GitHub Copilot Leaks Secrets: AI Autocompletion Suggests Real Credentials from Training Data

Security researchers demonstrated that GitHub Copilot would suggest real API keys, passwords, and private credentials as code completions — because those secrets had been present in public GitHub repositories that formed its training data.

GitHub / Developers Globally·2022·2 min read

Background

GitHub Copilot, launched in 2021, was trained on billions of lines of public code from GitHub repositories. Those repositories included code that had secrets accidentally committed: AWS keys, API tokens, passwords, and certificates. The AI memorised some of this data and could reproduce it as suggestions.

The Attack

Security researchers from Stanford University studied GitHub Copilot's suggestions for code patterns that commonly follow credential declarations. In 40% of cases where Copilot suggested credentials, the suggested strings were real values found in the training data — not synthesised placeholder values. Researchers could cause Copilot to suggest real AWS keys by writing code that declared an AWS client. GitHub had policies against storing secrets in code, and had its own secret scanning feature — but Copilot was reproducing secrets from before those policies were fully adopted.

Response

GitHub invested in filtering known secrets from Copilot suggestions and worked with AWS and other providers to identify and revoke keys that appeared in suggestions. Secret scanning was enhanced. GitHub published guidance on using Copilot safely. AWS and other cloud providers committed to automatic revocation of keys detected in public code.

Outcome

The incident raised fundamental questions about AI training data and memorisation of sensitive information. GitHub rotated all secrets it could identify. The research drove adoption of pre-commit hooks that scan for secrets before commits reach any repository.

Key Takeaways

  1. Never commit secrets to version control — use environment variables, secrets managers, or .env files excluded from git
  2. Pre-commit hooks that scan for secrets (git-secrets, truffleHog) should be mandatory in all development workflows
  3. AI code completion tools may suggest real credentials from their training data — review all suggested credential strings
  4. AWS, Google Cloud, and GitHub have automatic secret detection and revocation — enable these features immediately
GitHub Copilotcredential leakageAI training datasecrets in codeAPI keys