AI & Deepfake Scamsmedium

GitHub Copilot Leaks Secrets: AI Autocompletion Suggests Real Credentials from Training Data

Security researchers demonstrated that GitHub Copilot would suggest real API keys, passwords, and private credentials as code completions — because those secrets had been present in public GitHub repositories that formed its training data.

GitHub / Developers Globally·2022·2 min read

Background

GitHub Copilot, launched in 2021, was trained on billions of lines of public code from GitHub repositories. Those repositories included code that had secrets accidentally committed: AWS keys, API tokens, passwords, and certificates. The AI memorised some of this data and could reproduce it as suggestions.

The Attack

Security researchers from Stanford University studied GitHub Copilot's suggestions for code patterns that commonly follow credential declarations. In 40% of cases where Copilot suggested credentials, the suggested strings were real values found in the training data — not synthesised placeholder values. Researchers could cause Copilot to suggest real AWS keys by writing code that declared an AWS client. GitHub had policies against storing secrets in code, and had its own secret scanning feature — but Copilot was reproducing secrets from before those policies were fully adopted.

Response

GitHub invested in filtering known secrets from Copilot suggestions and worked with AWS and other providers to identify and revoke keys that appeared in suggestions. Secret scanning was enhanced. GitHub published guidance on using Copilot safely. AWS and other cloud providers committed to automatic revocation of keys detected in public code.

Outcome

The incident raised fundamental questions about AI training data and memorisation of sensitive information. GitHub rotated all secrets it could identify. The research drove adoption of pre-commit hooks that scan for secrets before commits reach any repository.

Key Takeaways

Never commit secrets to version control — use environment variables, secrets managers, or .env files excluded from git
Pre-commit hooks that scan for secrets (git-secrets, truffleHog) should be mandatory in all development workflows
AI code completion tools may suggest real credentials from their training data — review all suggested credential strings
AWS, Google Cloud, and GitHub have automatic secret detection and revocation — enable these features immediately

How to Prevent This

All guides

intermediate

Review all AI-suggested code for hardcoded credentials before committing

GitHub Copilot and other AI code assistants can suggest real credentials from their training data — API keys, passwords, and tokens from public repositories that were used to train the model. Security researchers demonstrated that Copilot would suggest valid AWS keys when writing code that declared an AWS client. Review every AI-generated code suggestion before committing. Use pre-commit hooks that scan for secrets patterns (git-secrets, truffleHog, Gitleaks) as a safety net. Never assume that because a credential appears in an AI suggestion, it is a placeholder — verify explicitly.

See: GitHub Copilot Secrets LeakAI & Emerging Threats

GitHub Copilotcredential leakageAI training datasecrets in codeAPI keys