Agentic AI Security: Best Practices for Developers Building AI Agents
AI agents can write code, send emails, push to GitHub, and access production systems. That autonomy is the whole point — and it's exactly what makes them dangerous when built without security in mind.
In August 2024, security researchers demonstrated how Slack AI could be manipulated through indirect prompt injection — malicious instructions hidden in private channel messages that tricked the AI into summarizing sensitive conversations and routing them to an external address. Nobody exploited a vulnerability in Slack's backend. No CVE was filed. The attacker just wrote some text that the AI followed.
That's the world we're building in now.
AI agents are no longer novelties. Teams are deploying them to manage codebases, respond to customer inquiries, monitor infrastructure, and execute business workflows. The productivity gains are real. So are the risks — and they're categorically different from anything in traditional software security.
This post covers what those risks actually are, how to address them, and what we've learned building and operating agentic AI systems at NYClaw.io.
1. The Unique Security Challenges of Agentic AI
Traditional software does exactly what it's programmed to do. The security surface is well-understood: injection vulnerabilities, authentication flaws, misconfigured permissions. Decades of tooling, education, and practice have made these manageable.
Agents are different. They make decisions. They interpret context. They take multi-step actions across systems they weren't specifically programmed to touch. This creates attack surfaces that didn't exist before.
Autonomy Amplifies Every Risk
When a traditional application has a bug, it produces a wrong output. When an agent has a bug — or gets manipulated — it can take a sequence of wrong actions across multiple systems before anyone notices. An agent that manages email, code, and calendar access doesn't just return a bad value. It might send an email, push a commit, and reschedule three meetings before you realize something went wrong.
The blast radius of an agent error scales with the agent's access level. This is not theoretical. Teams running agents with broad permissions have watched them:
- Commit debug code containing API keys to shared repositories
- Send draft emails that were not ready for external recipients
- Delete files while "cleaning up" what they incorrectly classified as temporary
- Grant permissions to resources based on ambiguous instructions
Prompt Injection: The Attack Nobody Trained for
Prompt injection is to AI agents what SQL injection was to web applications in the early 2000s — pervasive, underestimated, and not going away.
The attack is simple: an adversary embeds instructions inside content the agent is expected to process. The agent, unable to reliably distinguish "data I'm analyzing" from "instructions I should follow," acts on the malicious input.
Indirect prompt injection is more insidious. The attacker doesn't need to communicate with the agent directly. They just need to get their instructions into any content the agent will read: a webpage, a document, an email, a customer support ticket. When the agent processes that content, the attack executes.
A 2026 review in Information (MDPI) documented multiple critical vulnerabilities demonstrating how mature AI agents can be compromised through prompt injection in contexts where the AI performs actions with real-world consequences. This is not a niche academic concern. It's happening in production systems today.
The Credential Exposure Problem
Agents need credentials to function. They need API keys to call services, tokens to authenticate to platforms, connection strings to access databases. The operational convenience of having an agent that "just works" with all its tools creates enormous pressure to store credentials in accessible locations.
That pressure has produced a consistent pattern: credentials end up committed to version control. Research tools find API keys. Coding agents embed tokens in config files that get staged and committed. Infrastructure agents store connection strings in documentation that lives in repositories.
Once a credential hits a git commit — even in a private repository — it exists in history permanently until explicitly purged. Most teams don't purge. Many don't even know to look.
2. Credential Management: The One Rule That Cannot Break
There is one absolute rule in agentic AI systems: credentials never touch version control. Not even in private repositories. Not "just temporarily." Not "just for testing."
This rule is harder to follow than it sounds, because agents make credential management inconvenient by design. They're supposed to be autonomous. They need access. The path of least resistance is to put the key somewhere the agent can find it — and that somewhere is often a config file that eventually ends up in a commit.
The Right Credential Architecture
The answer is a strict separation between code and secrets:
✅ Safe credential locations:
- Environment variables injected at runtime
- System keychain (macOS Keychain, Windows Credential Manager)
- Dedicated secrets managers (HashiCorp Vault, AWS Secrets Manager)
- CI/CD secret stores (GitHub Secrets, GitLab CI Variables)
❌ Never store credentials:
- In any file tracked by git
- In documentation or markdown files ("for reference")
- In code comments
- In hardcoded strings, even in "internal" tools
When Credentials Are Exposed: The Response Protocol
Speed matters here. Every minute a compromised credential is active is a minute an attacker can use it. The moment you discover a credential in version control:
- Rotate immediately at the source. Go to the API provider, Discord Developer Portal, AWS console — wherever — and regenerate the key or token. Do this before anything else.
- Remove from the codebase. Delete the file or string containing the credential.
- Purge from git history. Use
git-filter-repo(preferred) or BFG Repo Cleaner to rewrite history and remove the credential from every commit. - Force push all branches. The rewritten history needs to replace the remote.
- Assume it's already compromised. Treat the old credential as burned regardless of whether you can confirm a breach.
The critical mindset shift: removing a credential from current code does not remove it from history. History rewrites are mandatory, not optional.
Building Prevention Into Your Workflow
Prevention is cheaper than remediation. Some concrete tools:
- GitHub Secret Scanning: Automatically detects common credential patterns in commits and alerts you (or blocks the push with push protection enabled)
- pre-commit hooks: Tools like
detect-secretsortruffleHogcan scan staged changes before a commit completes - Comprehensive .gitignore: All
.envfiles, config files with credential fields, and runtime secrets should be excluded from tracking by default
3. Privacy by Design: What AI Agents Get Wrong
AI agents that are useful tend to accumulate context. They remember conversations, store user preferences, log interactions, and build rich pictures of the people they work with. That context is what makes them valuable — and it's also a significant privacy liability.
The Context Accumulation Problem
An agent that has access to email, calendar, documents, and chat will inevitably develop a detailed profile of its user. The problem isn't the profile itself — it's what happens when:
- The agent operates in a shared environment (group chats, collaborative tools)
- The agent's memory files are stored in locations with broader access than intended
- The agent summarizes or references private context in semi-public outputs
- Another user manipulates the agent into surfacing information about someone else
Data Classification Before Data Access
Before an agent is given access to any data store, that data should be classified. The classification determines what the agent can do with it:
| Tier | Examples | Agent Access Rule |
|---|---|---|
| Critical | API keys, credentials, private keys | Never store in plaintext; inject at runtime only |
| Sensitive | Client info, financials, business strategy | Private storage only; never surface in public channels |
| Internal | Task lists, project plans, internal metrics | OK within team context; not for external sharing |
| Public | Blog posts, marketing copy, documentation | Safe for public repos and channels |
Transparency as a Security Property
AI agents that operate without transparency are security risks, not just ethical concerns. When users don't know what an agent is doing, they can't catch errors. When there's no audit trail, incidents can't be investigated. When the agent's reasoning is opaque, trust erodes.
Build transparency in from the start:
- Log every significant agent action with timestamp, action taken, and reasoning
- Surface agent reasoning to end users when it affects them
- Provide clear opt-out paths for data collection and processing
- Distinguish clearly between what the agent decided vs. what the user instructed
4. GitHub as Your Security Backbone
For developers building agentic AI systems, GitHub is where many of the most critical security decisions play out. What goes into repositories, what's public versus private, how history is managed — these decisions have lasting consequences.
Repository Visibility Is a Security Decision
The default impulse to make repositories public — for portfolio purposes, for collaboration, for open-source credibility — creates real risk when those repositories contain strategy documents, internal tooling configs, or anything that was "accidentally" committed.
A pattern we've seen repeatedly: a developer creates a repository for a client project or internal tool, sets it to public out of habit, and then commits a strategy document, pricing model, or configuration file containing API keys. GitHub's crawlers index the content within minutes. Secret scanning bots scrape new commits continuously.
The rule is simple: when in doubt, private. A repository can always be made public later. History cannot be unseen once public.
Treating Git History as Permanent
Many developers know not to commit credentials. Fewer understand that deleting a file doesn't remove it from git history, and that even after a deletion commit, the credential is accessible via git log, git show, or any tool that accesses the full repository object store.
This matters doubly for AI agents, which often write their own commits. An agent that generates configuration as part of a setup workflow, commits that configuration (with embedded credentials), and then "cleans up" by deleting the file has left credentials in history permanently.
The correct remediation is history rewriting via git-filter-repo, followed by a force push that replaces all remote branches. This is the tool GitHub itself recommends over the older git filter-branch approach.
Branch Protection and Review Gates
Agents that can commit and push directly to production branches are agents that can introduce security issues at scale. Branch protection rules create mandatory review checkpoints:
- Require pull request reviews before merging to
main - Enable required status checks (CI/CD must pass before merge)
- Restrict who (and what) can push directly to protected branches
- Enable GitHub's push protection to block commits containing detected secrets
For AI agents specifically: treat agent-generated commits as requiring human review before they reach production, just as you would with a junior developer's pull request.
5. Building Secure Learning Systems for AI Agents
Agentic AI systems don't just execute — they learn. They accumulate context, refine their understanding of user preferences, and adapt their behavior over time. This learning loop is what makes them powerful. It's also what makes security a continuous practice rather than a one-time setup.
The Post-Incident Learning Loop
Every security incident — whether it's a committed credential, an unauthorized action, or a prompt injection attempt — is an opportunity to improve the system. Teams that treat incidents as isolated failures miss the systemic improvements that would prevent recurrence.
The loop should look like this:
- Contain: Limit the immediate damage (rotate credentials, revert commits)
- Document: Record what happened, specifically — what file, what action, what the consequence was
- Update: Modify checklists, .gitignore rules, or decision trees to prevent recurrence
- Communicate: Surface the incident to relevant stakeholders with the fix attached
- Verify: Confirm the fix actually works in subsequent sessions
Audit Logging as a Security Primitive
For autonomous agents, audit logging isn't optional. It's how you know what happened when something goes wrong, and it's how you maintain accountability when an agent is making decisions independently.
Every significant agent action should be logged with:
- Timestamp and context: When the action occurred and what session/task triggered it
- Action taken: Specific, verifiable description (not "sent email" but "sent email to [recipient] with subject [X]")
- Reasoning: Why the agent took this action — what information or instruction led to it
- Result: What actually happened
- Risk flag: Any security or privacy implications of the action
Logs serve multiple purposes: they enable incident investigation, they create accountability, and they're the foundation for learning. An agent that logs its decisions can be audited, corrected, and improved. An agent that doesn't is a black box.
The Principle of Minimal Authority
The security principle of least privilege translates to AI agents as minimal authority: give agents only the access they need for their specific task, only for as long as they need it.
In practice, this means:
- Don't give a content-generation agent access to your production database
- Don't give a research agent permission to send emails
- Don't give any agent broad file system access when it only needs to read one directory
- Use scoped API tokens (read-only where read-only is sufficient)
- Prefer ephemeral credentials over long-lived tokens where possible
The IBM AI Security team puts it plainly: "Never trust, always verify — treat each tool as untrusted until validated." This is the zero-trust model applied to AI agent architecture, and it's the right mental model.
The Human-in-the-Loop Checkpoint
Not every agent action should require human approval — that defeats the purpose of automation. But certain categories of action should always have a human checkpoint:
- Irreversible actions (sending external communications, deleting data, making purchases)
- High-stakes decisions (anything with significant financial, legal, or reputational consequences)
- Actions outside the agent's pre-approved operational scope
- Anything the agent classifies as ambiguous or uncertain
The Ping Identity framework for AI agent authorization captures this well: "This provides a crucial checkpoint for ensuring that critical actions are reviewed and authorized by a human sponsor or end-user before execution." Build these checkpoints in from the start. Adding them retroactively is much harder.
Building Agentic AI Right
Agentic AI security is not a solved problem. The attack surfaces are new, the best practices are still evolving, and the tools for defense are maturing in real time. But the principles aren't new: least privilege, audit logging, transparency, credential hygiene, and human oversight at critical decision points.
What's different is the consequence of failure. When an agent makes a mistake, it doesn't just return a wrong value — it may take a chain of actions across multiple systems before anyone notices. The blast radius is proportional to the agent's access and autonomy.
That's not an argument against building agents. It's an argument for building them carefully.
At NYClaw.io, we operate a fully autonomous AI assistant (Ainsley) with access to file systems, git repositories, external APIs, and communication channels. We've built these practices through operational experience — including the incidents that taught us what not to do. The internal checklist we follow is available as a companion document for teams that want a more operational reference.
The teams winning with agentic AI right now aren't the ones moving the fastest. They're the ones moving fast with discipline — shipping autonomous systems that earn trust through accountability, not just capability.
Build Smarter Agents with NYClaw.io
We help founders and development teams design, deploy, and secure autonomous AI systems. Whether you're just starting with agentic AI or scaling an existing system, we bring the operational experience to do it right.
Talk to Us →Sources & Further Reading
- Prompt Injection Attacks in LLMs and AI Agent Systems — MDPI Information, January 2026
- Top Agentic AI Security Threats — Stellar Cyber, 2026
- AI Agent Security Best Practices — IBM, 2026
- IAM Best Practices for AI Agents — Ping Identity
- Best Practices of Authorizing AI Agents — Oso
- Removing Sensitive Data from a Repository — GitHub Docs