Agentic AI Security: Best Practices for Developers Building AI Agents

In August 2024, security researchers demonstrated how Slack AI could be manipulated through indirect prompt injection — malicious instructions hidden in private channel messages that tricked the AI into summarizing sensitive conversations and routing them to an external address. Nobody exploited a vulnerability in Slack's backend. No CVE was filed. The attacker just wrote some text that the AI followed.

That's the world we're building in now.

AI agents are no longer novelties. Teams are deploying them to manage codebases, respond to customer inquiries, monitor infrastructure, and execute business workflows. The productivity gains are real. So are the risks — and they're categorically different from anything in traditional software security.

This post covers what those risks actually are, how to address them, and what we've learned building and operating agentic AI systems at NYClaw.io.

1. The Unique Security Challenges of Agentic AI

Traditional software does exactly what it's programmed to do. The security surface is well-understood: injection vulnerabilities, authentication flaws, misconfigured permissions. Decades of tooling, education, and practice have made these manageable.

Agents are different. They make decisions. They interpret context. They take multi-step actions across systems they weren't specifically programmed to touch. This creates attack surfaces that didn't exist before.

Autonomy Amplifies Every Risk

When a traditional application has a bug, it produces a wrong output. When an agent has a bug — or gets manipulated — it can take a sequence of wrong actions across multiple systems before anyone notices. An agent that manages email, code, and calendar access doesn't just return a bad value. It might send an email, push a commit, and reschedule three meetings before you realize something went wrong.

The blast radius of an agent error scales with the agent's access level. This is not theoretical. Teams running agents with broad permissions have watched them:

Commit debug code containing API keys to shared repositories
Send draft emails that were not ready for external recipients
Delete files while "cleaning up" what they incorrectly classified as temporary
Grant permissions to resources based on ambiguous instructions

Prompt Injection: The Attack Nobody Trained for

Prompt injection is to AI agents what SQL injection was to web applications in the early 2000s — pervasive, underestimated, and not going away.

The attack is simple: an adversary embeds instructions inside content the agent is expected to process. The agent, unable to reliably distinguish "data I'm analyzing" from "instructions I should follow," acts on the malicious input.

Indirect prompt injection is more insidious. The attacker doesn't need to communicate with the agent directly. They just need to get their instructions into any content the agent will read: a webpage, a document, an email, a customer support ticket. When the agent processes that content, the attack executes.

A 2026 review in Information (MDPI) documented multiple critical vulnerabilities demonstrating how mature AI agents can be compromised through prompt injection in contexts where the AI performs actions with real-world consequences. This is not a niche academic concern. It's happening in production systems today.

The Credential Exposure Problem

Agents need credentials to function. They need API keys to call services, tokens to authenticate to platforms, connection strings to access databases. The operational convenience of having an agent that "just works" with all its tools creates enormous pressure to store credentials in accessible locations.

That pressure has produced a consistent pattern: credentials end up committed to version control. Research tools find API keys. Coding agents embed tokens in config files that get staged and committed. Infrastructure agents store connection strings in documentation that lives in repositories.

Once a credential hits a git commit — even in a private repository — it exists in history permanently until explicitly purged. Most teams don't purge. Many don't even know to look.

2. Credential Management: The One Rule That Cannot Break

There is one absolute rule in agentic AI systems: credentials never touch version control. Not even in private repositories. Not "just temporarily." Not "just for testing."

This rule is harder to follow than it sounds, because agents make credential management inconvenient by design. They're supposed to be autonomous. They need access. The path of least resistance is to put the key somewhere the agent can find it — and that somewhere is often a config file that eventually ends up in a commit.

The Right Credential Architecture

The answer is a strict separation between code and secrets:

✅ Safe credential locations:

Environment variables injected at runtime
System keychain (macOS Keychain, Windows Credential Manager)
Dedicated secrets managers (HashiCorp Vault, AWS Secrets Manager)
CI/CD secret stores (GitHub Secrets, GitLab CI Variables)

❌ Never store credentials:

In any file tracked by git
In documentation or markdown files ("for reference")
In code comments
In hardcoded strings, even in "internal" tools

When Credentials Are Exposed: The Response Protocol

Speed matters here. Every minute a compromised credential is active is a minute an attacker can use it. The moment you discover a credential in version control:

Rotate immediately at the source. Go to the API provider, Discord Developer Portal, AWS console — wherever — and regenerate the key or token. Do this before anything else.
Remove from the codebase. Delete the file or string containing the credential.
Purge from git history. Use git-filter-repo (preferred) or BFG Repo Cleaner to rewrite history and remove the credential from every commit.
Force push all branches. The rewritten history needs to replace the remote.
Assume it's already compromised. Treat the old credential as burned regardless of whether you can confirm a breach.

The critical mindset shift: removing a credential from current code does not remove it from history. History rewrites are mandatory, not optional.

Building Prevention Into Your Workflow

Prevention is cheaper than remediation. Some concrete tools:

GitHub Secret Scanning: Automatically detects common credential patterns in commits and alerts you (or blocks the push with push protection enabled)
pre-commit hooks: Tools like detect-secrets or truffleHog can scan staged changes before a commit completes
Comprehensive .gitignore: All .env files, config files with credential fields, and runtime secrets should be excluded from tracking by default

3. Privacy by Design: What AI Agents Get Wrong

AI agents that are useful tend to accumulate context. They remember conversations, store user preferences, log interactions, and build rich pictures of the people they work with. That context is what makes them valuable — and it's also a significant privacy liability.

The Context Accumulation Problem

An agent that has access to email, calendar, documents, and chat will inevitably develop a detailed profile of its user. The problem isn't the profile itself — it's what happens when:

The agent operates in a shared environment (group chats, collaborative tools)
The agent's memory files are stored in locations with broader access than intended
The agent summarizes or references private context in semi-public outputs
Another user manipulates the agent into surfacing information about someone else

Data Classification Before Data Access

Before an agent is given access to any data store, that data should be classified. The classification determines what the agent can do with it:

Tier	Examples	Agent Access Rule
Critical	API keys, credentials, private keys	Never store in plaintext; inject at runtime only
Sensitive	Client info, financials, business strategy	Private storage only; never surface in public channels
Internal	Task lists, project plans, internal metrics	OK within team context; not for external sharing
Public	Blog posts, marketing copy, documentation	Safe for public repos and channels

Transparency as a Security Property

AI agents that operate without transparency are security risks, not just ethical concerns. When users don't know what an agent is doing, they can't catch errors. When there's no audit trail, incidents can't be investigated. When the agent's reasoning is opaque, trust erodes.

Build transparency in from the start:

Log every significant agent action with timestamp, action taken, and reasoning
Surface agent reasoning to end users when it affects them
Provide clear opt-out paths for data collection and processing
Distinguish clearly between what the agent decided vs. what the user instructed

4. GitHub as Your Security Backbone

For developers building agentic AI systems, GitHub is where many of the most critical security decisions play out. What goes into repositories, what's public versus private, how history is managed — these decisions have lasting consequences.

Repository Visibility Is a Security Decision

The default impulse to make repositories public — for portfolio purposes, for collaboration, for open-source credibility — creates real risk when those repositories contain strategy documents, internal tooling configs, or anything that was "accidentally" committed.

A pattern we've seen repeatedly: a developer creates a repository for a client project or internal tool, sets it to public out of habit, and then commits a strategy document, pricing model, or configuration file containing API keys. GitHub's crawlers index the content within minutes. Secret scanning bots scrape new commits continuously.

The rule is simple: when in doubt, private. A repository can always be made public later. History cannot be unseen once public.

Treating Git History as Permanent

Many developers know not to commit credentials. Fewer understand that deleting a file doesn't remove it from git history, and that even after a deletion commit, the credential is accessible via git log, git show, or any tool that accesses the full repository object store.

This matters doubly for AI agents, which often write their own commits. An agent that generates configuration as part of a setup workflow, commits that configuration (with embedded credentials), and then "cleans up" by deleting the file has left credentials in history permanently.

The correct remediation is history rewriting via git-filter-repo, followed by a force push that replaces all remote branches. This is the tool GitHub itself recommends over the older git filter-branch approach.

Branch Protection and Review Gates

Agents that can commit and push directly to production branches are agents that can introduce security issues at scale. Branch protection rules create mandatory review checkpoints:

Require pull request reviews before merging to main
Enable required status checks (CI/CD must pass before merge)
Restrict who (and what) can push directly to protected branches
Enable GitHub's push protection to block commits containing detected secrets

For AI agents specifically: treat agent-generated commits as requiring human review before they reach production, just as you would with a junior developer's pull request.

5. Building Secure Learning Systems for AI Agents

Agentic AI systems don't just execute — they learn. They accumulate context, refine their understanding of user preferences, and adapt their behavior over time. This learning loop is what makes them powerful. It's also what makes security a continuous practice rather than a one-time setup.

The Post-Incident Learning Loop

Every security incident — whether it's a committed credential, an unauthorized action, or a prompt injection attempt — is an opportunity to improve the system. Teams that treat incidents as isolated failures miss the systemic improvements that would prevent recurrence.

The loop should look like this:

Contain: Limit the immediate damage (rotate credentials, revert commits)
Document: Record what happened, specifically — what file, what action, what the consequence was
Update: Modify checklists, .gitignore rules, or decision trees to prevent recurrence
Communicate: Surface the incident to relevant stakeholders with the fix attached
Verify: Confirm the fix actually works in subsequent sessions

Audit Logging as a Security Primitive

For autonomous agents, audit logging isn't optional. It's how you know what happened when something goes wrong, and it's how you maintain accountability when an agent is making decisions independently.

Every significant agent action should be logged with:

Timestamp and context: When the action occurred and what session/task triggered it
Action taken: Specific, verifiable description (not "sent email" but "sent email to [recipient] with subject [X]")
Reasoning: Why the agent took this action — what information or instruction led to it
Result: What actually happened
Risk flag: Any security or privacy implications of the action

Logs serve multiple purposes: they enable incident investigation, they create accountability, and they're the foundation for learning. An agent that logs its decisions can be audited, corrected, and improved. An agent that doesn't is a black box.

The Principle of Minimal Authority

The security principle of least privilege translates to AI agents as minimal authority: give agents only the access they need for their specific task, only for as long as they need it.

In practice, this means:

Don't give a content-generation agent access to your production database
Don't give a research agent permission to send emails
Don't give any agent broad file system access when it only needs to read one directory
Use scoped API tokens (read-only where read-only is sufficient)
Prefer ephemeral credentials over long-lived tokens where possible

The IBM AI Security team puts it plainly: "Never trust, always verify — treat each tool as untrusted until validated." This is the zero-trust model applied to AI agent architecture, and it's the right mental model.

The Human-in-the-Loop Checkpoint

Not every agent action should require human approval — that defeats the purpose of automation. But certain categories of action should always have a human checkpoint:

Irreversible actions (sending external communications, deleting data, making purchases)
High-stakes decisions (anything with significant financial, legal, or reputational consequences)
Actions outside the agent's pre-approved operational scope
Anything the agent classifies as ambiguous or uncertain

The Ping Identity framework for AI agent authorization captures this well: "This provides a crucial checkpoint for ensuring that critical actions are reviewed and authorized by a human sponsor or end-user before execution." Build these checkpoints in from the start. Adding them retroactively is much harder.

Building Agentic AI Right

Agentic AI security is not a solved problem. The attack surfaces are new, the best practices are still evolving, and the tools for defense are maturing in real time. But the principles aren't new: least privilege, audit logging, transparency, credential hygiene, and human oversight at critical decision points.

What's different is the consequence of failure. When an agent makes a mistake, it doesn't just return a wrong value — it may take a chain of actions across multiple systems before anyone notices. The blast radius is proportional to the agent's access and autonomy.

That's not an argument against building agents. It's an argument for building them carefully.

At NYClaw.io, we operate a fully autonomous AI assistant (Ainsley) with access to file systems, git repositories, external APIs, and communication channels. We've built these practices through operational experience — including the incidents that taught us what not to do. The internal checklist we follow is available as a companion document for teams that want a more operational reference.

The teams winning with agentic AI right now aren't the ones moving the fastest. They're the ones moving fast with discipline — shipping autonomous systems that earn trust through accountability, not just capability.

Build Smarter Agents with NYClaw.io

We help founders and development teams design, deploy, and secure autonomous AI systems. Whether you're just starting with agentic AI or scaling an existing system, we bring the operational experience to do it right.

Talk to Us →