Preventing Tool Misuse in AI Agents
In my side project (Biotrackr), I have a chat agent that I use to query my data using natural language. This agent has 12 tools that call APIs to retrieve data that provides context to a LLM. I’m using Claude as my LLM provider, so Claude will decide which tool to call, and with what parameters. Let’s pretend that we are bad actors trying to disrupt my agent. Say we decide to prompt inject the agent, and get it to perform an expensive query to retrieve 100 years of data (I’m not that old thankfully!) in an attempt to return a massive payload, consume thousands of Claude API tokens, and hammer my APIM gateway. ...