How to Build Guardrails for Effective Agents

increasingly prevalent in a lot of applications. However, integrating agents into your application is a lot more than just giving an LLM access to all data and functions. You also need to build effective guardrails that ensure the agent only has access to relevant data and prevent misuse of functions. You need to do this, while also ensuring the model can work effectively with access to necessary data, and utilize as many functions as possible, without needing a human in the loop.

My goal for this article is to highlight, on a high level, how to build effective agentic guardrails to ensure your agent only has access to necessary data and functions while maintaining a good user experience, for example, minimizing the number of times a human has to approve an agent’s access. I’ll first discuss why guardrails are so important, before I move into a crucial component of guardrails: fine-grained authorization. Next, I’ll discuss building guardrails for your data, and continue covering guardrails for functions.

This infographic highlights the main topics of this article. I’ll discuss fine-grained authorization, guardrails for data, and guardrails for functions, which are all essential topics when discussing guardrails for AI agents. Image by Google Gemini.

Why you need guardrails for your agents

First, I want to describe why we need guardrails for AI agents. You could, in theory, just give the agent access to all databases and functions in your applications, right?

There are multiple reasons guardrails are necessary. The main reason is to prevent the agent from performing any undesired actions, such as deleting database tables. Furthermore, you also need to ensure agents only have access to data within a scope, for example, ensuring that an agent used by one customer cannot use the data from another customer.

Some guardrails can be set up automatically and never need human involvement. Database access is on such a guardrail, where you set the scope an agent operates in (for example, within a customer), and only allow the agent access to that customer’s data. Other guardrails, however, need human interaction. Imagine if an agent wants to run a command, how do we make sure the agent is not performing a destructive action (like deleting a database table), and the user allows the command?

In these scenarios, we have a human-in-the-loop, where the agent asks for permission to perform a specific action. If the user allows it, the agent can continue, and if it’s not allowed, the agent has to decide on a different course of action.

Fine-grained permissions

A likely requirement for working with agents is to have fine-grained permissions. This means you can easily check if a function, or some data, is available within a certain scope, such as:

Does this customer 1 have access to database table A?
Does user 2 have access to function B?
Does organization 3 have access to function C?

It’s crucial that you have fine-grained authorization implemented in your application. There are numerous providers out there offering this functionality.

When you have fine-grained authorization implemented, you have to implement it into all functions in your applications, and handle both the scenario where access is granted and where access is denied. If access is denied, for example, you might consider adding a message stating that you need to ask an admin for a specific access level to be able to perform a certain action.

Agentic guardrails for data

After you’ve implemented fine-grained permissions, we can start discussing guardrails around your data. It’s important that your agent has access to as much data as possible to effectively answer user questions. You then need to balance this with the fact that the agent shouldn’t access restricted data, or fetch unnecessary information it doesn’t need to answer the user query

Access to restricted data

Restricting access to data for your agents is mostly up to the fine-grained authorization. In your functions that perform data search (database lookup, bucket retrieval, …), you should check the user’s access scope first.

Furthermore, you should also consider informing your agent in the prompt what it’s allowed to do. Having the agent try to access data and then being denied access for whatever reason will be costly, both with regard to token usage and time-wise.

Avoid fetching unnecessary information

If you give your agent access to all database tables and data buckets, you might experience issues where the agents have too many options, and it will be challenging for the agent to pick the correct document table and fields. This is also a topic I discussed recently in my article about building tools for effective agents.

To solve this problem, I would focus on only informing the agent of relevant information sources. If the agent is working on a task that you know can be solved only using database A, you should consider only informing the agent about database A, and leaving all other databases out of the agents prompt. This, of course, assumes that you know which data is potentially relevant for the agent to answer queries.

Agentic guardrails for functions

I think the topic of building agentic guardrails for functions is even more interesting. The reason is that there is a lot of elements to consider when building these guardrails:

How do you prevent destructive actions?
How do you minimize human-in-the-loop interactions?

How do you prevent destructive actions

The most important subtopic on function guardrails is preventing destructive actions. To solve this, you should mark all functions on whether they perform irreversible actions. For example

Deleting a database table is irreversible (you can, of course, load a backup, but this requires some work)
Reading from a table has no destructive impact

If the agent performs an easily reversible action (it can be reversed with the click of an undo button), or an action that has no destructive impact, you can likely just allow the agent to run the function.

If a function performs an irreversible action, however, you should inform the agent of such, and likely prompt the human user if the agent can perform this action.

How do you minimize human-in-the-loop interactions

Naturally, you want to prevent destructive actions. However, you also don’t want to bother the user too much by prompting them if the agent can perform an action or not.

A great approach to minimizing human interactions is to perform function whitelisting, such as what Cursor does for running terminal commands: The first time Cursor wants to perform a command, such as:

cd into a folder
Run pytest tests
move a file from one location to another

Cursor will prompt the user if it’s allowed to perform a command. You can then choose one of the three options below:

Deny the request
Accept the request (one-time)
Whitelist the command (accept the request now, and going forward)

Whitelisting works well because you ensure the user allows the agent to run a function or command, but you don’t have to bother them anymore about that exact function going forward. Still, whitelisting has a downside that some commands can’t be whitelisted, considering a user has to review the context every time the agent suggests running some functions (such as deleting a database table)

Conclusion

In this high-level article, I’ve discussed how you should approach building agentic applications with regard to guardrails. Guardrails are necessary because you need to ensure the agent acts in desired behavior and isn’t allowed to perform actions like fetching information that is out of the access scope or performing destructive actions without explicit permission from the user. I discussed building guardrails for your data and for the functions you make available to your agent. I believe guardrails are an important part of agentic application building, which should always be kept top-of-mind when building agentic applications. Ensuring proper guardrails are in place will make your agents safer to use, which is critical, considering that if a user’s trust in the agent is broken, it will be hard to recover the trust of the user.

👉 Find me on socials:

🧑‍💻 Get in touch

🔗 LinkedIn

🐦 X / Twitter

✍️ Medium

You can also read some of my other articles:

Source link

#Build #Guardrails #Effective #Agents

How to Build Guardrails for Effective Agents

Why you need guardrails for your agents

Fine-grained permissions

Agentic guardrails for data

Access to restricted data

Avoid fetching unnecessary information

Agentic guardrails for functions

How do you prevent destructive actions

How do you minimize human-in-the-loop interactions

Conclusion

Recent Posts

UK Government announces billions in AI investment ahead of Budget

Empirical Mode Decomposition: The Most Intuitive Way to Decompose Complex Signals and Time Series

This hacker conference installed a literal antivirus monitoring system

Shocker: Elon Musk spends a lot of time on X posting bad political takes

Gaming Exec Says That “Gen Z Loves AI Slop”

Best Walmart Black Friday deals live now: Save up to 60% on AirPods, TVs, Dyson vacuums, and more

Edtech-Specific Startup Funding Stays Low

This Week’s Awesome Tech Stories From Around the Web (Through November 22)

Shoot To Survive Gets Visual Upgrades In First Patch

3 ways innovation will shape the next era of digital security