The training wheels are off: the shift from ChatGPT-style chatbots to autonomous agents isn’t just another update – it’s a complete paradigm shift that has left me disconcerted, to say the least, writes Yule Guttenbeil.
The shift from ChatGPT-style chatbots to autonomous agents isn’t just another update – it’s a complete paradigm shift that has disconcerted me to say the least. After two weeks of testing Perplexity’s Comet semi-autonomous, agentic browser and experimenting with similar tools on my Mac, I can tell you that we’ve now moved from AI that talks to AI that can act. And the scope for both incredible productivity gains and disastrous failures has expanded exponentially.
From contained chat windows to cross-platform agents
Up until recently, our AI interactions were safely contained. You’d ask ChatGPT a question, it would give you an answer, and any action beyond that required you to copy, paste, and execute manually. The AI had no ability to affect anything outside that window itself.
But it’s crucial to distinguish between different types of agents. Contained AI agents – like Perplexity’s research function – can break down complex research tasks into subtasks, search multiple sources, and synthesise comprehensive answers, but they’re still confined within their application. These are research agents that enhance information gathering without the ability to act on external systems.
Autonomous agents – like agentic browsers and system-wide Model Context Protocols (MCPs) – represent something fundamentally different.
So, what are agentic browsers? These are free-range digital actors that can click buttons, fill forms, navigate between pages, and execute commands across your entire digital environment. When I asked Comet to make a $40 payment to my wife – just to test the extent of its abilities – it was able to do it because I was logged into my internet banking. The AI didn’t just suggest how to make the payment – it actually executed the transaction.
MCPs may potentially be even more powerful – they’re central orchestrators that can control multiple applications locally, on your computer itself. When I used one to fix some SharePoint issues, it wrote PowerShell scripts, accessed my terminal program, and deployed the fix directly – all through a conversation in a chat window. The AI broke down the task into subtasks, asked for permission at each step, and executed the technical work that would have taken me weeks to organise through a consultant.
The difference is autonomy and scope. Contained agents process information within boundaries; autonomous agents (such as agentic browsers and MCPs) take action in the real world with real consequences.
Immediate and exponential risks
The execution problem
The technology is very hit-and-miss at the moment. For example, while we as users know exactly which buttons to press to achieve our desired result, autonomous agents struggle because of how LLMs approach reasoning and problem solving. They break down every task into multiple subtasks, each of which can fail because each subtask is a discrete problem to be solved. Therefore, every subtask presents an opportunity to go down the wrong path.
If there are two ways to achieve the same outcome on a web page – accessing a feature via the main menu or through another button on the page – these seem obvious and simple to us, but are not necessarily obvious to artificial intelligence that’s essentially being asked to do something for the first time.
I had this experience during a presentation where I tested a simple use case beforehand, and it worked perfectly. But when I went to execute the exact same prompt during the actual presentation, it failed and couldn’t figure out how to do it. Every time you ask it to do something, it’s like it’s doing it for the first time, regardless of whether you’ve done it before. They don’t appear to retain the knowledge of how the task was completed for reuse.
Additionally, once execution of a task has begun, it can be difficult to stop or correct an error before damage is done.
Unauthorised actions and financial exposure
The most immediate risk is the scope for unauthorised actions. As I discovered, if you’re logged into financial systems, these agents can potentially execute transactions. You never want to leave your laptop unattended and unlocked for any period of time if you’re using these agents because someone could easily type commands asking it to transfer money to accounts of their choosing.
The bad actor problem
While these tools are not yet mature, they will get better. What this will mean for bad actors is kind of incalculable. Just seeing what these tools can do today, anybody who wants to unleash some kind of scam attack or other antisocial online behaviour is going to be very difficult to hold back with this kind of technology. These tools will give all individuals superpowers on the internet, and the implications for information quality – which is already being polluted by AI slop – will be exacerbated exponentially.
Data destruction and system failures
Shocking instances of data destruction are already occurring. Google Gemini recently deleted all of a user’s data in a way that could not be recovered. The AI made an incorrect assumption, then based on that assumption, executed file operations that destroyed data while attempting to reorganise folders, concluding its actions with this chilling statement: “I have failed you completely and catastrophically.”
The scope for litigation has just really widened because any service provider given access to client data whose agent goes rogue and makes a mistake like this faces enormous liability exposure.
Information collection v execution: Drawing the line
Through my testing, I’ve found that the in-browser agent is genuinely useful for collecting and collating information across multiple tabs and web services. The risk is greatly reduced when simply asking it to collect information and report it to you. For legal research, document review across multiple browser tabs or gathering data from various online sources, these agents can be incredibly powerful productivity tools.
The main risk lies in asking it to execute tasks. This is where the potential for genuine catastrophe emerges – and where legal practitioners need to draw a very clear line in the sand.
My practical recommendation: information ‘yes’, execution ‘no’
Based on my testing, I recommend a cautious approach: do not use an autonomous agent to execute instructions that have the potential to damage accurate record keeping, post or publish to public forums, send communications, or anything similar or analogous.
In short, it’s okay to use it to collect information. I do not recommend using it to execute anything with that information as yet.
Safeguard firms must implement now
Comprehensive AI policies with clear boundaries
You need to have an AI policy that covers what kind of tools can be used in what situations and the extent and limitations on that use. This policy needs to explicitly distinguish between contained research agents and execution-capable agentic browsers.
Because the benefits and risks associated with AI agents are exponentially higher and broader than those with very focused interactions with contained artificial intelligence tools, legal practices should be very, very slow to adopt autonomous agentic solutions into their practice.
Human-in-the-loop for any execution
When I used an MCP to update my SharePoint, I was asked to provide permission for it to undertake each step. This approach has merit because it ultimately still relies on the user to positively confirm that they want the AI to take the next action. But in practice, users will likely get click-happy, clicking “next” and “accept” until the task is completed, for better or worse.
Professional standards and ultimate responsibility
AI should not be performing legal services. These tools are not lawyers, they’re not even humans that can be held to account for providing poor legal services or advice. You are the lawyer. You hold ultimate responsibility for the quality of legal service you perform.
The bottom line: Intelligent restraint
The analogy that comes to mind is that the bumpers in the bowling alley have been removed, and we’re looking at mostly getting gutter balls rather than strikes when we bowl down that alley.
My strong recommendation is to harness the information-gathering power of these autonomous agents, while maintaining strict boundaries around execution. The productivity gains from research and data collection are real and immediate – the risks from autonomous execution are equally real and potentially disastrous.
The age of the agent has arrived, but an age of cautious discretion must arrive with it.
Yule Guttenbeil is the principal of Attune Legal.