Agent 86

Does generative AI miss it by "that much?"

Jan 06, 2025

My birthday, September 18, 1965, is of more cultural significance than I thought. It’s the same day as the NBC television premiere of the Mel Brooks-produced classic comedy series, Get Smart! Billed as a combination of James Bond and Inspector Clouseau, the show ran for five seasons and starred Don Adams as the bumbling Maxwell Smart, otherwise known as Agent 86. Despite his clumsiness and incompetence, Agent 86 in every episode still manages to thwart the nefarious plots emanating from KAOS, the “International Organization of Evil.”

Fictional spies and spoofs aside, we all regularly deal with agents in our daily lives. An agent is simply someone who represents you or acts on your behalf. Some years ago, travel agents would book flights and hotels for you—before their business model was upended by the internet. The service still lives on in high-end bespoke travel planning. Real estate agents represent you in managing complex, high-value transactions. Well-to-do individuals may employ shopping agents who go to stores, buying everything from groceries to apparel for their clients. Everyone hires a lawyer to look out for them in matters like drawing up a will, business contracts, civil disputes or criminal proceedings.

The common theme in any relationship with an agent is trust, built on a foundation of specialized expertise. We believe that our agents are knowledgeable in their field and expect that they will act in our best interests. To that end, many professional organizations like bar and medical associations require their members to adhere to an ethical code of conduct with severe penalties for non-compliance. The trust we place in an agent is earned and maintained through responsible behaviour by the agent.

Is agentic a word?
Generative AI, at least in its current incarnation, is barely two years old. It’s only natural, I suppose, given how quickly it has progressed through the hype cycle, that the ‘next big thing’ is now upon us. Over the last few months, I’ve seen a trickle turn into a flood of news items covering something with the ear-jarring name of agentic AI. Merriam-Webster thankfully contains no entry for agentic although it does suggest the slightly better-sounding word agential instead. Nonetheless, agentic has been in circulation for about a decade and a half in both psychology and technology and is used as a synonym for agential—meaning, simply, to act or behave as an agent. Agentic AI, therefore, is AI that, beyond providing information, will work on our behalf.

Forbes, in a recent article, calls agents the “third wave” of AI and, demonstrating that AI hype has yet to abate, assures us that agentic AI will transform the way we work. The article notes that the first two waves were predictive AI (which I take to mean machine learning and its variations) and generative AI. Forbes suggested in 2019 that AI’s first wave would transform the employee experience, and just a year ago told us that generative AI would change all of our jobs. That’s a lot of transformation in a short period of time, and I am not convinced that the hype is justified. I don’t think work has changed all that much—not yet, at any rate.

Microsoft, in an enthusiastic press release from November 2024, defines agents as “taking the power of generative AI a step further” because, instead of just answering questions as Copilot (Microsoft’s rebranding of ChatGPT) does, the agents can “work alongside you or even on your behalf.” Effectively, the company says, the agent is an app—a small computer program—that uses the knowledge base of generative AI in the course of executing a defined task. Microsoft offers plenty of hypothetical examples—AI agents could process a backlog of product returns or shipping invoices on behalf of a warehouse employee. They could act as a virtual project manager, reconcile financial statements or handle customer inquiries. Multiple agents, Microsoft goes on to say, could be connected together and work in concert to alleviate the drudgery of your daily job, all while you relax and sip your second cup of coffee. If it sounds a bit too good to be true, maybe it is.

Let’s take a closer look at AI agents and see what all the fuss is about.

Does that work for you?
It’s one thing to say that AI agents might be able to do all these tasks for us, but the next question is how? What would it take to create a useful agent? There’s lots of forums, websites and YouTube videos out there explaining the process and if you already know, or would like to learn, a programming language like Python it would not be too hard to code your own agent.

To accomplish his missions, Maxwell Smart had many gadgets at his disposal—from a powerful magnet in his belt buckle, to a hidden camera in a bowl of soup, to the famous shoe phone. Similarly, once you decide on a task for which you want to build an AI agent, you then select all the gadgets, tools and components you need for the agent to work. You’ll have to start with an AI system—which could be generative AI but doesn’t have to be. If you want an agent that will compare images, you might need a machine learning (ML) system. An agent to respond to customer service tasks might be better suited to a Large Language Model (LLM). Commercial AI systems all come with an application programming interface (API) allowing your agent code to access its functions—for example, to automatically issue a prompt to ChatGPT and evaluate the response.

Beyond interfacing with the AI system, your agent will need to read and make sense of other data sources. To use Microsoft’s examples I cited above, an agent that reconciles financial statements probably would need to be able to read Excel spreadsheets. An agent processing shipping invoices or returns would have to understand PDF files or Word documents and be able to interface with back-end Enterprise Resource Planning (ERP) systems that contain customer and product data. APIs exist for all this as well, and your code will have to connect them all together.

Bear in mind, an agent is really an algorithm—so you still need to code the steps it must take to complete its task. There are lots of well-documented programming techniques like rules for procedural tasks, decision trees for predictive tasks, classification algorithms for large data analysis, collaborative filtering for recommendations and many others that can help, depending on what your agent is supposed to do.

Once you’ve completed and tested all this coding, your agent is good to go. Simple enough, if you’re reasonably experienced in Python coding. Even better, you may not actually have to do all that work for yourself. Most purveyors of generative AI systems are now putting visual agent development tools into the hands of end users who don’t need to know any coding. Microsoft, for example, promotes Copilot Studio as a tool for users of any skill level to create agents simply by telling the tool what they want to do in their natural language. Has anybody considered the administrative and security burdens that user-created agents might impose on companies’ already taxed IT departments?

Error history
When I consider the above description of what an agent is and what it does, it strikes me that agents aren’t much different from ordinary computer programs that have been around since at least the middle of the last century. The only difference, I think, is that an agent might have some knowledge of its state between iterations of its execution, and therefore may improve at its assigned tasks much like an ML system would use reinforcement learning.

Given that AI in other forms has existed for decades, the idea of agentic AI as a new third wave is also exaggerated. Agent-like programs that leverage Neural Networks, Machine Learning or types of expert systems are not new, and one could certainly argue that much of the field of Robotic Process Automation (RPA), including customer service chatbots, is already agentic, or agential—choose your own preferred adjective.

What’s new, then, is only the idea of agents built on generative AI. And this is where I have a problem with the utility of agents, just as I do with generative AI itself. Remember that a successful working relationship with an agent is built on trust and specialized knowledge. To trust a computer agent, I need to believe in its ability to complete its task accurately. An agent that will reconcile financial statements or provide product information to my client had better be good at mathematics or English.

But even IBM, whom I consider to be a leader in AI governance, admits that its product, Watsonx, is fallible. I recently attended an IBM webcast which covered Watsonx agents but also introduced the idea of “uncertainty quantification” in generative AI—which simply means an expression of the confidence the system has in the correctness of its answers.

The company offered a few examples: When asked where IBM is headquartered, Watsonx gave Armonk, NY as its response with 85% certainty. (I would have thought IBM’s own product might have felt a bit better about this 100% correct answer, but OK.) Then, when asked to divide 20,438,502,840 by 2,349 it came up with “approximately 8,705,760.57” at only 5% certainty. The correct answer, any calculator will tell you, is 8,700,937.78. Watsonx was only off by less than one tenth of one per cent, which might be good enough in some situations, but do you think that Neil Armstrong and Buzz Aldrin (who Watsonx correctly identified, again with 85% certainty, as the first humans on the moon) would have accepted a .001 margin of error in calculating the Apollo 11 trajectory? Finally, when asked how many times the letter ‘r’ occurs in the word ‘strawberry’, Watsonx answered zero, but again admitted only 5% certainty. Maxwell Smart, I think, would at least have been able to get this question right.

I suppose I am encouraged by the fact that IBM’s generative AI technology at least recognizes its weaknesses. It’s a step in the right direction and can help make AI agents useful, but I do worry about the lack of certainty for even easy questions like IBM’s headquarters or the first moon landing. I think restricting the knowledge domain to specific use cases and carefully curating the training data, as IBM does with its Granite foundation models and InstructLab tools, will also help. But when it comes to broad consumer-oriented generative AI, it’s not enough for me to be able to rely on an agent to complete even basic tasks. If I have to be constantly checking the agent’s work for accuracy, I’m not sure that there’s much productivity gain. I might as well do the work myself.

Shop ‘til the agent drops
Here’s another example: I’m writing this article in December, during peak holiday shopping season. I abhor going to malls, and as I get older, I even find online shopping tedious. So I wondered about a recent article in TechCrunch describing the race to create AI agents that will automate online shopping. Wouldn’t it be nice to just give the agent my wish list and credit card, and let it cruise through all the online stores for me?

It turns out, this is quite a bit harder to accomplish than it seems. Part of the profitability of e-Commerce is its ability to bombard the online shopper with advertising, cross-sells, up-sells and other promotions. An AI agent could in theory bypass all that and query the e-Commerce site directly via back-end interfaces, to get specific product and price information and even make purchases. Retailers are trying to put up defenses to block agents, coming up with better ways to determine if a user is human.

Therefore, according to TechCrunch, companies like Perplexity are building new kinds of agents that either scrape data directly from web pages, or else mimic human behaviour by entering keystrokes and mouse clicks into the web sites. Assigned to buy a particular product, the agent would either use the web site’s search function, or else browse categories, add to cart and finally check out just as you would.

Either way, I still worry about trust and accuracy. I’m OK entering my credit card number into an encrypted payment page, but I’m not sure about handing it over to an AI agent that is not good at basic arithmetic. And could it accidentally click on an upsell I don’t want? I would have to be assured of substantial guardrails around the agent’s behaviour, something that hasn’t even been applied consistently yet for generative AI itself.

Perplexity’s initial results were not encouraging. The first test run by TechCrunch asked the agent to buy toothpaste on Amazon, and it came back with the unlikely response that it was out of stock. Later, it was able to complete a purchase, but the transaction took close to six hours, and the credit card payment went to Perplexity, not to the merchant. I suppose the business model—fee per transaction, or percentage of sale—has yet to be worked out, but I question the value of yet another intermediary in the payment process. For now, I will stick with going to stores, physically or virtually, myself.

Sorry about that, Chief
The comedic genius of Maxwell Smart lies in how he always succeeds despite his many failures along the way. Generative AI agents, I’m afraid, aren’t that lucky—they will bumble along much like Agent 86 but, for now at least, probably can’t be counted on to get the job done in the end. Until the industry figures out how to deal with the inaccuracy, bias and hallucinations that plague ChatGPT, Gemini, Claude and even Watsonx, let’s not expect flawless execution—maybe we can at least get apologies from our agents for their frequent missteps.

Get Smart borrowed the term 86 from the military, where it’s a euphemism for discarding something. Maybe we should just 86 the AI agents until we know we can get them right.

Emergent technology

Discussion about this post