AI Agents: From Definition to Deployment

illustration

By Jussi Rasku, Vice Head of GPT-Lab, leader of the work package on agentic development in the AI Champion project

There is no universally agreed definition of an AI agent, and the current surge of interest from different industries, together with hastily thrown together AI solutions from established and up-and-coming providers, only adds to the confusion. In this post, I use the term AI agent to refer to a system with a specific purpose and role, where a large language model (LLM) acts as its “brain.” To be an actually useful agent, AI also needs to have access to tools that allow it to read external information, such as browsing the web, to write documents, and to interact with other systems, agents or people. Such tool use is sometimes further divided into perception (input) and execution (output, including artifact generation and manipulation). An agent may also include some form of memory, enabling it to handle longer-running tasks. The key distinguishing feature between non-agentic AI and agentic AI is autonomy: AI agents operate with more independence than basic chat-based assistants.

That said, the boundary between chat-based assistants and AI agents is becoming increasingly blurred. Major providers such as OpenAI (ChatGPT), Anthropic (Claude), and Microsoft (Copilot) have introduced many agent-like capabilities into their systems, including advanced tool use and even the ability to launch sub-agents for handling multi-step tasks. Still, these are typically designed to be general-purpose helpers whereas agentic AI is usually created to automate some specific type of work.

Tool use, from primates through cows to LLMs

To me, tool use is what fundamentally separates traditional AI systems or standalone LLMs from agentic AI systems. Many animals, such as primates, crows, otters, elephants, and even cows, are capable of using tools, typically for shelter, accessing food, or to scratch an itch. What we now have with AI agents is something new: a non-human intellect, an LLM, capable of using tools to perform complex information processing and even generation. This opens an entirely new class of problems that can be solved with AI. I would even go as far as to argue that tool use is, in fact, the main driver behind the growing interest in agentic AI: foundation models are increasingly fine-tuned for tool use, making them more skilled and flexible, which in turn enables agents to handle more complex tasks.

Large jobs break large models; small jobs fit small ones

With advances in tool use and other capabilities, METR has reported an interesting trend. The length of a software engineering task that an AI agent can complete independently with a 50% success rate has been doubling approximately every seven months. At the current pace, only a few such doublings are needed before AI agents can independently complete tasks that would take a human an entire workday.

Since the tasks AI can currently handle reliably are still relatively small, more complex workflows must be decomposed into smaller steps. These steps must acknowledge what current technologies can handle. A common approach is to assign each step to a specialized AI agent. These are then composed into multi-agent systems (MAS) where agents pass their work to the next agent. MAS have proven to be a flexible approach that can be adapted to solve more complex automation tasks. For example, our research group has applied this approach to generating optimization procedures and solutions for operational planning tasks, where agents identify and mathematically model a problem, construct solution methods, and iteratively refine outputs.

Decomposing large tasks to smaller steps has the added benefit of making the systems more robust and more deterministic. There is still inherent randomness because of LLMs, but the range of possible states of the system is narrower. Decomposition also allows the use of small language models (SLMs) that can be fine-tuned to specific subtasks, significantly lowering costs and computation requirements.

Failing to keep the human in the loop

Because agentic workflows remain fragile, a human must be kept in the loop. This means building interfaces where agent outputs can be reviewed and edited before acceptance. It is way harder than it sounds: LLMs can easily overwhelm a human reviewer with sheer volume of output. The gravity of this should not be overlooked. Too many agent UIs are built to impress in a demo rather than to support a human who needs to review their outputs. Human attention is finite, and an interface that ignores this turns “human in the loop” into a reflexive click-through. Yet the human remains responsible for the correctness and lawfulness of whatever the system produces.

Finally, from definitions to deployment

The building blocks of an AI agent, namely role, tools, memory, and the LLM, are by now relatively well formed. Much of the current work at GPT-Lab focuses on how agents are defined, orchestrated, and how their work can be reviewed. So far, most agents and agentic workflows have been implemented in code, often in Python, using frameworks that provide structure for agent behavior. More accessible alternatives have also emerged: tools such as n8n enable visual programming and integration with various data sources and systems, making agent development more approachable.

Within the AI Champion project, GPT-Lab has been tasked with building an AI Agent Library that serves as a showcase of agentic solutions developed by the consortium. The library supports the discovery of new ideas and applications, allows users to observe these systems in action, and facilitates the customization and deployment of agent-based solutions. To support this, we have been exploring different declarative languages for defining AI agents. The first agents are being created, and our task is to define them in runtime-agnostic ways, allowing the full consortium to benefit from the results.

Several candidates already exist, including Open Agent Specification or AgentSpec from Oracle, and AgentSchema from Microsoft, along with various other industry initiatives. Academic efforts have also contributed to this space. For example, our ANSE project, which we are implementing together with the University of Jyväskylä, has proposed a declarative modeling system called PRISM that will be the subject of a future blog post. Despite, or perhaps because of high activity in this space, it remains fragmented, and no de facto standard has yet emerged.

On the next layer are the runtimes and deployment infrastructure for such agents. This introduces concerns around cost and capacity management, monitoring, reliability, quality of service, and dynamic resource allocation. These infrastructure concerns are precisely what the GPT-Lab Sandbox, developed by my fellow vice head Dr. Waseem’s team, is designed to address. The sandbox gives us a platform to host, experiment with, and share the AI agents we build.

Which brings me back to declarative definitions. A similar pattern was seen in cloud computing, where provider-agnostic infrastructure-as-code solutions (e.g., Terraform) gained traction to enable more efficient orchestration and deployment of services. As organizations adopt agentic AI, similar needs are likely to arise. Interest in declarative, provider-agnostic languages for defining agents and orchestrating workflows is expected to grow. We at GPT-Lab will follow this development closely and contribute to how agentic AI systems are defined, developed, and deployed.