Tim Cvetko

AI Engineer’s Guide To Building Autonomous AI Agents in 2 Mins

April 25, 2024

Any Agent. Any Task. Any LLM Model. Anytime. 2 Mins.

AI Agents have become powerful at transcribing human instructions into tasks like:

Connecting Slack and Drive accounts
Transcribing Youtube Videos into SEO-optimized Content
Or planning a trip to Hawaii(AgentGPT)

This is guide for any AI Engineer looking to build autonomous AI Agents. Let’s get a few things cleared up before we start:

an LLM is an ML model that outputs semi-reasoning text based on learned behavior
an LLM application is a product where LLMs are deployed in production to accomplish a given completion task
an AI Agent is a software entity designed to act autonomously within its environment, making decisions or taking actions to achieve specific goals.

Image by Author: Agent in Larger Environment

‍

Here’s What You’ll Learn in This Article:

How we Go From LLM Outputs to AI Agents Performing Tasks
How you can Build any Autonomous AI Agent from Scratch in 2 Mins

Introduction to AI Agents

How we Go From LLM Outputs to AI Agents Performing Tasks

Agents = LLM outputs + Task Completion Based on that Output.

‍

Okay, so How do We Get AI agents to Complete Tasks Based on Output?!

The simple answer: we make the LLM output a precise & actionable response that the AI Agent can parse into further tasks.

Let me give you an example — let’s say you wanted to:

Collect relevant Arxiv paper using a search query
Generate a bar chart of the applications domains and the numbers of papers in each domain.
Create a bar chart.

In that case, you would prompt the LLM via a prompt template to output the desired code:

task1 = '''
This recipe is available for you to reuse..

<begin recipe>
**Recipe Name:** Analyzing and Visualizing Application Domains in Arxiv Papers

**Steps:**

1. Collect relevant papers from arxiv using a search query.
2. Analyze the abstracts of the collected papers to identify application domains.
3. Count the number of papers in each application domain.
4. Generate a bar chart of the application domains and the number of papers in each domain.
5. Save the bar chart as an image file.

Here are the well-documented, generalized Python functions to perform the coding steps in the future:

```python
import requests
import feedparser
import matplotlib.pyplot as plt
from typing import List, Dict

def search_arxiv(query: str, max_results: int = 10) -> List[Dict[str, str]]:
"""
Search arxiv for papers related to a specific query.

:param query: The search query for arxiv papers.
:param max_results: The maximum number of results to return. Default is 10.
:return: A list of dictionaries containing the title, link, and summary of each paper.
"""
base_url = "http://export.arxiv.org/api/query?"
search_query = f"search_query=all:{query}"
start = 0
max_results = f"max_results={max_results}"
url = f"{base_url}{search_query}&start={start}&{max_results}"
response = requests.get(url)
feed = feedparser.parse(response.content)

papers = [{"title": entry.title, "link": entry.link, "summary": entry.summary} for entry in feed.entries]
return papers

def generate_bar_chart(domains: Dict[str, int], output_file: str) -> None:
"""
Generate a bar chart of application domains and the number of papers in each domain, and save it as an image file.

:param domains: A dictionary containing application domains as keys and the number of papers as values.
:param output_file: The name of the output image file.
"""
fig, ax = plt.subplots()
ax.bar(domains.keys(), domains.values())
plt.xticks(rotation=45, ha="right")
plt.xlabel("Application Domains")
plt.ylabel("Number of Papers")
plt.title("Number of Papers per Application Domain")

plt.tight_layout()
plt.savefig(output_file)
plt.show()
```

**Usage:**

1. Use the `search_arxiv` function to collect relevant papers from arxiv using a search query.
2. Analyze the abstracts of the collected papers using your language skills to identify application domains and count the number of papers in each domain.
3. Use the `generate_bar_chart` function to generate a bar chart of the application domains and the number of papers in each domain, and save it as an image file.

</end recipe>

Here is a new task:
Plot a chart for application domains of GPT models
'''

This code would result in the following bar chart!

And that is how AI Agents work under the surface. Taking these agents into production environments entails few more steps:

Task Breakdown: This involves parsing the output from the language model into actionable steps or commands.
Execution: Translate the output from the language model into specific actions or instructions that the AI agent can understand and execute. This may involve interfacing with other software components or systems to perform tasks such as data retrieval, computation, or manipulation.
Iterate: Establish a feedback loop to evaluate the success of the AI agent’s actions and adjust its behavior accordingly. This may involve monitoring performance metrics, user feedback, or real-world outcomes to continuously improve the agent’s ability to complete tasks effectively and efficiently.

Here are some of the most popular frameworks for building AI Agents:

How you can Build any Autonomous AI Agent from Scratch

Alrighty, here are the 3 Steps to building an autonomous AI Agent:

Define the LLM Model
Create a Prompt Template
Define the process_command function

Let’s go through them 1 by 1.

‍

Step 1: Define the LLM Model

Let’s bring in the Llama-2 model in.

Step 2: Create a Prompt Template

In this example, we will be building a simple color changer based on prompt. A JSON-parsed output is prompt-engineered into our Llama-2 model.

Image by Author: Defining the Prompt Template

‍

Step 3: Define the process_command Function

The process_command function ensures the LLM outputs get transcribed correctly into resulting tasks. This is the true AI Agent step.

Image by Author: the process_command function

That’s it. That’s the simple 3-step process for defining any AI agent.

Conclusion

AI Agents are autonomous agents performing tasks based off LLM output.
The way to force LLMs to output meaningful responses is to perform prompt engineering via prompt templates.
You can recreate any AI Agent with 3 simple steps:

Defining an LLM model
Creating a Prompt Template
Defining the process_command function

Enjoyed This Story?

Thanks for getting to the end of this article. My name is Tim, I work at the intersection of AI, business, and biology. I love to elaborate ML concepts or write about business(VC or micro)! Get in touch!

Subscribe for free to get notified when I publish a new story.

Get an email whenever Tim Cvetko publishes.
Get an email whenever Tim Cvetko publishes. By signing up, you will create a Medium account if you don't already have…timc102.medium.com

‍