Demystifying AI Agents in Crypto

My first exposure to the intersection of AI and crypto was in December 2023 when I heard about Bittensor, a collection of AI projects federated around a blockchain. Fast forward to today and oh boy, the web3 AI train is in full swing. But beyond the hype, there is a real sense of the industry overselling and doing a lot of things that aren't really AI.

So I want to take this post as an opportunity to classify different types of "AI Agents" in the crypto space.

What is an AI Agent?

First of all, let's define what an AI Agent is. At its core, an agent is a system that can take autonomous actions.

We will define 3 levels of autonomy:

Executing a process when a user asks a question.
Repeating processes autonomously after being defined by a user.
Multi step, long running processes that operate fully free of user input.

The key technological paradigm powering all these levels is called Function Calling or Tool Usage.

This is the process of creating code and asking the LLM the best way to execute that code. Imagine we wanted to make an agent that helps users invest in bitcoin.

LLMs are powerful at understanding and generating natural language, but they are not able to search the web or perform financial calculations. Let's see how this would look without tool calling.

What are the latest news headlines about bitcoin?

As my knowledge cutoff is December 2023, I'll provide you with some of the recent news headlines about Bitcoin up to that point. Please note

That message on it's own is even quite dangerous! Outdated information can be dangerous when it comes to financial decisions. What about instead if we asked our LLM to come up with a query to search for news about bitcoin?

What are the latest news headlines about bitcoin?

{"query": "latest bitcoin news", "params": {"news": "true"}}

CoinDesk: Bitcoin, Ethereum, Crypto News and Price Data

Leader in cryptocurrency, Bitcoin, Ethereum, XRP, blockchain, DeFi, digital finance and Web 3.0 news with analysis, video and live price updates.

Now we are getting somewhere! We can now search the web for news about bitcoin. This is the essence of tool calling. This was a good example of a first level of autonomy. These agents are simply responsing to a user's input and executing a predefined process. This is essentially a new form of user experience driven by natural language instead of user clicks or forms.

Level 2: Autonomous Agentic Processes

The second level we defined is where things start to get interesting! Humans have defined processes since the industrial revolution. A nice difference now is that we can now use language models to stick these processes together with tool calling. Let's look at an example.

As a human we want to invest in bitcoin:

daily we'll look at news of bitcoin going up or down
We'll analyze this news and decide if we want to buy or sell
We'll buy or sell based on our analysis

Now let's explore how this translates in the second level of autonomy. We'll graph the architecture of this process and make it "agentic".

Here we see a typical investment algorithm that invests based on sentiment analysis of news. However, the key difference is that we are driving the process with a language model and a series of tools. This could still exist before the launch of chatgpt and when agents were not a thing.

The key difference is in the availability of power cheap compute and the rapidity. Such a process was essentially restricted to rich organizations with the following capacities:

A Machine Learning and Data Science team focused on fine tuning a modle for analyzing bitcoin news
An operations team deploying the models in a python environment on AWS or cloud providers
An integrations team building the execution layer for the model and the purchasing of bitcoin

Now a solo developer can build this in a few hours simply by calling APIs and running this on their local machine. This is where the concept of agents starts to make sense. Automation is now a thing that can be done by thinkerers and even non-technical users given the rise of no-code automation tools..

But this is still a semi-autonomous process defined by a human to run each morning. Let's look at the third level of autonomy.

Level 3: Autonomous Agentic Processes

Now let's explore what kind of process we could build at this level. There have been attempts at fully autonomous systems but they always hit fundamental issues:

Reasoning is not possible for LLMs!

No matter matter what openAI touts about O1 and reasoning chains, LLMs are still fundamentally prediction of text. Reasoning chains is the process of cutting a problem into multiple pronpts and tool calling:

Human prompt: Should I buy bitcoin?

LLM receives an internal prompt: "analyze this problem and cut it into smaller problems that form a solution"
"find news about bitcoin" -> "is the news good or bad for bitcoin" -> "analyze price charts of bitcoin" -> "how does the news affect the price charts" -> "should I buy bitcoin?"

This might be a good start but it it faces a really common problem most developers face when coding with AI:

The prompting valley of death!

The prompting valley of death is a big reason why Devin and other AI agents are not yet replacing developers.

You make fast progress generating code and stiching it together via AI
You get more and more complexe as your application needs refinement to match real life applications
You hit a wall where prompt over prompt just makes things worst
You take over and fix the issue in 5-6 minutes coding manually.

This is important context to understand the limits of web3 agents. Prompting is a powerful tool but it is not a silver bullet. We hit the fundamental limits of LLMs when we give full autonomy.

So how do we get around this?

The general answer is that we build specialized systems and we stitch them together to form a more complete semi autonomous system. A single well defined prompt always gets great results compared to a loose set of prompts. This is caused by the concept of "attention" in LLMs. LLMs can be given vasts amount of information but like a human, will focus on what it considers the most important.

Knowing this let's explore how we could build a web3 agent that runs a meme coin. We'll split it into parts:

an agent process that checks news about popular memes happening
an agent process that generates Solidity code for a meme coin
an agent process that posts on twitter about the meme coin
an orchestrator script that runs each agents and stitches them together

This then starts to run as a continuous script but the logic between each agent is defined by a human planning the process.

We won't dive into the code here but coding these systems essentially comes back to good old software engineering mixed in with LLM and Tool calling. If we move beyond the hype and into the reality of what we can build with AI agents, we can see that there is a lot of potential but there is nothing magic about it.

We'll need to give the agent a database, webhooks, and other tools to make it work. It becomes a matter of stitching together a system that can be run in a loop.

Conslusion

Conclusion The intersection of AI agents and crypto represents an exciting frontier, but it's crucial to separate the hype from reality. While we're not yet at the stage of fully autonomous AI agents running complex crypto operations, we are seeing meaningful applications emerge at different levels of autonomy.

The key takeaway is that successful AI agent implementations in crypto tend to be well-scoped, specialized systems rather than catch-all solutions. By breaking down complex processes into discrete, manageable tasks and combining traditional software engineering practices with LLM capabilities, we can create valuable tools that enhance rather than replace human decision-making.

The future of AI agents in crypto likely lies not in building fully autonomous systems, but in creating thoughtful, semi-autonomous processes that leverage the strengths of both LLMs and traditional programming. This means embracing the limitations of current AI technology while taking advantage of its ability to process natural language, analyze information, and execute specific tasks efficiently.

For developers entering this space, the focus should be on building robust, practical systems that solve real problems rather than chasing the illusion of full autonomy. By understanding these constraints and opportunities, we can create meaningful applications that advance the field without falling into the trap of overselling AI's current capabilities.

The real innovation isn't in creating magical AI agents that do everything, but in thoughtfully combining AI capabilities with solid engineering practices to build tools that genuinely improve how we interact with crypto systems.