Agents: An optimistic painting

Agents, at least in their previous definition, have been around for a long time, before they ever adopted their current mantra. I'd like to explore a positive future, where there is some level of semi-autonomous, but still directed control, where a guiding intelligence holds the hand of many.

Main Questions

At what point can you trust an agent to fulfill it's task without human intervention; at what point is this a good idea? With current tools, what are the best approaches? I'm not here to unpack everything, but I'd like to explore an idea.

Definition

The Merriam-Webster dictionary defines an Agent as

"a means or instrument by which a guiding intelligence achieves a result"
"a computer application designed to automate certain tasks (such as gathering information online)"

You can check the Wayback Machine and it's the same definition back over 5 years, which makes sense, or the wayback snapshot is calling the current API.

Today's Agents

Most of today's "agents" are semi-autonomous apps that orchestrate LLMs, third-party APIs, and SDKs to achieve a narrowly defined goal. They might edit an IDE’s buffer, issue DOM commands in a browser, or drive other software. Given the same architecture, it could extend into the physical world.

There are parallels with agents and with RTS (Real-time strategy) games; one such example is StarCraft, which is one of the most famous:

In StarCraft, the best players in the world have very high APM (actions per minute) and the lowest action latency. With a ranking chart:

APM has the highest correlation with win-rate of any variable at p=0.65. We also see that the action latency of GrandMasters is the lowest of all the other ranks.

So, the more relevant actions you do, the better. And the more decisively you can execute those actions plays a role.

Being fast and correct matters.

How do RTS games inform Agents?

We know from StarCraft the more decisively correct actions you perform per minute will make you better. But there's something we must acknowledge about current Agents: they're built on LLMs, which are constantly evolving. This is both good and bad. On one hand, you have to update your system instructions and get used to the new changes. You also need to scrutinize the model itself. The AI labs already do this with their evaluations and testing, but the user (director) needs to evaluate it for themselves.

An Ideal UX

I'm an engineer and not a UX expert, but despite that, it's not hard to give suggestions on what a better UX could be. RTS games often have units on the playing field, which the player will select groups of units and give them actions (move here, attack this, wait here, etc). They also have indicators that show how their units are doing:

A better UX for agents is a little murkier and still being fleshed out. If we're just scoping it to ai/ml/software engineering, people have different preferences but there is a pattern. For instance, some prefer neovim in the terminal. Others prefer full fledged IDE's or a hybrid IDE + Vim extensions and a terminal UX.

These UX preferences will change, but as long as we scrutinize LLM output (which is becoming less needed, but from a maintainability perspective, LLMs still prefer architected and well-reasoned code) we need a way to see changes the agents make and rapidly iterate on those changes - if you have to click around a bunch of panels, you're not going to be as productive.

Multi-Agent Workflows

One interesting feature that Claude Code now has is the ability to work on different branches simultaneously via Git worktrees, allowing you to effectively run agents in parallel, and you could direct them as you see fit. I think it's a good starting point.

Below, this picture is by Masataka Mayuzumi, and their article "Claude Code for Parallel Development". You can see they have their normal editor, Claude Code in the middle, and then Cursor's agent on the far right.

In pursuit of a better UX

I personally don't use Cursor's agent and prefer a split screen with Claude Code. I'll use vim + a small prompt modal to select blocks of code for more fine-grained work. Claude Code's tooling is exceptional, but I'd love to see open source alternatives that allow for local LLM work with Ollama and a way to tweak its tooling.

Some way to rapidly traverse between working agents, not via clicking, via a smart context menu, where we can disseminate the results of the steps of the agents and quickly review them would be ideal. I'm not sure what the ultimate end UX is, but being able to rapidly iterate with outputs from several agents sounds much better than today.

A quick UX sidebar

One of the common patterns of UX is agent's have a chat pane that allows people to communicate with it. I strongly prefer an ephemeral modal that can be collapsed at any time, that allows your attention to not gravitate away from the editor itself, since doing that will break your working context. Plus, without extensive keyboard support, you're going to be slower than someone who can manipulate their environment at will.

If you are working on this agentic software tooling, it'd be nice to take into account;

Well defined UX that is traversable by keyboard, does not require significant mouse movements
Outputs are intentionally presented to the user (If you have to scroll in a massive list to see the diff, or to see the reasoning, this is harder to parse quickly).
Built-in git worktree support or alternative (work on multiple branches simultaneously), with the outputs presented elegantly and easy to follow.
My personal preference is a nice modal that is customizable and doesn't get in your face so you can oversee what your agents are doing.

A case for optimism

As long as the UX is drastically improved, agents have the capacity to not only drastically improve our productivity, but the overall amount of software and digital products built. Now, whether the human stays a director is another question; I'm of the opinion if you're building for humans a human should be in the loop. But those questions are best addressed elsewhere.

Let me know how you imagine the next agents.