I'm Ran, one of the people behind CopilotKit, a developer tool for building AI-native interfaces. Over the past year, we've spent a lot of time wiring agent frameworks into real frontends. stitching tool calls, tracking partial updates, and guessing when messages end. At some point, we realized we couldn't keep patching forever. Here's why, and how we started standardizing instead.
💬 The Problem With Agent Streams
Agent frameworks are growing fast. But they're all speaking different stream dialects.
Some emit partial deltas. Others send state snapshots.
Some give you raw tool call fragments. Others just dump the whole state tree.
None of them tell your UI what's actually happening, at least not in a way that's consistent or structured.
So every time you integrate a new framework, you end up doing the same thing:
guessing what the stream means, stitching events together by hand, and hoping your frontend logic doesn't fall apart the next time the backend changes.
If your product is tightly coupled to a framework's quirks, switching runtimes means rewriting everything downstream.
And if you're consuming the stream directly in your UI, the pain is even sharper.
You can keep patching. Or you can start defining what the stream should look like.
🔌 We Didn't Plan to Build a Spec
I'm one of the people behind CopilotKit, and for a while, my job boiled down to this: take whatever stream the agent framework gave us, and try to make the frontend behave.
Tool calls came in fragments. Messages had no clear end.
Sometimes state updated silently, sometimes it didn't.
We built logic to detect what kind of event was coming through, track it across updates, and do our best to keep the UI in sync.
LangGraph was a good example. It's flexible, powerful, and emits a raw stream that gives you everything, except clear semantics.
To support it, we wrote a stream adapter that watched every chunk and tried to reconstruct intent in real time.
- Was this a tool call? If yes, was it starting or midway through?
- If it had a name but no arguments, we assumed it was the beginning.
- If it had arguments, we guessed it was the middle.
- If the next chunk didn't fit either, maybe that meant it ended.
Same with messages. We'd accumulate content until a new event looked unrelated, then we'd assume the message was complete.
Tool call arguments? We had to track them manually and hope they lined up with the right context.
It worked for a while. Then bugs started to show up.
Unclosed tool calls. Missing arguments. Messages that never finalized.
Users reported them. We patched things. Then broke something else.
Each framework brought new quirks. Each bug brought more glue code.
And every time we thought "maybe this one is stable now," a small change upstream broke everything again.
At some point, we realized we weren't just adapting streams.
We were inventing structure. So we wrote it down.
Not as a one-off adapter, but as a shared format anyone could use.
Something that defined how these streams should actually behave when they reach the UI.
That became AG-UI.
🪄 Enter AG-UI: The Adapter That Actually Adapts
AG-UI, at its core, is a protocol.
It defines a common structure for how agent runtimes can emit events, and how frontends can consume them.
It's not a library, and it doesn't tell you how to build your agent.
It just describes the shape of the data that moves between runtime and UI.
The idea is simple:
Agent frameworks can stream whatever they want internally, but when it comes to the frontend, they emit a consistent set of event types.
These include things like when a message starts, when a tool is called, or when shared state updates.
Once you have that structure, the frontend doesn't need to guess what's happening.
You can build components that listen for specific events and respond accordingly.
And you can swap out the agent backend without rewriting the UI every time.
That's the role AG-UI plays. It sits between your agent runtime and your React tree, and gives both sides something to agree on.
🧩 How AG-UI Works in Practice
AG-UI defines a set of structured events that describe what's happening during an agent run: in terms of message flow, tool usage, shared state and more.
Some of these events map directly to what frameworks already emit. Others, like STATE_SNAPSHOT
, represent higher-level behaviors that aren't always explicitly available in the raw stream, but make sense to the "Agentic Experience" (the term I use to describe usage of AI Agents that is made for humans to work with).
There are two ways these events get produced:
1. Native support
Some frameworks can choose to emit AG-UI events directly. This means they adopt the event structure internally and produce streams that conform to the spec out of the box.
2. Adapter-based support
When native support isn't available, a translation layer can be created. Typically as a subclass of AG-UI's abstract Agent
class. Each agent implementation (e.g. LangGraphAgent
) wraps the native stream and transforms it into AG-UI events in real time.
The Agent
base class provides a shared interface with methods and properties like run()
, runId
, messages
, tools
, and more. The run()
method returns an observable stream of AG-UI events, no matter which backend is used.
Once this structure is in place, the frontend doesn't need to know or care which framework is running behind the scenes. It just listens to a clean stream of well-defined events.
📑 The Event Types That Make Up an Agent Run
AG-UI includes a small, focused set of event types designed to support most real-time agent interactions:
🗨️ Message Events
TEXT_MESSAGE_START
: A message has begun streaming (e.g. assistant starts typing).TEXT_MESSAGE_END
: The message has completed. Useful for locking in output or triggering animations.
🛠️ Tool Call Lifecycle
TOOL_CALL_START
: A tool is being invoked. Includes metadata liketoolCallId
andtoolCallName
.TOOL_CALL_ARGS
: The arguments being passed to the tool. Enables live rendering of param inputs.TOOL_CALL_END
: The tool call has returned. Includes the result payload.
Each tool event shares the same toolCallId
so the frontend can track the call as a lifecycle.
📦 State and Snapshot Events
STATE_SNAPSHOT
: A structured snapshot of current state, useful for syncing or debugging.MESSAGE_SNAPSHOT
: A full or partial message state reconstruction. Handy when full messages aren't emitted explicitly by the backend.
🧩 Custom Events
CUSTOM
: Lets you define custom signals not covered by the core spec, like"PredictState"
. These can still be handled uniformly by shared UI components.
Together, these events form a stable contract between agent logic and UI behavior, making it possible to build reusable components that work across frameworks, without needing to reverse-engineer every stream format from scratch.
🌱 Ecosystem & Adoption: Growing the Standard
AG-UI seems to resonate with the people actually building with agents.
We've seen everything from community-built adapters, to blog posts, LinkedIn threads, and YouTube videos.
It's already integrated with several agent runtimes, either natively or through adapters:
- ✅ LangGraph
- ✅ Mastra
- ✅ Agno
- ✅ Vercel AI SDK
- ✅ AG2
- ✅ LlamaIndex
Some emit AG-UI events directly. Others, like LangGraph, are supported via wrapper classes that map their native stream into structured events using the AG-UI spec.
Adapters are easy to write. If a framework emits a stream, it can be made to emit AG-UI. No changes to the runtime required.
The SDKs in JavaScript and Python are small and purpose-built to sit between an agent backend and a frontend UI.
Once that layer is in place, everything downstream gets simpler: shared components, dev-tools, run replay, even multi-agent orchestration.
🎯 Finalized Conclusion
Building UIs on top of agent frameworks has meant writing the same stream logic over and over.
AG-UI came out of trying to stop that, first for ourselves, then for others.
If you're building an AI product, and you're tired of translating agent internals into frontend updates, AG-UI is worth a look.
Curious? Start here:
- Check out the docs
- Try the TypeScript or Python SDKs
- Join the discussions
- Or just build your own adapter and emit events in your own stack. The protocol is open and extensible.
The protocol is open. The stream is yours to shape.