March 23, 2026

Good AI agents need good managers

Written by

Maxmilian Aoki

When you think about a "good" AI agent, what comes to mind?

You might think of a frictionless experience for the end user – where their questions are answered on the agent's first try, and actions are taken on their behalf.

Or maybe you think about it from a builder perspective: it's the seamless integration of tools and custom connections that come together on your platform of choice – allowing your team to assemble even the most unique solutions in a way that feels intuitive.

The thing is, these two pictures offer two very different perspectives on AI agents. One captures the end result and business goal. The other captures the technical implementation. Both are equally important, but together they don't tell the full story.

Because the backbone of any successful agentic solution is good management.

Good AI agents need good "managers". And we're not talking about any specific role – agent management isn't necessarily coupled with a position, like the product owner or CX designer. Agent management is a mindset that anyone on a CX team can start embodying to set their team up for success.

But this is easier said than done, because the role of an agent "manager" is changing at the same rate that AI agents themselves are evolving.

Let's go back in time to illustrate.

Agent management used to be plug-and-play.

Think back to when agentic systems weren't agentic at all. The humble origins of today's AI agents were NLU systems that would direct a user through a "tree" of responses, where each response and user intent was pre-scripted. Agent "managers" took a very hands-on approach, and "performance review" was technical:

"Is the agent covering every reasonable use case?"

You could literally map this. Draw your decision tree, count the branches. Coverage was measurable: "We handle 47 scenarios."

"Are the intents triggering correctly?"

You'd test with variations: "I want to return my order", "Can I send this back?". If the NLU failed, you'd add more training phrases. The fix was mechanical.

"Are the scripted responses appropriate?"

You wrote them, so you already knew the answer. Performance review meant reading your own scripts and asking "Is this how we want to sound?".

This kind of management was time-consuming, but it was auditable. You could point to any part of the system and say with certainty: "This is what will happen when a customer does this."

Now contrast this with modern agentic systems.

Today, agents don't follow a tree. They reason about what to do next based on context, guidelines, and goals. You can't map every possible conversation because your agent is generating paths in real-time based on what the (endearingly unpredictable) end user actually says.

This is the power of the LLM integrated into a legacy system to create something fundamentally new. In customer service, "new" isn't an incremental advancement – it's a paradigm shift.

Which means the old rules of management no longer apply. Much of the agent's behaviour has become abstracted, and on first glance this feels like relinquishing control. That comes with reasonable fears:

"If I don't script the agent, it'll say something wrong."

"I'll lose control of the brand experience."

"I don't know how to measure success anymore."

As the agent has become more capable, the stakes have been raised for its manager. It's now the manager's responsibility to ensure consistency in an "employee" that behaves probabilistically.

The best platforms anticipated this shift.

Many platforms on the market will get you a good agent. Voiceflow is purpose-built for the teams that also need good agent management – which is what makes agents adaptable, iterable, and scalable to complex business use-cases.

So what does the new management framework actually look like?

Think about the best managers you've worked with. They don't hand a new hire a script for every scenario – that's impossible. But they also don't say "figure it out" and disappear. They do something more nuanced: they set the identity, values, and tone. They establish SOPs for the things that have to go right every time. And for everything else, they say: here's the goal, here are the boundaries – use your judgment.

That's the model Voiceflow was designed around.

The agent's identity lives in a global prompt – the employee handbook. Below that, playbooks give the agent a goal and room to reason through how to achieve it, adapting based on what the customer actually says. Workflows are the opposite – deterministic SOPs for tasks like refund processing or compliance checks that demand precision every time.

And crucially, these compose together: a workflow can invoke a playbook for flexible reasoning mid-sequence, and a playbook can hand off to a workflow when the conversation enters regulated territory. The agent moves between structure and autonomy depending on the moment – which is exactly what good management enables. You don't choose between trusting your people and holding them accountable. You do both.

But this is all about fostering great performance. How do you improve it?

The performance review just got a whole lot more powerful.

Remember the one undeniable advantage of legacy customer support automation? Auditability. You wrote the scripts, so you knew what the agent would say. With agentic systems, that certainty disappears. Your agent is generating novel responses thousands of times a day. The fears we mentioned earlier – it'll say something wrong, I'll lose control – are really fears about losing visibility.

Voiceflow's observability suite addresses this directly.

It starts with full visibility into what's actually happening. Every conversation captured, every decision the agent made traceable – not just what it said, but why it said it. Which skill it invoked, which knowledge source it referenced, where it exercised judgment versus where it followed a strict process. This is the difference between reading someone's email and understanding their reasoning. Without it, you're guessing.

That visibility feeds into structured evaluations. As the manager, you define what "good" looks like for your business, and the system evaluates every interaction against those criteria automatically.

Then, staging environments let you test changes safely. Version control lets you roll back.

This creates a compounding feedback loop: observation surfaces insights, insights drive improvements, improvements generate new data that feeds back into observation. Your agent gets better because you're paying attention – and paying attention gets easier because the tools are built for it.

Resource allocation has to shift in favour of management.

The old way takes more upfront time (building all the scripts) but less ongoing time (the tree doesn't change unless you change it).

The new way takes less upfront time (write guidelines instead of scripts) but more ongoing time (continuous monitoring and iterating).

And the teams who succeed aren't the ones with the best technical skills. They're the ones who understand that good AI agents, like good employees, need good managers.

They need clear guidelines, not micromanagement. They need coaching, not scripts. They need someone operating at a strategic level who is prioritizing long-term growth, not just fluent in tactical details.

The teams that figure this out won't just have better chatbots. They'll have a fundamentally different kind of customer experience capability – one their competitors can't easily copy, because it's not just about the technology.

It's about how you manage it.

Here's my challenge to you: In your next "one-on-one" with your conversational AI system, ask yourself:

Am I being the manager this agent needs?

Let's work together to level up your AI agent leadership.

Build AI agents with complete control

Get started, its free