Voiceflow named a 2026 Best Software Award winner by G2
Read now

Every article on building an AI customer service team starts with the same move: a list of new job titles. AI Manager, Knowledge Manager, Conversational AI Designer, AI Trainer, AI Agent Owner, Director of ACX. The implicit promise is that if you hire those five people, you’ll have an AI program.
Most of the CX leaders we talk to know that’s not how it works. They have a hiring freeze. They have an org chart they didn’t get to design from scratch. And they have a very specific question: who on my current team is going to do this, and what do they need to learn?
This piece answers that. There are four functions every AI agent program needs covered. None of them require a net-new hire if your CX team is already mid-sized. What they require is that you stop thinking in titles and start thinking in functions — and that you take the maintain side as seriously as the build side.
If you’ve spent any time on Google researching this, the SERP top 10 will look familiar. Ada lists three operating models and a handful of new roles. IBM names AI Owners and AI Champions. Writer says your next headcount is an AI Agent Owner. Zendesk talks about AI teammates. The HBR piece tells you to onboard them like new hires.
They’re not wrong about the work. They’re wrong about the unit of analysis.
The right unit isn’t a job title. It’s a function — a body of work that has to be owned by someone, named or not. We’ve watched dozens of teams build AI agent programs at Voiceflow, and the shape that consistently works is small. Turo runs theirs in a contained corner of their product. StubHub International launched their MVP only inside the “My Account” section. One of our customers handles tens of thousands of conversations a week with a PM and an engineer. That’s it. Two people.
Headcount is rarely the constraint. Clarity of ownership is. A function being unowned is what kills programs. A title being vacant rarely is.
There are four functions. Here’s what each one looks like, who in your existing org is most likely to own it, and what falls apart when no one does.
The work is writing the agent’s instructions. Its identity, its goals, the procedures it follows for specific tasks, the tone it strikes when a customer is frustrated.
What’s changed since the chatbot era: you’re not writing scripts. Modern agents reason about what to do next based on the situation. So you write playbooks — goal-oriented instructions with room for the agent to adapt — and workflows for the things that have to go right every time. A refund flow is a workflow. A billing dispute conversation is a playbook. The two compose together, and the agent moves between them based on what the customer says.
Voiceflow’s playbook and workflow split is built around exactly this distinction: playbooks for goal-driven reasoning, workflows for deterministic procedures. Other platforms structure it differently, but the underlying split is universal. Whoever owns this function has to understand both modes and pick the right one for each task.
In your existing org, this person is almost always your senior CX strategist, your conversation designer if you have one, or your most experienced support manager. The skills overlap is high. They already know how customers ask things, what edge cases bite, where escalation paths matter. What they need to learn is how to write for an LLM: BLUF instead of preamble, specific over abstract, anticipating the messy ways real customers phrase things.
When this function is unowned, the agent sounds generic. It drifts off-brand. It fumbles edge cases — and no one knows how to fix it because no one owns the prompt. You’ll spot it in the transcripts: every reply reads like the AI’s default voice, not yours.
The agent only knows what you’ve put in front of it. Pricing, policies, product details, troubleshooting steps, how the warranty works — all of it lives in your knowledge base, and it has to reflect current reality.
Knowledge isn’t static. Policies change quarterly. Products launch monthly. Pricing updates without warning when someone in finance pushes a memo through. The agent is downstream of all of that, and the only way it stays accurate is if a specific person owns the job of keeping it current.
In your existing org, that’s your help center owner, your technical writer, or your knowledge management lead. Sometimes it’s a CX ops person who already triages “the article on X is wrong” tickets. The skill they need to add is writing for retrieval, not for human reading. Long human-friendly walls of text retrieve poorly. Chunk size matters. Headings carry weight. The agent searches for relevant snippets, not the whole document — and that changes how you structure content.
Here’s the honest part: this is the function most often “owned by everyone and therefore no one.” Pin it to a person before launch. Otherwise you’ll get the most painful failure mode in customer service: the agent confidently telling customers something that used to be true. “The return policy is 30 days” — except it changed to 60 last quarter, and now you’re refunding the gap on principle. Trust collapses fast when an AI is fluently wrong.
This is the function most teams skip. It’s also the single most common reason AI agent programs degrade after launch.
The work is reading transcripts every week. Running structured evaluations against the criteria you defined for what “good” looks like. Tracking the metrics that matter — resolution rate, escalation rate, latency, CSAT — and turning what you see into prompt updates and KB additions and the occasional “we need a new playbook for this.”
Modern agents behave probabilistically. You can’t audit them by reading the script you wrote, because you didn’t write a script. You wrote guidelines. The agent generates its own paths through them, thousands of times a day. The only way to know what’s happening is to watch.
In your existing org, this is your support ops lead, your QA manager, or a senior support manager who already cares about transcript review. They’re already wired to look at conversations and ask “why did this go sideways?” The new skill is reading agent traces, not just transcripts — understanding why the agent did what it did. Which knowledge source it referenced, which tool it called, where it exercised judgment versus where it followed a fixed procedure. This is what observability and evaluations are for: every conversation captured, every decision traceable, structured evals running automatically against the bar you set.
The Monday morning version: 30 minutes of transcript spot-checking, an eval run on the previous week’s conversations, a list of three prompts to update by Friday. Repeat every week. Forever.
When this function is unowned, the agent silently degrades. Knowledge goes stale. Edge cases pile up. Escalations creep up by 2% a month and no one notices until someone in another org complains. By the time customers are loud about it, you’re months behind.
The work is connecting the agent to the systems it needs — Salesforce, Zendesk, your order management, your auth provider, whatever ticketing tool you’re standardized on. Then maintaining those connections. Then deploying changes safely through dev, staging, and production without breaking production at 4pm on a Friday.
Modern agent platforms abstract a lot of the AI-specific engineering. Your engineer doesn’t need to fine-tune a model or stand up a vector database. They need to integrate with the systems your CX team already uses, write the occasional function tool when an integration needs custom logic, and own the deployment pipeline so changes don’t go straight to production untested.
In your existing org, this is a platform engineer, your IT integrations lead, or a backend developer who already owns CX system integrations. They probably already know Zendesk and Salesforce APIs better than anyone. The skill they need to add is the agent platform’s primitives — tools, function steps, environments — which is closer to learning a new SaaS product than to learning machine learning.
Voiceflow’s environments (separate dev, staging, production) exist precisely so this person can ship safely without becoming a release-day bottleneck. You don’t want every prompt change to require an engineer; you want engineers to own the deploy mechanism so the design and ops people can ship their work through it.
When this function is unowned, integrations break in silence. Deploys go straight to production with no staging check. No one can debug a tool failure when one happens. The agent looks broken to customers; the team can’t tell whether it’s the agent’s reasoning or the CRM that’s down.
Here’s the part that surprises CXOs most: building takes weeks; maintaining takes forever.
A reasonable first-year split is 30% build, 70% maintain. The build share shrinks from there.
Most platform pitches are framed around the build phase — “look how fast you can launch your first agent!” — which is true and also misleading. The work that determines whether the program succeeds happens after launch. More knowledge to curate as the business evolves. More transcripts to review as volume grows. More integrations to maintain as upstream systems change. More evals to keep current as the agent’s scope expands.
What does not grow as fast: design effort. Once the playbook structure is right, expanding into a new use case is much cheaper than building the first one. Sanlam Studios built and threw away four or five agents before landing on something that worked. That’s not failure — that’s what learning looks like. Once they had the structure, additions came fast.
By month 18, your ops and observability function is probably 60–70% of the team’s time, even though it was 30% at launch. Plan for that. The CXOs who treat the launch as the finish line end up with a degraded agent and a team that quietly stopped owning it. The ones who treat it as the starting line end up with a capability that compounds.
The smallest viable shape is three people:
That’s a real team. You can run a meaningful program with it. The customer mentioned earlier — tens of thousands of conversations a week with a PM and an engineer — is the proof that even three is sometimes more than necessary if your scope is contained and the two people are senior.
Mid-size shape, when scope grows or you cross a few use cases: five people. Split conversation design from prioritization (separate CxD). Split knowledge from ops (separate KB owner).
Enterprise shape, when you’re running multiple agents across product lines: 8–12 people. Multiple ops owners by domain. A dedicated KB team. Two engineers — one for integrations, one for platform and security.
Notice what’s missing from all three shapes: an “AI Manager” with that title, an “AI Agent Owner” with that title, a “Conversational AI Designer” with that title. Those titles can exist if you find them useful, but they’re not what makes the team work. Coverage of the four functions is what makes the team work.
The team you need is mostly the team you have. But only if you take ownership seriously, and only if you take maintenance as seriously as you take launch.
If your pilot is in flight and you’re trying to get it across the line, our moving an AI CX pilot into production guide covers the 90-day version of this conversation. If you want to see how playbooks, workflows, observability, and environments fit together in practice — the actual product surface the four functions work against — book a Voiceflow walkthrough.