Iteration Patterns for Improving Assistants

Zoran Slamkov
November 4, 2022


A common question we ask teams is to walk us through the deployment workflow when updating their production assistant. After extensive feedback, the only theme that is consistent across these teams is how inconsistent the answers are: exposing a true lack of automation and testing between design time and runtime.

The exclusion of automation and testing in the development pipeline has many downstream effects for businesses looking to improve their assistants, namely:

  • Major or severe reductions in scope to deploy on a regular cadence,
  • Leaving risk-averse teams putting their assistants into “maintenance mode”,
  • Have limited or no opportunity to deploy an MVP with quick, successive iterations, and,
  • Reporting and insights becoming stale before they can be actioned.

The aforementioned challenges in the delivery system are ubiquitous across a variety of teams in various industries, who notably are employing a variety of technologies with mixed results.

Typical DevOps and CI/CD pipeline workflows fail to translate well to the deployment of virtual assistants for primarily two reasons:

  1. The diversity and technical aptitude of the various stakeholders who contribute to its success, and;
  2. The variety of tooling required to produce the best end-product.

Many enterprises, in an effort to curb the problems outlined above, will opt to increase investment in the maintenance of technical debt at the expense of new flows and use-cases.

Admittedly, we rarely come across a stakeholder in a mature conversational AI team who doesn’t lament the persistent challenges expressed above: from designers who don’t see their visions come to fruition, to developers who are constantly paying down technical debt or hardcoding response content instead of working on new integrations and automation tools.

Below we will detail the modern iteration patterns we’ve seen improve or, in their absence, hinder teams deploying complex assistants into production today. The goal here is to minimize the time to bring a new version of your assistant into the hands of customers and automating the entire delivery system.

Accessible and queryable transcripts

The accessibility and usefulness of conversation transcripts is likely the most important aspect of CAI development that many businesses still struggle with today. We heard from one of our large financial services customers:

Transcript data is extremely important to making good design decisions and improvements. I’d venture that from a design perspective, transcripts are the single most informative type of data.

Some benefits exposed through accessible and queryable transcripts include:

  • Capture utterances with no intent match and for improving your NLU model;
  • Capture drop-off in Topics to know where to focus on conversational improvements;
  • Track goals (resolution, meetings booked, etc) and have transcripts inform areas for optimization.

While the importance of transcript data cannot be understated, we’ve found this data is still notoriously difficult to get access to and, once received, the data is “stale” due to being from an assistant that is many versions behind the current one. Additionally, in many organizations, the data provided is also in a format that isn’t suited for larger quantitate analysis as it requires manual inspection of every individual conversation to extract findings and map to others of a similar outcome.

Building a transcript pipeline and incorporating that into the flow of conversation design, when done correctly, can have large results. See below what one of our customers was able to achieve:

We have a summary dashboard that will give us a sense of where in the journey users are running into problems (e.g., where are they dropping off, where are there the most no-match events, no-input events, etc., which sessions are re-contacts), then we can use that data to pull a targeted list of transcripts to read, analyze, and come up with recommendations for how the bot design (or the NLU or other backend integrations) needs to change to avoid the problem continuing.

Finally, automating the ingestion and classification of transcripts means that businesses have the opportunity to iterate and be comfortable deploying, as they know they will get the data back quickly to inform where and how to improve their assistant. We heard from a customer that “sometimes we need to put out a flow at 80% and see how customers are going to interact with it, but we don't always have the access to that information,” leading to the next version not being based on data but on small samples of quantitive testing from users who were willing and available to provide feedback.

Splitting functional components from response data

The separation of functional from conversation data means more of the assistant build can be done concurrently, reducing dependences and monolithic designs from entering your deployment pipelines. Functional components, also known as function steps, is responsible for performing a function that is invisible to the user but provides greater flexibility and extensibility of what your assistant is capable of. For example, functional data would be an integration to a third-party service, or it could simply generate a random number. Response data is the content that the assistant says and the paths the conversation can take based on what the user replies.

When both functional and conversation data is blended in a single resource, problems emerge as developers are left with a monolith that can be unstable and difficult to test. Below is a quote from a conversation designer at a major retailer when building a new assistant for development to work on:

I did everything and it wasn't until the end of that I got told by development that we can't do it because it’s too big and too out of scope. Had we been doing that concurrently we could of had those integrations setup already.

By separating functional and conversation data, functional code can be tested and versioned independent of the rest of the dialog. Ultimately, this means greater speed and flexibility when adding more features that can be used by conversation designers when building new assistants. To summarize:

Automated regression testing

There are still too few tools in the market today that support or use regression testing for assistants. As complexity increases with dynamic, multi-turn assistants, to test for every scenario requires an extensive library of scripts with expected responses to validate that everything is performing to expectation.

The effort to create regression tests is also a burden, as any addition to a path in the design can impact every script that has been developed to date. This leads to many organizations today relying on manual, human-led tests to qualitatively approve that the assistant “seems” to work as expected. However, since they do not have enough time to run through every scenario - it’s hard to exit the testing phase with absolute certainty that there are no regressions in the new version. Worse, some teams will opt to keep their assistant simple with only single-turn flows so that they can capture all scenarios easily in their tests.

Modern CAI teams are learning that the interplay between your NLU model, dialog design and personas is the only way to automate script-building and testing. One customer said to us:

We’ve developed a system where we can determine which intents are available to be triggered at any viable state in the conversation, and because we know all the utterances that can trigger an intent, we can programatically generate every variation of the test script. It takes a bit of time to run but nowhere near the time it would take to do this manually, and we have peace of mind knowing that what we are pushing to production will work.


For modern CAI teams, having clean and frictionless automation patterns through the CAI team leads to dramatic benefits to team collaboration and assistant performance. Following and developing the patterns listed above will accelerate development and inspire designers to iterate faster, use better data and track better outcomes.

As you look strategize your CAI team workflow - our team is always here to chat.