How To Evaluate Production AI Agents (NEW UPDATE) -- Voiceflow

August 4, 2025

Voiceflow just dropped a game-changing update for evaluating production AI agents, introducing powerful new Transcripts and Evaluations features directly on-platform. This video dives deep into how to leverage these tools to get granular performance data, eliminating the need for external platforms.You'll learn to navigate the new Transcript view for full debug logs and access call recordings for voice agents. A key highlight is creating, running, and batch-running AI Evaluations (Pass/Fail, Text, Rating) to assess agent performance, with practical tips on prompt crafting and understanding re-run limitations. This update, including new API endpoints, is essential for Voiceflow builders aiming to prove agent value and streamline their management workflow.

🤖 Get 1000 bonus credits with Voiceflow!Exclusively for Umbral—design voice or chat agents in minutes and save $100+https://www.voiceflow.com/partners/conner-burton⚫Sign up for N8N cloud and start automating workflows!⚫ (affiliate link)https://n8n.partnerlinks.io/adblhu2pfwu3Voiceflow just dropped a massive update for anyone managing production AI agents. In this deep dive, we'll go beyond the release notes and explore the powerful new Transcripts and Evaluations features.If you've been extracting transcripts to other platforms like n8n or Make to analyze them, this update changes everything. We'll cover how to use the new on-platform tools to get granular data on your agent's performance, from booking rates to customer sentiment.Looking to implement AI Chatbots, Agents or Automations? Feel free to book a discovery call:https://cal.com/umbral/discovery-callUmbral Website:https://umbral.aiIn this video, you'll learn:- The new Transcript view and how to access full debug logs.- How to listen to and download call recordings for voice agents (Twilio).- How to create, run, and batch-run AI Evaluations (Pass/Fail, Text, Rating).- Common pitfalls, like prompt crafting and the current re-run limitations.- A walkthrough of the new Transcripts & Evaluations API endpoints.This is a must-watch for any serious Voiceflow builder looking to prove the value of their agents and streamline their management workflow.--- Chapters ---00:00 - Voiceflow's Game-Changing Update01:44 - Exploring the New Transcript View03:41 - HUGE for Voice Agents: Call Recordings05:09 - Finding the Hidden Debug Logs06:55 - What Are AI Agent Evaluations?09:29 - The New Analytics Dashboard11:23 - How to Create a New Evaluation14:33 - How to Batch Run Evaluations17:08 - CRITICAL: The Re-Run Limitation19:47 - Deep Dive: New API Endpoints22:48 - Why This Matters for Builders24:15 - Final Thoughts

Build your own chat or voice AI agent

Build, deploy, and scale conversational AI experiences without code.