Voiceflow named a 2026 Best Software Award winner by G2
Read now
The text-to-speech (TTS) market is projected at roughly $5.8 billion in 2026, on a 22%+ CAGR through 2030. Resemble AI is one of the platforms riding that curve, and it has changed shape materially in the last 18 months: consumer subscriptions gone, pay-per-use pricing instead, deepfake detection bolted on, and a customer roster that now reads Netflix, Paramount, Deutsche Telekom, and the World Bank.
This is a 2026 review of what Resemble AI actually does, what it costs under the new Flex model, where it sits against ElevenLabs and the rest of the voice-cloning field, and where a voice-agent platform like Voiceflow fits in the stack underneath your cloned voice.
Resemble AI is a voice cloning and synthesis platform. The core product takes a sample of human speech and creates a synthetic voice clone that can read any text you give it in the cloned voice. The 2024-2025 product expansion added real-time / streaming TTS (around 75ms latency on Chatterbox Turbo), a deepfake detection product called Detect, and watermarking through Verify.
Resemble's positioning in 2026 leans firmly enterprise. The current homepage logo strip includes Netflix (the Andy Warhol Diaries voice work won an Emmy / Webby nomination), Paramount (Ghostface Is Calling promo), Deutsche Telekom (T Challenge 2025 winner), Telnyx, and the World Bank. The company closed a $13M Series B on December 8, 2025, led by Sony and Okta Ventures, bringing total raised to $25M.
Use cases the product actively targets: voice-enabled IVR, character voices for media production, audiobook generation, multilingual dubbing, automated call center voices, and synthetic data for ML training. Voice cloning is the spine, but the platform increasingly markets itself as "voice infrastructure" rather than a creator tool.
Resemble retired its consumer subscription tiers (Creator $29/mo, $1 first month) sometime in 2025. The 2026 pricing model is pay-per-use, organized as Flex (consumption-based) and Enterprise (custom). Here's the current breakdown:

$0 to start. Credits never expire. Full API access from day one, including voice cloning and Detect.
Quote-based. Up to 80% volume discount, SOC 2 Type 2, SSO/SAML, on-prem deployment option, dedicated CSM. Sales-led; the deal floor is generally in the high four-figures monthly committed spend.
Worked example: an IVR generating 5,000 minutes of TTS per month (300,000 seconds) on the Flex plan costs $150/month for TTS alone, plus voice-cloning add-ons and team seats. A media-production shop generating 200 minutes per week of cloned character audio (~50,000 seconds/month) lands closer to $25/month plus the voice-clone add-on. Pay-per-use rewards bursty creator workloads and punishes always-on production traffic; that's where Enterprise volume discounts come in.
The pricing trap to watch: deepfake detection is roughly 80× the cost of TTS ($0.04/sec for audio Detect vs $0.0005/sec for TTS). If you're running Detect on every inbound call in a contact-center setting, your detection bill can outpace your TTS bill within a single billing cycle.
{{blue-cta}}
The product surface beyond voice cloning is where Resemble has spent most of its 2024-2026 engineering effort.
Plus a smaller set of incremental shipped items: longer audio context windows, improved emotional control parameters (joy / sadness / anger / fear at finer granularity), and tighter API rate limits on Flex for fraud-prevention reasons.
The voice-cloning workflow changed completely in 2025. The old "record 25 sentences in the browser, then 100 more for high quality" flow is gone. The new flow is upload-driven and offers two tiers:
Workflow (Professional Clone):
Resemble's consent and safety stance: voice cloning of someone else's voice without their permission is against the Terms of Service, and the platform asks for confirmation-of-consent during clone creation. Detect + Verify exist partly to enforce this. Every Resemble-generated voice is watermarked and detectable by their own detection model.
The voice-cloning and TTS market in 2026 has consolidated around four serious platforms plus a long tail of single-feature tools. Here's how the field stacks up:

The dominant TTS competitor. ElevenLabs bundles voice cloning, expressive TTS, and a developer ecosystem under flat subscription tiers (Starter $5/mo, Creator $22/mo, Pro $99/mo, Scale $330/mo, Business $1,320/mo). Voice cloning works on Instant (60 seconds of audio) or Professional (30+ minutes). Better for predictable monthly costs; worse for pure pay-per-use bursty workloads.
The honest read on Resemble vs ElevenLabs in 2026: ElevenLabs wins on developer mindshare and ecosystem breadth; Resemble wins on the integrated Detect+Verify deepfake stack and on enterprise compliance posture (HIPAA, GDPR, SOC 2, on-prem). For a buyer evaluating both: pick ElevenLabs for predictable creator/SaaS-product use cases; pick Resemble if your security and compliance team is in the buying loop.
For developers wanting a builder-side reference, see build a custom voice AI agent with ElevenLabs API. The same pattern works against Resemble's API.
Tools like Kits.AI (sometimes listed in older comparisons) are music/stems-focused and not direct TTS competitors, so they belong on a different shortlist.
Resemble AI generates voice. Voiceflow runs voice agents. They aren't competing products; they sit at different layers of a voice-AI stack.
A cloned voice is one component. A voice agent is the whole conversational system that uses the cloned voice to actually talk to customers. It handles real-time turn-taking, routes to a voice chatbot flow, queries a knowledge base, transfers to a human if needed, and stays in character across thousands of calls.
Voiceflow is the agent platform. Five things it brings that a voice cloning tool can't, on its own:
call_forward for live-agent escalation, dtmf for IVR menus and PIN capture, barge-in, no-reply timeouts, multi-provider automatic speech recognition (Deepgram, Google) and TTS (ElevenLabs, Google, Amazon Polly).The honest framing: Resemble.ai is not currently a native TTS provider in Voiceflow's voice channel (the natives are ElevenLabs, Google, and Amazon Polly). If you want to use a Resemble-cloned voice inside a Voiceflow agent, you'd integrate via custom API call, similar to how teams integrate non-native LLM providers like Mistral.
Downstream, voice-agent platforms power AI phone agents, AI call center deployments, virtual receptionist services, and AI call center agent automations. Cloned voice is the surface layer; the agent platform is what turns it into a usable product. If you want a fast builder-side starting point, try standing up a voicebot on Voiceflow and plugging in your TTS provider of choice.
{{blue-cta}}
Resemble AI uses a pay-per-use Flex plan: $0 to start, then $0.0005 per second for TTS, $0.001 per second for voice agents, and $0.04 per second for deepfake detection. Add-ons include Team Seats ($20/mo per user) and voice-clone fees ($2-5/mo per voice). Enterprise is custom-quoted with up to 80% volume discount.
There's no permanent free tier, but the Flex plan starts at $0 with no minimum commitment. You pay only when you generate audio or run detection, and credits never expire. That's effectively a "free to start" model rather than a free tier.
Rapid Clone takes about a minute and needs 10 seconds of audio. Professional Clone takes around 40 minutes of training time and needs 10-25 minutes of varied speech for full emotional range.
It depends on your use case. ElevenLabs has stronger developer mindshare and a flat subscription model that's easier to budget. Resemble wins on the integrated deepfake-detection stack (Detect + Verify), on enterprise compliance (HIPAA, GDPR, SOC 2, on-prem), and on pay-per-use pricing for bursty workloads.
Yes. Detect is Resemble's deepfake-detection model, claiming 98.1% accuracy on the ASVspoof 2021 benchmark. It's available as an API and Chrome extension, billed at $0.04/sec for audio and $0.07/sec for video.
Cloning someone else's voice without their permission violates Resemble's Terms of Service. The platform requires confirmation-of-consent at clone creation. Every Resemble-generated voice is watermarked through Verify and detectable by Detect, which is part of how the platform enforces its consent policy and complies with the ELVIS Act, EU AI Act, and similar regulations.
