Voice assistants are becoming increasingly popular as they provide an efficient and intuitive way for users to interact with various applications. And with the advent of large language models (LLMs) like OpenAI’s GPT series, voice assistants have become more capable of understanding and generating responses for longer and more complex user inputs.
This quick project is on Voiceflow ASR Demo, which harnesses the power of OpenAI’s Whisper model for automatic speech recognition (ASR) without the need for an external API. By using a Docker container, you can run the ASR service locally or on your server, providing a more versatile and customizable solution.
What’s the idea?
As users interact with LLM-powered voice assistants, they tend to provide longer and more complex utterances. This is beneficial because it gives the assistant more context to generate better answers. The idea then is to use the Whisper model for ASR without relying on an external API, offering you more control and customization options while keeping your data in-house.
What is the Voiceflow ASR Demo?
The Voiceflow ASR Demo is a test page that demonstrates ASR capabilities using OpenAI’s Whisper model. The project consists of a simple webpage that captures audio from the user’s microphone, sends it to your custom endpoint, and displays the transcribed text and the time it took to render the transcription.
- Start and stop recording with a button
- Auto-end recording after a specified duration of silence
- Utilizes a Docker container to run the ASR webservice locally
- Uses a proxy to avoid CORS issues
Setting up the Voiceflow ASR Demo
To get started, you'll need Node.js and Docker installed on your machine. Follow these steps to set up the demo:
- Clone the repository: git clone https://github.com/voiceflow-gallagan/whisper-asr-demo.git
- Change to the project directory: cd whisper-asr-demo
- Install the required dependencies: npm install
- Pull and run the Docker container for the ASR webservice: docker run -d -p 9000:9000 -e ASR_MODEL=base.en onerahmet/openai-whisper-asr-webservice:latest
- Start the proxy server: npm start
Now, the proxy server should be running at http://localhost:3000. Open the index.html file in your browser to test the ASR demo.
Using the Voiceflow ASR Demo
- Click the “Start Recording” button to start capturing audio from your microphone.
- Speak into your microphone.
- The recording will stop automatically after a specified duration of silence (2 seconds by default) or can be manually stopped by clicking the “Stop Recording” button.
- The transcribed text and the time it took to render the transcription will be displayed on the page.
Do more with OpenAI's Whisper model
This demo should be a good start for you to provide an efficient and customizable way to leverage OpenAI’s Whisper model for ASR in your Voiceflow Voice Assistants. By using a local or server-hosted Docker container, you can avoid relying on external APIs and maintain greater control over your data.
Thanks to Ahmet Oner for sharing the whisper-asr-webservice we're using in this demo. Do not hesitate to check it to find more information and details to use a different model.