OpenAI embeddings and vector databases
Let's start by going over OpenAI embeddings and the vectors database. OpenAI embeddings are mathematical representations of text data that can be used to measure the similarity between different pieces of text, making it easier for machine learning models to process and understand. In this project, we use OpenAI embeddings to create a vector representations of the text data fetched from your webpages. These vectors are then stored in a local database, which is unique to your Voiceflow Assistant, as it is linked to the Assistant API Key. The vector database is created using HNSWLib, a library for approximate nearest neighbor search, which allows us to efficiently search for the most similar documents to a given query.
The code in this project utilizes several technologies, including OpenAI GPT, Langchain, HNSWLib, and Cheerio. OpenAI GPT is used to generate answers to users’ questions, while Langchain is a library that helps with natural language processing tasks such as text splitting and document loading. HNSWLib is used to create vector storage, and Cheerio is a library that allows us to fetch web content from URLs and parse it.
Langchain and its role
Langchain is a crucial component of this project, as it provides various tools and utilities to work with natural language data. It allows us to create embeddings, save them in a local database, and use them with GPT to answer questions. All this is done with tasks such as loading documents from web pages using CheerioWebBaseLoader, splitting text into smaller chunks using RecursiveCharacterTextSplitter, and creating embeddings for the text using OpenAIEmbeddings.
Langchain also provides a way to create chains of operations, such as the VectorDBQAChain, which can be used to answer questions using the vector database and OpenAI GPT.
Endpoints and Voiceflow integration
There are two endpoints available in this project: the parser endpoint and the question endpoint.
The parser endpoint allows you to fetch a webpage, extract text from it, and create a document to add to the vector database.
/parser: This endpoint allows you to send a POST request with the URL of the page you want to parse and add to the vector database. You also need to pass the Voiceflow Assistant API Key in the request body.
The question endpoint allows you to send a user’s question and generate an answer using GPT and the context from the vector database.
/question: This endpoint allows you to send a POST request with the user’s question. You also need to pass the Voiceflow Assistant API Key in the request body.
[.code-tag] "question":"What are the types of use cases where LLMs can be used for more natural conversations?" [.code-tag]
[.code-tag] } [.code-tag]
To use these endpoints in your Voiceflow Assistant, you can use the API step to send requests to the appropriate endpoint and process the response.
Quick start video and demo
Integrating your Voiceflow Assistant with your existing FAQ, knowledge base, and documentation portal, means users get more accurate and helpful answers to their questions. The combination of Voiceflow, OpenAI GPT,and Langchain JS makes this possible and easy to implement.
To get started with this project, check out the source code at https://github.com/voiceflow-gallagan/kb-voiceflow-demo.git