No items found.

This tutorial shows you how to build a chatbot that can scrape and search through a website, Notion documents, PDFs, PowerPoints, and any other unstructured text documents that you have!

This chatbot is powered by a custom repository that you will need to spin up on our local environment. The instructions are below:

  1. Clone the Voiceflow project to your workspace
  2. Clone the github repo to your local environment
  3. Run the repository and copy the endpoint to your Voiceflow project
  4. Test and add any documentation you want through the chatbot
  5. Add to website!

Check out the video tutorial, the repository explanation, and let us know if you have any questions!

Transcript of the Video

hey everyone this is Daniel from

Voiceflow back with another  use

case between Voiceflow Lang chain and GPT this one is able to scrape an entire

website loaded into a database and then

let a user ask it any questions and so

over here we've got our documentation

page and so this is all the docs about

how to use voice flow and so what we've

done is we're able to use a repository

that Nico put together that's using lag

chain and a number of other services a

voicemail project that I've included a

template for to create a chat bot that

we can make in under 10 minutes and add

to our website where user can ask it any

question like this how do I manage an

entity and it's able to look through all

of our documentation summarize it give

the answer to the user directly and then

actually include the sources of where

they got that information from and let's

show you how to spin this up and so

first off what we're going to do is

we're going to go ahead and go to the

voice little demos and examples repo so

this is where we keep all of our pocs so

the one we're going to be using today is

this link chain local knowledge base but

we've got a number of other ones you can

check out on our developer blog to see

how we can use some of these

so first thing we're going to do is

we're going to go ahead and just clone

this repo so I'm going to pop up vs code

here let's just go clone repo and put

that in and we'll go ahead and just save

this to local file and open it up

awesome so once we're in here uh we're

gonna go ahead and open a terminal

and we're just going to want to navigate

to the link chain local knowledge base

so CD Lang chain slash local knowledge


and now once we're in here there's a

couple things we need to do so if we

open up the readme

um this pretty much has all the

instructions so first off you need

node.js18 to run this code so you can

download it if you haven't downloaded it

yet but what we want to do is we

actually want to create an environment

file by copying this file here so if you

go ahead and just copy this command from

the readme uh you'll see that I go it

goes ahead and creates a new environment

file then what you need is you want

you're going to want to put in your open

AI key so if you don't have that yet you

can just go to open AI

um and uh just get that from a free

account or a paid account I'm going to

go ahead and grab mine in a second here

I went ahead and put in my API key into

the environments file and now we're

pretty much good to go so the first

thing you want to make sure is you want

to make sure that you've got Docker

installed and running on your computer

so you can see over here that Docker is

running the next thing you're going to

do is very simple just go ahead and hit

yarn build so you'll see the command

over here so yarn build and you want to

make sure you install the yarn if you

haven't done that already

but this is gonna go it's gonna start


um this may take a while when you first

do it because it's got a ton of packages

it's going to be downloading I already

did it before so it'll be a bit quicker

um but let's give it a couple seconds

here and we're ready so voice will link

chain API is listening on Port 3000 and

is connected to write a server so

um you can check out Nico's video for

the breakdown of exactly how this works

I'm going to show you more about the

implementation so now that we've got

this here there's a couple things that

are happening so one is that there's a

support running on uh 3000 on the port

3000 so if I go ahead and if I go to uh

in my search bar all you have to do is

just type in

127.01.404 and this is going to pop up

ngrok and so to get the actual endpoint

I'm just going to go to ngrok status and

I can grab the URL here and so just to

check that this is actually working I'm

going to go ahead and pull up postmanage

to run an API call so let's go ahead and

do the health one

cool here so I'm just going to run

um the I'm going to hit the health API

so you can find all the apis

in the documentation in the stock and so

you can see they've got an ad API clear

cache health so let's just go see if

this is running so localhost 3000 API

Health content type application Json in

the body I haven't sent anything

and yep cool so the server's up so we

are good to go uh now we can actually

hop out our voice slow and start using

it and so invoice flow um I've got a

project here um so this is the GPT

custom knowledge based project so I'll

kind of explain how this works step by


um the first thing you need to do is in

this we're setting some variables at the

beginning so the first one you want to

do is just make sure that your endpoint

is the end clock endpoint you can't use

a localhost or voice tool so it needs to

be an actual endpoint so just put in the

ngrok one and now you've got your

project is pretty much ready to go and

so the way this works is that uh within

the app itself

um there are different collections that

you can add documentation to and so if I

go to my ad API call here you'll see

that one of the fields is collection and

so as you add documents so whether this

is scripting a website or whether this

is a PDF document or some other like

PowerPoint or something else you're

going to want to add it to a specific

collection so this allows you to have

multiple collections four different

types of documents so invoicel here I've

just set a collection at the top and

I've called it a general if you want to

you can get a bit more granular so you

can actually use a set step here

um in logic set and you can choose to

set a collection to something else so

set collection to X

uh and so with collector if I go see

collection here

um I can go ahead and set it to let's

say I wanted to scrape a bunch of um

support documents I can I can call the

collection support so what's important

here is that um when you're asking

questions to the bot you're going to ask

questions from a collection as well and

so if you do have multiple collections

you might need to modify this template

um in a way that a user can potentially

choose a collection or you might set the

collection for them

um depending on what kind of question

they want to ask or alternatively you

can just save everything to one

collection and that makes it a lot

simpler so once you set the endpoint and

you set the collection you want to use

you can go ahead and get started

the first thing you do is add

information and so when you're running

this chat bot

um if you go ahead and hit run

you'll see the first thing it does is to

say what do you want to do ask question

information if you would add information

it allows you to do three different

types script a website add a web page

and add a document all of these three

are the same API endpoint and so you can

see over here on the API step they're

all going to the API ad endpoint and

that we've dynamically inserted our

endgroc endpoint here what we're doing

in this flow and each of these flows is

we're just capturing different

parameters that we want to send so over

here I'm capturing sitemap I'm capturing

a filter for user wanted a limit and I'm

just inserting those into the body of

the API call dynamically so URL filter

collection and limit

so you can go ahead and modify this if

you want you can remove limits you can

add limits whatever it is it's the same

thing for adding a web page and adding a

document these are actually the exact

same I've just broken it out for

um the sake of a better user experience

but you can go ahead and modify that but

once it's done let's go ahead and

actually just just try running this and

so we'll go run test

ask add information and let's go and

pick uh let's go do something like maybe blog

so here I'm going to go and actually

just take Nico's uh blog post that he

made this so this is the blog post that

documents Nico's app and I'm just gonna

go back to voice flow and let's say I

had a web page

let's go ahead and drop in the URL

and let's see if this works great so the

API call was a success so and I can see

back here in my app that it says added

at the bottom so that's awesome super

easy the web scraper one oh and now I

can actually add ask questions about it

so if I go ahead and hit run test I can

go ask a question and I can say uh

does this proof of concept

keeping answers

awesome sweets I was able to scrape

through the article and actually pull

out you know all the different pieces of

tech that Nicole used to build this so

let's go ahead and add for more

information and this time I'll do the

sitemap so sitemap is a bit uh a bit

different because what you need is the

site map of website so I'll go scrape a

website and you need to put in a site

map so for me the easiest one that I

know about is our documents so you're

going to want to go to sitemap.xml so

most websites uh have the kind of main

header of the website dot slash

sitemap.xml and that'll give you the

actual map of the website and all the

pages are included you want to make sure

that anything you're going to scrape is

actually included in the sitemap

otherwise it will not work because it

doesn't know where to look

so I'm going to go ahead and copy this

go back to voiceload here I'll drop in


great and now I can add a filter and so

I want to filter anything in an article

and so if you remember on the sitemap

that's basically all of these articles

that we've got here in zendesk so if I

go back to Voice Low I'll just say

article is a filter and let's add a

limit of 10. if it's too long um the API

will time out but you just want to check

and make sure that it's still running in

the app because you may have like 50

articles it may just take a long time to

scrape all them and voiceless API my

timeout but it still might be running in

the app so let's just go ahead and hit

10 and let's see if this works so

usually if it's taking a little bit

longer load that means it's working and

so I can see here in my app that it's

actually going through 10 of these Pages

I'm starting to scrape them and add them

to our database

and so let's give this a couple more

seconds and let's see if it times out or

doesn't time out on Voice Low

awesome so it actually it finished it's

done now it's in the collection General

and we've got all 10 of them so I can go

ahead and ask a question

um you do want to wait a couple minutes

after you scrape a number of Articles

just so we can properly vectorize and

load the database but let's try this out

so I can see what I can maybe say let's



how do I do I manage oops

an entity in voicemail

or you may have to wait a couple minutes

for this to get a proper answer

um but sweet you can you see here that

it's actually already done it um so it's

able to answer this and actually provide

this sort um but you can go ahead and

play around this and see what it looks

like but there you go you've got this

all set up and to be able to attach to a

website it's actually very simple

um you just hit publish here and you're

going to enter a version name so I'll

just say web chat V1

and as this is publishing uh what it's

going to do when it's done updating the

API is it's actually going to give me a

code snippet that I can add to my

website so let's just go ahead and wait

for that so it's done uploading it says

embed widget or I've already done this

so if you go click embed widget it just

takes you this tab over here the

Integrations tab and now it gives you a

widget so if you will have access to

your website you can just go ahead and

put this right in the footer or if you

want to preview it there's a little

trick you can use so just copy what's in

between the scripts and

you can go ahead now and let's say I

want to go to

so this is our page of our docs

um now on on my side I can just add this

into the code in the footer and it'll

appear but if I want to preview it if

you just go inspect console and then

just add in what you copied add in what

you copied

um you'll see that it actually appears

here on the left on the right hand side

so now if I go ahead and track with this

it's actually interacting with my

assistant so if I say ask a question and

I say how do I manage an entity

um it'll go ahead and actually hit our

server here and so you can see that it

is using the vector store managing

entities and it's able to actually

answer the question so now I've got a

live chat bot on my website so if I go

ahead and put this in the code it'll be

there permanently and now whenever I

update my voicemail project that chat

bot will actually update in real time as

well I can also go ahead and start

modifying the appearance here so the

image the title the color of it

um whatever it may be so let's go ahead

and maybe just change this color to


red and uh we'll keep everything else

the same and maybe let's change the

title to Voice Low support bot

and awesome so let's go ahead and now go

back to our canvas here and just hit

publish this

test two

awesome so I put in the code again and

you can now see

um that the chatbots actually read and

the titles versus sport plot so it makes

it really easy to update if you got this

live on your website once you hit

published that'll automatically update

um but super easy to be able to create

this chatbot that's now powered by our

whole kind of website modified and added

to whatever website you want let us have

any questions I will drop the template

and the tutorial in the in the comments


What can you use this for? 💡

1. Product Support Assistant
- Answer frequently asked questions about the product
- Provide step-by-step instructions on how to use specific features
- Offer troubleshooting assistance to resolve user issues
- Provide links to relevant documentation and tutorials to assist users in learning more about the product

2. Website Chatbot
- Introduce the product to potential customers by providing information on its features and benefits
- Answer questions about the product's pricing, availability, and other details
- Help guide customers through the purchasing process by providing recommendations and advice
- Collect customer feedback and insights to improve the product and customer experience

3. Research Assistant
- Ingest a collection of PDFs, PowerPoints, and research information to identify and summarize key points and insights
- Provide a list of relevant sources for further research
- Offer insights on trends and patterns within the research data
- Assist in identifying gaps or inconsistencies within the research data

4. Internal Documentation
- Scrape Notion documents and internal documents to create an assistant for internal employees
- Provide access to internal policies, procedures, and guidelines
- Offer insights on best practices for internal workflows and processes
- Assist in onboarding new employees by providing access to relevant information and resources

How does the technology work?

The chatbot is powered by a custom API that uses advanced technologies like Open AI, Langchain, Redis, OpenSearch, and Unstructured to fetch content from various sources, including URLs, sitemaps, text, PDFs, PowerPoints, Notion docs (markdown), and images (OCR).

Once the content is collected, it is turned into embeddings or vectors and saved in a local OpenSearch database, creating a comprehensive knowledge base. Users can then interact with the chatbot, asking questions about any topic related to Voiceflow. The chatbot utilizes the knowledge base to generate context and provide answers.

The possibilities for using this chatbot are endless. Users can search for specific keywords or topics, ask for tutorials or guides on particular features of Voiceflow, ask for troubleshooting help, or even ask for recommendations on how to improve their voice or chatbot applications. Our chatbot provides a fast, efficient, and user-friendly way to access information and get help with Voiceflow.

Learn more about our full collection of AI features here.
Learn more about our Knowledge base feature here.

Get started with Voiceflow for free
Start creating AI Assistants today.
Get Started