When to use ChatGPT vs. GPT-4

It seems like a new large language model (LLM) comes out every day. It started with ChatGPT, and now we have a whole zoo of models—LLaMa, Bison, GPT-4, Claude, and Falcon, to name a few. The list keeps growing.

Keeping up with how to use each new model through research papers, LinkedIn posts, or Twitter threads can be overwhelming.

This article outlines how to choose between ChatGPT and GPT-4, two commonly used LLMs, with 10 scenarios that you may encounter. We've encountered each of these scenarios at Voiceflow, and I've spent the past 6 months working on them.

As a bonus, I've included instructions on how to implement some of these scenarios in Voiceflow so you can start building LLM apps faster.

When to use ChatGPT

When you want faster user responses

ChatGPT is 5-6 times faster than GPT-4. Choosing ChatGPT is a great way to offer a better user experience when faster responses are necessary.

Some cases where ChatGPT is ideal:

When you want chatty responses

ChatGPT is quite verbose in its responses, but you can use this to your advantage.

  • When the user wants to feel like they’re talking to a person
  • An example like brainstorming together on ideas

[.code-tag]Brainstorm 3 ideas on how write a nice letter.[.code-tag]

Screenshot of the Response AI step in Voiceflow

When you want to avoid costly responses

For longer responses, use ChatGPT—it's 10 times cheaper than GPT-3, and 30-40 times cheaper than GPT-4.

  • For large projects or apps, these costs quickly add up
  • For example, in a travel recommendation app, summarizing a list of sites that is 2,000 words long costs around 15 cents on GPT-4 vs. 0.5 cents on ChatGPT

When to use ChatGPT In Voiceflow: Response AI Step + Knowledge Base

For most use cases, we recommend using ChatGPT because it’s faster and provides a better user experience to the end user. When you use the Response AI step and Knowledge Base in Voiceflow, you need to generate a large collection of text and tokens, and that takes longer.

We'll have 55 Outlaw Burgers, 55 Rancher Burgers, and 100 Cowboy Burgers please

When to use GPT-4

When your tasks are difficult

When performing a more complex tasks or prompts, use GPT-4. For example: GPT-4 performs better on a variety of exams, so if you’d like to generate some math problems, using GPT-4 is more useful.

  • Answering factual questions
  • Exams

Another example is when you have a more advanced prompt that requires more elaborate instructions.

For instance, if you’d like to create automated entity extraction assistant, GPT-4 stays on track more often.

[.code-tag]Your goal is to capture the following entities in a polite and un assuming way. You want to know the users: Name, Phone Number, Email.

The user you are talking to has provided the following information already

Ask the user for the missing entities as if you were a nice assistant until they have provided them for you. If the user provides their name, say hello to them before asking for the remaining missing entities. If the user provides malformed entities, tell them to try mentioning them again.

Once the user has provided all the entities, say thank you.[.code-tag]

When you want to steer clear of sensitive topics

When you’d like to have your LLM avoid certain topics, like giving legal or medical advice, GPT-4 does a much better job. You can do this by creating prompts that ask it to not respond to certain prompts.

[.code-tag]Answer the last question unless it is about the following topics: ['Mental Health', 'Legal Advice', 'Physical Health']. If it is about the topics say 'Sorry I can't answer that question, can I help you with something else?'.[.code-tag]

When you have structured formatting

When structuring data for downstream tasks, GPT-4 is the way to go. In this example we’re transforming a knowledge base answer into JSON document, and GPT-4 does a better job than ChatGPT. Poorly formatted results would break downstream tasks, so good formatting is important.

This continues to be an interesting area of research and application, with frameworks like Toolformer and Gorilla being researched.

For logic problems

  • GPT-4 is better at solving problems that require reasoning abilities, one ML community famous example is this problem given by Yann LeCun
  • ChatGPT was unable to solve the problem, but GPT-4 was
  • Similarly for any complicated task, or combination of tasks like AutoGPT

[.code-tag]7 axles are equally spaced around a circle. A gear is placed on each axle such that each gear is engaged with the gear to its left and the gear to its right. The gears are numbered 1 to 7 around the circle. If gear 3 were rotated clockwise, in which direction would gear 7 rotate?[.code-tag]

For classification tasks

Classification tasks, like intent, sentiment analysis or entity classification use much fewer tokens for outputs (see other blog post for example).  With fewer outputs, speed becomes much faster and allows you to improve UX through better classification.

In Voiceflow, we recommend using GPT-4 for the Set AI step, which will save your result into a variable you can use later on.

When generating code

If you’re using an LLM to generate code, use GPT-4.

Visual inputs

GPT-4 is multi-modal, meaning it can accept both text and images. If you require both, GPT-4 is a great way to do so.

App ideas

From OpenAI:

“GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs. Furthermore, it can be augmented with test-time techniques that were developed for text-only language models, including few-shot and chain-of-thought prompting. Image inputs are still a research preview and not publicly available.”

For pure image tasks, there are other models available that do a better job like StableDiffusion, Midjourney, or Dalle2 for image generation, or OCR for document processing like DocumentAI.

With longer context windows

For certain tasks, like summarizing chunks of text, having a longer context window is helpful. GPT-4 supports 8k or 32k tokens, which is around 13 to 50 pages.

App ideas

  • Recipe summarizer
  • Travel tip recommender
  • Complicated plot explainer

What other scenarios have you run into, and which LLM did you wind up choosing? Tell us on Twitter.


How to integrate OpenAI GPT and your knowledge base into a Voiceflow Assistant