drunk.support
just another wordprussite.

Creating a GPT3 powered support bot for WP Fusion


I’ve been itching for a few weeks to try training a GPT3 chat bot on some of our WP Fusion support resources.

We have over 400 pages of docs, 4,500 chat logs, and over 18,000 support tickets at our disposal, so there’s quite a bit of data.

@gilgNYC’s tweet over the weekend about creating a Shopify help center Q&A bot inspired me to give it a try.

I mostly followed LangChain’s ChatLangChain instructions but did need to make some changes to make it work with a different dataset.

If you already have your data in a Notion database, this tutorial may be simpler to follow.

You can try out LangChain’s deployed chatbot here 🦜.

Tools

I’m doing this on my local machine running OSX Ventura. Things will be a bit different on Windows / Linux.

To get going you’ll want:

  • Homebrew package manager
  • OpenAI developer account. This comes with $18 free credit but you’ll probably want to add a credit card. I ended up spending about $65 in API tokens to get the first training working
  • Python 3
  • pip
  • LangChain
  • Weaviate (the cloud service is the simplest to get started with)
  • wget (can be installed with Homebrew)
  • (optional) Some kind of API testing tool for managing the data stored in Weaviate. I use Insomnia.

Getting started

Create a directory to store everything and check out ChatLangChain using

git clone https://github.com/hwchase17/chat-langchain.git

Switch into the directory and install the requirements using

pip3 install -r requirements.txt  

Now we need to scrape our documentation. For this we use wget, which isn’t available on macs by default, but can be installed using

brew install wget

And then we can scrape the target URL using

wget -r -A.html https://wpfusion.com/documentation/

If everything works you should see your content saved to a bunch of .html files in subdirectories off of your project directory.

The next step is ingesting the data.

Ingesting

The next step is to ingest the data into a vectorstore, in this case Weaviate.

I made a few modifications to the ChatLangChain ingest script. You can see the modified file on our forked version.

Notably:

  1. We use Elementor for our docs and blog content, so the BeautifulSoup HTML parser has been updated to look for any content inside of div.elementor-widget-theme-post-content
  2. I’ve added error handling in check_batch_result() based on the Weaviate docs
  3. The Path() command needs to be updated to point to the directory of your scraped HTML files
  4. .rglob("*"): was changed to .rglob("*.html"): to exclude any non-HTML files (RSS feeds, etc)
  5. url = str(p).replace("index.html", "") removes the .html extension from the files before they are synced to Weaviate— since our documentation resources don’t end in .html, this avoids 404 errors
  6. client.schema.delete_class("Paragraph") will throw an error the first time you run the script since the Paragraph class doesn’t exist. I’ve updated this to client.schema.delete_all(). This means every time you run the script, the Weaviate data will be reset and re-ingested
    • Q: Can we somehow update this so new resources can be ingested without erasing the pervious data, while avoiding duplicates? 🤔
  7. I changed the model from ada to curie. This should in theory give better results, but it is more expensive to ingest. For an overview of the models see the OpenAI docs.
  8. Finally I set some batch parameters using client.batch(). I was running into API limits exceeded errors trying to sync all 2,095 objects at once. So this breaks them up into smaller chunks, and also assigns check_batch_result as the error handler. For more info see the Weaviate Python docs.

First set your environment variables using

export OPENAI_API_KEY=xxxx

export WEAVIATE_URL="https://yyyy.weaviate.network/"

And then run the ingest using

python3 ingest.py

If everything is working correctly, your HTML documents will be loaded, chunked, and sent to Weaviate in batches.

To confirm everything worked, I like to load up the Weaviate objects endpoint in Insomnia just to check that the data is there and the URLs are correct:

Optional: You can add additional example questions and responses in the ingest_examples.py file and ingest those by running

python3 ingest_examples.py

Customize the prompt

Finally, to get optimal answers, you should customize the prompt in the chain.py file.

The LangChain blog has some info on how they engineered their prompt:

You are an AI assistant for the open source library LangChain. The documentation is located at https://langchain.readthedocs.io.
You are given the following extracted parts of a long document and a question. Provide a conversational answer with a hyperlink to the documentation.
You should only use hyperlinks that are explicitly listed as a source in the context. Do NOT make up a hyperlink that is not listed.
If the question includes a request for code, provide a code block directly from the documentation.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about LangChain, politely inform them that you are tuned to only answer questions about LangChain.
Question: {question}
=========
{context}
=========
Answer in Markdown:

For our purposes, we’ve just replaced the references to LangChain with WP Fusion, and left the rest the same.

I may run some experiments with different prompts later 📝

Start a chat

Finally, you can start a chat with your model by running

python3 app.py

This will start a Gradio server running on your local machine, and give you the URL to connect to it. In my case it’s http://127.0.0.1:7860

Open the URL in your browser, paste in your OpenAI key, and start a chat. If everything is working, you should get a response. If not, check your console for any errors.

To share the chat widget publicly, set share=True in the launch() function and Gradio will give you a public URL where others can try out your bot.

Examples

Here we can see the bot has correctly understood WP Fusion’s features with regards to WooCommerce, and can provide a code example from the docs for customizing which order statuses are synced with your CRM.

Here the bot understands that WP Fusion has a Paid Memberships Pro integration, the available features, and also the ecommerce plugins that WP Fusion supports.

It can suggest some basic strategies for using WP Fusion with an LMS, and provides steps for setting up the initial connection to HubSpot.

Each reply links to the relevant page in the WP Fusion documentation.

The bot can understand the gist of a question and locate the relevant article.

When it can’t help, it provides contact information.

The bot is not as good as ChatGPT at modifying an existing code snippet. While the answer here is correct, I would have preferred a complete example for the pending order status like in the previous prompt.


Leave a Reply

Your email address will not be published. Required fields are marked *