Creating a GPT3 powered support bot for WP Fusion

I’ve been itching for a few weeks to try training a GPT3 chat bot on some of our WP Fusion support resources.

We have over 400 pages of docs, 4,500 chat logs, and over 18,000 support tickets at our disposal, so there’s quite a bit of data.

@gilgNYC’s tweet over the weekend about creating a Shopify help center Q&A bot inspired me to give it a try.

https://twitter.com/gilgNYC/status/1619493246375432194?s=20&t=9VyyVto3kE9B8VK2XVrKjg

I mostly followed LangChain’s ChatLangChain instructions but did need to make some changes to make it work with a different dataset.

If you already have your data in a Notion database, this tutorial may be simpler to follow.

You can try out LangChain’s deployed chatbot here 🦜.

Tools

I’m doing this on my local machine running OSX Ventura. Things will be a bit different on Windows / Linux.

To get going you’ll want:

Homebrew package manager
OpenAI developer account. This comes with $18 free credit but you’ll probably want to add a credit card. I ended up spending about $65 in API tokens to get the first training working
Python 3
pip
LangChain
Weaviate (the cloud service is the simplest to get started with)
wget (can be installed with Homebrew)
(optional) Some kind of API testing tool for managing the data stored in Weaviate. I use Insomnia.

Getting started

Create a directory to store everything and check out ChatLangChain using

git clone https://github.com/hwchase17/chat-langchain.git

Switch into the directory and install the requirements using

pip3 install -r requirements.txt

Now we need to scrape our documentation. For this we use wget, which isn’t available on macs by default, but can be installed using

brew install wget

And then we can scrape the target URL using

wget -r -A.html https://wpfusion.com/documentation/

If everything works you should see your content saved to a bunch of .html files in subdirectories off of your project directory.

Screenshot of a computer directory on macOS showcasing folders and files related to software and documentation. The left sidebar lists locations and tags. One folder, wpf_crm_object_type, is highlighted in purple, hinting at WP Fusions integration capabilities within the system.

The next step is ingesting the data.

Ingesting

The next step is to ingest the data into a vectorstore, in this case Weaviate.

I made a few modifications to the ChatLangChain ingest script. You can see the modified file on our forked version.

Notably:

We use Elementor for our docs and blog content, so the BeautifulSoup HTML parser has been updated to look for any content inside of div.elementor-widget-theme-post-content
I’ve added error handling in check_batch_result() based on the Weaviate docs
The Path() command needs to be updated to point to the directory of your scraped HTML files
.rglob("*"): was changed to .rglob("*.html"): to exclude any non-HTML files (RSS feeds, etc)
url = str(p).replace("index.html", "") removes the .html extension from the files before they are synced to Weaviate— since our documentation resources don’t end in .html, this avoids 404 errors
client.schema.delete_class("Paragraph") will throw an error the first time you run the script since the Paragraph class doesn’t exist. I’ve updated this to client.schema.delete_all(). This means every time you run the script, the Weaviate data will be reset and re-ingested
- Q: Can we somehow update this so new resources can be ingested without erasing the pervious data, while avoiding duplicates? 🤔
I changed the model from ada to curie. This should in theory give better results, but it is more expensive to ingest. For an overview of the models see the OpenAI docs.
Finally I set some batch parameters using client.batch(). I was running into API limits exceeded errors trying to sync all 2,095 objects at once. So this breaks them up into smaller chunks, and also assigns check_batch_result as the error handler. For more info see the Weaviate Python docs.

First set your environment variables using

export OPENAI_API_KEY=xxxx

export WEAVIATE_URL="https://yyyy.weaviate.network/"

And then run the ingest using

python3 ingest.py

If everything is working correctly, your HTML documents will be loaded, chunked, and sent to Weaviate in batches.

A computer screen showcasing terminal output reveals a list of green text files and folders linked to various filters and configurations. Warning messages in red stand out, while some informational text may pertain to setting up a support bot for enhanced user interaction. — Screenshot

To confirm everything worked, I like to load up the Weaviate objects endpoint in Insomnia just to check that the data is there and the URLs are correct:

A dark-themed interface of the Insomnia app displays a GET request to a login API, powered by GPT-3. The left panel shows the request URL and settings, while the right panel displays a JSON response. The header reads Insomnia with options like Create and Sign In visible.

Optional: You can add additional example questions and responses in the ingest_examples.py file and ingest those by running

python3 ingest_examples.py

Customize the prompt

Finally, to get optimal answers, you should customize the prompt in the chain.py file.

The LangChain blog has some info on how they engineered their prompt:

You are an AI assistant for the open source library LangChain. The documentation is located at https://langchain.readthedocs.io.
You are given the following extracted parts of a long document and a question. Provide a conversational answer with a hyperlink to the documentation.
You should only use hyperlinks that are explicitly listed as a source in the context. Do NOT make up a hyperlink that is not listed.
If the question includes a request for code, provide a code block directly from the documentation.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about LangChain, politely inform them that you are tuned to only answer questions about LangChain.
Question: {question}
=========
{context}
=========
Answer in Markdown:

For our purposes, we’ve just replaced the references to LangChain with WP Fusion, and left the rest the same.

I may run some experiments with different prompts later 📝

Start a chat

Finally, you can start a chat with your model by running

python3 app.py

This will start a Gradio server running on your local machine, and give you the URL to connect to it. In my case it’s http://127.0.0.1:7860

Open the URL in your browser, paste in your OpenAI key, and start a chat. If everything is working, you should get a response. If not, check your console for any errors.

To share the chat widget publicly, set share=True in the launch() function and Gradio will give you a public URL where others can try out your bot.

Examples

Screenshot of a web page featuring LangChain AI, showcasing GPT-3 in action through a chat interface discussing WooCommerce, WP Fusion, and automation statuses. The interface includes sections for user input and responses from the support bot AI assistant.

Here we can see the bot has correctly understood WP Fusion’s features with regards to WooCommerce, and can provide a code example from the docs for customizing which order statuses are synced with your CRM.

A screenshot of a webpage with an FAQ section dives into Paid Memberships Pro, covering WP Fusion strategies, integration with e-commerce plugins, LMS usage, and HubSpot setup. It also mentions a GPT3 powered support bot for enhanced assistance.

Here the bot understands that WP Fusion has a Paid Memberships Pro integration, the available features, and also the ecommerce plugins that WP Fusion supports.

It can suggest some basic strategies for using WP Fusion with an LMS, and provides steps for setting up the initial connection to HubSpot.

Each reply links to the relevant page in the WP Fusion documentation.

A screenshot of a WP Fusion AI support chat features a user inquiring about configuring a paywall with visibility percentages for WordPress subscribers. The GPT-3-powered support bot responds, advising to consult the documentation for setting up paywalls effectively in WP Fusion.

The bot can understand the gist of a question and locate the relevant article.

An orange speech bubble inquires about troubleshooting a product. A gray one, powered by our support bot, responds with an apology and offers a link for more information on the issue.

When it can’t help, it provides contact information.

Screenshot of a conversation with code snippets for syncing custom WooCommerce order statuses using WP Fusion. The snippets involve `add_action` functions. A support bot assists with the final question on modifying the code to set the order status to pending.

The bot is not as good as ChatGPT at modifying an existing code snippet. While the answer here is correct, I would have preferred a complete example for the pending order status like in the previous prompt.