I’ve been itching for a few weeks to try training a GPT3 chat bot on some of our WP Fusion support resources.
We have over 400 pages of docs, 4,500 chat logs, and over 18,000 support tickets at our disposal, so there’s quite a bit of data.
@gilgNYC’s tweet over the weekend about creating a Shopify help center Q&A bot inspired me to give it a try.
I mostly followed LangChain’s ChatLangChain instructions but did need to make some changes to make it work with a different dataset.
If you already have your data in a Notion database, this tutorial may be simpler to follow.
You can try out LangChain’s deployed chatbot here 🦜.
Tools
I’m doing this on my local machine running OSX Ventura. Things will be a bit different on Windows / Linux.
To get going you’ll want:
- Homebrew package manager
- OpenAI developer account. This comes with $18 free credit but you’ll probably want to add a credit card. I ended up spending about $65 in API tokens to get the first training working
- Python 3
- pip
- LangChain
- Weaviate (the cloud service is the simplest to get started with)
wget
(can be installed with Homebrew)- (optional) Some kind of API testing tool for managing the data stored in Weaviate. I use Insomnia.
Getting started
Create a directory to store everything and check out ChatLangChain using
git clone https://github.com/hwchase17/chat-langchain.git
Switch into the directory and install the requirements using
pip3 install -r requirements.txt
Now we need to scrape our documentation. For this we use wget
, which isn’t available on macs by default, but can be installed using
brew install wget
And then we can scrape the target URL using
wget -r -A.html https://wpfusion.com/documentation/
If everything works you should see your content saved to a bunch of .html files in subdirectories off of your project directory.

The next step is ingesting the data.
Ingesting
The next step is to ingest the data into a vectorstore, in this case Weaviate.
I made a few modifications to the ChatLangChain ingest script. You can see the modified file on our forked version.
Notably:
- We use Elementor for our docs and blog content, so the BeautifulSoup HTML parser has been updated to look for any content inside of
div.elementor-widget-theme-post-content
- I’ve added error handling in
check_batch_result()
based on the Weaviate docs - The
Path()
command needs to be updated to point to the directory of your scraped HTML files .rglob("*"):
was changed to.rglob("*.html"):
to exclude any non-HTML files (RSS feeds, etc)url = str(p).replace("index.html", "")
removes the .html extension from the files before they are synced to Weaviate— since our documentation resources don’t end in .html, this avoids 404 errorsclient.schema.delete_class("Paragraph")
will throw an error the first time you run the script since theParagraph
class doesn’t exist. I’ve updated this toclient.schema.delete_all()
. This means every time you run the script, the Weaviate data will be reset and re-ingested- Q: Can we somehow update this so new resources can be ingested without erasing the pervious data, while avoiding duplicates? 🤔
- I changed the model from
ada
tocurie
. This should in theory give better results, but it is more expensive to ingest. For an overview of the models see the OpenAI docs. - Finally I set some batch parameters using
client.batch()
. I was running into API limits exceeded errors trying to sync all 2,095 objects at once. So this breaks them up into smaller chunks, and also assignscheck_batch_result
as the error handler. For more info see the Weaviate Python docs.
First set your environment variables using
export OPENAI_API_KEY=xxxx
export WEAVIATE_URL="https://yyyy.weaviate.network/"
And then run the ingest using
python3 ingest.py
If everything is working correctly, your HTML documents will be loaded, chunked, and sent to Weaviate in batches.

To confirm everything worked, I like to load up the Weaviate objects endpoint in Insomnia just to check that the data is there and the URLs are correct:

Optional: You can add additional example questions and responses in the ingest_examples.py
file and ingest those by running
python3 ingest_examples.py
Customize the prompt
Finally, to get optimal answers, you should customize the prompt in the chain.py
file.
The LangChain blog has some info on how they engineered their prompt:
You are an AI assistant for the open source library LangChain. The documentation is located at https://langchain.readthedocs.io. You are given the following extracted parts of a long document and a question. Provide a conversational answer with a hyperlink to the documentation. You should only use hyperlinks that are explicitly listed as a source in the context. Do NOT make up a hyperlink that is not listed. If the question includes a request for code, provide a code block directly from the documentation. If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer. If the question is not about LangChain, politely inform them that you are tuned to only answer questions about LangChain. Question: {question} ========= {context} ========= Answer in Markdown:
For our purposes, we’ve just replaced the references to LangChain with WP Fusion, and left the rest the same.
I may run some experiments with different prompts later 📝
Start a chat
Finally, you can start a chat with your model by running
python3 app.py
This will start a Gradio server running on your local machine, and give you the URL to connect to it. In my case it’s http://127.0.0.1:7860
Open the URL in your browser, paste in your OpenAI key, and start a chat. If everything is working, you should get a response. If not, check your console for any errors.
To share the chat widget publicly, set share=True
in the launch()
function and Gradio will give you a public URL where others can try out your bot.
Examples

Here we can see the bot has correctly understood WP Fusion’s features with regards to WooCommerce, and can provide a code example from the docs for customizing which order statuses are synced with your CRM.

Here the bot understands that WP Fusion has a Paid Memberships Pro integration, the available features, and also the ecommerce plugins that WP Fusion supports.
It can suggest some basic strategies for using WP Fusion with an LMS, and provides steps for setting up the initial connection to HubSpot.
Each reply links to the relevant page in the WP Fusion documentation.

The bot can understand the gist of a question and locate the relevant article.

When it can’t help, it provides contact information.

The bot is not as good as ChatGPT at modifying an existing code snippet. While the answer here is correct, I would have preferred a complete example for the pending
order status like in the previous prompt.