(2025-01-31) Lab Notebook: How to run RAG on the HPCC

Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.

Retrieval-Augmented Generation (RAG) is a technique that grants generative artificial intelligence models information retrieval capabilities. It modifies interactions with a large language model so that the model responds to user queries with reference to a specified set of documents. This allows the model to generate more accurate and relevant responses.

At this time these apps can only access files already on the HPCC. If you want to transfer files from your local machine to the HPCC, view our File Transfer documentation.

OpenWebUI

Before you open the OpenWebUI OnDemand app, click on the advanced options. Here you will see a few options related to RAG. You can set a RAG directory, which is where the model will look for documents; by default, this is ~/.cache/openwebui/docs, but you can change it to any directory you want or place a file in that directory. Here you can also set the 'RAG Top K', which sets the number of results to consider when using RAG, with a default of 5. If you have a SQLite or Postgres database, you can set the 'Database URL' to the location of the database. Once these are set, then launch the app.

Once started, you cannot change these settings; you will need to stop the app and restart a new session to change these settings.

Once the app is open, you can either click the '+' plus icon on the left side of the message window to upload a new document, or if you have files in your RAG directory (either the one you specified or the default), you can type '#' a pound sign and then the name of the file you want to use; this will autocomplete the file name. Once you have the file name, you can press enter, and it will add the file to the conversation context.

To change some RAG settings or refresh the directory, click on the user icon in the top right corner, then click 'Admin Panel', then at the top click on 'Settings', and then on the left side click on 'Documents'.

Here you can scan the RAG directory for new files, change the Embedding Model, and change some other settings.

OpenWebUI also lets you perform web searches. To enable this, navigate to the 'Admin Panel', then 'Settings', then 'Web Search', and then enable the 'Web Search' toggle and select the search engine you want to use.

When in a conversation, now click the '+' plus icon on the left side of the message window and select to enable the web search option. The model will now use the web search engine you selected to find relevant information for the conversation.

For more information, please see the OpenWebUI RAG docs.

LM Studio

Open the LM Studio OnDemand app, or load the LM Studio module and run LM_Studio in the terminal. Once open, select the yellow chat icon on the left side of the screen. At the top, you can select which model you would like to use.

In the bottom message window, there is a paperclip icon. Click on this to upload your documents. It will open up a files window in your home directory. Select one or multiple files to attach and then click the 'Open' button in the top right corner. You will see your file now attached to the current message. The next message you send will do one of two things:

Inject full context: If the file is small enough, LM Studio will inject the full context of the file into the conversation (i.e., the full text of the file will be put into the conversation context).
RAG: If the file is too large, LM Studio will try to retrieve the most relevant parts of the file to the conversation and add those to the conversation context.

For more information, please see the LM Studio RAG docs.