RAG Service
The RAG service is a Python web server running on Flask, which acts as the gateway to modern AI, NLP, and ML technologies. It handles the creation of embeddings, querying large language models (LLMs) for the RAGbot, and similar operations.
High Resources
Depending on the use case and the available machine, the RAG service may be resource-intensive, particularly in terms of RAM and GPU usage.
User Setup
For this, it is assumed that the repository has already been cloned in a prior step. Afterwards, simply start the service via Docker Compose:
The RAG service should now be up and running on the port mapped in the Dockerfile. See further down for information about model usage and customizable settings of the service.
Developer Setup
After cloning the repository, navigate to the rag folder. There, create a new Python environment using a tool of your choice:
The name env is already included in .gitignore, so it's recommended to use that name if possible.
Activate the environment:
Activate on Windows
On Windows, activate the environment with the following command:
Then, install the dependencies and start the service:
Settings
The RAGBot can be configured with different LLMs for inference and user interaction. It is recommended to use either an OpenAI model (in which case an OpenAI API key must be set) or a locally hosted Ollama server.
In both cases, the configuration must be defined in the UCE Configuration. The relevant excerpt is shown below:
Example RAGBot settings within uceConfig.json
{
"settings": {
"rag": {
"models": [
{
"model": "ollama/gemma3:latest",
"url": "http://ollama.llm.texttechnologylab.org/",
"apiKey": "",
"displayName": "Gemma3 (4.3B - Google)"
},
{
"model": "ollama/deepseek-r1:latest",
"url": "http://ollama.llm.texttechnologylab.org/",
"apiKey": "",
"displayName": "DeepSeek-R1 (7.6B - DeepSeek)"
},
{
"model": "openai/o4-mini-2025-04-16",
"url": "",
"apiKey": "YOUR_API_KEY",
"displayName": "OpenAI's O4 Mini"
}
]
}
}
}
For OpenAI models, you can specify any model name listed in the OpenAI model catalog. UCE provides the option for users to select which model they want to use from the list of models defined in the UCE configuration.