Placing a Model

RAG-DocBot uses a GGUF-format language model file to power the inference service. You must provide this file yourself.

Steps

Download a compatible GGUF model.

GGUF models are available on Hugging Face. Choose a model appropriate for your hardware (CPU or GPU) and available RAM.
Rename the file to modelfile.gguf.

The inference service expects the file at a specific path. Rename your downloaded file:
```
mv your-downloaded-model.gguf modelfile.gguf
```
Move the file to the models/ directory.
```
mv modelfile.gguf ./models/
```
Start (or restart) the services.
```
docker compose up -d
```
The inference service will load the model on startup.

The model file must be named exactly modelfile.gguf and placed in the models/ directory created by the installer.
Larger models (e.g. 13B+ parameter models) require more RAM. For CPU-only servers, quantised (Q4 or Q5) variants are recommended.
If the inference service fails to start, check docker compose logs inference for model loading errors.