Skip to main content

Placing a Model

RAG-DocBot uses a GGUF-format language model file to power the inference service. You must provide this file yourself.


Steps

  1. Download a compatible GGUF model.

    GGUF models are available on Hugging Face. Choose a model appropriate for your hardware (CPU or GPU) and available RAM.

  2. Rename the file to modelfile.gguf.

    The inference service expects the file at a specific path. Rename your downloaded file:

    mv your-downloaded-model.gguf modelfile.gguf
  3. Move the file to the models/ directory.

    mv modelfile.gguf ./models/
  4. Start (or restart) the services.

    docker compose up -d

    The inference service will load the model on startup.


Notes

  • The model file must be named exactly modelfile.gguf and placed in the models/ directory created by the installer.
  • Larger models (e.g. 13B+ parameter models) require more RAM. For CPU-only servers, quantised (Q4 or Q5) variants are recommended.
  • If the inference service fails to start, check docker compose logs inference for model loading errors.