Troubleshooting
Unknown Runtime Specified nvidia
Symptom: Docker Compose fails with an error similar to:
Error response from daemon: Unknown runtime specified nvidia
Cause: The NVIDIA Container Runtime is not installed or not configured as a Docker runtime.
Fix:
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Then retry docker compose up.
Out-of-Memory (OOM)
Symptom: The inference container is killed by the OOM killer, or you see errors such as CUDA out of memory or cudaMalloc failed.
Cause: The model and/or context window exceed the available shared RAM on the Jetson.
Fix:
- Reduce
N_GPU_LAYERSto offload fewer layers to the GPU (e.g., change from-1to20). - Reduce
N_CTXto lower the KV cache memory footprint (e.g.,4096→2048). - Switch to a smaller or more aggressively quantized model (e.g., Q4_K_M → Q2_K, or 7B → 3B).
See Model Selection & Sizing for recommended settings.
Inference Is Very Slow
Symptom: Token generation is much slower than expected (e.g., < 1 token/second).
Cause: The model is running on CPU only, or clock speeds are throttled due to thermal limits.
Fix:
- Verify that
N_GPU_LAYERSis set to a non-zero value (use-1to offload all layers). - Check that the NVIDIA Container Runtime is active and the container has GPU access:
docker exec <inference-container-name> nvidia-smi - Set the maximum power mode and optionally lock clocks:
sudo nvpmodel -m 0
sudo jetson_clocks - Check thermal throttling — if the device is overheating it will reduce clock speeds. Ensure adequate cooling.
Model Not Found
Symptom: The inference server fails to start with a message such as model file not found or no such file or directory.
Cause: The GGUF model file is not present at the path specified by MODEL_PATH.
Fix:
- Confirm the model file exists on the host:
ls -lh ./models/ - Check that
MODEL_PATHin your.envfile matches the actual filename, including the full path relative to the project root. - Ensure the
models/directory is correctly mounted into the container (check thevolumessection indocker-compose.jetson.yml).
CUDA Driver Version Mismatch
Symptom: Container logs show an error such as:
CUDA driver version is insufficient for CUDA runtime version
Cause: The JetPack / L4T version on the host does not match the CUDA version expected by the container image.
Fix:
- Verify your JetPack version:
cat /etc/nv_tegra_release - Ensure you are running JetPack 6.x (L4T R36.x). Earlier JetPack versions are not supported.
- If you have recently updated JetPack, rebuild the inference image to pick up the correct L4T base:
docker compose -f docker-compose.yml -f docker-compose.jetson.yml build inference
docker compose -f docker-compose.yml -f docker-compose.jetson.yml up -d