Troubleshooting Guide¶
This guide helps you diagnose and fix common issues with scmd.
Quick Diagnosis¶
First step: Run the doctor command to check your setup:
This will check: - ✅ scmd installation - ✅ Models downloaded - ✅ llama-server availability - ✅ System resources - ✅ Backend connectivity
Common Issues¶
1. "Cannot connect to llama-server"¶
Problem: Commands fail with connection errors.
Solutions:
-
Let scmd auto-start the server (recommended):
-
Manually start the server:
-
Check server status:
-
Use a cloud backend instead:
2. "llama-server not found"¶
Problem: llama-server binary is not installed.
Solutions:
macOS:
Linux:
# Build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make llama-server
sudo cp llama-server /usr/local/bin/
Alternative: Use cloud backends (no local installation needed):
3. GPU Out of Memory (OOM) Crashes¶
Problem: Server crashes with "kIOGPUCommandBufferCallbackErrorOutOfMemory"
Solutions:
-
Restart in CPU mode (slower but stable):
-
Use smaller context size:
-
Switch to smaller model:
-
Close other applications using GPU/memory
-
Check memory with doctor:
Prevention: scmd now auto-detects available memory and tunes configuration automatically!
4. CPU Mode is Very Slow¶
Problem: Queries take 30-60+ seconds.
Explanation: CPU-only inference is inherently slow. This is expected behavior.
Solutions:
-
Enable GPU acceleration (if you have a GPU):
-
Use smaller model:
-
Use cloud backend for faster results:
-
Use Groq (free tier, very fast):
Performance expectations: - CPU mode: 30-60 seconds per query (0.2-0.5 tokens/sec) - GPU mode (M1/M2): 2-5 seconds per query (~20 tokens/sec) - Cloud (OpenAI/Groq): 1-3 seconds per query
5. Model Not Downloaded¶
Problem: "Model 'xxx' not found"
Solution:
Models download automatically on first use, but you can also download manually:
# List available models
scmd models list
# Download specific model
scmd models download qwen2.5-1.5b
# Check downloaded models
scmd doctor
6. Port 8089 Already in Use¶
Problem: Another process is using port 8089.
Solutions:
-
Let scmd use the existing server:
-
Stop the conflicting process:
-
Stop scmd's server:
7. Commands Fail with Generic "Error"¶
Problem: Old behavior - should be fixed now!
Solution: Update to latest version with improved error messages:
New error messages include: - ❌ Clear description of the problem - 💡 2-4 actionable solutions - 🔗 Links to relevant documentation
8. Server Won't Start¶
Problem: scmd server start fails.
Debug steps:
-
Check logs:
-
Run doctor:
-
Try manual start with debug:
-
Check disk space:
-
Check llama-server installation:
Environment Variables¶
Control scmd behavior with environment variables:
# Disable auto-start (for debugging)
export SCMD_NO_AUTOSTART=1
# Enable debug output
export SCMD_DEBUG=1
# Set custom data directory
export SCMD_DATA_DIR=~/custom/path
# Suppress progress messages
export SCMD_QUIET=1
Getting More Help¶
-
Check logs:
-
Run diagnostics:
-
Enable debug mode:
-
Report issue:
- GitHub: https://github.com/scmd/scmd/issues
- Include output from
scmd doctor - Include relevant error messages
Performance Tuning¶
Memory-Constrained Systems (< 8GB RAM)¶
# Use smallest model
scmd server start -m qwen2.5-0.5b --cpu
# Or use cloud backend
export GROQ_API_KEY=your-key
scmd -b groq
High-Performance Systems (16+ GB RAM)¶
M1/M2 Macs (8GB)¶
# Recommended: Medium model with auto-tuned settings
scmd server start -m qwen2.5-3b
# scmd will auto-tune context size and GPU layers
Verifying Installation¶
After installation, verify everything works:
# 1. Check installation
scmd doctor
# 2. Start server (should auto-start, but let's be explicit)
scmd server start
# 3. Test with simple query
echo "Hello world" | scmd /explain
# 4. Check status
scmd server status
# 5. View logs
scmd server logs --tail 20
Expected output: - ✅ All scmd doctor checks pass (or have helpful recommendations) - ✅ Server starts within 10-30 seconds - ✅ Test query completes successfully - ✅ No errors in logs
Uninstalling¶
To completely remove scmd:
# 1. Stop server
scmd server stop
# 2. Remove data directory
rm -rf ~/.scmd
# 3. Remove binary
rm $(which scmd)
# 4. (Optional) Uninstall llama.cpp
brew uninstall llama.cpp
FAQ¶
Q: Do I need to manually start llama-server?
A: No! As of the latest version, scmd automatically starts llama-server when needed.
Q: Can I use scmd without installing llama.cpp?
A: Yes! Use cloud backends:
Q: Why is CPU mode so slow?
A: CPU inference is inherently slow (30-60s per query). Use GPU mode or cloud backends for better performance.
Q: How much disk space do I need?
A: Models range from 400MB to 5GB: - qwen2.5-0.5b: ~400MB - qwen2.5-1.5b: ~1GB (default) - qwen2.5-3b: ~2GB - qwen3-4b: ~2.6GB - qwen2.5-7b: ~4.7GB
Q: Can I use multiple models?
A: Yes! Download multiple models and switch between them:
Q: Is my data sent to the cloud?
A: When using llama.cpp backend: No - everything runs locally. When using cloud backends (OpenAI, Groq, etc): Yes - data is sent to their servers.
Last Updated: January 2026 Version: 1.0.0