Troubleshooting Guide¶

This guide helps you diagnose and fix common issues with scmd.

Quick Diagnosis¶

First step: Run the doctor command to check your setup:

scmd doctor

This will check: - ✅ scmd installation - ✅ Models downloaded - ✅ llama-server availability - ✅ System resources - ✅ Backend connectivity

Common Issues¶

1. "Cannot connect to llama-server"¶

Problem: Commands fail with connection errors.

Solutions:

Let scmd auto-start the server (recommended):

# Just run your command - server will start automatically
echo "Hello" | scmd /explain

Manually start the server:
```
scmd server start
```
Check server status:
```
scmd server status
```

Use a cloud backend instead:

export OPENAI_API_KEY=your-key
scmd -b openai /explain code.go

2. "llama-server not found"¶

Problem: llama-server binary is not installed.

Solutions:

macOS:

brew install llama.cpp

Linux:

# Build from source
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make llama-server
sudo cp llama-server /usr/local/bin/

Alternative: Use cloud backends (no local installation needed):

export OPENAI_API_KEY=your-key
scmd -b openai /explain

3. GPU Out of Memory (OOM) Crashes¶

Problem: Server crashes with "kIOGPUCommandBufferCallbackErrorOutOfMemory"

Solutions:

Restart in CPU mode (slower but stable):
```
scmd server restart --cpu
```
Use smaller context size:
```
scmd server restart -c 2048
```
Switch to smaller model:
```
scmd server start -m qwen2.5-1.5b
```
Close other applications using GPU/memory
Check memory with doctor:
```
scmd doctor
```

Prevention: scmd now auto-detects available memory and tunes configuration automatically!

4. CPU Mode is Very Slow¶

Problem: Queries take 30-60+ seconds.

Explanation: CPU-only inference is inherently slow. This is expected behavior.

Solutions:

Enable GPU acceleration (if you have a GPU):
```
scmd server restart --gpu
```

Use smaller model:

scmd server start -m qwen2.5-0.5b  # Fastest

Use cloud backend for faster results:

export OPENAI_API_KEY=your-key
scmd -b openai /explain

Use Groq (free tier, very fast):

export GROQ_API_KEY=your-key
scmd -b groq /explain

Performance expectations: - CPU mode: 30-60 seconds per query (0.2-0.5 tokens/sec) - GPU mode (M1/M2): 2-5 seconds per query (~20 tokens/sec) - Cloud (OpenAI/Groq): 1-3 seconds per query

5. Model Not Downloaded¶

Problem: "Model 'xxx' not found"

Solution:

Models download automatically on first use, but you can also download manually:

# List available models
scmd models list

# Download specific model
scmd models download qwen2.5-1.5b

# Check downloaded models
scmd doctor

6. Port 8089 Already in Use¶

Problem: Another process is using port 8089.

Solutions:

Let scmd use the existing server:

# scmd will automatically detect and use it
scmd /explain code.go

Stop the conflicting process:

# Find process on port 8089
lsof -ti:8089

# Kill it
kill $(lsof -ti:8089)

# Restart scmd server
scmd server start

Stop scmd's server:
```
scmd server stop
```

7. Commands Fail with Generic "Error"¶

Problem: Old behavior - should be fixed now!

Solution: Update to latest version with improved error messages:

# Build latest from source
go build -o scmd cmd/scmd/main.go

New error messages include: - ❌ Clear description of the problem - 💡 2-4 actionable solutions - 🔗 Links to relevant documentation

8. Server Won't Start¶

Problem: scmd server start fails.

Debug steps:

Check logs:
```
scmd server logs
```
Run doctor:
```
scmd doctor
```
Try manual start with debug:
```
SCMD_DEBUG=1 scmd server start
```
Check disk space:
```
df -h ~/.scmd
```
Check llama-server installation:
```
which llama-server
llama-server --help
```

Environment Variables¶

Control scmd behavior with environment variables:

# Disable auto-start (for debugging)
export SCMD_NO_AUTOSTART=1

# Enable debug output
export SCMD_DEBUG=1

# Set custom data directory
export SCMD_DATA_DIR=~/custom/path

# Suppress progress messages
export SCMD_QUIET=1

Getting More Help¶

Check logs:

scmd server logs
tail -f ~/.scmd/logs/llama-server.log

Run diagnostics:
```
scmd doctor
```
Enable debug mode:
```
SCMD_DEBUG=1 scmd /explain test.go
```
Report issue:
GitHub: https://github.com/scmd/scmd/issues
Include output from scmd doctor
Include relevant error messages

Performance Tuning¶

Memory-Constrained Systems (< 8GB RAM)¶

# Use smallest model
scmd server start -m qwen2.5-0.5b --cpu

# Or use cloud backend
export GROQ_API_KEY=your-key
scmd -b groq

High-Performance Systems (16+ GB RAM)¶

# Use larger model with more context
scmd server start -m qwen2.5-7b -c 8192 --gpu

M1/M2 Macs (8GB)¶

# Recommended: Medium model with auto-tuned settings
scmd server start -m qwen2.5-3b
# scmd will auto-tune context size and GPU layers

Verifying Installation¶

After installation, verify everything works:

# 1. Check installation
scmd doctor

# 2. Start server (should auto-start, but let's be explicit)
scmd server start

# 3. Test with simple query
echo "Hello world" | scmd /explain

# 4. Check status
scmd server status

# 5. View logs
scmd server logs --tail 20

Expected output: - ✅ All scmd doctor checks pass (or have helpful recommendations) - ✅ Server starts within 10-30 seconds - ✅ Test query completes successfully - ✅ No errors in logs

Uninstalling¶

To completely remove scmd:

# 1. Stop server
scmd server stop

# 2. Remove data directory
rm -rf ~/.scmd

# 3. Remove binary
rm $(which scmd)

# 4. (Optional) Uninstall llama.cpp
brew uninstall llama.cpp

FAQ¶

Q: Do I need to manually start llama-server?

A: No! As of the latest version, scmd automatically starts llama-server when needed.

Q: Can I use scmd without installing llama.cpp?

A: Yes! Use cloud backends:

export OPENAI_API_KEY=your-key
scmd -b openai /explain

Q: Why is CPU mode so slow?

A: CPU inference is inherently slow (30-60s per query). Use GPU mode or cloud backends for better performance.

Q: How much disk space do I need?

A: Models range from 400MB to 5GB: - qwen2.5-0.5b: ~400MB - qwen2.5-1.5b: ~1GB (default) - qwen2.5-3b: ~2GB - qwen3-4b: ~2.6GB - qwen2.5-7b: ~4.7GB

Q: Can I use multiple models?

A: Yes! Download multiple models and switch between them:

scmd server start -m qwen2.5-3b
scmd server restart -m qwen2.5-1.5b

Q: Is my data sent to the cloud?

A: When using llama.cpp backend: No - everything runs locally. When using cloud backends (OpenAI, Groq, etc): Yes - data is sent to their servers.

Last Updated: January 2026 Version: 1.0.0