Provider Setup Guide
Configure and optimize each AI provider for llmswap.
Table of contents
- Overview
- OpenAI
- Anthropic Claude
- Google Gemini
- IBM Watson
- Groq
- Cohere
- Perplexity
- Ollama (Local Models)
- macOS/Linux
- Windows
- Download from ollama.ai
Overview
llmswap supports 8 AI providers, each with unique strengths:
| Provider | Best For | Starting Cost | Speed |
|---|---|---|---|
| OpenAI | General purpose, GPT-4 | $0.01/1K tokens | Fast |
| Anthropic | Long context, Claude 3.5 | $0.003/1K tokens | Fast |
| Google Gemini | Cost-effective, multimodal | $0.00025/1K tokens | Fast |
| IBM Watson | Enterprise, secure | $0.0002/1K tokens | Medium |
| Groq | Ultra-fast inference | $0.00005/1K tokens | Ultra-fast |
| Cohere | RAG, enterprise search | $0.0005/1K tokens | Fast |
| Perplexity | Web-connected, search | $0.0002/1K tokens | Medium |
| Ollama | Local, private | Free | Varies |
OpenAI
Setup
- Get API Key: platform.openai.com/api-keys
- Set Environment Variable:
export OPENAI_API_KEY="sk-..."
Available Models
gpt-4o- Most capable, multimodal (default)gpt-4-turbo- Faster GPT-4 variantgpt-4- Original GPT-4gpt-4o-mini- Small, fast, cheapo1-preview- Reasoning modelo1-mini- Smaller reasoning modelgpt-3.5-turbo- Fast, affordable
Configuration
# Set default model
llmswap config set provider.models.openai gpt-4-turbo
# Use specific model
llmswap chat --provider openai --model gpt-4o-mini
Best Practices
- Use
gpt-4o-minifor simple tasks (90% cheaper) - Use
gpt-4ofor complex reasoning - Use
o1-previewfor math/coding problems
Anthropic Claude
Setup
- Get API Key: console.anthropic.com
- Set Environment Variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Available Models
claude-3-5-sonnet-20241022- Latest, most capable (default)claude-3-opus- Powerful, more expensiveclaude-3-5-haiku- Fast, affordableclaude-3-haiku- Fastest, cheapest
Configuration
# Set default model
llmswap config set provider.models.anthropic claude-3-5-sonnet-20241022
# Use specific model
llmswap chat --provider anthropic --model claude-3-haiku
Best Practices
- Claude excels at long documents (200K context)
- Best for creative writing and analysis
- Use Haiku for simple tasks (10x cheaper)
Google Gemini
Setup
- Get API Key: makersuite.google.com/app/apikey
- Set Environment Variable:
export GEMINI_API_KEY="..."
Available Models
gemini-1.5-pro- Most capable (default)gemini-1.5-flash- Fast, efficientgemini-2.0-flash-exp- Experimental, cutting-edge
Configuration
# Set default model
llmswap config set provider.models.gemini gemini-1.5-flash
# Use specific model
llmswap chat --provider gemini --model gemini-2.0-flash-exp
Best Practices
- Extremely cost-effective (90% cheaper than GPT-4)
- Great for multimodal tasks
- Flash models are perfect for high-volume
IBM Watson
Setup
- Get Credentials: cloud.ibm.com/catalog/services/watsonx-ai
- Set Environment Variables:
export WATSONX_API_KEY="..." export WATSONX_PROJECT_ID="..."
Available Models
granite-13b-chat- General purpose (default)granite-3.1-8b-instruct- Efficientgranite-3.1-2b-instruct- Lightweight
Configuration
# Set default model
llmswap config set provider.models.watsonx granite-3.1-8b-instruct
# Use Watson
llmswap chat --provider watsonx
Best Practices
- Enterprise-grade security and compliance
- Best for regulated industries
- Supports custom model deployment
Groq
Setup
- Get API Key: console.groq.com
- Set Environment Variable:
export GROQ_API_KEY="gsk_..."
Available Models
llama-3.3-70b-versatile- Most capable (default)llama-3.1-8b-instant- Ultra-fastmixtral-8x7b-32768- Good for code
Configuration
# Set default model
llmswap config set provider.models.groq llama-3.1-8b-instant
# Use Groq for speed
llmswap generate "python web scraper" --provider groq
Best Practices
- Fastest inference (840+ tokens/second)
- Perfect for real-time applications
- 5-15x faster than other providers
Cohere
Setup
- Get API Key: dashboard.cohere.com/api-keys
- Set Environment Variable:
export COHERE_API_KEY="..."
Available Models
command-r-plus-08-2024- Most capable (default)command-r-03-2024- Efficientaya-expanse-32b- Multilingual
Configuration
# Set default model
llmswap config set provider.models.cohere command-r-plus-08-2024
# Use Cohere
llmswap chat --provider cohere
Best Practices
- Excellent for RAG applications
- Strong multilingual support
- Good for enterprise search
Perplexity
Setup
- Get API Key: perplexity.ai/settings/api
- Set Environment Variable:
export PERPLEXITY_API_KEY="pplx-..."
Available Models
sonar-pro- Web-connected (default)sonar- Standard web searchsonar-reasoning- Complex reasoning
Configuration
# Set default model
llmswap config set provider.models.perplexity sonar-pro
# Use for web-connected queries
llmswap ask "latest news about AI" --provider perplexity
Best Practices
- Real-time web access for current information
- Automatic source citations
- Best for research and fact-checking
Ollama (Local Models)
Setup
Windows
Download from ollama.ai
2. **Pull Models**:
```bash
# Popular models
ollama pull llama3.1 # 8B parameters
ollama pull mistral # 7B parameters
ollama pull codellama # Code-focused
ollama pull phi3 # Microsoft's 3.8B
- Start Ollama Service:
ollama serve # Usually auto-starts
Available Models
Run ollama list to see installed models:
llama3.1- Meta’s latest (default)mistral- Fast, efficientcodellama- Optimized for codephi3- Small but capablegemma2- Google’s open modelqwen2.5-coder- Excellent for coding
Configuration
# Set default model
llmswap config set provider.models.ollama llama3.1
# Use local model
llmswap chat --provider ollama --model codellama
Best Practices
- Completely free and private
- No internet required after setup
- Speed depends on your hardware
- Use smaller models (3B-7B) for speed
- Use larger models (13B-70B) for quality
Provider Selection Strategy
By Use Case
| Use Case | Recommended Provider | Why |
|---|---|---|
| General chat | Anthropic Claude | Best conversation quality |
| Code generation | OpenAI GPT-4 | Strong coding abilities |
| Fast responses | Groq | 5-15x faster |
| Cost savings | Gemini | 90% cheaper |
| Privacy | Ollama | Local, no data sharing |
| Web search | Perplexity | Real-time information |
| Enterprise | Watson | Compliance, security |
| RAG/Search | Cohere | Optimized for retrieval |
By Budget
# See cost comparison
llmswap compare --input-tokens 10000 --output-tokens 5000
# Monthly cost estimate
llmswap costs --estimate --daily-queries 100
Auto-Fallback Chain
llmswap automatically falls back through providers if one fails:
- Anthropic (if configured)
- OpenAI (if configured)
- Gemini (if configured)
- Cohere (if configured)
- Perplexity (if configured)
- Watson (if configured)
- Groq (if configured)
- Ollama (if available)
Configure fallback order:
llmswap config set provider.fallback_order "anthropic,gemini,ollama"
Troubleshooting
Check Provider Status
llmswap providers
Common Issues
“API key invalid”
- Verify key is correct:
echo $PROVIDER_API_KEY - Check key permissions on provider dashboard
- Regenerate key if needed
“Rate limit exceeded”
- Wait a few minutes
- Upgrade your plan
- Switch to different provider:
/switch gemini
“Model not found”
- Check available models:
llmswap config get provider.models - Update model name:
llmswap config set provider.models.openai gpt-4o
“Ollama not responding”
- Check service:
curl http://localhost:11434/api/tags - Restart:
ollama serve - Verify model installed:
ollama list
Ready to start? Check our Getting Started guide or explore Examples.