Provider Setup Guide
Configure and optimize each AI provider for llmswap.
Table of contents
- Overview
- OpenAI
- Anthropic Claude
- Google Gemini
- IBM Watson
- Groq
- Cohere
- Perplexity
- Ollama (Local Models)
- macOS/Linux
- Windows
- Download from ollama.ai
Overview
llmswap supports 8 AI providers, each with unique strengths:
Provider | Best For | Starting Cost | Speed |
---|---|---|---|
OpenAI | General purpose, GPT-4 | $0.01/1K tokens | Fast |
Anthropic | Long context, Claude 3.5 | $0.003/1K tokens | Fast |
Google Gemini | Cost-effective, multimodal | $0.00025/1K tokens | Fast |
IBM Watson | Enterprise, secure | $0.0002/1K tokens | Medium |
Groq | Ultra-fast inference | $0.00005/1K tokens | Ultra-fast |
Cohere | RAG, enterprise search | $0.0005/1K tokens | Fast |
Perplexity | Web-connected, search | $0.0002/1K tokens | Medium |
Ollama | Local, private | Free | Varies |
OpenAI
Setup
- Get API Key: platform.openai.com/api-keys
- Set Environment Variable:
export OPENAI_API_KEY="sk-..."
Available Models
gpt-4o
- Most capable, multimodal (default)gpt-4-turbo
- Faster GPT-4 variantgpt-4
- Original GPT-4gpt-4o-mini
- Small, fast, cheapo1-preview
- Reasoning modelo1-mini
- Smaller reasoning modelgpt-3.5-turbo
- Fast, affordable
Configuration
# Set default model
llmswap config set provider.models.openai gpt-4-turbo
# Use specific model
llmswap chat --provider openai --model gpt-4o-mini
Best Practices
- Use
gpt-4o-mini
for simple tasks (90% cheaper) - Use
gpt-4o
for complex reasoning - Use
o1-preview
for math/coding problems
Anthropic Claude
Setup
- Get API Key: console.anthropic.com
- Set Environment Variable:
export ANTHROPIC_API_KEY="sk-ant-..."
Available Models
claude-3-5-sonnet-20241022
- Latest, most capable (default)claude-3-opus
- Powerful, more expensiveclaude-3-5-haiku
- Fast, affordableclaude-3-haiku
- Fastest, cheapest
Configuration
# Set default model
llmswap config set provider.models.anthropic claude-3-5-sonnet-20241022
# Use specific model
llmswap chat --provider anthropic --model claude-3-haiku
Best Practices
- Claude excels at long documents (200K context)
- Best for creative writing and analysis
- Use Haiku for simple tasks (10x cheaper)
Google Gemini
Setup
- Get API Key: makersuite.google.com/app/apikey
- Set Environment Variable:
export GEMINI_API_KEY="..."
Available Models
gemini-1.5-pro
- Most capable (default)gemini-1.5-flash
- Fast, efficientgemini-2.0-flash-exp
- Experimental, cutting-edge
Configuration
# Set default model
llmswap config set provider.models.gemini gemini-1.5-flash
# Use specific model
llmswap chat --provider gemini --model gemini-2.0-flash-exp
Best Practices
- Extremely cost-effective (90% cheaper than GPT-4)
- Great for multimodal tasks
- Flash models are perfect for high-volume
IBM Watson
Setup
- Get Credentials: cloud.ibm.com/catalog/services/watsonx-ai
- Set Environment Variables:
export WATSONX_API_KEY="..." export WATSONX_PROJECT_ID="..."
Available Models
granite-13b-chat
- General purpose (default)granite-3.1-8b-instruct
- Efficientgranite-3.1-2b-instruct
- Lightweight
Configuration
# Set default model
llmswap config set provider.models.watsonx granite-3.1-8b-instruct
# Use Watson
llmswap chat --provider watsonx
Best Practices
- Enterprise-grade security and compliance
- Best for regulated industries
- Supports custom model deployment
Groq
Setup
- Get API Key: console.groq.com
- Set Environment Variable:
export GROQ_API_KEY="gsk_..."
Available Models
llama-3.3-70b-versatile
- Most capable (default)llama-3.1-8b-instant
- Ultra-fastmixtral-8x7b-32768
- Good for code
Configuration
# Set default model
llmswap config set provider.models.groq llama-3.1-8b-instant
# Use Groq for speed
llmswap generate "python web scraper" --provider groq
Best Practices
- Fastest inference (840+ tokens/second)
- Perfect for real-time applications
- 5-15x faster than other providers
Cohere
Setup
- Get API Key: dashboard.cohere.com/api-keys
- Set Environment Variable:
export COHERE_API_KEY="..."
Available Models
command-r-plus-08-2024
- Most capable (default)command-r-03-2024
- Efficientaya-expanse-32b
- Multilingual
Configuration
# Set default model
llmswap config set provider.models.cohere command-r-plus-08-2024
# Use Cohere
llmswap chat --provider cohere
Best Practices
- Excellent for RAG applications
- Strong multilingual support
- Good for enterprise search
Perplexity
Setup
- Get API Key: perplexity.ai/settings/api
- Set Environment Variable:
export PERPLEXITY_API_KEY="pplx-..."
Available Models
sonar-pro
- Web-connected (default)sonar
- Standard web searchsonar-reasoning
- Complex reasoning
Configuration
# Set default model
llmswap config set provider.models.perplexity sonar-pro
# Use for web-connected queries
llmswap ask "latest news about AI" --provider perplexity
Best Practices
- Real-time web access for current information
- Automatic source citations
- Best for research and fact-checking
Ollama (Local Models)
Setup
Windows
Download from ollama.ai
2. **Pull Models**:
```bash
# Popular models
ollama pull llama3.1 # 8B parameters
ollama pull mistral # 7B parameters
ollama pull codellama # Code-focused
ollama pull phi3 # Microsoft's 3.8B
- Start Ollama Service:
ollama serve # Usually auto-starts
Available Models
Run ollama list
to see installed models:
llama3.1
- Meta’s latest (default)mistral
- Fast, efficientcodellama
- Optimized for codephi3
- Small but capablegemma2
- Google’s open modelqwen2.5-coder
- Excellent for coding
Configuration
# Set default model
llmswap config set provider.models.ollama llama3.1
# Use local model
llmswap chat --provider ollama --model codellama
Best Practices
- Completely free and private
- No internet required after setup
- Speed depends on your hardware
- Use smaller models (3B-7B) for speed
- Use larger models (13B-70B) for quality
Provider Selection Strategy
By Use Case
Use Case | Recommended Provider | Why |
---|---|---|
General chat | Anthropic Claude | Best conversation quality |
Code generation | OpenAI GPT-4 | Strong coding abilities |
Fast responses | Groq | 5-15x faster |
Cost savings | Gemini | 90% cheaper |
Privacy | Ollama | Local, no data sharing |
Web search | Perplexity | Real-time information |
Enterprise | Watson | Compliance, security |
RAG/Search | Cohere | Optimized for retrieval |
By Budget
# See cost comparison
llmswap compare --input-tokens 10000 --output-tokens 5000
# Monthly cost estimate
llmswap costs --estimate --daily-queries 100
Auto-Fallback Chain
llmswap automatically falls back through providers if one fails:
- Anthropic (if configured)
- OpenAI (if configured)
- Gemini (if configured)
- Cohere (if configured)
- Perplexity (if configured)
- Watson (if configured)
- Groq (if configured)
- Ollama (if available)
Configure fallback order:
llmswap config set provider.fallback_order "anthropic,gemini,ollama"
Troubleshooting
Check Provider Status
llmswap providers
Common Issues
“API key invalid”
- Verify key is correct:
echo $PROVIDER_API_KEY
- Check key permissions on provider dashboard
- Regenerate key if needed
“Rate limit exceeded”
- Wait a few minutes
- Upgrade your plan
- Switch to different provider:
/switch gemini
“Model not found”
- Check available models:
llmswap config get provider.models
- Update model name:
llmswap config set provider.models.openai gpt-4o
“Ollama not responding”
- Check service:
curl http://localhost:11434/api/tags
- Restart:
ollama serve
- Verify model installed:
ollama list
Ready to start? Check our Getting Started guide or explore Examples.