Endprompt supports multiple LLM providers, giving you flexibility to choose the best model for each use case.Documentation Index
Fetch the complete documentation index at: https://docs.endprompt.ai/llms.txt
Use this file to discover all available pages before exploring further.
Supported Providers
OpenAI
GPT-4o, GPT-3.5-turbo, gpt-image-1
Anthropic
Claude 3 Opus, Sonnet, Haiku
Gemini 2.0 Flash, Gemini 2.5 Pro
Model Comparison
| Model | Provider | Speed | Quality | Cost | Vision | Image Gen | Best For |
|---|---|---|---|---|---|---|---|
| GPT-4o | OpenAI | Fast | Excellent | Medium | ✅ | ❌ | General purpose, vision, balanced |
| GPT-4 | OpenAI | Medium | Excellent | High | ✅ | ❌ | Complex reasoning |
| GPT-3.5-turbo | OpenAI | Very Fast | Good | Low | ❌ | ❌ | Simple tasks, high volume |
| gpt-image-1 | OpenAI | Medium | Excellent | Per-image | ❌ | ✅ | Image generation and editing |
| Claude 3 Opus | Anthropic | Medium | Excellent | High | ✅ | ❌ | Nuanced analysis, long context |
| Claude 3 Sonnet | Anthropic | Fast | Very Good | Medium | ✅ | ❌ | Balanced performance |
| Claude 3 Haiku | Anthropic | Very Fast | Good | Low | ✅ | ❌ | Fast responses, simple tasks |
| Gemini 2.0 Flash | Very Fast | Good | Low | ✅ | ❌ | Fast multimodal tasks | |
| Gemini 2.5 Pro | Medium | Excellent | Medium | ✅ | ❌ | Complex reasoning, long context |
Choosing a Model
For Quality-Critical Tasks
- GPT-4o or Claude 3 Opus
- Complex analysis, nuanced responses
- Higher cost, worth it for important outputs
For High-Volume Tasks
- GPT-3.5-turbo or Claude 3 Haiku
- Simple classification, extraction
- Low cost, high throughput
For Long Documents
- Claude 3 models
- Support up to 200K tokens context
- Ideal for document analysis
For Image Tasks
- Vision (image inputs): GPT-4o, Claude 3 models, Gemini — analyze images alongside text prompts
- Image Generation (image outputs): gpt-image-1 — generate or edit images from text descriptions
- Image Editing: Use gpt-image-1 with both image inputs and outputs to edit existing images
Model Capabilities
Context Window
How much input each model can handle:| Model | Max Input Tokens |
|---|---|
| GPT-4o | 128,000 |
| GPT-4 | 128,000 |
| GPT-3.5-turbo | 16,385 |
| Claude 3 Opus | 200,000 |
| Claude 3 Sonnet | 200,000 |
| Claude 3 Haiku | 200,000 |
JSON Mode
Some models support native JSON output mode:| Model | JSON Mode |
|---|---|
| GPT-4o | ✅ Supported |
| GPT-4 | ✅ Supported |
| GPT-3.5-turbo | ✅ Supported |
| Claude 3 models | Via prompting |
JSON mode is enabled automatically when you define an output schema.
Model Selection in Prompts
When creating or editing a prompt:- Click the Model dropdown
- Select your preferred model
- The model is saved with the prompt
Cost Considerations
LLM costs are based on tokens:- Input tokens: Your prompt text (including rendered template)
- Output tokens: The model’s response
| Cost Factor | Impact |
|---|---|
| Model choice | Higher-end models cost more per token |
| Prompt length | Longer prompts = more input tokens |
| Response length | Higher max_tokens = potentially more output tokens |
| Request volume | More requests = more total cost |
Model-Specific Tips
- OpenAI
- Anthropic
- Google
GPT-4o is the recommended default:
- Best balance of speed, quality, and cost
- Multimodal capabilities (can process images)
- Reliable JSON output
- Use temperature 0-0.3 for deterministic tasks
- Enable JSON mode for structured output
- GPT-3.5-turbo is 10x cheaper for simple tasks
Adding Provider API Keys
Your tenant needs API keys configured for each provider you want to use:- Go to LLM API Keys in the sidebar (under Configuration)
- Add your OpenAI and/or Anthropic API keys
- Keys are encrypted and stored securely
Next Steps
Model Settings
Configure temperature, tokens, and more
API Authentication
Set up keys to call your endpoints

