Skip to main content
Endprompt supports multiple LLM providers, giving you flexibility to choose the best model for each use case.

Supported Providers

OpenAI

GPT-4o, GPT-4, GPT-3.5-turbo

Anthropic

Claude 3 Opus, Sonnet, Haiku

Model Comparison

ModelProviderSpeedQualityCostBest For
GPT-4oOpenAIFastExcellentMediumGeneral purpose, balanced
GPT-4OpenAIMediumExcellentHighComplex reasoning
GPT-3.5-turboOpenAIVery FastGoodLowSimple tasks, high volume
Claude 3 OpusAnthropicMediumExcellentHighNuanced analysis, long context
Claude 3 SonnetAnthropicFastVery GoodMediumBalanced performance
Claude 3 HaikuAnthropicVery FastGoodLowFast responses, simple tasks

Choosing a Model

For Quality-Critical Tasks

  • GPT-4o or Claude 3 Opus
  • Complex analysis, nuanced responses
  • Higher cost, worth it for important outputs

For High-Volume Tasks

  • GPT-3.5-turbo or Claude 3 Haiku
  • Simple classification, extraction
  • Low cost, high throughput

For Long Documents

  • Claude 3 models
  • Support up to 200K tokens context
  • Ideal for document analysis

Model Capabilities

Context Window

How much input each model can handle:
ModelMax Input Tokens
GPT-4o128,000
GPT-4128,000
GPT-3.5-turbo16,385
Claude 3 Opus200,000
Claude 3 Sonnet200,000
Claude 3 Haiku200,000
For very long documents, Claude models offer the largest context windows.

JSON Mode

Some models support native JSON output mode:
ModelJSON Mode
GPT-4o✅ Supported
GPT-4✅ Supported
GPT-3.5-turbo✅ Supported
Claude 3 modelsVia prompting
JSON mode is enabled automatically when you define an output schema.

Model Selection in Prompts

When creating or editing a prompt:
  1. Click the Model dropdown
  2. Select your preferred model
  3. The model is saved with the prompt
Different prompts on the same endpoint can use different models—useful for A/B testing.

Cost Considerations

LLM costs are based on tokens:
  • Input tokens: Your prompt text (including rendered template)
  • Output tokens: The model’s response
Cost FactorImpact
Model choiceHigher-end models cost more per token
Prompt lengthLonger prompts = more input tokens
Response lengthHigher max_tokens = potentially more output tokens
Request volumeMore requests = more total cost
Start with a capable model (GPT-4o), then optimize to cheaper models for tasks where quality remains acceptable.

Model-Specific Tips

GPT-4o is the recommended default:
  • Best balance of speed, quality, and cost
  • Multimodal capabilities (can process images)
  • Reliable JSON output
Tips:
  • Use temperature 0-0.3 for deterministic tasks
  • Enable JSON mode for structured output
  • GPT-3.5-turbo is 10x cheaper for simple tasks

Adding Provider API Keys

Your tenant needs API keys configured for each provider you want to use:
  1. Go to LLM API Keys in the sidebar (under Configuration)
  2. Add your OpenAI and/or Anthropic API keys
  3. Keys are encrypted and stored securely
Without a valid API key for a provider, prompts using that provider’s models will fail.

Next Steps