Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.endprompt.ai/llms.txt

Use this file to discover all available pages before exploring further.

Endprompt supports multiple LLM providers, giving you flexibility to choose the best model for each use case.

Supported Providers

OpenAI

GPT-4o, GPT-3.5-turbo, gpt-image-1

Anthropic

Claude 3 Opus, Sonnet, Haiku

Google

Gemini 2.0 Flash, Gemini 2.5 Pro

Model Comparison

ModelProviderSpeedQualityCostVisionImage GenBest For
GPT-4oOpenAIFastExcellentMediumGeneral purpose, vision, balanced
GPT-4OpenAIMediumExcellentHighComplex reasoning
GPT-3.5-turboOpenAIVery FastGoodLowSimple tasks, high volume
gpt-image-1OpenAIMediumExcellentPer-imageImage generation and editing
Claude 3 OpusAnthropicMediumExcellentHighNuanced analysis, long context
Claude 3 SonnetAnthropicFastVery GoodMediumBalanced performance
Claude 3 HaikuAnthropicVery FastGoodLowFast responses, simple tasks
Gemini 2.0 FlashGoogleVery FastGoodLowFast multimodal tasks
Gemini 2.5 ProGoogleMediumExcellentMediumComplex reasoning, long context

Choosing a Model

For Quality-Critical Tasks

  • GPT-4o or Claude 3 Opus
  • Complex analysis, nuanced responses
  • Higher cost, worth it for important outputs

For High-Volume Tasks

  • GPT-3.5-turbo or Claude 3 Haiku
  • Simple classification, extraction
  • Low cost, high throughput

For Long Documents

  • Claude 3 models
  • Support up to 200K tokens context
  • Ideal for document analysis

For Image Tasks

  • Vision (image inputs): GPT-4o, Claude 3 models, Gemini — analyze images alongside text prompts
  • Image Generation (image outputs): gpt-image-1 — generate or edit images from text descriptions
  • Image Editing: Use gpt-image-1 with both image inputs and outputs to edit existing images

Model Capabilities

Context Window

How much input each model can handle:
ModelMax Input Tokens
GPT-4o128,000
GPT-4128,000
GPT-3.5-turbo16,385
Claude 3 Opus200,000
Claude 3 Sonnet200,000
Claude 3 Haiku200,000
For very long documents, Claude models offer the largest context windows.

JSON Mode

Some models support native JSON output mode:
ModelJSON Mode
GPT-4o✅ Supported
GPT-4✅ Supported
GPT-3.5-turbo✅ Supported
Claude 3 modelsVia prompting
JSON mode is enabled automatically when you define an output schema.

Model Selection in Prompts

When creating or editing a prompt:
  1. Click the Model dropdown
  2. Select your preferred model
  3. The model is saved with the prompt
Different prompts on the same endpoint can use different models—useful for A/B testing.

Cost Considerations

LLM costs are based on tokens:
  • Input tokens: Your prompt text (including rendered template)
  • Output tokens: The model’s response
Cost FactorImpact
Model choiceHigher-end models cost more per token
Prompt lengthLonger prompts = more input tokens
Response lengthHigher max_tokens = potentially more output tokens
Request volumeMore requests = more total cost
Start with a capable model (GPT-4o), then optimize to cheaper models for tasks where quality remains acceptable.

Model-Specific Tips

GPT-4o is the recommended default:
  • Best balance of speed, quality, and cost
  • Multimodal capabilities (can process images)
  • Reliable JSON output
Tips:
  • Use temperature 0-0.3 for deterministic tasks
  • Enable JSON mode for structured output
  • GPT-3.5-turbo is 10x cheaper for simple tasks

Adding Provider API Keys

Your tenant needs API keys configured for each provider you want to use:
  1. Go to LLM API Keys in the sidebar (under Configuration)
  2. Add your OpenAI and/or Anthropic API keys
  3. Keys are encrypted and stored securely
Without a valid API key for a provider, prompts using that provider’s models will fail.

Next Steps

Model Settings

Configure temperature, tokens, and more

API Authentication

Set up keys to call your endpoints