LLM Models Overview

Endprompt supports multiple LLM providers, giving you flexibility to choose the best model for each use case.

Supported Providers

OpenAI

GPT-4o, GPT-4, GPT-3.5-turbo

Anthropic

Claude 3 Opus, Sonnet, Haiku

Model Comparison

Model	Provider	Speed	Quality	Cost	Best For
GPT-4o	OpenAI	Fast	Excellent	Medium	General purpose, balanced
GPT-4	OpenAI	Medium	Excellent	High	Complex reasoning
GPT-3.5-turbo	OpenAI	Very Fast	Good	Low	Simple tasks, high volume
Claude 3 Opus	Anthropic	Medium	Excellent	High	Nuanced analysis, long context
Claude 3 Sonnet	Anthropic	Fast	Very Good	Medium	Balanced performance
Claude 3 Haiku	Anthropic	Very Fast	Good	Low	Fast responses, simple tasks

Choosing a Model

For Quality-Critical Tasks

GPT-4o or Claude 3 Opus
Complex analysis, nuanced responses
Higher cost, worth it for important outputs

For High-Volume Tasks

GPT-3.5-turbo or Claude 3 Haiku
Simple classification, extraction
Low cost, high throughput

For Long Documents

Claude 3 models
Support up to 200K tokens context
Ideal for document analysis

Model Capabilities

Context Window

How much input each model can handle:

Model	Max Input Tokens
GPT-4o	128,000
GPT-4	128,000
GPT-3.5-turbo	16,385
Claude 3 Opus	200,000
Claude 3 Sonnet	200,000
Claude 3 Haiku	200,000

For very long documents, Claude models offer the largest context windows.

JSON Mode

Some models support native JSON output mode:

Model	JSON Mode
GPT-4o	✅ Supported
GPT-4	✅ Supported
GPT-3.5-turbo	✅ Supported
Claude 3 models	Via prompting

JSON mode is enabled automatically when you define an output schema.

Model Selection in Prompts

When creating or editing a prompt:

Click the Model dropdown
Select your preferred model
The model is saved with the prompt

Different prompts on the same endpoint can use different models—useful for A/B testing.

Cost Considerations

LLM costs are based on tokens:

Input tokens: Your prompt text (including rendered template)
Output tokens: The model’s response

Cost Factor	Impact
Model choice	Higher-end models cost more per token
Prompt length	Longer prompts = more input tokens
Response length	Higher max_tokens = potentially more output tokens
Request volume	More requests = more total cost

Start with a capable model (GPT-4o), then optimize to cheaper models for tasks where quality remains acceptable.

Model-Specific Tips

OpenAI
Anthropic

GPT-4o is the recommended default:

Best balance of speed, quality, and cost
Multimodal capabilities (can process images)
Reliable JSON output

Tips:

Use temperature 0-0.3 for deterministic tasks
Enable JSON mode for structured output
GPT-3.5-turbo is 10x cheaper for simple tasks

Adding Provider API Keys

Your tenant needs API keys configured for each provider you want to use:

Go to LLM API Keys in the sidebar (under Configuration)
Add your OpenAI and/or Anthropic API keys
Keys are encrypted and stored securely

Without a valid API key for a provider, prompts using that provider’s models will fail.

Getting Started

Endpoints

Prompts

LLM Models

LLM Models Overview

Supported Providers

OpenAI

Anthropic

Model Comparison

Choosing a Model

For Quality-Critical Tasks

For High-Volume Tasks

For Long Documents

Model Capabilities

Context Window

JSON Mode

Model Selection in Prompts

Cost Considerations

Model-Specific Tips

Adding Provider API Keys

Next Steps

Model Settings

API Authentication

Getting Started

Endpoints

Prompts

LLM Models

​Supported Providers

OpenAI

Anthropic

​Model Comparison

​Choosing a Model

​For Quality-Critical Tasks

​For High-Volume Tasks

​For Long Documents

​Model Capabilities

​Context Window

​JSON Mode

​Model Selection in Prompts

​Cost Considerations

​Model-Specific Tips

​Adding Provider API Keys

​Next Steps

Model Settings

API Authentication

Supported Providers

Model Comparison

Choosing a Model

For Quality-Critical Tasks

For High-Volume Tasks

For Long Documents

Model Capabilities

Context Window

JSON Mode

Model Selection in Prompts

Cost Considerations

Model-Specific Tips

Adding Provider API Keys

Next Steps