Model Settings

Each prompt includes settings that control how the LLM generates responses. Understanding these settings helps you get better, more consistent outputs.

Core Settings

Temperature

Controls randomness in the model’s output.

Value	Behavior	Use Cases
0.0	Most deterministic	Extraction, classification, factual Q&A
0.1 - 0.3	Consistent with minor variation	Summarization, analysis
0.4 - 0.6	Balanced	General tasks
0.7 - 0.9	Creative variation	Writing, brainstorming
1.0	Maximum randomness	Creative exploration

For production APIs, use lower temperatures (0.1-0.3) to ensure consistent responses.

Max Tokens

Maximum number of tokens in the model’s response.

Setting	Typical Use
100-300	Short answers, classifications
500-1000	Summaries, explanations
1000-2000	Detailed analyses
2000+	Long-form content, reports

Setting max tokens too low may truncate responses mid-sentence. Too high wastes capacity and cost.

System Prompt

An optional instruction that sets the AI’s behavior and persona:

You are a professional legal analyst with expertise in contract law.
Always cite specific clauses when making recommendations.
Never provide advice that constitutes legal representation.

System prompts are useful for:

Setting a consistent persona
Establishing behavioral guidelines
Defining constraints and limitations

System prompts are separate from your main template. They’re sent to the model as a “system” message when supported.

Advanced Settings

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness:

Top P = 1.0: Consider all tokens (default)
Top P = 0.9: Consider tokens in top 90% probability mass
Top P = 0.5: More focused, less diverse outputs

Use either temperature OR top_p, not both. Temperature is more intuitive for most users.

Frequency Penalty

Reduces repetition by penalizing tokens based on how often they’ve appeared:

Value	Effect
0.0	No penalty (default)
0.5	Moderate reduction in repetition
1.0+	Strong avoidance of repeated phrases

Presence Penalty

Encourages the model to talk about new topics:

Value	Effect
0.0	No penalty (default)
0.5	Moderate encouragement of new topics
1.0+	Strong push for novelty

Settings by Use Case

Classification / Extraction

Temperature: 0.0 - 0.2
Max Tokens: 100 - 500

You want consistent, predictable outputs.

Summarization

Temperature: 0.2 - 0.4
Max Tokens: 500 - 1500

Some flexibility in wording, but consistent structure.

Content Generation

Temperature: 0.5 - 0.8
Max Tokens: 1000 - 3000

Allow creativity while maintaining coherence.

Brainstorming / Ideation

Temperature: 0.8 - 1.0
Max Tokens: 1000+

Maximum creativity and diversity.

Model-Specific Defaults

Different models may have different optimal settings:

Model	Recommended Temp	Notes
GPT-4o	0.3	Very capable at low temps
GPT-4	0.3	Excellent reasoning
GPT-3.5-turbo	0.5	May need higher temp for quality
Claude 3 Opus	0.3	Excellent at following instructions
Claude 3 Sonnet	0.4	Good balance
Claude 3 Haiku	0.5	May need higher temp

Setting Configuration

In the prompt editor:

Open the Settings panel (usually on the right side)
Adjust the sliders or enter values
Settings are saved with the prompt

Different prompts on the same endpoint can have different settings—useful for A/B testing configurations.

Testing Settings

When experimenting with settings:

Start Conservative

Begin with temperature 0.3, reasonable max tokens.

Test Multiple Times

Run the same input 3-5 times to see variation.

Adjust Incrementally

Change one setting at a time to understand its effect.

Document What Works

Note optimal settings for each type of task.

Common Issues

Truncated Responses

Symptom: Response ends mid-sentence. Solution: Increase max tokens.

Too Much Variation

Symptom: Same input gives wildly different outputs. Solution: Lower temperature to 0.1-0.3.

Repetitive Output

Symptom: Model repeats phrases or ideas. Solution: Increase frequency penalty to 0.3-0.5.

Boring/Generic Output

Symptom: Responses feel template-like. Solution: Increase temperature to 0.6-0.8.

Best Practices

Match settings to task

Classification needs low temp. Creative writing needs higher temp.

Test before production

Always test settings with real-world inputs before going live.

Be conservative with max tokens

Set max tokens to what you actually need, not the maximum possible.

Document your choices

Note why you chose specific settings in the prompt description.

Next Steps

API Authentication

Set up keys to call your endpoints

Making Requests

Learn how to call your endpoints

Getting Started

Endpoints

Prompts

LLM Models

Core Settings

Temperature

Max Tokens

System Prompt

Advanced Settings

Top P (Nucleus Sampling)

Frequency Penalty

Presence Penalty

Settings by Use Case

Classification / Extraction

Summarization

Content Generation

Brainstorming / Ideation

Model-Specific Defaults

Setting Configuration

Testing Settings

Common Issues

Truncated Responses

Too Much Variation

Repetitive Output

Boring/Generic Output

Best Practices

Next Steps

API Authentication

Making Requests

Getting Started

Endpoints

Prompts

LLM Models

​Core Settings

​Temperature

​Max Tokens

​System Prompt

​Advanced Settings

​Top P (Nucleus Sampling)

​Frequency Penalty

​Presence Penalty

​Settings by Use Case

​Classification / Extraction

​Summarization

​Content Generation

​Brainstorming / Ideation

​Model-Specific Defaults

​Setting Configuration

​Testing Settings

​Common Issues

​Truncated Responses

​Too Much Variation

​Repetitive Output

​Boring/Generic Output

​Best Practices

​Next Steps

API Authentication

Making Requests

Core Settings

Temperature

Max Tokens

System Prompt

Advanced Settings

Top P (Nucleus Sampling)

Frequency Penalty

Presence Penalty

Settings by Use Case

Classification / Extraction

Summarization

Content Generation

Brainstorming / Ideation

Model-Specific Defaults

Setting Configuration

Testing Settings

Common Issues

Truncated Responses

Too Much Variation

Repetitive Output

Boring/Generic Output

Best Practices

Next Steps