Skip to main content
Each prompt includes settings that control how the LLM generates responses. Understanding these settings helps you get better, more consistent outputs.

Core Settings

Temperature

Controls randomness in the model’s output.
ValueBehaviorUse Cases
0.0Most deterministicExtraction, classification, factual Q&A
0.1 - 0.3Consistent with minor variationSummarization, analysis
0.4 - 0.6BalancedGeneral tasks
0.7 - 0.9Creative variationWriting, brainstorming
1.0Maximum randomnessCreative exploration
For production APIs, use lower temperatures (0.1-0.3) to ensure consistent responses.

Max Tokens

Maximum number of tokens in the model’s response.
SettingTypical Use
100-300Short answers, classifications
500-1000Summaries, explanations
1000-2000Detailed analyses
2000+Long-form content, reports
Setting max tokens too low may truncate responses mid-sentence. Too high wastes capacity and cost.

System Prompt

An optional instruction that sets the AI’s behavior and persona:
You are a professional legal analyst with expertise in contract law.
Always cite specific clauses when making recommendations.
Never provide advice that constitutes legal representation.
System prompts are useful for:
  • Setting a consistent persona
  • Establishing behavioral guidelines
  • Defining constraints and limitations
System prompts are separate from your main template. They’re sent to the model as a “system” message when supported.

Advanced Settings

Top P (Nucleus Sampling)

Alternative to temperature for controlling randomness:
  • Top P = 1.0: Consider all tokens (default)
  • Top P = 0.9: Consider tokens in top 90% probability mass
  • Top P = 0.5: More focused, less diverse outputs
Use either temperature OR top_p, not both. Temperature is more intuitive for most users.

Frequency Penalty

Reduces repetition by penalizing tokens based on how often they’ve appeared:
ValueEffect
0.0No penalty (default)
0.5Moderate reduction in repetition
1.0+Strong avoidance of repeated phrases

Presence Penalty

Encourages the model to talk about new topics:
ValueEffect
0.0No penalty (default)
0.5Moderate encouragement of new topics
1.0+Strong push for novelty

Settings by Use Case

Classification / Extraction

Temperature: 0.0 - 0.2
Max Tokens: 100 - 500
You want consistent, predictable outputs.

Summarization

Temperature: 0.2 - 0.4
Max Tokens: 500 - 1500
Some flexibility in wording, but consistent structure.

Content Generation

Temperature: 0.5 - 0.8
Max Tokens: 1000 - 3000
Allow creativity while maintaining coherence.

Brainstorming / Ideation

Temperature: 0.8 - 1.0
Max Tokens: 1000+
Maximum creativity and diversity.

Model-Specific Defaults

Different models may have different optimal settings:
ModelRecommended TempNotes
GPT-4o0.3Very capable at low temps
GPT-40.3Excellent reasoning
GPT-3.5-turbo0.5May need higher temp for quality
Claude 3 Opus0.3Excellent at following instructions
Claude 3 Sonnet0.4Good balance
Claude 3 Haiku0.5May need higher temp

Setting Configuration

In the prompt editor:
  1. Open the Settings panel (usually on the right side)
  2. Adjust the sliders or enter values
  3. Settings are saved with the prompt
Different prompts on the same endpoint can have different settings—useful for A/B testing configurations.

Testing Settings

When experimenting with settings:
1

Start Conservative

Begin with temperature 0.3, reasonable max tokens.
2

Test Multiple Times

Run the same input 3-5 times to see variation.
3

Adjust Incrementally

Change one setting at a time to understand its effect.
4

Document What Works

Note optimal settings for each type of task.

Common Issues

Truncated Responses

Symptom: Response ends mid-sentence. Solution: Increase max tokens.

Too Much Variation

Symptom: Same input gives wildly different outputs. Solution: Lower temperature to 0.1-0.3.

Repetitive Output

Symptom: Model repeats phrases or ideas. Solution: Increase frequency penalty to 0.3-0.5.

Boring/Generic Output

Symptom: Responses feel template-like. Solution: Increase temperature to 0.6-0.8.

Best Practices

Classification needs low temp. Creative writing needs higher temp.
Always test settings with real-world inputs before going live.
Set max tokens to what you actually need, not the maximum possible.
Note why you chose specific settings in the prompt description.

Next Steps