Skip to main content
The Settings tab lets you configure advanced endpoint behaviors like caching, rate limits, and other operational settings.

Accessing Settings

Open any endpoint and click the Settings tab in the workspace.

Caching

Caching stores LLM responses so identical requests return instantly without calling the LLM again.

Enable Caching

SettingDescription
Enable CacheToggle caching on/off for this endpoint
Cache DurationHow long to cache responses (in seconds)
Caching is based on the complete request payload. Different inputs produce different cache keys.

When to Use Caching

Good for Caching

  • Reference data lookups
  • Static content generation
  • Classification tasks
  • Repeated queries

Avoid Caching

  • User-specific content
  • Time-sensitive data
  • Random/creative outputs
  • Conversational contexts

Cache Bypass

During testing, you may want fresh responses. The test runner includes a “Bypass Cache” option that forces a new LLM call. You can also bypass cache programmatically:
curl -X POST https://yourcompany.api.endprompt.ai/api/v1/summarize \
  -H "x-api-key: your-key" \
  -H "x-cache-bypass: true" \
  -d '{"text": "..."}'

Rate Limits

Protect your endpoints from abuse with rate limiting.

Rate Limit Settings

SettingDescription
Requests per MinuteMaximum requests allowed per minute
Requests per HourMaximum requests allowed per hour
Requests per DayMaximum requests allowed per day
Rate limits apply per API key. Different keys have independent limits.

Rate Limit Headers

Responses include rate limit information:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1699999999
When exceeded, requests receive a 429 Too Many Requests response:
{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Try again in 45 seconds.",
  "retry_after": 45
}

Visibility

Control who can see and use this endpoint.
SettingDescription
PublicVisible to all team members
InternalOnly visible to you and admins
Use Internal visibility for endpoints under development or for personal experiments.

Timeout Settings

Configure how long to wait for LLM responses.
SettingDefaultDescription
Request Timeout60sMaximum time to wait for LLM response
Long-running prompts (complex analysis, long documents) may need higher timeouts.

Danger Zone

Actions in the Danger Zone are permanent and cannot be undone.

Delete Endpoint

Permanently removes the endpoint and all associated:
  • Prompts and versions
  • Execution logs
  • Cached responses
To delete:
  1. Click Delete Endpoint
  2. Type the endpoint name to confirm
  3. Click Permanently Delete
Deleting an endpoint will break any applications calling it. Ensure nothing depends on the endpoint before deleting.

Configuration Best Practices

Enable caching only after you’ve stabilized your prompts and confirmed outputs are deterministic.
Start with lower limits and increase based on actual usage patterns.
Keep endpoints Internal while testing, then make Public when ready.
Use the endpoint description to note why certain settings were chosen.

Settings by Use Case

Use CaseCacheRate LimitTimeout
Search/lookupYes, 1 hour100/min30s
Content generationNo20/min60s
Document analysisNo10/min120s
ClassificationYes, 24 hours200/min30s
ChatbotNo30/min45s

Next Steps