Accessing Settings
Open any endpoint and click the Settings tab in the workspace.Caching
Caching stores LLM responses so identical requests return instantly without calling the LLM again.Enable Caching
| Setting | Description |
|---|---|
| Enable Cache | Toggle caching on/off for this endpoint |
| Cache Duration | How long to cache responses (in seconds) |
Caching is based on the complete request payload. Different inputs produce different cache keys.
When to Use Caching
Good for Caching
- Reference data lookups
- Static content generation
- Classification tasks
- Repeated queries
Avoid Caching
- User-specific content
- Time-sensitive data
- Random/creative outputs
- Conversational contexts
Cache Bypass
During testing, you may want fresh responses. The test runner includes a “Bypass Cache” option that forces a new LLM call. You can also bypass cache programmatically:Rate Limits
Protect your endpoints from abuse with rate limiting.Rate Limit Settings
| Setting | Description |
|---|---|
| Requests per Minute | Maximum requests allowed per minute |
| Requests per Hour | Maximum requests allowed per hour |
| Requests per Day | Maximum requests allowed per day |
Rate Limit Headers
Responses include rate limit information:429 Too Many Requests response:
Visibility
Control who can see and use this endpoint.| Setting | Description |
|---|---|
| Public | Visible to all team members |
| Internal | Only visible to you and admins |
Timeout Settings
Configure how long to wait for LLM responses.| Setting | Default | Description |
|---|---|---|
| Request Timeout | 60s | Maximum time to wait for LLM response |
Danger Zone
Delete Endpoint
Permanently removes the endpoint and all associated:- Prompts and versions
- Execution logs
- Cached responses
- Click Delete Endpoint
- Type the endpoint name to confirm
- Click Permanently Delete
Deleting an endpoint will break any applications calling it. Ensure nothing depends on the endpoint before deleting.
Configuration Best Practices
Start without caching
Start without caching
Enable caching only after you’ve stabilized your prompts and confirmed outputs are deterministic.
Set conservative rate limits
Set conservative rate limits
Start with lower limits and increase based on actual usage patterns.
Use Internal during development
Use Internal during development
Keep endpoints Internal while testing, then make Public when ready.
Document settings choices
Document settings choices
Use the endpoint description to note why certain settings were chosen.
Settings by Use Case
| Use Case | Cache | Rate Limit | Timeout |
|---|---|---|---|
| Search/lookup | Yes, 1 hour | 100/min | 30s |
| Content generation | No | 20/min | 60s |
| Document analysis | No | 10/min | 120s |
| Classification | Yes, 24 hours | 200/min | 30s |
| Chatbot | No | 30/min | 45s |

