Shared Inference
User Manual
Complete guide for the Shared Inference tier. Multi-tenant shared GPU infrastructure with pay-per-use pricing at $3.00 per 1M tokens and no base fee.
Getting Started
1. Create Your Account
Visit solacesentry.com/signup to create your account. You will need a valid email address and a password that meets our security requirements.
After signing up, you will be redirected to the pricing page to select your plan and safety domains.
2. Choose the Shared Inference Plan
Select "Shared Inference" from the pricing page. Choose one or more of the 25 safety domains relevant to your use case. You can add or remove domains at any time from your dashboard.
3. Get Your API Key
After subscribing, navigate to your dashboard. Your API key will be available under the API Keys section. API keys follow the format:
sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Important: Keep your API key secret. Do not share it in public repositories, client-side code, or logs. If compromised, rotate it immediately from your dashboard.
API Authentication
All API requests must include your API key in the Authorization header
using the Bearer token scheme.
Authorization: Bearer sk_live_your_key_here
Key Prefixes
sk_live_
Production key. Use in your production environment. All requests are billed.
sk_test_
Test key. Use for development and testing. Requests are free but rate-limited.
sk_dev_
Development key. Local development with mock responses available.
Submitting Observations
Observations are the data points you submit for violation detection. Each observation contains a payload with domain-specific data that the inference engine processes.
Using curl
curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/observations \
-H "Authorization: Bearer sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"payload": {
"temperature": "39.5",
"heart_rate": "120",
"blood_pressure_systolic": "85",
"domain": "clinical"
}
}'
Using Python (requests)
import requests
url = "https://api.solacesentry.com/v1/projects/{project_id}/observations"
headers = {
"Authorization": "Bearer sk_live_your_key_here",
"Content-Type": "application/json"
}
payload = {
"payload": {
"temperature": "39.5",
"heart_rate": "120",
"blood_pressure_systolic": "85",
"domain": "clinical"
}
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
Observation Response
{
"observation_id": "obs_a1b2c3d4e5f6",
"project_id": "proj_your_project",
"status": "accepted",
"created_at": "2026-02-11T10:30:00Z"
}
Running Inference
After submitting one or more observations, call the inference endpoint to run violation detection. The engine analyzes accumulated evidence and returns a classification with a human-readable narrative.
Using curl
curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/infer \
-H "Authorization: Bearer sk_live_your_key_here" \
-H "Content-Type: application/json"
Using Python (requests)
import requests
url = "https://api.solacesentry.com/v1/projects/{project_id}/infer"
headers = {
"Authorization": "Bearer sk_live_your_key_here",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers)
result = response.json()
print("Classification:", result["classification"])
print("Narrative:", result["narrative"])
Inference Response Format
{
"classification": "veto",
"narrative": "Patient vitals indicate critical hypotension (systolic 85mmHg) combined with tachycardia (120bpm) and fever (39.5C). This pattern is consistent with septic shock. Immediate clinical intervention is required.",
"decision_trace": {
"sparse_gate": "passed",
"evidence_state": { ... },
"judge_verdicts": [
{ "judge": "safety", "verdict": "veto", "confidence": 0.97 },
{ "judge": "policy", "verdict": "veto", "confidence": 0.94 },
{ "judge": "consistency", "verdict": "concern", "confidence": 0.82 },
{ "judge": "viability", "verdict": "approve", "confidence": 0.88 }
],
"tribunal_outcome": "veto"
},
"evidence_state": {
"current_weight": 0.87,
"observation_count": 3
}
}
Understanding Results
Classification Levels
A hard safety violation has been identified. The system has determined that proceeding would pose unacceptable risk. Immediate human review is required. In safety-critical domains (healthcare, autonomous), this means stop and escalate.
The system has identified patterns that warrant attention but do not constitute an immediate safety violation. Review the narrative and evidence to determine if intervention is needed. Additional observations may clarify the situation.
The submitted observations fall within expected parameters. No safety violations or concerns have been identified based on current evidence. Continue normal operations.
Narrative
Every inference response includes a human-readable narrative explaining the decision. Narratives are:
- Grounded in evidence -- every claim in the narrative maps to an observed data point (INV-8)
- Limited to 2 generation attempts -- if narrative generation fails twice, a fallback summary is used (INV-6)
- Deterministic -- the same evidence always produces the same classification
Decision Trace
The decision_trace field provides
full explainability into how the decision was reached. SolaceSentry never operates as a black box.
The trace includes which judges voted and how, the tribunal consensus mechanism, and the evidence that informed each verdict.
Evidence & Expectations
Evidence State
Evidence accumulates over time as you submit observations. A core invariant of SolaceSentry is that evidence never decays (INV-2). Once an observation contributes evidence, that evidence weight can only increase.
Get Current Evidence
curl -X GET https://api.solacesentry.com/v1/projects/{project_id}/evidence \
-H "Authorization: Bearer sk_live_your_key_here"
Setting Expectations
Expectations define the bounds you expect your data to stay within. When observations violate expectations, this contributes stronger evidence to violation detection.
Set Expectations
curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/expectations \
-H "Authorization: Bearer sk_live_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"expectations": [
{
"field": "temperature",
"min": "36.0",
"max": "38.5",
"unit": "celsius"
},
{
"field": "heart_rate",
"min": "60",
"max": "100",
"unit": "bpm"
}
]
}'
Python SDK
The official Python SDK provides an async-first interface to the SolaceSentry API. Install it with pip:
pip install solace-sentry
Complete Example
import asyncio
from solace_sentry.sdk import SolaceSentryClient
async def main():
client = SolaceSentryClient(
api_key="sk_live_your_key_here",
base_url="https://api.solacesentry.com"
)
# Submit an observation
obs = await client.observations.create(
project_id="proj_abc123",
payload={
"temperature": "39.5",
"heart_rate": "120",
"domain": "clinical"
}
)
print(f"Observation: {obs.observation_id}")
# Run inference
result = await client.inference.create(project_id="proj_abc123")
print(f"Classification: {result.classification}")
print(f"Narrative: {result.narrative}")
# Access decision trace for full explainability
for verdict in result.decision_trace.judge_verdicts:
print(f" {verdict.judge}: {verdict.verdict} ({verdict.confidence:.2f})")
# Get current evidence state
evidence = await client.evidence.get(project_id="proj_abc123")
print(f"Evidence weight: {evidence.current_weight}")
asyncio.run(main())
Using the Interpreter
The Interpreter (Clinical Reasoning Workbench) is a natural language interface available in your customer dashboard under each entitlement. It supports all 25 safety domains and provides an intuitive way to explore your data without writing code.
Accessing the Interpreter
- Log in to your dashboard at solacesentry.com/client/dashboard
- Navigate to Entitlements
- Click the Interpreter button on any active entitlement
- Type your question in natural language
Supported Query Intents
assess_risk
"What is the current risk level?"
explain_decision
"Why was this vetoed?"
compare_scenarios
"Compare outcomes A and B"
list_violations
"Show all detected violations"
show_evidence
"What evidence has been collected?"
trace_decision
"Trace how this decision was made"
suggest_action
"What should I do next?"
summarize_state
"Give me a summary"
query_history
"Show recent observations"
check_compliance
"Are we compliant with policy X?"
forecast_trend
"What trends do you see?"
validate_data
"Is this data valid?"
Safety Domains
SolaceSentry supports 25 safety domains across critical industries. You select your domains during subscription and can modify them from your dashboard at any time.
Healthcare
healthcare_ops
clinical
pharma
lab
Financial
revenue
financial
insurance
claims
fraud
Legal & Regulatory
legal
regulatory
government
Cyber & Security
cybersec
threat
incident
ai_governance
Industrial
manufacturing
supply_chain
energy
infrastructure
Transport & People
aviation
autonomous
safety_eng
hr
Hard Invariants
SolaceSentry enforces 8 hard invariants that can never be violated regardless of configuration, input data, or system state. These invariants are the foundation of the system's safety guarantees.
Sparse Gate
Fast-path bypass for trivial observations. Non-critical data is filtered early to reduce latency.
No-Decay Evidence
Evidence weights never decrease. Once observed, evidence can only accumulate.
Lazy Staleness
Stale evidence is detected lazily at read time rather than actively expired.
Fast Gate Before Planning
Planning is only invoked if necessary. Simple cases are resolved without the planner.
Planning Gated
Crisis check always runs before any planning operation to ensure immediate threats are handled first.
Max 2 Narrative Attempts
Narrative generation is limited to 2 attempts. If both fail, a deterministic fallback is used.
Record Immutability
Records cannot be modified after creation. This ensures auditability and prevents tampering.
Narrative Reads Record Only
Narratives are always grounded in recorded evidence. No unsubstantiated claims.
Rate Limits
The Shared Inference tier has standard rate limits to ensure fair usage across all tenants on the shared GPU infrastructure.
| Endpoint | Rate Limit | Burst |
|---|---|---|
| Observations | 60 requests/min | 10 |
| Inference | 30 requests/min | 5 |
| Evidence / Expectations | 120 requests/min | 20 |
| Health Check | 600 requests/min | 100 |
When rate limited, you will receive a 429 Too Many Requests
response with a Retry-After header indicating
how many seconds to wait before retrying.
Billing & Usage
Pay-Per-Use Pricing
The Shared Inference tier uses straightforward pay-per-use pricing with no base fee and no minimum commitment.
$3.00 / 1M tokens
No base fee. No minimum commitment.
Usage Dashboard
Track your token usage in real time from your dashboard. You can:
- View current billing period usage
- Download usage reports (CSV)
- View historical invoices
- Set usage alerts
- Export audit logs
What Counts as a Token?
Tokens are the units processed by SolaceSentry's custom BPE tokenizer, optimized for safety-domain vocabulary. Both input (observation payloads) and output (inference results) tokens are counted. The tokenizer is deterministic -- the same input always produces the same token count.
API Reference
Base URL: https://api.solacesentry.com
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/projects/{project_id}/observations | Submit an observation |
| POST | /v1/projects/{project_id}/infer | Run violation inference |
| GET | /v1/projects/{project_id}/evidence | Get current evidence state |
| GET | /v1/projects/{project_id}/expectations | Get expectations |
| POST | /v1/projects/{project_id}/expectations | Set expectations |
| GET | /v1/health | Health check |
Support
Email Support
For questions, issues, or feedback, contact our support team:
support@solacesentry.comStandard response time: within 24 business hours.
Need More Support?
The Dedicated Domain tier includes priority Slack + email support, and the Enterprise Security tier includes a dedicated support engineer. Consider upgrading if you need faster response times or hands-on assistance.
FAQ
Can I switch to Dedicated Domain or Enterprise later?
Yes. You can upgrade at any time from your billing page. Your data and configuration will be migrated to your new dedicated infrastructure.
Is my data shared with other tenants?
No. While the Shared Inference tier uses shared GPU infrastructure for cost efficiency, your data is logically isolated. Each tenant has separate projects, evidence stores, and access controls. For physical isolation, consider the Enterprise Security tier.
What happens if I exceed rate limits?
You will receive a 429 response with a Retry-After header. Implement exponential backoff in your client. The Python SDK handles this automatically.
How is the inference classification determined?
SolaceSentry uses a multi-judge tribunal system. Four specialized judge transformers (safety, policy, consistency, viability) independently assess the evidence. The tribunal then reaches a consensus. If any judge vetoes, the overall result is a veto. Full transparency is provided via the decision_trace field.
Can I use SolaceSentry for HIPAA-regulated data?
The Shared Inference tier is not designed for HIPAA-regulated data. For HIPAA compliance, isolated infrastructure, and BAA, please use the Enterprise Security tier.
What is the uptime guarantee?
The Shared Inference tier operates on a best-effort basis with target availability. For SLA-backed uptime guarantees (99.9%), the Enterprise Security tier is recommended.
How do I rotate my API key?
Navigate to your dashboard, go to API Keys, and click "Rotate Key." Your old key will be invalidated immediately and a new key will be generated. Update your applications with the new key.