Back to Dashboard | Shared Inference Manual
Shared Inference Tier

Shared Inference

User Manual

Complete guide for the Shared Inference tier. Multi-tenant shared GPU infrastructure with pay-per-use pricing at $3.00 per 1M tokens and no base fee.

Getting Started

1. Create Your Account

Visit solacesentry.com/signup to create your account. You will need a valid email address and a password that meets our security requirements.

After signing up, you will be redirected to the pricing page to select your plan and safety domains.

2. Choose the Shared Inference Plan

Select "Shared Inference" from the pricing page. Choose one or more of the 25 safety domains relevant to your use case. You can add or remove domains at any time from your dashboard.

3. Get Your API Key

After subscribing, navigate to your dashboard. Your API key will be available under the API Keys section. API keys follow the format:

sk_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Important: Keep your API key secret. Do not share it in public repositories, client-side code, or logs. If compromised, rotate it immediately from your dashboard.

API Authentication

All API requests must include your API key in the Authorization header using the Bearer token scheme.

Authorization: Bearer sk_live_your_key_here

Key Prefixes

sk_live_ Production key. Use in your production environment. All requests are billed.
sk_test_ Test key. Use for development and testing. Requests are free but rate-limited.
sk_dev_ Development key. Local development with mock responses available.

Submitting Observations

Observations are the data points you submit for violation detection. Each observation contains a payload with domain-specific data that the inference engine processes.

Using curl

curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/observations \
  -H "Authorization: Bearer sk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "payload": {
      "temperature": "39.5",
      "heart_rate": "120",
      "blood_pressure_systolic": "85",
      "domain": "clinical"
    }
  }'

Using Python (requests)

import requests

url = "https://api.solacesentry.com/v1/projects/{project_id}/observations"
headers = {
    "Authorization": "Bearer sk_live_your_key_here",
    "Content-Type": "application/json"
}
payload = {
    "payload": {
        "temperature": "39.5",
        "heart_rate": "120",
        "blood_pressure_systolic": "85",
        "domain": "clinical"
    }
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Observation Response

{
  "observation_id": "obs_a1b2c3d4e5f6",
  "project_id": "proj_your_project",
  "status": "accepted",
  "created_at": "2026-02-11T10:30:00Z"
}

Running Inference

After submitting one or more observations, call the inference endpoint to run violation detection. The engine analyzes accumulated evidence and returns a classification with a human-readable narrative.

Using curl

curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/infer \
  -H "Authorization: Bearer sk_live_your_key_here" \
  -H "Content-Type: application/json"

Using Python (requests)

import requests

url = "https://api.solacesentry.com/v1/projects/{project_id}/infer"
headers = {
    "Authorization": "Bearer sk_live_your_key_here",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers)
result = response.json()

print("Classification:", result["classification"])
print("Narrative:", result["narrative"])

Inference Response Format

{
  "classification": "veto",
  "narrative": "Patient vitals indicate critical hypotension (systolic 85mmHg) combined with tachycardia (120bpm) and fever (39.5C). This pattern is consistent with septic shock. Immediate clinical intervention is required.",
  "decision_trace": {
    "sparse_gate": "passed",
    "evidence_state": { ... },
    "judge_verdicts": [
      { "judge": "safety", "verdict": "veto", "confidence": 0.97 },
      { "judge": "policy", "verdict": "veto", "confidence": 0.94 },
      { "judge": "consistency", "verdict": "concern", "confidence": 0.82 },
      { "judge": "viability", "verdict": "approve", "confidence": 0.88 }
    ],
    "tribunal_outcome": "veto"
  },
  "evidence_state": {
    "current_weight": 0.87,
    "observation_count": 3
  }
}

Understanding Results

Classification Levels

VETO Critical violation detected

A hard safety violation has been identified. The system has determined that proceeding would pose unacceptable risk. Immediate human review is required. In safety-critical domains (healthcare, autonomous), this means stop and escalate.

CONCERN Potential issue identified

The system has identified patterns that warrant attention but do not constitute an immediate safety violation. Review the narrative and evidence to determine if intervention is needed. Additional observations may clarify the situation.

APPROVE No violations detected

The submitted observations fall within expected parameters. No safety violations or concerns have been identified based on current evidence. Continue normal operations.

Narrative

Every inference response includes a human-readable narrative explaining the decision. Narratives are:

  • Grounded in evidence -- every claim in the narrative maps to an observed data point (INV-8)
  • Limited to 2 generation attempts -- if narrative generation fails twice, a fallback summary is used (INV-6)
  • Deterministic -- the same evidence always produces the same classification

Decision Trace

The decision_trace field provides full explainability into how the decision was reached. SolaceSentry never operates as a black box. The trace includes which judges voted and how, the tribunal consensus mechanism, and the evidence that informed each verdict.

Evidence & Expectations

Evidence State

Evidence accumulates over time as you submit observations. A core invariant of SolaceSentry is that evidence never decays (INV-2). Once an observation contributes evidence, that evidence weight can only increase.

Get Current Evidence

curl -X GET https://api.solacesentry.com/v1/projects/{project_id}/evidence \
  -H "Authorization: Bearer sk_live_your_key_here"

Setting Expectations

Expectations define the bounds you expect your data to stay within. When observations violate expectations, this contributes stronger evidence to violation detection.

Set Expectations

curl -X POST https://api.solacesentry.com/v1/projects/{project_id}/expectations \
  -H "Authorization: Bearer sk_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "expectations": [
      {
        "field": "temperature",
        "min": "36.0",
        "max": "38.5",
        "unit": "celsius"
      },
      {
        "field": "heart_rate",
        "min": "60",
        "max": "100",
        "unit": "bpm"
      }
    ]
  }'

Python SDK

The official Python SDK provides an async-first interface to the SolaceSentry API. Install it with pip:

pip install solace-sentry

Complete Example

import asyncio
from solace_sentry.sdk import SolaceSentryClient

async def main():
    client = SolaceSentryClient(
        api_key="sk_live_your_key_here",
        base_url="https://api.solacesentry.com"
    )

    # Submit an observation
    obs = await client.observations.create(
        project_id="proj_abc123",
        payload={
            "temperature": "39.5",
            "heart_rate": "120",
            "domain": "clinical"
        }
    )
    print(f"Observation: {obs.observation_id}")

    # Run inference
    result = await client.inference.create(project_id="proj_abc123")
    print(f"Classification: {result.classification}")
    print(f"Narrative: {result.narrative}")

    # Access decision trace for full explainability
    for verdict in result.decision_trace.judge_verdicts:
        print(f"  {verdict.judge}: {verdict.verdict} ({verdict.confidence:.2f})")

    # Get current evidence state
    evidence = await client.evidence.get(project_id="proj_abc123")
    print(f"Evidence weight: {evidence.current_weight}")

asyncio.run(main())

Using the Interpreter

The Interpreter (Clinical Reasoning Workbench) is a natural language interface available in your customer dashboard under each entitlement. It supports all 25 safety domains and provides an intuitive way to explore your data without writing code.

Accessing the Interpreter

  1. Log in to your dashboard at solacesentry.com/client/dashboard
  2. Navigate to Entitlements
  3. Click the Interpreter button on any active entitlement
  4. Type your question in natural language

Supported Query Intents

1.

assess_risk

"What is the current risk level?"

2.

explain_decision

"Why was this vetoed?"

3.

compare_scenarios

"Compare outcomes A and B"

4.

list_violations

"Show all detected violations"

5.

show_evidence

"What evidence has been collected?"

6.

trace_decision

"Trace how this decision was made"

7.

suggest_action

"What should I do next?"

8.

summarize_state

"Give me a summary"

9.

query_history

"Show recent observations"

10.

check_compliance

"Are we compliant with policy X?"

11.

forecast_trend

"What trends do you see?"

12.

validate_data

"Is this data valid?"

Safety Domains

SolaceSentry supports 25 safety domains across critical industries. You select your domains during subscription and can modify them from your dashboard at any time.

Healthcare

healthcare_ops

clinical

pharma

lab

Financial

revenue

financial

insurance

claims

fraud

Legal & Regulatory

legal

regulatory

government

Cyber & Security

cybersec

threat

incident

ai_governance

Industrial

manufacturing

supply_chain

energy

infrastructure

Transport & People

aviation

autonomous

safety_eng

hr

Hard Invariants

SolaceSentry enforces 8 hard invariants that can never be violated regardless of configuration, input data, or system state. These invariants are the foundation of the system's safety guarantees.

1

Sparse Gate

Fast-path bypass for trivial observations. Non-critical data is filtered early to reduce latency.

2

No-Decay Evidence

Evidence weights never decrease. Once observed, evidence can only accumulate.

3

Lazy Staleness

Stale evidence is detected lazily at read time rather than actively expired.

4

Fast Gate Before Planning

Planning is only invoked if necessary. Simple cases are resolved without the planner.

5

Planning Gated

Crisis check always runs before any planning operation to ensure immediate threats are handled first.

6

Max 2 Narrative Attempts

Narrative generation is limited to 2 attempts. If both fail, a deterministic fallback is used.

7

Record Immutability

Records cannot be modified after creation. This ensures auditability and prevents tampering.

8

Narrative Reads Record Only

Narratives are always grounded in recorded evidence. No unsubstantiated claims.

Rate Limits

The Shared Inference tier has standard rate limits to ensure fair usage across all tenants on the shared GPU infrastructure.

Endpoint Rate Limit Burst
Observations 60 requests/min 10
Inference 30 requests/min 5
Evidence / Expectations 120 requests/min 20
Health Check 600 requests/min 100

When rate limited, you will receive a 429 Too Many Requests response with a Retry-After header indicating how many seconds to wait before retrying.

Billing & Usage

Pay-Per-Use Pricing

The Shared Inference tier uses straightforward pay-per-use pricing with no base fee and no minimum commitment.

$3.00 / 1M tokens

No base fee. No minimum commitment.

Usage Dashboard

Track your token usage in real time from your dashboard. You can:

  • View current billing period usage
  • Download usage reports (CSV)
  • View historical invoices
  • Set usage alerts
  • Export audit logs

What Counts as a Token?

Tokens are the units processed by SolaceSentry's custom BPE tokenizer, optimized for safety-domain vocabulary. Both input (observation payloads) and output (inference results) tokens are counted. The tokenizer is deterministic -- the same input always produces the same token count.

API Reference

Base URL: https://api.solacesentry.com

Method Endpoint Description
POST /v1/projects/{project_id}/observations Submit an observation
POST /v1/projects/{project_id}/infer Run violation inference
GET /v1/projects/{project_id}/evidence Get current evidence state
GET /v1/projects/{project_id}/expectations Get expectations
POST /v1/projects/{project_id}/expectations Set expectations
GET /v1/health Health check

Support

Email Support

For questions, issues, or feedback, contact our support team:

support@solacesentry.com

Standard response time: within 24 business hours.

Need More Support?

The Dedicated Domain tier includes priority Slack + email support, and the Enterprise Security tier includes a dedicated support engineer. Consider upgrading if you need faster response times or hands-on assistance.

FAQ

Can I switch to Dedicated Domain or Enterprise later?

Yes. You can upgrade at any time from your billing page. Your data and configuration will be migrated to your new dedicated infrastructure.

Is my data shared with other tenants?

No. While the Shared Inference tier uses shared GPU infrastructure for cost efficiency, your data is logically isolated. Each tenant has separate projects, evidence stores, and access controls. For physical isolation, consider the Enterprise Security tier.

What happens if I exceed rate limits?

You will receive a 429 response with a Retry-After header. Implement exponential backoff in your client. The Python SDK handles this automatically.

How is the inference classification determined?

SolaceSentry uses a multi-judge tribunal system. Four specialized judge transformers (safety, policy, consistency, viability) independently assess the evidence. The tribunal then reaches a consensus. If any judge vetoes, the overall result is a veto. Full transparency is provided via the decision_trace field.

Can I use SolaceSentry for HIPAA-regulated data?

The Shared Inference tier is not designed for HIPAA-regulated data. For HIPAA compliance, isolated infrastructure, and BAA, please use the Enterprise Security tier.

What is the uptime guarantee?

The Shared Inference tier operates on a best-effort basis with target availability. For SLA-backed uptime guarantees (99.9%), the Enterprise Security tier is recommended.

How do I rotate my API key?

Navigate to your dashboard, go to API Keys, and click "Rotate Key." Your old key will be invalidated immediately and a new key will be generated. Update your applications with the new key.