Confidential inference
with cryptographic proof

Run sensitive AI inference where model weights and prompts stay protected, even if the host is compromised. Built on AWS Nitro Enclaves with attestation-gated key release and end-to-end encrypted sessions.

Request a Pilot Read Security Model View Source

Designed for government clouds, defense contractors, and regulated industries.

The Problem

Most "secure inference" still leaves gaps

Standard approaches protect data in transit but leave it exposed on the host. EphemeralML assumes the host is compromised and still keeps secrets protected.

Host exposure

The host can see plaintext prompts or decrypted weights at some point in the pipeline.

Transit is not protection

Encryption in transit doesn't prevent data exposure on compromised hosts.

No proof of execution

Compliance teams need proof of what code processed an inference, not just logs.

Circular trust

Key management is frequently enforced by the same environment you don't trust.

Overview

A confidential inference gateway

EphemeralML protects model weights, user inputs/outputs, and execution integrity through TEE isolation and attestation-bound cryptography.

Model Weights (IP)

Encrypted at rest, decrypted only inside an attested enclave. The host never sees plaintext keys.

User Data (PII / Classified)

Prompts and outputs are encrypted end-to-end via HPKE sessions. The host relays ciphertext only.

Execution Integrity

Each inference produces an Attested Execution Receipt (AER) with enclave measurements and cryptographic signature.

How It Works

Three steps to confidential inference

Verify the enclave

The client verifies enclave identity and code measurement against a policy allowlist via COSE/CBOR attestation.

Establish encrypted session

All requests and responses are encrypted via HPKE (X25519 + ChaCha20-Poly1305). The host forwards ciphertext only.

Load models with gated keys

Model decryption keys are released only when KMS confirms the enclave measurement matches policy. The host never sees plaintext keys.

Security Model

What we guarantee (and what we don't)

We guarantee

Host blindness: the host relays ciphertext only and cannot decrypt prompts, outputs, or model keys
Attestation-gated key release: model DEKs are released only to approved enclave measurements
Session binding: encryption keys are bound to attestation + nonce to prevent key swapping
Model integrity: Ed25519-signed manifests prevent serving a different model blob
Auditability: each inference produces a verifiable Attested Execution Receipt (AER)

We explicitly do not claim (v1)

Protection against all microarchitectural side-channels
Availability guarantees (the host can DoS)
Confidentiality under full enclave compromise
Multi-cloud or confidential GPU support

Architecture

Three-zone trust model

Client (Trusted)

Verifies attestation, holds policy allowlists, establishes HPKE sessions

Host (Untrusted)

Networking, storage, and AWS API proxy. Forwards ciphertext only.

Enclave (Trusted TEE)

Decrypts data, loads models, runs inference, signs execution receipts

Client ↔ Host Relay ↔ Enclave • Host ↔ KMS / S3

Use Cases

Who it's for

Protected Model Serving

Keep model weights encrypted at rest and decrypted only inside an attested enclave.

Sensitive Inference

PII and classified prompts/outputs remain encrypted end-to-end to the enclave.

Auditable AI

Attach a verifiable AER receipt to each inference for compliance and forensics.

Third-Party Inference

Offer "trust but verify" inference without exposing customer data or model keys.

Roadmap

Where we are

V1 — Gateway (Complete)

Attestation + HPKE E2E sessions
KMS attestation-gated key release
Attested Execution Receipts (AER)
Ed25519-signed model manifests
Policy hot-reload with version tracking
CI pipeline with cargo-audit
Stable crypto dependencies (no pre-release)
Hardened error handling (no unwraps on untrusted input)
MSRV policy (Rust 1.75+)

V2 — Shield Mode (Planned)

Leakage-resilient inference
Partial-compromise threat scenarios
Performance tuning and controls
Expanded deployment targets
Confidential GPU support
Multi-cloud TEE backends

FAQ

Common questions

Can the host read prompts or outputs?

No. The host relays ciphertext only; decryption occurs inside the enclave.

What stops the host from decrypting model keys?

KMS policy binds decrypt authorization to enclave measurements. Without valid attestation, decrypt is denied.

Is this a full defense against side-channels?

No. V1 documents residual side-channel risk; mitigations are limited and explicit.

What's the performance overhead?

Recent multi-model runs show 11.9-14.0% overhead (weighted mean 12.9%), ~0.028ms enclave-side per-inference crypto, and ~0.164ms end-to-end per-request crypto overhead on m6i.xlarge. Embedding quality remains near-identical to bare metal.

How do you verify benchmark artifacts before release?

Run ./scripts/final_release_gate.sh. It enforces --require-kms, audits every run with check_kms_integrity.sh, and outputs a final summary and manifest.

Do you support confidential GPU?

Not in v1. CPU inference via Candle is supported today. GPU TEE support is planned for v2.

Confidential inferencewith cryptographic proof