Confidential inference
with cryptographic proof

Run sensitive AI inference where model weights and prompts stay protected, even if the host is compromised. Built on AWS Nitro Enclaves with attestation-gated key release and end-to-end encrypted sessions.

Designed for government clouds, defense contractors, and regulated industries.

13k+
Lines of Rust
110
Tests Passing
12.9%
Enclave Overhead
0
RC Dependencies
4
CI Gates

Most "secure inference" still leaves gaps

Standard approaches protect data in transit but leave it exposed on the host. EphemeralML assumes the host is compromised and still keeps secrets protected.

!

Host exposure

The host can see plaintext prompts or decrypted weights at some point in the pipeline.

!

Transit is not protection

Encryption in transit doesn't prevent data exposure on compromised hosts.

!

No proof of execution

Compliance teams need proof of what code processed an inference, not just logs.

!

Circular trust

Key management is frequently enforced by the same environment you don't trust.

A confidential inference gateway

EphemeralML protects model weights, user inputs/outputs, and execution integrity through TEE isolation and attestation-bound cryptography.

W

Model Weights (IP)

Encrypted at rest, decrypted only inside an attested enclave. The host never sees plaintext keys.

D

User Data (PII / Classified)

Prompts and outputs are encrypted end-to-end via HPKE sessions. The host relays ciphertext only.

V

Execution Integrity

Each inference produces an Attested Execution Receipt (AER) with enclave measurements and cryptographic signature.

Three steps to confidential inference

1

Verify the enclave

The client verifies enclave identity and code measurement against a policy allowlist via COSE/CBOR attestation.

2

Establish encrypted session

All requests and responses are encrypted via HPKE (X25519 + ChaCha20-Poly1305). The host forwards ciphertext only.

3

Load models with gated keys

Model decryption keys are released only when KMS confirms the enclave measurement matches policy. The host never sees plaintext keys.

What we guarantee (and what we don't)

We guarantee

  • Host blindness: the host relays ciphertext only and cannot decrypt prompts, outputs, or model keys
  • Attestation-gated key release: model DEKs are released only to approved enclave measurements
  • Session binding: encryption keys are bound to attestation + nonce to prevent key swapping
  • Model integrity: Ed25519-signed manifests prevent serving a different model blob
  • Auditability: each inference produces a verifiable Attested Execution Receipt (AER)

We explicitly do not claim (v1)

  • Protection against all microarchitectural side-channels
  • Availability guarantees (the host can DoS)
  • Confidentiality under full enclave compromise
  • Multi-cloud or confidential GPU support

Three-zone trust model

Client (Trusted)

Verifies attestation, holds policy allowlists, establishes HPKE sessions

Host (Untrusted)

Networking, storage, and AWS API proxy. Forwards ciphertext only.

Enclave (Trusted TEE)

Decrypts data, loads models, runs inference, signs execution receipts

Client ↔ Host Relay ↔ Enclave  •  Host ↔ KMS / S3

Built with

Rust 2021
AWS Nitro Enclaves
HPKE (X25519)
ChaCha20-Poly1305
Ed25519
COSE / CBOR
Candle ML
KMS Attestation
KMS Release Gate
cargo-audit CI

Who it's for

Protected Model Serving

Keep model weights encrypted at rest and decrypted only inside an attested enclave.

Sensitive Inference

PII and classified prompts/outputs remain encrypted end-to-end to the enclave.

Auditable AI

Attach a verifiable AER receipt to each inference for compliance and forensics.

Third-Party Inference

Offer "trust but verify" inference without exposing customer data or model keys.

Where we are

V1 — Gateway (Complete)

  • Attestation + HPKE E2E sessions
  • KMS attestation-gated key release
  • Attested Execution Receipts (AER)
  • Ed25519-signed model manifests
  • Policy hot-reload with version tracking
  • CI pipeline with cargo-audit
  • Stable crypto dependencies (no pre-release)
  • Hardened error handling (no unwraps on untrusted input)
  • MSRV policy (Rust 1.75+)

Common questions

Can the host read prompts or outputs?

No. The host relays ciphertext only; decryption occurs inside the enclave.

What stops the host from decrypting model keys?

KMS policy binds decrypt authorization to enclave measurements. Without valid attestation, decrypt is denied.

Is this a full defense against side-channels?

No. V1 documents residual side-channel risk; mitigations are limited and explicit.

What's the performance overhead?

Recent multi-model runs show 11.9-14.0% overhead (weighted mean 12.9%), ~0.028ms enclave-side per-inference crypto, and ~0.164ms end-to-end per-request crypto overhead on m6i.xlarge. Embedding quality remains near-identical to bare metal.

How do you verify benchmark artifacts before release?

Run ./scripts/final_release_gate.sh. It enforces --require-kms, audits every run with check_kms_integrity.sh, and outputs a final summary and manifest.

Do you support confidential GPU?

Not in v1. CPU inference via Candle is supported today. GPU TEE support is planned for v2.

Request a Pilot

If you deploy AI in a high-assurance environment and need verifiable confidentiality, EphemeralML is built for you.