Run sensitive AI inference where model weights and prompts stay protected, even if the host is compromised. Built on AWS Nitro Enclaves with attestation-gated key release and end-to-end encrypted sessions.
Designed for government clouds, defense contractors, and regulated industries.
Standard approaches protect data in transit but leave it exposed on the host. EphemeralML assumes the host is compromised and still keeps secrets protected.
The host can see plaintext prompts or decrypted weights at some point in the pipeline.
Encryption in transit doesn't prevent data exposure on compromised hosts.
Compliance teams need proof of what code processed an inference, not just logs.
Key management is frequently enforced by the same environment you don't trust.
EphemeralML protects model weights, user inputs/outputs, and execution integrity through TEE isolation and attestation-bound cryptography.
Encrypted at rest, decrypted only inside an attested enclave. The host never sees plaintext keys.
Prompts and outputs are encrypted end-to-end via HPKE sessions. The host relays ciphertext only.
Each inference produces an Attested Execution Receipt (AER) with enclave measurements and cryptographic signature.
The client verifies enclave identity and code measurement against a policy allowlist via COSE/CBOR attestation.
All requests and responses are encrypted via HPKE (X25519 + ChaCha20-Poly1305). The host forwards ciphertext only.
Model decryption keys are released only when KMS confirms the enclave measurement matches policy. The host never sees plaintext keys.
Verifies attestation, holds policy allowlists, establishes HPKE sessions
Networking, storage, and AWS API proxy. Forwards ciphertext only.
Decrypts data, loads models, runs inference, signs execution receipts
Keep model weights encrypted at rest and decrypted only inside an attested enclave.
PII and classified prompts/outputs remain encrypted end-to-end to the enclave.
Attach a verifiable AER receipt to each inference for compliance and forensics.
Offer "trust but verify" inference without exposing customer data or model keys.
No. The host relays ciphertext only; decryption occurs inside the enclave.
KMS policy binds decrypt authorization to enclave measurements. Without valid attestation, decrypt is denied.
No. V1 documents residual side-channel risk; mitigations are limited and explicit.
Recent multi-model runs show 11.9-14.0% overhead (weighted mean 12.9%), ~0.028ms enclave-side per-inference crypto, and ~0.164ms end-to-end per-request crypto overhead on m6i.xlarge. Embedding quality remains near-identical to bare metal.
Run ./scripts/final_release_gate.sh. It enforces --require-kms, audits every run with check_kms_integrity.sh, and outputs a final summary and manifest.
Not in v1. CPU inference via Candle is supported today. GPU TEE support is planned for v2.
If you deploy AI in a high-assurance environment and need verifiable confidentiality, EphemeralML is built for you.