Zero-Knowledge Proofs of Real-World Vulnerabilities At Scale

Published on July 30, 2025

אם ירצה ה׳

Paper: Cuéllar Gempeler, S., Harris, B., Parker, J., Pernsteiner, S., Sweet, I., & Tromer, E. (2025). Cheesecloth: Zero-Knowledge Proofs of Real-World Vulnerabilities. ACM Transactions on Privacy and Security.

Paper: Cuéllar, Santiago, Bill Harris, James Parker, Stuart Pernsteiner, and Eran Tromer. "Cheesecloth:{Zero-Knowledge} proofs of real world vulnerabilities." In 32nd USENIX Security Symposium (USENIX Security 23), pp. 6525-6540. 2023.

🚩 Side-by-Side: 2023 vs 2025

(The differences that made me raise an eyebrow)

Dimension	2023 USENIX Paper	2025 ACM Article	Unexpected Gap ➜ Why it matters
Scale of largest run	1.3 M instructions (Heartbleed)	≥ 10 trillion cycles (OpenSSL DoS)	3–4 orders of magnitude jump in only two years. Either Moore’s law went super-nova or the definition of “step” quietly changed (likely from instruction to cycle). If the latter, past gate counts aren’t apples-to-apples.
Gate counts	159 M (Heartbleed) → 159 B (Heartbleed)	Same bug, ≈ 1 000× larger	Sudden inflation suggests either (a) earlier numbers were compressed or (b) new encoding counts gates differently. Without a footnote, we can’t tell if the 2023 paper understated cost.
Proof artifact size	20 GB (max)	83–748 GB	40× growth outruns typical disk budgets; cost models from 2023 are now obsolete.
Explicit limitation list	“Interactive proofs… not yet non-interactive”	Interactive proofs still default, but no longer listed as a “limitation”	Silence ≠ solved. The 2025 text still needs online verifier, but the limitation table was quietly trimmed. Risk of over-confidence by new readers.
TCB size admission	~35 k LoC	Same, but no longer quantified	Removing the exact line count makes it harder to evaluate progress on formal verification.
Vulnerability classes	Memory unsafety, leakage, DoS	Same, plus MITM, auth-bypass, “rowhammer future work”	No new classes actually shipped; the “future work” list grew while delivered classes stagnated. Feels like scope creep without deliverables.
Reproducibility claim	“<5 % variance across 3 trials”	“95 % confidence repeatability”	Statistical language softened; the 2023 paper gave concrete stdev, 2025 only gives a confidence interval. Harder to audit.
Hardware baseline	128-core Xeon, 2 TB RAM (same hardware)	Same hardware, yet workloads grew 1 000×	Implies algorithmic breakthrough rather than hardware scaling. However, no new asymptotic analysis is offered—raises reproducibility risk if breakthrough isn’t published.

🎯 Most Worrying Delta

Gate-count inflation without transparent re-benchmarking.
The Heartbleed circuit went from 159 M to 82 B gates for the same 1.3 M-step trace. Either the 2023 numbers were compressed/rounded down, or the 2025 circuit is less optimized. Neither explanation is comforting:
- If 2023 understated cost → past feasibility claims need revision.
- If 2025 is less optimized → progress may be negative.

🛠️ Mitigations for Readers

Demand raw R1CS files for both years to normalize gate-counting definitions.
Re-run the 2023 artifact using today’s repo commit to see if numbers drift.
Require explicit changelog in repo releases highlighting any redefinitions of “step” or “gate”.

Until then, treat 2025 numbers as preliminary upper bounds, not direct improvements.

2025 ACM Transactions version

1. TL;DRs for Five Audiences

Audience	2-Sentence Take-away
Expert cryptographer / ZK researcher	Cheesecloth is a compiler tool-chain that turns real C/C++/Rust programs (via LLVM or RISC-V) into optimized ZK proof statements, enabling practical proofs of actual CVEs such as Heartbleed, FFmpeg OOB, Scuttlebutt auth-bypass and a 10-trillion-cycle OpenSSL DoS. Key novelties: Public-PC segment optimization, sparsity-aware memory model, a four-label must-leak taint system, and an in-circuit higher-order logic checker for long-running executions.
Security practitioner / bug bounty hunter	The project lets you prove to vendors or coordinators that you hold a working exploit—without revealing the payload—by producing a zero-knowledge proof the vulnerable path exists. It ships Docker scripts (`scripts/run_grit`, `run_ffmpeg`) that compile the target, generate 20–200 GB circuit files, and output a proof the vendor can independently verify.
General public	Remember Heartbleed? This work shows we can now mathematically prove such bugs exist without disclosing how to trigger them, protecting users while forcing vendors to act. It’s like showing you have a key to the vault without ever letting anyone see or copy it.
Skeptic	Yes, zero-knowledge proofs are often “too slow for real programs,” but Cheesecloth demonstrates proofs over 340 k instructions and 10 T CPU cycles, with end-to-end runtimes under four days on commodity servers. Open source code backs the claims; numbers and circuits are reproducible.
CTO / decision-maker	Cheesecloth turns responsible disclosure into a contractually verifiable process: analysts can prove severity, regulators can audit proof, and IP stays protected. Integration is CLI-based; costs are hundreds of CPU-hours per critical bug—far cheaper than breaches or prolonged vendor stonewalling.

2. Real-World Problem

Security researchers face a disclosure dilemma:
• Reveal full exploit → users immediately at risk.
• Share privately with vendor → vendor may ignore or under-reward.

Current ZK frameworks conceptually solve this but choke on industrial code size, data-dependent loops, and nuanced bug conditions (e.g., info-leak vs. OOB).

3. Counter-intuitive Findings

Zero-knowledge can scale to multi-million-step executions without leaking the control-flow trace, by selectively opening only safe basic blocks (Public-PC segments).
A must-leak taint lattice (⊥, ℓ₀, ℓ₁, ⊤) makes proving a leak easier than proving safety—opposite of traditional analyses.
For DoS, proving divergence via symbolic reasoning is faster than simulating the full loop; a 10 T-cycle proof fits in < 1 PB constraints with a custom higher-order logic checker.

4. Demystifying Jargon

Paper Term	Plain-English Analogy
Public-PC segment	“Open the GPS only for highway stretches”—the prover reveals which basic block is running when it’s harmless, enabling massive constant folding.
Sparsity (k)	“One mailbox checked by k apartments”—share a single memory gate across k CPU steps when real programs rarely touch RAM every cycle.
Must-leak taint	Labels that mean data is definitely private rather than “maybe,” so any flow from ℓ₁ → ℓ₀ is a smoking gun.
MicroRAM	A minimal 64-bit virtual CPU, byte-addressable, designed so that each instruction is cheap to encode as ZK constraints.
Witness Checker	The self-contained circuit that ensures the secret trace indeed follows MicroRAM’s rules and hits the vulnerability flag.
zkIF / R1CS / SIEVE IR	File formats for “giant sudoku” constraint systems that ZK provers work on.

5. Methodology Highlights

Compilation Pipeline
Source → LLVM IR / RISC-V → MicroRAM assembly (llvm-passes, rust-toolchain in repo).
Aggressive constant folding on public inputs; dynamic choice of k sparsity.
Optimizations
Public-PC: custom state-routing network hides execution order yet lets compiler emit tight circuits for public blocks.
Memory ports grouped in blocks of k steps; stutter instructions added only when needed.
Advice-guided malloc avoids quadratic region tracking by letting prover pick allocation layout but verifiably non-overlapping.
Vulnerability Encodings
Memory unsafety: track validity bits & poison words, set vul flag on OOB/use-after-free etc.
Leakage: per-byte taint; bug when sink label mismatches source.
Interactive attacks: two MicroRAM VMs share privileged channel state; random-oracle commitment prevents quine-based key theft.
DoS: Higher-order logic with reachability predicates verified in-circuit by a Rust proof checker.

6. Quantitative Results (selected)

Target	LoC (approx.)	Steps proved	Mult gates	Prover time	Proof size
GRIT (buffer overflow)	3 K	4 K	255 M	1.5 min	262 MB
FFmpeg (GIF OOB)	24 K	27 K	1.8 B	10.5 min	1.8 GB
OpenSSL Heartbleed	340 K	1.3 M	82 B	7.8 h	83 GB
Scuttlebutt MITM	74 K	4.6 M	299 B	29 h	306 GB
OpenSSL DoS	340 K	≥ 10 T	731 B	73 h	748 GB

All proofs generated on a 128-core Xeon, 2 TB RAM; 95 % confidence repeatability observed across three runs.

7. Deployment Considerations

Aspect	Notes
Hardware	Commodity x86 server ≥ 32 GB RAM for small bugs; 1-2 TB for Heartbleed-scale.
Runtime	From minutes to ~3 days; interactive protocol requires stable link (83 GB communication for largest case).
Integration	Dockerfile & Bash scripts; verifiers need only Swanky library to check proofs.
UX	Analysts tag sources/sinks with two helper macros, compile, run script; vendors receive circuit+proof bundle.
Compliance	MIT-licensed; no cryptographic export issues.

8. Limitations & Assumptions

Circuit size still huge (10¹¹ gates) → not yet phone-friendly.
Side-channel classes beyond memory/IFL/DoS (e.g., timing, cache) not modeled.
Trusted Computing Base large: LLVM, custom passes, proof checker (~35 K LoC). Formal verification pending.
Public-PC heuristic may under- or over-estimate loop iterations, leading to fallback “private-pc” blocks that bloat circuits.

9. Future Work

Verifiable compilation of the tool-chain itself (e.g., using zkSMT).
Automatic partitioning & shape analysis to extend constant-folding without manual hints.
STARK back-end for transparent setup and cheaper verification on-chain.
Richer vulnerability classes (race conditions, rowhammer) via extended MicroRAM instrumentation.
Compression of proof streams (e.g., recursion) to drop proof size to sub-GB.

10. Conflicts of Interest & Bias Checks

Authors are affiliated with Galois Inc. (security consultancy) and may commercially benefit from adoption. Funding from DARPA may orient the research toward national-security use cases. Source code is Apache/MIT, mitigating vendor lock-in bias.

Bottom line: Cheesecloth bridges a long-standing gap between theoretical ZK proofs and messy real-world C code, delivering the first end-to-end, open-source pipeline that scales from toy examples to flagship CVEs—while keeping exploits secret and verifiable.

2023 Usenix version

1. Five Targeted TL;DRs

Audience	3-Line Summary
Expert cryptographer	Presents an LLVM-to-circuit toolchain (“Cheesecloth”) that proves existence of memory & leakage bugs in C/C++/Rust programs without revealing exploits. Uses a TinyRAM-inspired IR (MicroRAM), public-PC segmentation, and sparse memory ports to cut R1CS size (~160 M gates for Heartbleed). Evaluated on GRIT, FFmpeg, OpenSSL.
Security practitioner / bug-bounty hunter	Lets you show a vendor you really have a working vuln—buffer overflow or Heartbleed-style leak—without handing over the POC. You compile their code plus your secret input; the tool emits a ZK proof the bug triggers. Works on real software but needs hefty hardware (≥32 GB RAM, hours–days).
General public	Researchers built a way to prove software is vulnerable while keeping the details secret, a bit like showing you have a key without giving a copy. They demo it on famous bugs, including the Heartbleed flaw that once exposed passwords on the Internet.
Skeptic	Yes, zero-knowledge proofs are heavy, but the paper shows they can handle millions of instructions of unmodified C code. Circuits are still huge (20–200 GB artifacts) and require trust in a large compiler pipeline; however, no exploit data leaks in the proof itself.
Policy/decision maker	Technique enables “responsible disclosure with cryptographic receipts.” Government or enterprise can require a ZK proof before acting, reducing extortion risk. Deployment today is feasible only for high-value cases due to compute cost.

2. Real-World Problem Addressed

Responsible vulnerability disclosure is stuck between:

Full disclosure → users at risk before patch.
Vendor-only disclosure → vendor may downplay or ignore.

Cheesecloth offers a third path: provide cryptographic proof that “a working exploit exists” without revealing the exploit or even which function is buggy.

3. Surprising / Counter-Intuitive Findings

ZK circuits for a 1.3 M-instruction OpenSSL trace are doable (159 M mult gates) and verifiable in <24 h on commodity servers—much larger than prior state-of-the-art (~51 K instructions).
Publicly exposing only the program counter at chosen segments (public-PC) slashes circuit size up to 41 % yet leaks zero about secret data or exact control flow.
A single-trace “must-leak” taint analysis can provably detect Heartbleed without building two parallel executions—contradicting the common belief that non-interference proofs always need pairwise reasoning.

4. Jargon Demystified

Term	Plain-English Translation & Example
Zero-Knowledge Proof (ZKP)	A protocol where you convince me a statement is true without showing why. E.g., proving you know a password without typing it.
R1CS / Arithmetic Circuit	A giant algebra puzzle: variables must satisfy equations of the form a·b = c. If you fill in the numbers correctly, the program ran correctly.
TinyRAM	A simplified virtual CPU designed for ZK. Think of an 8-bit Game Boy emulator that’s easy to prove things about.
MicroRAM	TinyRAM on a diet: byte-addressable, no flags, advice-driven I/O, tailored for bug proofs.
Public-PC segment	Mark a basic block so its start/end addresses are public; the inside stays secret but the circuit for that block is pre-optimized.
Memory sparsity (s)	Share one memory-consistency gadget across s CPU steps because most steps don’t touch RAM.
Must-leak labeling	A taint-tracking scheme where if any secret-tagged byte reaches a public sink unconditionally, a leak is proven.

5. Methodology in Brief

Compilation Pipeline
LLVM IR → MicroRAM assembly (custom backend + Picolibc port).
Constant-fold public-only prefix of program.
Insert vulnerability-specific instrumentation:
- malloc/free wrappers for memory safety.
- taintSource/taintSink for data leakage.
Generate R1CS via Witness-Checker:
- Transition function per step.
- Memory permutation check with sparsity.
- Public-PC optimized blocks.
Proof Generation
Prover runs program twice: once to collect nondeterministic advice, once to produce witness.
Diet Mac’n’Cheese interactive ZKP backend (Swanky library) used for concrete runs.
Benchmarks
GRIT image converter (buffer overflow) – 5 K steps.
FFmpeg GIF decoder (out-of-bounds write) – 79 K steps.
OpenSSL 1.0.1f Heartbleed (info leak) – 1.3 M steps.

Hardware: Debian 11, 128-core Xeon, 2 TB RAM (but <32 GB needed for GRIT/FFmpeg).

6. Results With Context

Case	Steps	Circuit Gates	Prover Time	Verifier Time*	Comm.	Gate Reduction vs naïve
GRIT	5 K	324 M	2 m 42 s	~1 m	416 MB	-41 % (public-PC)
FFmpeg	79 K	8.4 B	1 h 10 m	16 m	10 GB	-10 %
OpenSSL	1.3 M	159.7 B	23 h 41 m	4 h	203 GB	-6 %

*Verifier time approximated from Diet Mac’n’Cheese logs; protocol is interactive.

Confidence intervals not provided in paper; timings varied <5 % over 3 trials.

7. Practical Deployment Considerations

Compute & Storage
20-200 GB intermediate artifacts; >32 GB RAM recommended.
Multi-hour proof generation—suitable for bounty programs, audits, not yet CI pipelines.
Trusted Computing Base
Large (Clang, LLVM passes, Haskell/Rust code). No formal verification; users must trust open-source audit or Docker hash.
Workflow Integration
Fits into existing disclosure flows: researcher runs proof, sends circuit + proof; vendor verifies with Swanky (open-source).
Can be wrapped as a GitHub Action for reproducible artifact builds.
User Experience
Command-line scripts (scripts/run_grit, etc.) automate full build.
Requires manual tagging of sources/sinks or exploit address.
Extension Paths
Other ZK backends (Snark-friendly) via exported R1CS.
More languages via any LLVM frontend.

8. Limitations & Assumptions

Circuit leakage: Time-step bound T and chosen public-PC blocks leak some structural info (may hint at bug location).
Massive TCB: Compiler bugs could create false proofs.
Memory-only vulns: Current encodings target buffer/heap issues; logic bugs or arithmetic overflows not yet covered.
Interactive proofs: Requires online verifier or transcript Fiat-Shamirization (not evaluated).
Hardware barrier: Commodity laptops cannot handle OpenSSL-scale proofs.

Boundary conditions: assumes deterministic execution, single-threaded code, non-self-modifying binaries.

9. Future Directions

Automated source/sink inference with static taint analysis to remove manual annotations.
Von Neumann MicroRAM to cut instruction-fetch overhead further.
Non-interactive proofs (e.g., Halo2, Plonkish) to deliver single JSON proof to vendor.
Broader vulnerability classes: integer overflow, race conditions.
Formal verification of the pipeline using Coq/Isabelle to shrink TCB.

10. Conflicts of Interest & Bias Check

Funded by DARPA contract HR001120C0085; may bias toward national-security use-cases.
Authors are employees of Galois Inc., which could commercially benefit from consulting on the toolchain.
No proprietary components; MIT-licensed code mitigates lock-in concerns.

11. Bottom Line

Cheesecloth is the first practical evidence that zero-knowledge proofs can certify real-world C/C++ vulnerabilities at million-instruction scale. While computationally heavy and not yet turnkey, it opens a new lane for secure, verifiable vulnerability disclosure—bridging cryptographic theory and day-to-day security practice.