Comparison of formal verification frameworks for arithmetic circuits

Prelude: Proof Market

In 2014 I was running a website called Proof Market where you could put a bounty in Bitcoin on mathematical statements to be proven and when you prove them in Rocq you could receive the Bitcoin. I want to build a non-custodian version of Proof Market. For that purpose, zkVM can be very useful. I prefer that the proof-checking zkVM be formally verified. I am verifying arithmetic circuits because I want a market for theorem proving people and theorem proving AIs so that I can easily offload the verification of arithmetic circuits!

What

When you encode a problem (something to prove for Proof Market, or in general some verification to pass) as a puzzle of solving polynomial equations, cryptographers give you succinct ways to show that there's a solution to the puzzle (succinct arguments).

The receiver of a succinct argument will be able to check, given some cryptographic assumptions, with a very high probability, that there is a solution to the polynomial puzzle. Then the receiver will also want to make sure that the solution of the polynomial puzzle corresponds to a solution of the original problem (soundness). Also it's often critical for the sender of a succinct argument that any solution of the original problem can be encoded as a polynomial puzzle solution (completeness).

When people write polynomial puzzles (arithmetic circuits), they often have mathematical proofs of soundness and completeness in their mind. Mathematical proofs can be transferred with whiteboards and discussions, but (i) it takes a lot of effort from both sides, the sender and the receiver of the mathematical ideas, and (ii) everybody gets bored after checking five cases out of 40. They get used to some repetitive patterns and lose attention (though I have a hypothesis that it's possible to distinguish theoretical computer scientists from mathematicians by showing them proofs involving 40 cases). Sometimes people miss a slight mistake in the sixth case out of 40 and that matters because an attacker can study that case to find a hole in the implementation.

There are computer languages for writing and checking mathematical proofs, and they can be used to reason about hardware and software systems too. In case of arithmetic circuits the software is written with polynomials, so we are back to math. People have used these to check soundness and completeness of arithmetic circuits. Let me compare these.

Note

For cryptographers: I'm ignoring the argument of knowledge in this blog post. None of the projects mentioned below seems to state that the prover has a polynomial-time access to the solutions. It's all about existence of solutions.

Disclaimer: I'm biased

I'm developing Clean, a framework for proving correctness of arithmetic circuits in a somewhat modular way. Clean is written in Lean Language (TM of Lean FRO).

I used Rocq for years in my jobs. For an early-era formal verification of smart contracts, I used Isabelle. I used ACL2 from time to time for small projects. I've talked with people developing these tools, but I don't belong in the inner circle of any tool developments.

Template

How advanced are the existing examples (available online)?
- Level 1: simple examples involving up to a dozen finite field elements
- Level 2: useful components like hash functions or risc vm instructions
- Level 3: end-to-end results like characterization of risc vm executions
Can I reproduce the result? How much time does it take?
- Without rushing, keeping my ordinary leisurely existence.
- On Macbook Air M4, 2025 with 16 GB memory
Can Claude Code verify IsZero circuit?
Can Claude Code verify weighted-sum circuit?
- I start with: "Let's try another gadget. It takes 65 field elements. The first 32 and the second 32 should be taken the inner-product, and it should match the last 65th element."

Summary Comparison

Here's the summary of results. Note that the Level 0 examples are only about the publicly available examples. The projects have different goals. Two of them, acl2-jolt and sp1-lean, aim at tricky aspects of existing zkVM implementations. The others aim to be general frameworks. Some try to import circuits from other languages and verify them. Others aim to be frontends. All of the projects contain components that can be reused.

Framework	Language	Example Level	Reproducible	Claude Code: IsZero	Claude Code: Weighted-Sum
ACL2 (r1cs)	ACL2	Level 2?	Yes (3 hours)	Yes	Yes
ACL2 (PFCS)	ACL2	Level 1	Yes (no extra time)	Yes (one-shot)	Yes (verbose)
acl2-jolt	ACL2	Level 2	Partial (Ubuntu only)	N/A	N/A
Garden	Rocq	Level 2	Yes (20 min)	Eventually yes	Yes (with intervention)
zk-lean	Lean	Level 1	Yes (<10 min)	No	N/A
sp1-lean	Lean	Level 2	Partial	N/A	N/A
Clean	Lean	Level 2	N/A (author)	N/A (exists)	No

Circuit visualization Image generated by ChatGPT

ACL2 (r1cs) like in the ACL2 book

People used ACL2 for verifying arithmetic circuits. An example is a page in the ACL2 book. The framework lives in the ACL2 book and some applications seem to be proprietary.

R1CS is a popular format of polynomial equations. It seems that the ACL2 (r1cs) library can deal with other kinds of polynomial equations too.

How advanced are the existing examples available online?
- Level 2? I have difficulty finding a list of the examples. I might have missed the most interesting project.
- BLAKE2 is mentioned in a paper but I could not find the source code online.
- MIMC hash specification is available online. Proofs about some components are also online.
Can I reproduce the result? How much time does it take?
- Yes, three hours.
- From my previous experiences, I knew that the best way on a recent Mac is to follow this guide, and use sbcl. I just did brew install sbcl for installing sbcl.
- And then, I should have done cd books; make ACL2=/<snip>/acl2-sources/saved_acl2 -j 8 kestrel/ethereum/semaphore/top.cert.
- But I wasted some time by waiting for a while for make ACL2=/<snip>/acl2-sources/saved_acl2 -j 8 kestrel/ethereum
Can Claude Code verify IsZero circuit?
- Yes, but I think it followed an existing similar gadget to some extent.
- Claude Code asked me "R1CS or PFCS". This is clever because the ACL2 Book contains at least two ways of formulating polynomial circuit verification problems. PFCS stands for prime field constraint systems.
Can Claude Code verify weighted-sum circuit?
- Yes.
- It was about to give up proofs so I gave advice like "maybe there can be a hint to guide the proof engine to use the proven sublemmas rather than expanding the definition".
- Claude Code is helpful because I don't remember the exact hinting syntax for ACL2. The whole process took 30 minutes.

What I like about ACL2 (r1cs) like in ACL2 book

The formalization of the arithmetic circuits is straightforward and easy to follow. There's a list of equations with the left-hand side and the right-hand side spelled out.

(define make-rel-iszero ()
  :returns (def definitionp)
  (make-definition
   :name 'rel-iszero
   :para '(x out)
   :body (list
          ;; Constraint 1: x * x_inv = 1 - out
          (make-constraint-equal
           :left (expression-mul
                  (expression-var 'x)
                  (expression-var 'x-inv))
           :right (expression-sub (expression-const 1)
                                  (expression-var 'out)))
          ;; Constraint 2: out * x = 0
          (make-constraint-equal
           :left (expression-mul
                  (expression-var 'out)
                  (expression-var 'x))
           :right (expression-const 0)))))

ACL2 doesn't require writing tactics in the normal usage. You can give hints to the prover, and it's just about naming the relevant lemmas. This might be easier for humans as well as for LLMs.

My remaining wishes about ACL2 (r1cs) like in ACL2 book

Installation of ACL2 took some trial and error, though this is my third time doing this. I tried to use the particular gcl version mentioned in the installation guide, and the ./configure script didn't work, etc. Some of the online documentation pages returned 403 errors. It's a real hurdle when compared with Lean installation (installing a VSCode plugin and pressing 'ok' a couple of times).

Me vs Claude Code

Claude Code is better than me at dealing with ACL2 (r1cs) like in ACL2 book. I believe that stating and proving correctness theorems happened 100x faster with Claude Code than just me doing it.

ACL2 (PFCS) like in the ACL2 book

How advanced are the existing examples (available online)?
- Level 1: a few field elements (rel-select)
- Research papers suggest that the PFCS approach was used to verify many gadgets of snarkVM. However, I could not find these advanced examples online.
Can I reproduce the result? How much time does it take?
- No time spent after finishing the ACL2 (r1cs) process above.
Can Claude Code verify IsZero circuit?
- Yes. One-shot complete.
Can Claude Code verify weighted-sum circuit?
- Yes, but very verbosely.
- I had to tell it "don't give up" once.

What I like about ACL2 (PFCS) like in the ACL2 book

PFCS is a reasonable way to package arithmetic circuits.

My remaining wishes about ACL2 (PFCS) like in the ACL2 book

This framework deserves more attention. It's suffering from the lack of publicly available advanced examples.

There are data types that can be encoded as vectors of field elements, and allowing those as parameters might be an interesting direction.

Me vs Claude Code

It's unclear whether I'm better or Claude Code is in dealing with ACL2 (PFCS). I had a better grasp when Claude Code was about to give up but I insisted on continuing.

ACL2 like in `acl2-jolt`

Another project using ACL2 is acl2-jolt available on GitHub.

How advanced are the existing examples?
- Level 2
Can I reproduce the result? How much time does it take?
- Not on the Mac.
- Yes, to some extent on Ubuntu 24.04.3 LTS
  - I could certify the ACL files after some advice from the authors (PR).
  - I could not check the correspondence with the Rust implementation of Jolt (because a dependency was not pinned and wanted a newer version of Rust)
Can Claude Code verify IsZero circuit?
- Not really applicable.
- Reflecting the lookup-singularity approach of Jolt, this verification approach is also focusing on verifying subtables and composing subtables into instructions.

What I like about ACL2 like in `acl2-jolt`

The project was verifying a real implementation. A part of the codebase is well-documented. The codebase is very uniform and the proven statements are short and easy to read.

My remaining wishes about ACL2 like in `acl2-jolt`

The verification stops at each instruction. The notion of the program execution has not been formalized. Most users care about execution of programs, so it's important that circuits represent program executions.

There are some confusing aspects in the codebase. Some subtables are made with 257x257 dimension with (create-tuple-indices (expt 2 8) (expt 2 8)). Other subtables are made with 256x256 dimension with (create-tuple-indices (1- (expt 2 8)) (1- (expt 2 8))). I'm mostly sure it doesn't matter, but something similar might matter.

Me vs Claude Code

Claude Code is better than me at dealing with acl2-jolt. I was replacing 257x257 table with a 256x256 table and Claude Code was fixing proofs before I understood why they were broken.

Garden

Garden is a framework for verifying arithmetic circuits written in Rocq.

How advanced are the existing examples (available online)?
- Level 2, multiple counts.
- Garden can import circuits from many languages.
Can I reproduce the result? How much time does it take?
- Yes, 20 minutes. Including filing a tiny PR about the build instruction.
Can Claude Code verify IsZero circuit?
- Eventually yes.
- Claude Code was confused how to decompose A /\ B /\ C and it was complaining "This is incredibly frustrating." I was sorry to hear that.
- Most difficulty came from the current axiomatic treatment of multiplicative inverse.
  - Claude Code added some axioms about UnOp.inv (because the inverse operation has no definition and is just assumed to exist, see M.v).
  - However it forgot to add an axiom saying UnOp.inv _ is smaller than p.
  - Claude Code was spending a lot of time shuffling mods like in H_out: (1 mod p - ((x * (self.inv mod p)) mod p) mod p) mod p = 0
- I was in a situation where coqide works but VsRocq doesn't because it somehow doesn't figure out the Require Imports according to the unique _CoqProject. This limited Claude Code's options of IDE access.
Can Claude Code verify weighted-sum circuit?
- Yes. Some intervention was needed because Claude Code told Rocq to accept some unnecessary statements without proofs.

What I like about Garden

It verifies circuits written in popular DSLs.

My remaining wishes about Garden

Only solvable problems:

Sometimes field operations are defined to be integer modulo p, and it's very easy to get distracted.
The syntax is currently rather noisy to the eye (but that might be a reasonable choice for a target of automatic translation).

The IsZero circuit looks like this

Definition constrain {p} `{Prime p} (arg_fun_0 : IsZero.t) (arg_fun_1 : Felt.t) : M.t unit :=
  let var_out : Felt.t := arg_fun_0.(IsZero.out) in
  let var_inv : Felt.t := arg_fun_0.(IsZero.inv) in
  let var_felt_const_1 : Felt.t := UnOp.from 1 in
  let var_felt_const_0 : Felt.t := UnOp.from 0 in
  (* Constraint 1: out = 1 - x * inv *)
  let var_prod : Felt.t := BinOp.mul arg_fun_1 var_inv in
  let var_out_expr : Felt.t := BinOp.sub var_felt_const_1 var_prod in
  let* _ : unit := M.AssertEqual var_out var_out_expr in
  (* Constraint 2: x * out = 0 *)
  let var_prod2 : Felt.t := BinOp.mul arg_fun_1 var_out in
  let* _ : unit := M.AssertEqual var_prod2 var_felt_const_0 in
  M.Pure tt.

Me vs Claude Code

I'm better than Claude Code at dealing with Garden code.

zk-lean

zk-lean is a framework for verifying arithmetic circuits written in Lean.

How advanced are the existing examples (available online)?
- Level 1, just comparing two field elements
- However, the framework itself seems to be more featureful.
Can I reproduce the result? How much time does it take?
- Yes, under ten minutes.
Can Claude Code verify IsZero circuit?
- No, it could not figure out how to deal with the binds of the free monad. I could not either. We couldn't figure out how to use the machinery.

What I like about zk-lean

It's too early to tell. The codebase is a combination of a few, very elementary examples and some sophisticated machinery. I will need to see the sophisticated machinery at work before I can say something.

My remaining wishes about zk-lean

I guess more interesting examples are coming. Jolt repository has a feature of exporting some Lean definitions for zk-lean, so there might be some efforts on top of this.

Me vs Claude Code

For me vs Claude Code, about dealing with zk-lean code, it's a tie. Both failed to understand how to reason about the syntax. The point I gave up was when I saw the definition of FreeM.bind is protected. I guessed I wasn't supposed to unfold the definition (unfolding is the act of replacing FreeM.bind with its definition), but then what else?

sp1-lean

Succinct announced formal verification of Sp1 Hypercube. This is a notable achievement.

How advanced are the existing examples (available online)?
- Level 2: risc vm instructions
Can I reproduce the result? How much time does it take?
- Partially. I haven't converted Rust constraints into Lean code. Some proof about LB instruction seems to be missing in the commit on the day of the announcement. The file LoadByteChip.lean was added in the commit on the day of announcement. It is not yet in the default build lake build.
- The build of Lean code succeeded in under two hours with a lake build (on a commit made on the same day as the announcement).
- The announcement does not mention a git commit, so I guessed the commit.
Can Claude Code verify IsZero circuit?
- Not applicable. An experiment would be to add an iszero instruction in the Sail specification and implement it and verify it, but that won't be a fair comparison.

What I like about sp1-lean

The announcement is very transparent about what the team did and what they didn't do. I especially appreciate the "Assumptions and Caveats" section. The level of transparency is an example to follow.

The effort aimed at verifying all relevant instructions.

The definitions and the statements are easy to follow if you're familiar with RISC-V zkVMs (ways of encoding VM executions into arithmetic circuits).

I feel considerable theorem-proving kung-fu applied in a targeted manner to finish the work. The repo contains cool Lean techniques (use of advanced attributes and skillfully adjusted invocations of automatic tactics) that keep the size of proof scripts to, say, less than 1/5 of what I expect for a naive approach.

My remaining wishes about sp1-lean

The announcement does not specify the git commit of the proof artifacts. The most recent commit at the time of announcement doesn't contain completed proof of LB correctness. A previous commit contains a theorem about LB instruction.

Completeness is missing. For instance correct_add says "if constraints hold, something something". I could not find another theorem saying "if something something, constraints can be satisfied".

There are some tricky points about the definitions. The mitigation would involve some design changes in the constraint development framework:

Main[6] and Main[14] are inputs and Main[21] is the output of AddChip.
Results of Sp1 specific conventions (how ByteOperation related tables are made) and the general Air constraint conventions are mixed.
Send and receive of multiplicities are treated magically, so that multiplicity overflow would not be noticed.
MemoryAccessInShardTimestamp contains low limbs only, why is that safe for what purposes, and so on, are not a part of verification in the repo.
I don't think I've noticed everything.

The proofs look fragile to changes in the number of constraints etc. Perhaps coding agents are able to update this kind of things.

Clean

Clean is a framework for proving correctness of arithmetic circuits in a modular way, written in Lean.

How advanced are the existing examples (available online)?
- Level 2: Keccak and BLAKE3 hash functions, there's also a small VM with soundness
Can I reproduce the result? How much time does it take?
- Not applicable because I'm working on this project. Please try and tell us what you think.
Can Claude Code verify IsZero circuit?
- IsZero already exists, so not applicable.
Can Claude Code verify weighted-sum circuit?
- No, it started doing something else.
- Claude Code forgot to constrain a variable for the inner product.
- Claude Code tried to prove soundness but could not, so it added an assumption saying that the result is the intended value, instead of fixing the circuit.
- I was able to finish the soundness proof.

What I like about Clean

The whole development is in Lean, so nothing is lost in translation.

Clean offers ways to verify sub-circuits in a composable way.

There's some conceptual simplicity about looking at everything as a predicate on the infinite tape of field variables.

The circuit syntax is passable as a front-end language (in my opinion). Here's a IsZeroField.lean in Clean project (written before this experiment):

def main (x : Var field F) : Circuit F (Var field F) := do
  -- When x ≠ 0, we need x_inv such that x * x_inv = 1
  -- When x = 0, x_inv can be anything (we use 0)
  let xInv ← witness fun env =>
    if x.eval env = 0 then 0 else (x.eval env : F)⁻¹

  let isZero <== 1 - x*xInv -- If x is zero, isZero is one
  isZero * x === 0  -- If x is not zero, isZero is zero

  return isZero

My remaining wishes about Clean

It's cumbersome having to remember the different kinds of circuits: Circuit, ElaboratedCircuit, Subcircuit, FormalCircuit, FormalAssertion and FormalGeneralCircuit.

When I use a sub-circuit, I see (something_complicated).output in the goal state instead of just a single variable for the output of the sub-circuit. This doesn't usually matter because I have the specifications of the subcircuits telling me things about the (something_complicated).output. Since I never need to look into the structure of the complicated term, I could just see a variable instead of this.

Me vs Claude Code

I'm better than Claude Code at working with Clean.

Some dormant projects

Coda and risc0-lean4 have some Lean code for arithmetic circuits. The last updates are years ago, so I didn't include these projects in the comparison.

Future

What's coming in this field? In the short term, more frameworks will appear. Somebody will come up with the idea that polynomial circuits are like effectful lenses/relation-oriented programming/representation theory. In the mid-term, except for hand-craft optimized circuits, people will stop writing circuits themselves. Most polynomial circuits will be automatically generated. Some formal verification techniques will be used with less and less manual intervention. Evaluating results of verification efforts will remain a challenge.