Verifying Poseidon in Clean: Why the Last 'sorry' Is About Primality

poseidon-clean

Poseidon is one of the most popular hash functions of the zero-knowledge ecosystem. It was designed as an arithmetization-friendly hash function for SNARKs, STARKs, and related proof systems, and it appears in widely used libraries such as circomlib's Poseidon circuit. It is used in Merkle trees, commitments, nullifiers, and many other places where a proof system needs a hash that is cheap to express as arithmetic constraints.

In this post, we walk through a Lean 4 proof of correctness for a Clean model of circomlib's optimized Poseidon circuit at arity 1. The theorem says that the modeled constraints are sound and complete with respect to the optimized Poseidon specification used by circomlib:

def circuit : FormalCircuit (F BN254_PRIME) field field where
  Assumptions _ := True
  Spec input output := output = poseidon1Opt input
  soundness := ...
  completeness := ...

In other words, if the constraints are satisfied, the output is the specified Poseidon hash. And for every input, the expected witness exists.

Info

The proof and specs live in the Clean repository:

Circuit and proofs: Clean/Circomlib/Poseidon.lean
Textbook spec: Clean/Specs/Poseidon.lean
Optimized spec: Clean/Specs/PoseidonOptimized.lean
Companion walkthrough: Poseidon.md

If you are familiar with Lean, you know that large proofs are often developed top-down. You state helper lemmas first, use them to shape the main theorem, and fill in their proofs later. Lean marks these unfinished proof obligations with the keyword sorry. After weeks of work on the Poseidon proof, there was one sorry left:

instance : Fact (Nat.Prime BN254_PRIME) := by sorry

Interestingly, this was not just a missing crypto argument. It was the primality proof for the BN254 scalar field modulus. This number is a standard parameter of the BN254 curve (also known as alt_bn128), used in Ethereum's pairing precompiles and many zkSNARK libraries. But Lean does not accept "standard parameter" as a proof. To complete the theorem, we still have to prove that this 254-bit integer is prime.

You have probably heard of probabilistic primality tests such as Miller-Rabin. Those are useful engineering tools: run enough rounds, and a false positive becomes astronomically unlikely. But high confidence would also not constitute a Lean proof. Here we need a proof object that the kernel can check.

Closing this sorry therefore requires a primality certificate: a finite witness that reduces the claim "BN254_PRIME is prime" to computations and smaller primality claims. Surprisingly, as we will see in the following, the certificate we need is based on a theorem from 1876.

Why Poseidon Is Worth Verifying

Most familiar hash functions, like SHA-256, are bit-oriented. They use rotations, XORs, bitwise Boolean operations, and additions modulo powers of two. These operations are natural on CPUs, but expensive inside arithmetic circuits.

Poseidon was designed for the opposite setting. Its core operations are field additions, field multiplications, S-boxes of the form $x \mapsto x^5$, and small matrix multiplications. These are the kinds of operations that R1CS, PLONKish arithmetizations, and many SNARK circuits handle efficiently.

Efficiency is a major reason behind why Poseidon appears so often in circom circuits. But it also means that a bug in the circuit is serious. If the circuit accepts a witness whose output is not the intended hash, then every protocol depending on that hash can inherit the bug. This is precisely the kind of mistake formal verification is designed to rule out. Instead of testing many inputs and hoping the wiring is right, we state mathematically what the circuit should compute and prove that every satisfying assignment computes it.

Standard Poseidon vs. Optimized Poseidon

Poseidon is usually described as a sequence of full rounds and partial rounds using an MDS matrix. That is the clean mathematical description: apply S-boxes, add round constants, mix the state, and repeat. The circomlib circuit however, does not implement this description in the most literal way. It implements an optimized schedule that is algebraically intended to produce the same permutation, but with fewer constraints. The optimization uses a pre-sparse matrix before the partial rounds and sparse matrices during the partial-round phase.

For verification, this distinction matters. If the circuit checks the optimized schedule, then the theorem should target the optimized schedule. Proving the circuit against a textbook-looking spec directly would force the proof to also absorb the optimization argument.

So the proof uses two Lean specs:

poseidon1     -- simpler, non-optimized, full-MDS output spec
poseidon1Opt  -- optimized circomlib-style schedule

You can find these definitions in Specs/Poseidon.lean and Specs/PoseidonOptimized.lean respectively.

The final circuit theorem is stated against poseidon1Opt, because that is the structure the Clean circuit model actually follows. The simpler poseidon1 spec is still useful: we test poseidon1Opt against it and against circomlibjs vectors as an output-level sanity check.

What We Proved

Currently, the proof is about arity 1 Poseidon. In circomlib terminology, this means one input element, plus the initial zero element, so the internal state has width $t = 2$. The formally verified object is a Clean circuit model of the optimized circomlib schedule. The top-level theorem proves two properties:

Soundness. If an assignment satisfies all constraints of the Clean circuit, then the output is poseidon1Opt input.

Completeness. For every input, the honest computation produces a witness satisfying all constraints.

Together, these say that the Clean circuit computes exactly the optimized Poseidon function. No extra outputs are accepted, and no correct computation is rejected.

There are also two trust boundaries worth stating explicitly:

First, the proof does not mechanically parse poseidon.circom. The Clean circuit is a direct hand model of the same optimized arity-1 structure. The correspondence is small and explicit, but it is not a verified compiler pipeline from Circom source to R1CS.
Second, the optimized Poseidon spec is tested against the simpler non-optimized spec and against circomlibjs vectors, but the equality poseidon1Opt = poseidon1 is not yet proved for all field elements. This is a useful validation layer, not a formal equivalence proof.

With those boundaries made explicit, we gain strong confidence in the soundness and completeness of the circomlib Poseidon circuit: the Lean proof gives a machine-checked guarantee for the Clean circuit model, while the remaining assumptions are small, inspectable modeling links rather than hundreds of algebraic constraints checked by hand.

The Circuit Shape

The optimized arity-1 circuit follows this pipeline:

[0, input]
    |
ARK C[0..1]
    |
3 full rounds
    |
transition round with P
    |
56 sparse partial rounds
    |
3 full rounds
    |
final SBOX + MDS
    |
state[0]

This is the optimized schedule, so the Lean specification used by the final theorem is:

Specs.PoseidonOptimized.poseidon1Opt

Proving Small Pieces First

Trying to prove the whole circuit at once would be difficult: arity-1 Poseidon already has more than 400 intermediate witness variables, and the 56 partial rounds are enough to make naive unfolding painful for Lean. Instead, the proof follows a natural circuit decomposition strategy.

The proof has three layers of FormalCircuits:

Sigma                                       -- x ↦ x^5
FullRoundOpt_t2 / PartialRoundOptStep_t2    -- one round
ApplyFullRounds / ApplyPartialRoundsOpt     -- a phase of N rounds

Only Sigma is its own bottom-level circuit: it introduces witnesses for $x^2$, $x^4$, and $x^5$, and the soundness proof just rewrites the constraints and closes with ring arithmetic. The ARK and matrix arithmetic is inlined directly into the round-level circuits, since it's cheap enough that factoring it out would obscure the proof more than it would shorten it.

The round-level circuits are then built around Sigma:

FullRoundOpt_t2:        Sigma on both state elements, then ARK, then Mix
PartialRoundOptStep_t2: Sigma on the first element, ARK on the first element, then MixS

This gives the proof a modular shape. If a component is wrong, the local theorem for that component fails. If the component is correct, the next layer can use its specification instead of redoing the algebra every time.

The Hard Part: Composition

Once the round-level components are sound and complete, the challenge is to glue 56 partial rounds (and two full-round phases of 3 each) into one end-to-end proof. The naive approach is to write the whole pipeline as one expression and let Lean's simplifier unfold every fold. That does not scale: the resulting term is enormous, elaboration becomes slow and brittle, and the proof script becomes unreadable.

The technique we use is to wrap each loop in its own FormalCircuit. ApplyPartialRoundsOpt is a FormalCircuit whose main is a Circuit.foldl over the 56 per-round circuits. Its Spec says simply: "the output equals partialRoundsOpt_t2 C_t2 S_t2 56 10 0 input". ApplyFullRounds does the same for the 3-round full-round phases. Each loop becomes a self-contained gadget with its own soundness proof.

The benefit of this encapsulation is that the loop is hidden behind its Spec. The top-level Poseidon circuit composes a handful of these gadgets sequentially; it never has to see the underlying fold, and the composition between phases reduces to chaining equalities between their specs.

Inside ApplyPartialRoundsOpt.soundness, we still need an induction to go from "each partial round produces the spec's next state" to "after 56 rounds, the state matches partialRoundsOpt_t2". This is handled by a private induction lemma, together with an envState helper that recovers the state at each iteration from the witness environment. The induction is local to the loop circuit and is never re-run at the top level.

With the loops encapsulated this way, the top-level Poseidon1 proof is relatively short. After circuit_proof_start unpacks the constraint hypotheses, a focused simp unfolds each phase's Spec, and a final projection extracts state element 0. The linear composition of subcircuit specs is the proof.

What the Final Theorem Says

The final circuit is:

def circuit : FormalCircuit (F BN254_PRIME) field field where
  main := main
  localLength _ := 402
  Assumptions _ := True
  Spec input output :=
    output = Specs.PoseidonOptimized.poseidon1Opt input
  soundness := ...
  completeness := ...

The localLength value accounts for every witness variable:

initial ARK:        2   (InitialArk)
first full rounds: 24   (3 × 8 in ApplyFullRounds)
transition:         8   (FullRoundOpt_t2 with P)
partial rounds:   336   (56 × 6 in ApplyPartialRoundsOpt)
last full rounds:  24   (3 × 8 in ApplyFullRounds)
final round:        8   (FullRoundOpt_t2)
----------------------
total:            402

The soundness theorem starts from an arbitrary environment satisfying the constraints and ends with:

Expression.eval env ((main input_var).output offset)
  = Specs.PoseidonOptimized.poseidon1Opt input

The completeness theorem goes in the other direction: for any input, the circuit's honest witness generation satisfies the constraints. This is the core result. The Clean model of optimized arity-1 Poseidon computes exactly the optimized Poseidon specification.

The `sorry` in the Room

All of this takes place over the BN254 scalar field:

$$ p = 21888242871839275222246405745257275088548364400416034343698204186575808495617. $$

Lean needs to know that this number is prime. As discussed before, it is not enough to show it is a probable prime, such as in "Miller-Rabin passed on many bases", but prime as a theorem:

Fact (Nat.Prime BN254_PRIME)

This matters because the proof uses field reasoning. If the modulus were composite, ZMod p would not be a field. Algebraic facts that are valid in a field would no longer be available.

So the last sorry is crucial, but it is also independent of Poseidon. It is a number-theoretic certificate for a fixed 254-bit integer.

Lucas's Test

The classical Fermat test says that if $p$ is prime, then:

$$ a^{p-1} \equiv 1 \pmod p $$

for every $a$ not divisible by $p$. The converse is notably false. In fact, a whole (infinite) family of numbers, the so-called Carmichael numbers, can pass Fermat tests while being composite.

Lucas's test adds one crucial condition. If there exists an $a$ such that:

$a^{n-1} \equiv 1 \pmod n$,
for every prime factor $q$ of $n-1$, $a^{(n-1)/q} \not\equiv 1 \pmod n$,

then $n$ is prime.

The intuition relates to the order of elements in a group. The first condition says the order of $a$ divides $n-1$. The second says it does not divide any of the smaller candidates obtained by removing a prime factor. Together, they force the order of $a$ to be exactly $n-1$, and that can only happen when $n$ is prime. Mathlib contains this theorem as lucas_primality.

Pratt Certificates

Lucas's test seems to require knowing the prime factorization of $n-1$, including proofs that those factors are prime. Pratt's observation was one can use a recursive strategy in order to use Luca's test to proof primality.

To prove that $n$ is prime, provide:

a Lucas witness $a$,
the factorization of $n-1$,
Pratt certificates for each prime factor.

The recursion stops at the number 2, as in the following example.

Pratt certificate tree for 2063

This gives a compact proof object that a proof assistant can check. The relatively expensive work of finding the factorization and witnesses can happen offline. Lean only has to verify the certificate.

For BN254, the top-level factorization of $p-1$ is:

$$ p - 1 = 2^{28} \times 3 \times 13 \times 29 \times 983 \times 11003 \times 237073 \times 405928799 \times 1670836401704629 \times 13818066809895115352007386748515426880336692474882178609894547503. $$

The largest factor is itself large, so it needs its own certificate tree. But the structure is the same all the way down.

The Verified-zkEVM project maintains CompPoly, a Lean library for computable polynomials, which already contains a machine-checked Pratt certificate for the BN254 scalar field prime in its BN254 field formalization. That certificate can close the remaining sorry by providing the missing Fact (Nat.Prime BN254_PRIME) instance.

Pratt certificate sketch for BN254

At the time of writing, this last sorry is still present in our Clean development. The missing ingredient is not a new number-theoretic argument, since the certificate already exists. The remaining work is engineering work: bumping Clean's Lean toolchain to the 4.29 release that CompPoly targets, so the certificate can be imported and the sorry discharged. This work is already in progress.

What This Gives Us

After closing the primality certificate, the Lean proof establishes that the Clean arity-1 model is sound and complete for poseidon1Opt. The strongest remaining uncertainty is no longer inside the Lean proof. It is at the modeling boundary:

poseidon.circom source
        |
   hand-modeled in Clean
        |
Lean proof of soundness/completeness
        |
optimized Poseidon spec
        |
test-vector agreement with non-optimized spec and circomlibjs

The result is useful precisely because of this shift. The question is no longer "did we inspect 402 witness variables and dozens of round constants correctly?" (something which would be very hard for a human reader to confidently answer) but "does this Clean model match the Circom circuit it claims to model?", which an auditor can verify by reading both side by side.

What's Next

The current proof covers arity 1. Circomlib Poseidon is also commonly used at arities 2, 3, and 4, for example in binary Merkle trees and wider hash inputs.

The specs and test vectors for those arities already exist. The next step is to extend the circuit proofs. The same pattern should carry over: a small S-box circuit, round-level gadgets with inlined arithmetic, and phase-level loop circuits. The harder part is again the composition problem at larger state widths.

There are also two natural ways to strengthen the statement:

prove the optimized and non-optimized Poseidon specs equivalent for all inputs,
mechanically connect the Clean model to the original Circom source or compiler output.

The current result is a useful milestone: a machine-checked proof that a realistic optimized ZK hash circuit model computes exactly the function it claims to compute, with the last missing assumption reduced to a classical primality certificate.

Acknowledgements We thank Gregor Mitscha-Baude, Yoichi Hirai, and David Tedgui for their help with the Clean proof and for valuable comments on this blog post.

ZK/SEC Quarterly

Verifying Poseidon in Clean: Why the Last 'Sorry' Is About Primality

Why Poseidon Is Worth Verifying

Standard Poseidon vs. Optimized Poseidon

What We Proved

The Circuit Shape

Proving Small Pieces First

The Hard Part: Composition

What the Final Theorem Says

The `sorry` in the Room

Lucas's Test

Pratt Certificates

What This Gives Us

What's Next

Why Poseidon Is Worth Verifying

Standard Poseidon vs. Optimized Poseidon

What We Proved

The Circuit Shape

Proving Small Pieces First

The Hard Part: Composition

What the Final Theorem Says

The sorry in the Room

Lucas's Test

Pratt Certificates

What This Gives Us

What's Next

Share This Article

Latest Posts

Groth16, Intuitively

Soundness Failures in LaBRADOR Implementations from NTT -Friendly Rings

The Final Form of Software Development

Recommended Reading

Introducing Clean, a Formal Verification DSL for ZK Circuits in Lean4

More to Explore

Archetype X zkSecurity - Proof Is in the Pudding: ZK on Bitcoin

Unfolding the Bulletproofs Magic: a SageMath Deep Dive

A Challenge on the Jolt zkVM

The `sorry` in the Room