New Optimization Techniques for PlonK's Arithmetization (2022/462)

Abstract

PlonK is a universal and updatable zk-SNARK for general circuit satisfiability that allows a verifier to check the validity of a certain NP statement very efficiently, optionally in zero-knowledge. PlonK requires that the NP relation of interest be expressed as a system of so-called PlonK constraints. Such conversion is complex and can be implemented in various ways, having a great impact on the prover complexity (which is typically linearithmic in the number of PlonK constraints).

We propose several general results for simplifying PlonK constraint systems, which produce more compact but equivalent systems and can lead to significant performance improvements. We also develop an automated optimizer of constraints, based on our techniques, that can be used to construct very compact and less error-prone constraint systems, favoring a more auditable circuit design.

Finally, we demonstrate the potential of our techniques by implementing optimized constraint systems for the Poseidon hash, obtaining the most compact representations in the Turbo-PlonK model with minimal custom gates. En route, we devise a novel optimization idea for implementing Poseidon partial rounds and show that it can be applied to both simplifying SNARK circuits and achieving performance improvements in CPU implementations of the Poseidon hash.

1. Introduction

Succinct non-interactive arguments of knowledge (SNARKs) [GGPR13, BCG+13, PHGR13, Gro16] are a class of non-interactive arguments of knowledge systems with generally constant communication complexity and logarithmic verification complexity. This comes at the cost of a significantly slower prover, compared to other zero-knowledge proof systems with higher communication complexity [JKO13, GMO16, CGM16].

To be secure, SNARKs usually require a so-called structured reference string (SRS), usually comprising the successive powers of a secret element, which if known would break soundness. The generation of such SRS is usually performed collaboratively through a multi-party computation (MPC) protocol in so-called setup ceremonies [BGM17].

PlonK [GWC19], which stands for Permutations over Lagrange-bases for Oecumenical Non-interactive arguments of Knowledge, is a universal and updatable zero-knowledge SNARK for general circuit satisfiability. Given its significant improvements with respect to its predecessor Sonic [MBKM19], especially on prover efficiency, PlonK has become very popular and has been adopted by several state-of-the-art blockchain projects such as Zcash [HBHW22], Mina [BMRS20], the Dusk Network [MKF21] or Anoma [GYB21].

In PlonK’s original paper, circuits are expressed in terms of a 2 fan-in 1 fan-out parametric arithmetic gate over an algebraic field. The whole constraint system is transformed into polynomials which are then committed using a polynomial commitment scheme and evaluated on a random point during the proving process. Using Fast Fourier Transforms for the polynomial conversion and efficient polynomial multiplication, and computing the commitments with KZG [KZG10], the prover complexity is in O(kn \cdot \log n) scalar operations for the former and O(kn) group operations for the latter where k is the number of wires, and n the number of constraints.

The circuit is furthermore limited by the SRS size. As the polynomials are committed with the SRS, the number of constraints in a circuit is upper-bounded by the size of the SRS. A compact representation of circuits is paramount to make PlonK-based SNARKs more practical. This is the subject of this paper.

1.1 Applications of SNARKs

A number of blockchain projects have jumped on the opportunity of using zk-SNARKs, and leverage the unique properties that zero-knowledge proofs provide, for various applications. From identification to storage (Filecoin [Lab17]) as well as private transactions (Zcash [HBHW22]). Unlike other alternative ZK proving systems, zk-SNARKs enjoy extremely efficient verification times and very compact proofs, this makes them particularly suitable for on-chain validation.

Hash functions are an essential building block for blockchain applications, as they are the basis for Merkle-tree-based set accumulator schemes and signature schemes. Although SNARK-friendly hash functions can be implemented significantly more efficiently than their more standard counterparts, they are still one of the main bottlenecks [WS21, Zca21]. For these reasons, it is important to develop new techniques that allow us to model efficient and compact circuits of such primitives.

1.2 Our Contributions

Simplification of PlonK constraint systems. We propose generic optimization mechanisms to simplify Turbo-PlonK constraint systems. Our techniques can reduce the number of constraints and variables in the arithmetization, while preserving the satisfiability of the underlying system of polynomial equations induced by them. Our simplifications exploit the fact that Turbo-PlonK constraints can access the wires corresponding to the next constraint in the system.

Automated optimizer of constraints. We propose an automated mechanism for applying the above simplification techniques, abstracted out as simple rewriting rules (Section 3.2). This allows a user to design circuits in the original and less error-prone (but unoptimized) input/output abstraction based on arithmetic gates and then use the automated optimizer to benefit from the simplification techniques.

Optimized circuits for the Poseidon hash. We demonstrate the potential of our techniques by implementing optimized circuits for the Poseidon hash. En route, we devise a novel idea for implementing partial rounds, coined the linear skip (Section 4.2), that leverages the fact that partial rounds are almost linear functions, and the composition of linear functions is again a linear function.

Remark 1

This work is part of a library implemented by the Nomadic Labs’ Crypto Team [Nom22b], designed to develop private transactions (based on the Sapling protocol by Zcash [HBHW22]) and zero-knowledge rollups over the Tezos blockchain [Goo14]. The current proposal uses Merkle-trees as set accumulators and Schnorr signatures for validating the authenticity of transactions. Both primitives heavily rely on a secure hash function, which has been chosen to be Poseidon.

1.3 Related Work

Zero-knowledge arguments. Zero-knowledge arguments were introduced by Goldwasser, Micali and Rackoff in 1985 [GMR85]. They allow a prover to convince a verifier of the validity of a certain statement without revealing any other information, e.g., why the statement is true. A few years later, Blum, Feldman and Micali [BFM88] extended this notion and considered non-interactive zero-knowledge arguments (NIZK).

PlonK family. A number of projects has branched out since PlonK’s release. Following Turbo-PlonK, Plookup [GW20], in 2020, allows the use of look-up tables and facilitates range checks and queries with the use of new identities. Several polynomial commitment schemes, SHPlonk [BDFG20] and FFlonk [GW21a], released in 2020 and 2021, implicitly target PlonK. The Electric Coin Company changed their proving system from Groth16 to PlonK in 2020 when publishing Halo2 [BGH].

R1CS vs PlonK’s constraint system. Groth16 and PlonK are the two main proof systems used in the blockchain world. They differ in many points, including the universality of the setup, but especially in their arithmetization. PlonK uses an algebraic gate which corresponds to the equality of the addition and multiplication of two input wires with an output variable, while Groth16 uses Rank 1 Constraint Systems (R1CS).

Alternative efficient implementations of Poseidon. In the original Poseidon paper [GKR+19, Appendix B], the authors propose an optimization that exploits the structure of partial rounds and the fact that they only involve one S-box. This idea consists of rearranging the structure of the constant and linear layers in order to express the linear part of partial rounds as the multiplication by a very sparse matrix. This technique is similar to the proposed linear skip, where such linear part of partial rounds is replaced by the identity matrix at the cost of carrying a computation for the next layer.

2. Preliminaries

2.1 Notation

We consider bilinear groups (\mathbb{G}_1, \mathbb{G}_2, \mathbb{G}_t, e : \mathbb{G}_1 \times \mathbb{G}_2 \to \mathbb{G}_t) of prime order p. We use additive notation for all three groups. Throughout the paper, all polynomial equations will have coefficients over the scalar field \mathbb{Z}_p.

We use bold case for vectors. Given a vector \boldsymbol{x} over some n dimensional vector space, we denote by x_i its i-th coordinate, for all i \in [n]. We denote concatenation as \parallel, used as a separator of elements of the same type (typically constraints).

2.2 Hades Strategy

Confusion and diffusion [Sha49] are two key properties when building cryptographic primitives. One of the simplest ways to achieve both is to use a substitution-permutation network (SPNs). This approach has been recently revisited in the case of algebraic hash functions in an effort to reduce the number of permutations. These partial SPNs (P-SPNs) gave birth to Zorro in 2013, LowMC in 2015 and MiMC in 2016 to cite a few.

Grassi et al. presented in 2020 the HADES Design Strategy together with an analysis framework built upon the wide trail design strategy. The Hades strategy is organized in layers, with so-called partial rounds (P-SPNs) surrounded by full rounds (SPNs) to prevent statistical attacks.

2.3 Poseidon

Poseidon [GKR+21] is an algebraic hash function family presented by Grassi et al. in 2020 relying on the Hades Design strategy. In particular, Poseidon consists of a substitution-permutation network applied to a state with w registers. Such network combines so-called full rounds with partial rounds. In a full round the state is modified by adding a constant to each register element, applying a non-linear function (S-box) to each of them and a linear layer involving all the elements in the state. Partial rounds are similar, but only apply the non-linear S-box to one of the register elements.

The use of partial rounds aims at reducing the number of non-linear (and more expensive) operations. For this reason, it is particularly amenable for its implementation in R1CS constraint systems and PlonK-based ones and the Poseidon function family has been used in a number of projects, including Zcash [HBHW22], the Dusk Network [MKF21] and Mina [BMRS20].

2.4 SNARKs Arithmetizations

Most SNARKs model NP relations as arithmetic circuits that are represented by systems of multivariate polynomial equations. More concretely, let C^{\boldsymbol{x}}(\boldsymbol{w}) := R(\boldsymbol{x}, \boldsymbol{w}) be a circuit parameterized by a statement \boldsymbol{x}, which is satisfiable (on a certain input witness \boldsymbol{w}) iff \boldsymbol{x} belongs in the NP language induced by R.

The specific details about how arithmetic circuits are transformed into polynomial constraints are known as the system’s arithmetization. For example, Rank 1 Constraint Systems (R1CS) require that every polynomial f_i be such that it can be factored into three linear polynomials a_i, b_i, c_i as follows:

f_i(\boldsymbol{x}, \boldsymbol{w}) = a_i(\boldsymbol{x}, \boldsymbol{w}) \cdot b_i(\boldsymbol{x}, \boldsymbol{w}) - c_i(\boldsymbol{x}, \boldsymbol{w})

On the other hand, PlonK imposes the following restrictions: each polynomial f_i can involve at most N different variables, where N is the gate architecture size; and a polynomial f_i can only contain monomials of a restricted form, which may be extended with so-called custom gates [GW19].

2.5 PlonK’s Arithmetization

From PlonK constraint system. In the PlonK proof system [GWC19], a constraint is a tuple (\mathsf{a}, \mathsf{b}, \mathsf{c}, \mathsf{sels}) where \mathsf{a}, \mathsf{b}, \mathsf{c} are formal variables and \mathsf{sels} is a set of so-called selectors. In the original version of PlonK, the available selectors are q_L, q_R, q_O, adding the left, right and output wires (resp. \mathsf{a}, \mathsf{b}, \mathsf{c}), q_M, multiplying two wires (\mathsf{a} and \mathsf{b}), and q_C adding a public constant. Every constraint induces a polynomial equation:

q_{\mathsf{L}} \cdot \mathsf{a} + q_{\mathsf{R}} \cdot \mathsf{b} + q_{\mathsf{O}} \cdot \mathsf{c} + q_{\mathsf{M}} \cdot \mathsf{a} \cdot \mathsf{b} + q_{\mathsf{C}} = 0 \tag{1}

Such equation captures addition and multiplication gates. For example, an addition gate c = a + b can be modeled by setting q_L = q_R = 1, q_O = -1 and q_M = q_C = 0. And a multiplication gate c = ab can be modeled as q_M = 1, q_O = -1 and q_L = q_R = q_C = 0.

To a polynomial equation. NP relations are converted into algebraic circuits by modeling their satisfiability as sets of polynomial equations of the form (1), induced by PlonK constraints \{(\mathsf{a}_i, \mathsf{b}_i, \mathsf{c}_i, \mathsf{sels}_i)\}_{i \in [n]}. These constraints are then compiled into a single polynomial equation:

(Q_l \cdot A + Q_r \cdot B + Q_o \cdot C + Q_m \cdot A \cdot B + Q_c)(X) = 0 \tag{2}

where polynomials A, B, C encode the witness to the NP relation and Q polynomials define the circuit and may be precomputed. Let \omega \in \mathbb{Z}_p be a primitive 2^k-th root of unity, for some k such that n < 2^k. Polynomial Q_l is defined as the minimal polynomial that evaluates to q_{L_i} on \omega^i, for every i \in [n].

Extending to arbitrary polynomials: Turbo-PlonK. The Turbo-PlonK proposal is key in unlocking PlonK capabilities by allowing one to change PlonK’s constraint equation (1) or even append another one to it. With this technique, one can view the selectors as elementary gates that are added up to form an arithmetic identity. We can define new multivariate polynomials of any degree P(\mathsf{a}, \mathsf{b}, \mathsf{c}), a new selector q_P and add the expression q_P \cdot P(\mathsf{a}, \mathsf{b}, \mathsf{c}) to the identity. Another possibility offered by Turbo-PlonK is the addition of new identities of the form I(\mathsf{a}, \mathsf{b}, \mathsf{c}, \mathsf{sels}) = 0.

We can also extend the arithmetic identity by defining linear selectors referring to the next gate wires, representing identity:

q_{L} a_i + q_{R} b_i + q_{O} c_i + q_{M} a_i b_i + q_{C} + \widetilde{q}_{L} a_{i+1} + \widetilde{q}_{R} b_{i+1} + \widetilde{q}_{O} c_{i+1} = 0 \tag{4}

where a_{i+1}, b_{i+1}, c_{i+1} refer to the next-constraint wires. In the rest of this work, we consider all arithmetic selectors from (3), as well as an extra arithmetic selector q_{x^5} : \delta that adds an extra term \delta \, a_i^5 to the identity from (4). This selector will be used to model Poseidon S-boxes with just 1 constraint.

3. New Optimizations on PlonK’s Arithmetization

A Turbo-PlonK constraint is simply a PlonK constraint inducing (possibly) more than one polynomial equality.

Definition 1 (Turbo-PlonK Constraint)

A Turbo-PlonK constraint is a tuple (\mathsf{a}, \mathsf{b}, \mathsf{c}, \mathsf{sels}) where \mathsf{a}, \mathsf{b}, \mathsf{c} are formal variables and \mathsf{sels} is a set of selectors. A selector is a pair q : \alpha, where q is the selector’s name and \alpha \in \mathbb{Z}_p, its coefficient.

Definition 2 (Turbo-PlonK Constraint System)

A Turbo-PlonK constraint system is an ordered list of Turbo-PlonK constraints. There exists a function \mathcal{P} that maps each pair of consecutive constraints into a set of polynomial equations (each corresponding to a so-called identity), involving the formal variables of both constraints, but whose coefficients depend only on the selectors of the first.

Remark 2

One may ask why only the variables of a constraint and the next are involved in an equation. The reason is that accessing the next constraint variables comes “for free”. Indeed, PlonK’s permutation argument requires evaluating the permutation polynomials on \xi, but also on \xi\omega, which corresponds to the next constraint values. Accessing further values, e.g. two constraints away, is possible but would incur the additional cost of opening polynomial commitments at a new evaluation point, e.g. \xi\omega^2, making these gates less practical.

3.1 Optimizing Turbo-PlonK Constraint Systems

Shared wires across constraints. Two linear combinations that involve common wires can be implemented more efficiently if they are handled together. For example, a linear combination of the form \mathsf{out} = \alpha_1 x + \alpha_2 y + \alpha_3 z would require two constraints to be implemented (over a 3-wires architecture). Now, imagine we needed to encode an extra linear combination that depends on the same x, y, z, say \mathsf{out}' = \beta_1 x + \beta_2 y + \beta_3 z. We could model both linear combinations with 3 constraints instead of 4 by sharing wires across constraints using next-constraint selectors.

Merging constraints. The use of next-constraint selectors can lead to constraints that have an empty set of selectors whose only purpose is to include wires that are used by the previous constraint. Such constraints can be combined with other constraints if the wires are not in conflict. For example, two groups of three constraints as above can be combined into simply one group of five.

Gaussian elimination. It is advantageous to view constraints as the polynomial equalities that they induce and not think about what wires represent inputs and what wires represent outputs. That way, we can transform the polynomial equalities into an equivalent set of equalities that can be represented with simpler or fewer constraints. For example, imagine we want to compute r := 8x^5 + 2y^5 and s := 4x^5 - 3y^5. The naive approach would require introducing two auxiliary variables and 4 Turbo-PlonK constraints:

(x, \_, \mathsf{aux}_1, \{q_{x^5}: 1, q_O: -1\}) \quad (y, \_, \mathsf{aux}_2, \{q_{x^5}: 1, q_O: -1\})

(\mathsf{aux}_1, \mathsf{aux}_2, r, \{q_L: 8, q_R: 2, q_O: -1\}) \quad (\mathsf{aux}_1, \mathsf{aux}_2, s, \{q_L: 4, q_R: -3, q_O: -1\})

Alternatively, we could simplify the polynomial equalities by considering linear combinations between them:

\begin{cases} 8x^5 + 2y^5 - r = 0 \\ 4x^5 - 3y^5 - s = 0 \end{cases} \iff \begin{cases} 32x^5 - 3r - 2s = 0 \\ -8y^5 - 2s + r = 0 \end{cases}

Now, it is possible to model both equations with just two constraints, avoiding auxiliary variables:

(x, r, s, \{q_{x^5}: 32, q_R: -3, q_O: -2\}) \quad (y, r, s, \{q_{x^5}: -8, q_R: 1, q_O: -2\})

3.2 Automated Optimizer of Constraints

We propose an automated mechanism for applying the techniques presented in the previous section in order to simplify constraint systems. This method is less error-prone and can lead to more effective optimizations than the manual application of the simplification rules, e.g., by combining separated parts of the circuit that are compatible.

Given a list of constraints \Gamma := ((\mathsf{a}_1, \mathsf{b}_1, \mathsf{c}_1, \mathsf{sels}_1), \ldots, (\mathsf{a}_n, \mathsf{b}_n, \mathsf{c}_n, \mathsf{sels}_n)), we say \Gamma is admissible if its last constraint does not contain next-constraint selectors. We define \mathcal{P}(\Gamma) as the set:

\{q_{L\,i} \mathsf{a}_i + q_{R\,i} \mathsf{b}_i + q_{O\,i} \mathsf{c}_i + q_{M\,i} \mathsf{a}_i \mathsf{b}_i + q_{C\,i} + \widetilde{q}_{L\,i} \mathsf{a}_{i+1} + \widetilde{q}_{R\,i} \mathsf{b}_{i+1} + \widetilde{q}_{O\,i} \mathsf{c}_{i+1}\}_{i \in [n]}

Definition 3 (Satisfiability)

We say an extended constraint system (\Gamma, L), with N variables, is satisfiable if there exists an evaluation \boldsymbol{x} \in \mathbb{Z}_p^N such that f(\boldsymbol{x}) = 0 for every polynomial f \in \mathcal{P}(\Gamma) \cup L.

The rewriting rules are described as follows. The Constraint Rules (Figure 2) include: Permute (rearranging next-constraint wires by any permutation \sigma \in S_3), Reduce (removing trivial next-constraint selectors with zero coefficients), and Combine constraints (merging an empty-selector constraint with an adjacent one if wires are compatible).

The Optimizer Rules (Figure 3) include: Collect linear (moving purely linear constraints into a set L of linear polynomials), Free variable (substituting a variable that does not appear in \Gamma using a linear equation from L), Efficient sum (encoding a linear equation over up to 6 variables using a single constraint with next-constraint selectors), Auxiliary variable (splitting a large linear equation into smaller parts using a fresh variable), and Constraint rule (applying any of the constraint rules to a constraint within the system).

Theorem 1

Let (\Gamma, L) be an admissible extended constraint system and let (\Gamma', L') be the result of applying any of the rules from Figure 3 to (\Gamma, L). Then, (\Gamma', L') is admissible. Furthermore, (\Gamma, L) is satisfiable if and only if (\Gamma', L') is satisfiable.

The optimizer applies the rules heuristically as follows: (1) It first applies collect linear on all constraints if possible. (2) It then applies the free variable rule on all polynomials in L that contain at most two non-constant monomials. (3) It applies the auxiliary variable rule to partition polynomials with fewer than 2N monomials. (4) It moves the resulting polynomials from L to \Gamma by applying efficient sum one by one.

The optimizer is particularly efficient for optimizing circuits that involve linear computations. For example, the naive implementation of the Poseidon circuit with w = 3 and N = 3, using an S-box custom gate requires 464 constraints. It can be reduced to just 272 constraints after the optimizer. The prototype implementation is open source [Nom22c].

4. Optimizing Poseidon Hash

4.1 Optimized Implementation of Poseidon

We present an optimized Turbo-PlonK modelization of Poseidon for w = 3. Our techniques can be generalized to other versions. We model (shifted) full rounds with 4 constraints where the last one has no selectors and is compatible with its successor. On the other hand, we can leverage the linear skip and model 4 (shifted) partial rounds with just 7 constraints where the last is compatible with its successor. Consequently we are effectively modeling every full round with 3 constraints and every partial round with 1.5 constraints.

Modelling (shifted) full rounds. Modelling a shifted full round involves capturing the constraints \mathbf{y} = M\mathbf{x}^5 + \boldsymbol{\kappa}, or equivalently M^{-1}\mathbf{y} - \mathbf{x}^5 - M^{-1}\boldsymbol{\kappa} = 0. Let M^{-1} = (\beta_{ij}). This can be modeled with 4 constraints, where the last constraint (with no selectors) is compatible with the first constraint of the next shifted full round. The last shifted full round of the first block of full rounds is also compatible with the first shifted partial round.

Modelling (shifted) partial rounds. Ignoring the constraint coefficients and focusing on the variables that are involved, modeling 4 nested shifted partial rounds requires asserting six linear relations. Gaussian elimination can be applied to construct an equivalent system of six equations with at most one power of five per constraint. The last constraint with no selectors starts with y_3, making it compatible with the subsequent partial round.

As detailed in Table 1, the techniques presented above allow modeling one iteration of the Poseidon strategy with 3 constraints per full round, plus 6 constraints per block of 4 partial rounds and two extra constraints. This gives a total of 1.5R_P + 3R_F + 2 constraints with an architecture of N = 3 wires per gate.

4.2 Linear Skip

A naive implementation of Poseidon Hash would require including constraints that model w additions (of w terms) for every (full or partial) round; and constraints modelling w S-boxes (per full round) plus one S-box per partial round.

We propose an optimization, coined the linear skip, which can be used to reduce the number of constraints modelling partial rounds. In a nutshell, we leverage the fact that the composition of linear functions is again a linear function in order to “skip” the evaluation of certain wires.

Consider two Poseidon partial rounds transforming state (x_1, \ldots, x_w) into state (z_1, \ldots, z_w). The linear trick consists of ignoring certain intermediate equations, not creating variables nor constraints for intermediate wires y_i and c_i for any i \in [w-1]. (Note that we must keep a variable for c_w, which is the input to an S-box.) This way, the constraints modelling a pair of partial rounds can be reduced to:

\forall i \in [\mathbf{w}].\; a_i = x_i + \kappa_i

c_{\mathbf{w}} = \sum_{j=1}^{\mathbf{w}-1} \alpha_{\mathbf{w}j} a_j + \alpha_{\mathbf{w}\mathbf{w}} b_{\mathbf{w}} + \hat{\kappa}_{\mathbf{w}}

b_{\mathbf{w}} = a_{\mathbf{w}}^5 \qquad d_{\mathbf{w}} = c_{\mathbf{w}}^5 \qquad \forall i \in [\mathbf{w}].\; z_i = \sum_{j=1}^{\mathbf{w}-1} \alpha'_{ij} a_j + \alpha'_{i\mathbf{w}} b_{\mathbf{w}} + \alpha_{i\mathbf{w}} d_{\mathbf{w}} + \kappa'_i

where the coefficients \alpha'_{ij} and \kappa'_i are the result of composing two linear functions. In particular:

\forall i,j \in [\mathbf{w}].\; \alpha'_{ij} := \sum_{k=1}^{\mathbf{w}-1} \alpha_{ik} \alpha_{kj} \qquad \forall i \in [\mathbf{w}].\; \kappa'_i := \sum_{k=1}^{\mathbf{w}-1} \alpha_{ik} \hat{\kappa}_i

That is, M' = (\alpha'_{ij}) is the result of multiplying the submatrix of M without the last column by the submatrix of M without the last row. Instead of having to perform 2w additions of w terms to model the pair of partial rounds, the linear skip allows us to model them with simply 1 addition of w terms and w additions of w+1 terms.

Multiple skips. The above technique can be generalized to skipping multiple partial rounds. However, skipping too many rounds can be counterproductive, because the required sums become heavier as the number of skips increases. In particular, skipping n partial rounds would require 1 addition of w+i terms for every i = 0, \ldots, n-1; plus w extra additions of w+n terms. The number of additions necessary to model n+1 partial rounds when skipping n of them is:

t_{n,w} := w(w + n - 1) + \sum_{i=1}^{n-1} (w + i - 1) \tag{14}

The optimal number of skips depends on the register size w, but also on the constraint system being used, since some systems can handle several additions per constraint.

5. Results

5.1 Optimized PlonK Circuits for Poseidon

Table 1 presents a comparison with respect to the number of constraints and prover cost of the optimized circuits for Poseidon and other existing references. The most popular register sizes of w = 3 and w = 5 and three different Turbo-PlonK models are considered: (i) plain PlonK with no extra selectors, (ii) one extra selector, q_{x^5}, for computing S-boxes in just one constraint, and (iii) q_{x^5} combined with extra linear selectors that point to the next constraint.

Key results from Table 1 for w = 3, N = 3 with R_F = 8, R_P = 56: the estimated count [GKK+19] requires 624 constraints; using an S-box custom gate [GW19] reduces this to 464; applying a linear 1-skip plus the optimizer yields 216 constraints; and the full Section 4.1 approach plus the optimizer achieves 110 constraints — a reduction of over 82% from the baseline.

For N = 4, the same techniques achieve as few as 98 constraints with w = 3 and 173 constraints with w = 5. Other projects such as Zcash [Zca21] define specialized custom gates allowing R_P + R_F + 1 constraints (65 constraints) and Mina [WS21] uses N = 15 wires per gate with very specialized custom gates. These models introduce significant overhead on both prover and verifier, but the reduction in constraints pays off.

5.2 Optimized Poseidon on CPU

The linear skip essentially decreases the number of additions and multiplications in partial rounds, and it can be exploited to build a very efficient implementation of Poseidon on CPU. An OCaml script is provided, available in Mec [Nom21], that computes the number of additions and multiplications for a given instance of Poseidon and a given number of skipped partial rounds.

A C implementation of HADES over the scalar field of BLS12-381 is available under the MIT license in [Nom22a]. The implementation supports any security parameters and any number of skipped rounds. It uses the library blst [Sup21] as a backend for arithmetic operations.

For w = 3, R_F = 8, R_P = 56: the baseline (no skip) requires 579 additions and 816 multiplications at 16.03 \mus; the optimal 2-skip requires 486 additions and 723 multiplications at 13.97 \mus — a 13% reduction. For w = 5, R_F = 8, R_P = 59: the baseline requires 1680 additions and 1972 multiplications at 40.14 \mus; the optimal 5-skip requires 1041 additions and 1333 multiplications at 26.06 \mus — a 35% reduction.

6. Conclusions

We have shown how simple, yet powerful, optimization techniques make it possible to express circuits in the most basic Turbo-PlonK model very compactly, whose number of constraints is comparable to the one obtained through more complex and expensive custom gates. Our automated optimizer of constraints, implemented in a prototype which is publicly available as open source, allows users to focus on the circuit design and obtain efficient implementations while avoiding error-prone and complex manual optimizations. This way, the construction of circuits can be more transparent and easier to audit.

Our techniques have led to the most efficient and compact implementations of the Poseidon circuit in the Turbo-PlonK model with basic custom gates such as q_{x^5} and next-constraint linear selectors. Other projects, such as the Dusk Network could benefit from our techniques (in particular the linear skip) to reduce the number of constraints of their Hades implementation [Dus21] by at least a factor of 25% for free (without any modifications of their model or additional custom gates). Furthermore, we believe our techniques nicely complement other works that aim at reducing the number of constraints by exploring additional PlonK identities and more complex custom gates [WS21, Zca21].

An appealing direction for future work would be to explore additional heuristics and optimizer rules that can capture the most advanced optimizations presented in Section 4. Also, it would be very interesting to explore whether these techniques can be extended to other arithmetizations.

Acknowledgments

We are very thankful to Marc Beunardeau, Victor Dumitrescu, Antonio Locascio, Marina Polubelova and Marco Stronati for very fruitful discussions about this project and for all their help and feedback.

References

[ALSZ21] Behzad Abdolmaleki, Helger Lipmaa, Janno Siim, and Michał Zając. “On Subversion-Resistant SNARKs”. In: Journal of Cryptology, 34(3):1–42, 2021.
[Azt20] Aztec. Ignition ceremony, 2020. ignition.aztecprotocol.com.
[BBB+18] Benedikt Bünz et al. “Bulletproofs: Short Proofs for Confidential Transactions and More”. In: 2018 IEEE Symposium on Security and Privacy, pp. 315–334. IEEE, May 2018.
[BCG+13] Eli Ben-Sasson et al. “SNARKs for C: Verifying Program Executions Succinctly and in Zero Knowledge”. In: CRYPTO 2013, Part II. Vol. 8043. LNCS. Springer, Aug. 2013, pp. 90–108.
[BCG+14] Eli Ben-Sasson et al. “Zerocash: Decentralized Anonymous Payments from Bitcoin”. In: 2014 IEEE Symposium on Security and Privacy, pp. 459–474. IEEE, May 2014.
[BCG+17] Jonathan Bootle et al. “Linear-Time Zero-Knowledge Proofs for Arithmetic Circuit Satisfiability”. In: ASIACRYPT 2017, Part III. Vol. 10626. LNCS. Springer, Dec. 2017, pp. 336–365.
[BCL+20] Tim Beyne et al. “On the Security of the Rescue Hash Function”. Cryptology ePrint Archive, Report 2020/820, 2020. eprint.iacr.org/2020/820. [page on this site]
[BCMS20] Benedikt Bünz et al. “Recursive Proof Composition from Accumulation Schemes”. In: TCC 2020, Part II. Vol. 12551. LNCS. Springer, Nov. 2020, pp. 1–18.
[BCR+19] Eli Ben-Sasson et al. “Aurora: Transparent Succinct Arguments for R1CS”. In: EUROCRYPT 2019, Part I. Vol. 11476. LNCS. Springer, May 2019, pp. 103–128.
[BDFG20] Dan Boneh et al. “Efficient Polynomial Commitment Schemes for Multiple Points and Polynomials”. Cryptology ePrint Archive, Report 2020/081, 2020. eprint.iacr.org/2020/081.
[BDFG21] Dan Boneh et al. “Halo Infinite: Proof-Carrying Data from Additive Polynomial Commitments”. In: CRYPTO 2021, Part I. Vol. 12825. LNCS. Springer, Aug. 2021, pp. 649–680.
[BFLS91] László Babai et al. “Checking Computations in Polylogarithmic Time”. In: 23rd ACM STOC, pp. 21–31. ACM Press, May 1991.
[BFM88] Manuel Blum, Paul Feldman, and Silvio Micali. “Non-Interactive Zero-Knowledge and Its Applications (Extended Abstract)”. In: 20th ACM STOC, pp. 103–112. ACM Press, May 1988.
[BGH] Sean Bowe, Jack Grigg, and Daira Hopwood. Halo2 (2020). URL: github.com/zcash/halo2.
[BGH19] Sean Bowe, Jack Grigg, and Daira Hopwood. “Halo: Recursive Proof Composition without a Trusted Setup”. Cryptology ePrint Archive, Report 2019/1021, 2019. eprint.iacr.org/2019/1021. [page on this site]
[BGK+21] Mario Barbara et al. “Reinforced Concrete: Fast Hash Function for Zero Knowledge Proofs and Verifiable Computation”. Cryptology ePrint Archive, Report 2021/1038, 2021. eprint.iacr.org/2021/1038. [page on this site]
[BGM17] Sean Bowe, Ariel Gabizon, and Ian Miers. “Scalable Multi-Party Computation for zk-SNARK Parameters in the Random Beacon Model”. Cryptology ePrint Archive, Report 2017/1050, 2017. eprint.iacr.org/2017/1050. [page on this site]
[BMRS20] Joseph Bonneau et al. “Mina: Decentralized Cryptocurrency at Scale”, 2020. Whitepaper. docs.minaprotocol.com.
[CDS94] Ronald Cramer, Ivan Damgård, and Berry Schoenmakers. “Proofs of Partial Knowledge and Simplified Design of Witness Hiding Protocols”. In: CRYPTO’94. Vol. 839. LNCS. Springer, Aug. 1994, pp. 174–187.
[CGM16] Melissa Chase, Chaya Ganesh, and Payman Mohassel. “Efficient Zero-Knowledge Proof of Algebraic and Non-Algebraic Statements with Applications to Privacy Preserving Credentials”. In: CRYPTO 2016, Part III. Vol. 9816. LNCS. Springer, Aug. 2016, pp. 499–530.
[CHM+20] Alessandro Chiesa et al. “Marlin: Preprocessing zkSNARKs with Universal and Updatable SRS”. In: EUROCRYPT 2020, Part I. Vol. 12105. LNCS. Springer, May 2020, pp. 738–768.
[Com16] Electric Coin Company. “The Design of the Ceremony”, 2016. electriccoin.co.
[Cra97] Ronald Cramer. “Modular Design of Secure Yet Practical Cryptographic Protocols”. 1997.
[DFKP13] George Danezis et al. “Pinocchio Coin: Building Zerocoin from a Succinct Pairing-Based Proof System”. In: PETShop ’13, pp. 27–30. ACM, 2013.
[DKP+19] Itai Dinur et al. “Linear Equivalence of Block Ciphers with Partial Non-Linear Layers: Application to LowMC”. In: EUROCRYPT 2019, Part I. Vol. 11476. LNCS. Springer, May 2019, pp. 343–372.
[DMP90] Alfredo De Santis, Silvio Micali, and Giuseppe Persiano. “Non-Interactive Zero-Knowledge with Preprocessing”. In: CRYPTO’88. Vol. 403. LNCS. Springer, Aug. 1990, pp. 269–282.
[Dus21] Dusk Network. Hades252, 2021. github.com/dusk-network/Hades252.
[EGL+20] Maria Eichlseder et al. “An Algebraic Attack on Ciphers with Low-Degree Round Functions: Application to Full MiMC”. In: ASIACRYPT 2020, Part I. Vol. 12491. LNCS. Springer, Dec. 2020, pp. 477–506.
[Fil20] Filecoin. “Trusted Setup Complete!”, 2020. filecoin.io.
[FLS90] Uriel Feige, Dror Lapidot, and Adi Shamir. “Multiple Non-Interactive Zero Knowledge Proofs Based on a Single Random String (Extended Abstract)”. In: 31st FOCS, pp. 308–317. IEEE, Oct. 1990.
[GGPR13] Rosario Gennaro et al. “Quadratic Span Programs and Succinct NIZKs without PCPs”. In: EUROCRYPT 2013. Vol. 7881. LNCS. Springer, May 2013, pp. 626–645.
[GKK+19] Lorenzo Grassi et al. “Starkad and Poseidon: New Hash Functions for Zero Knowledge Proof Systems”. Cryptology ePrint Archive, Report 2019/458, 2019. eprint.iacr.org/2019/458. [page on this site]
[GKM+18] Jens Groth et al. “Updatable and Universal Common Reference Strings with Applications to zk-SNARKs”. In: CRYPTO 2018, Part III. Vol. 10993. LNCS. Springer, Aug. 2018, pp. 698–728.
[GKR+19] Lorenzo Grassi et al. “Poseidon: A New Hash Function for Zero-Knowledge Proof Systems”. Cryptology ePrint Archive, Report 2019/458, 2019. ia.cr/2019/458. [page on this site]
[GKR+21] Lorenzo Grassi et al. “Poseidon: A New Hash Function for Zero-Knowledge Proof Systems”. In: USENIX Security 2021, pp. 519–535. USENIX Association, Aug. 2021.
[GMO16] Irene Giacomelli, Jesper Madsen, and Claudio Orlandi. “ZKBoo: Faster Zero-Knowledge for Boolean Circuits”. In: USENIX Security 2016, pp. 1069–1083. USENIX Association, Aug. 2016.
[GMR85] Shafi Goldwasser, Silvio Micali, and Charles Rackoff. “The Knowledge Complexity of Interactive Proof-Systems (Extended Abstract)”. In: 17th ACM STOC, pp. 291–304. ACM Press, May 1985.
[Goo14] L.M. Goodman. “Tezos: A Self-Amending Crypto-Ledger”, 2014. tezos.com/whitepaper.pdf.
[Gro16] Jens Groth. “On the Size of Pairing-Based Non-Interactive Arguments”. In: EUROCRYPT 2016, Part II. Vol. 9666. LNCS. Springer, May 2016, pp. 305–326.
[GS08] Jens Groth and Amit Sahai. “Efficient Non-Interactive Proof Systems for Bilinear Groups”. In: EUROCRYPT 2008. Vol. 4965. LNCS. Springer, Apr. 2008, pp. 415–432.
[GW19] Ariel Gabizon and Zachary J. Williamson. “The Turbo-PlonK Program Syntax for Specifying SNARK Programs”, 2019. Preprint. docs.zkproof.org.
[GW20] Ariel Gabizon and Zachary J. Williamson. “plookup: A Simplified Polynomial Protocol for Lookup Tables”. Cryptology ePrint Archive, Report 2020/315, 2020. eprint.iacr.org/2020/315. [page on this site]
[GW21a] Ariel Gabizon and Zachary J. Williamson. “fflonk: A Fast-Fourier Inspired Verifier Efficient Version of PlonK”. Cryptology ePrint Archive, Report 2021/1167, 2021. eprint.iacr.org/2021/1167.
[GW21b] Ariel Gabizon and Zachary J. Williamson. “Public Inputs in PlonK’s Permutation Argument”, 2021. github.com/arielgabizon/plonk-addendum.
[GWC19] Ariel Gabizon, Zachary J. Williamson, and Oana Ciobotaru. “PLONK: Permutations over Lagrange-Bases for Oecumenical Noninteractive Arguments of Knowledge”. Cryptology ePrint Archive, Report 2019/953, 2019. eprint.iacr.org/2019/953. [page on this site]
[GYB21] Christopher Goes, Awa Sun Yin, and Adrian Brink. “Anoma: Undefining Money”, 2021. Whitepaper. anoma.network.
[HBHW22] Daira Hopwood et al. “Zcash Protocol Specification”, 2022. Documentation. zips.z.cash/protocol/protocol.pdf.
[HGB21] Ulrich Haböck, Alberto Garoffolo, and Daniele Di Benedetto. “Darlin: Recursive Proofs Using Marlin”. Cryptology ePrint Archive, Report 2021/930, 2021. eprint.iacr.org/2021/930.
[JKO13] Marek Jawurek, Florian Kerschbaum, and Claudio Orlandi. “Zero-Knowledge Using Garbled Circuits: How to Prove Non-Algebraic Statements Efficiently”. In: ACM CCS 2013, pp. 955–966. ACM Press, Nov. 2013.
[KZG10] Aniket Kate, Gregory M. Zaverucha, and Ian Goldberg. “Constant-Size Commitments to Polynomials and Their Applications”. In: ASIACRYPT 2010. Vol. 6477. LNCS. Springer, Dec. 2010, pp. 177–194.
[Lab17] Protocol Labs. “Filecoin: A Decentralized Storage Network”, 2017. filecoin.io/filecoin.pdf.
[LMR19] Russell W. F. Lai, Giulio Malavolta, and Viktoria Ronge. “Succinct Arguments for Bilinear Group Arithmetic: Practical Structure-Preserving Cryptography”. In: ACM CCS 2019, pp. 2057–2074. ACM Press, Nov. 2019.
[MBKM19] Mary Maller et al. “Sonic: Zero-Knowledge SNARKs from Linear-Size Universal and Updatable Structured Reference Strings”. In: ACM CCS 2019, pp. 2111–2128. ACM Press, Nov. 2019.
[MKF21] Toghrul Maharramov, Dmitry Khovratovich, and Emanuele Francioni. “The Dusk Network Whitepaper”, 2021. dusk.network.
[Nom21] Nomadic Labs. Mec: Mini Elliptic Curve Library, 2021. gitlab.com/dannywillems/ocaml-ec.
[Nom22a] Nomadic Labs. C Implementation of Poseidon, 2022. gitlab.com/dannywillems/ocaml-bls12-381.
[Nom22b] Nomadic Labs’ Crypto Team. Library for ZK Proofs and ZK Rollups, 2022. gitlab.com/nomadic-labs/privacy-team.
[Nom22c] Nomadic Labs’ Crypto Team. Optimizer of PlonK-Constraints, 2022. gitlab.com/nomadic-labs/privacy-team/optimizer.ml.
[PFM+22] Luke Pearson et al. “Plonkup: Reconciling PlonK with Plookup”. Cryptology ePrint Archive, Report 2022/086, 2022. ia.cr/2022/086.
[PHGR13] Bryan Parno et al. “Pinocchio: Nearly Practical Verifiable Computation”. In: 2013 IEEE Symposium on Security and Privacy, pp. 238–252. IEEE, May 2013.
[pro20] Mina Protocol. “A More Efficient Approach to Zero Knowledge for PlonK”, 2020.
[pro22] Mir Protocol. Plonky2, 2022.
[Sch91] Claus-Peter Schnorr. “Efficient Signature Generation by Smart Cards”. In: Journal of Cryptology, 4(3):161–174, Jan. 1991.
[Sha49] C. E. Shannon. “Communication Theory of Secrecy Systems”. In: The Bell System Technical Journal, 28(4):656–715, 1949.
[Sup21] Supranational. blst (pronounced ‘blast’) is a BLS12-381 signature library focused on performance and security, 2021. github.com/supranational/blst.
[WS21] David Wong and Joseph Spadavecchia. “Poseidon Gate (Mina Protocol)”, 2021. Kimchi Specification.
[Zca21] Zcash. Poseidon128, 2021. github.com/zcash/halo2.

Appendix A: Proofs of the Main Body

Proof of Theorem 1

Let (\Gamma, L) be an admissible extended constraint system and let (\Gamma', L') be the result of applying any of the rules from Figure 3 to (\Gamma, L). Then, (\Gamma', L') is admissible. Furthermore, (\Gamma, L) is satisfiable if and only if (\Gamma', L') is satisfiable.

The proof focuses on each rule individually:

Collect linear. Since the last constraint of \Gamma_0 does not contain next-constraint selectors, we have that \mathcal{P}(\Gamma_0 \parallel (x_1, x_2, x_3, \{q_L : \alpha_1, q_R : \alpha_2, q_O : \alpha_3, q_C : \alpha_0\}) \parallel \Gamma_1) is equal to \mathcal{P}(\Gamma_0) \cup \mathcal{P}(\Gamma_1) \cup \{\alpha_0 + \sum_{i=1}^3 \alpha_i x_i = 0\}, and \mathcal{P}(\Gamma_0 \parallel \Gamma_1) = \mathcal{P}(\Gamma_0) \cup \mathcal{P}(\Gamma_1), from which the result follows immediately.

Free variable. Say \boldsymbol{x} is a satisfying assignment for (\Gamma, L). Since L contains equation \alpha_0 + \sum_{i \in S} \alpha_i x_i = 0 with \alpha_j \neq 0, and it is satisfied by \boldsymbol{x}, it must be x_j = -(\alpha_0 + \sum_{i \in S \setminus \{j\}} \alpha_i x_i) \div \alpha_j. So \boldsymbol{x} must also be a satisfying assignment for (\Gamma, L[x_j \mapsto -(\alpha_0 + \sum_{i \in S \setminus \{j\}} \alpha_i x_i) \div \alpha_j]). On the other hand, if \boldsymbol{x}' is a satisfying assignment for (\Gamma', L'), because x_j is not in \Gamma' (and not in L'), we have that \boldsymbol{x}' modified so that x_j := -(\alpha_0 + \sum_{i \in S \setminus \{j\}} \alpha_i x_i) \div \alpha_j is a satisfying assignment for the original system.

Efficient sum. Since \Gamma is admissible, its last constraint does not include next-constraint selectors. In that case, \mathcal{P}(\Gamma) \cup L equals \mathcal{P}(\Gamma') \cup L' as desired.

Auxiliary variable. If \boldsymbol{x} is a satisfying assignment of t_1 + t_2 = 0, then \boldsymbol{x}, s \mapsto t_1 is a satisfying assignment for t_1 - s = 0 and s + t_2 = 0. On the other hand, if t_1 - s = 0 and s + t_2 = 0 are satisfiable at the same time, by adding both equations, t_1 + t_2 = 0 must also be satisfiable.

Constraint rule. Note that L = L' and for any t \in \mathsf{ConstraintRules} we have \mathcal{P}(\mathsf{g} \parallel \Gamma_1) = \mathcal{P}(t(\mathsf{g}) \parallel \Gamma_1) which implies that \mathcal{P}(\Gamma_0 \parallel \mathsf{g} \parallel \Gamma_1) = \mathcal{P}(\Gamma_0 \parallel t(\mathsf{g}) \parallel \Gamma_1) given that \Gamma_0 is admissible, as desired.

New Optimization Techniques for PlonK’s Arithmetization

Abstract

1. Introduction

1.1 Applications of SNARKs

1.2 Our Contributions

1.3 Related Work

2. Preliminaries

2.1 Notation

2.2 Hades Strategy

2.3 Poseidon

2.4 SNARKs Arithmetizations

2.5 PlonK’s Arithmetization

3. New Optimizations on PlonK’s Arithmetization

3.1 Optimizing Turbo-PlonK Constraint Systems

3.2 Automated Optimizer of Constraints

4. Optimizing Poseidon Hash

4.1 Optimized Implementation of Poseidon

4.2 Linear Skip

5. Results

5.1 Optimized PlonK Circuits for Poseidon

5.2 Optimized Poseidon on CPU

6. Conclusions

Acknowledgments

References

Appendix A: Proofs of the Main Body

History