Barrett reduction is a widely used algorithm for reducing a value modulo 14%
performance improvement in scalar multiplication.
What Is Barrett Reduction?
Barrett reduction is a method for efficiently computing the remainder of a division operation (i.e., modulus operation),
We want to compute
One way to compute
Once
First, let’s rewrite the formula above as:
So far, our computation is exact. However, instead of computing
This allows us to precompute
Note: Both and are fast to compute using right shifts.
Since the two approximations can be smaller than their exact computations, we have
Computing in
If we had computed
Since
(where modulo
However, remember that we have computed an approximation
We know that either:
To compute
This means that we might not have
Since
Then, we can calculate
Again, the
We finally arrive at the algorithm as described in Handbook of Applied Cryptography, Chapter 14. The steps outlined above closely follow the process detailed in the book.
Source: Handbook of Applied Cryptography, Chapter 14
A Tighter Bound Analysis
The bound of
Remember that
Since
Let’s denote
Then we can simplify
We use
That is saying
The bound of
Prove that
We have:
We know that
And thus:
That’s exactly what we want. Therefore, we conclude that
Tighter Bound in Practice:
We have proved that
Let’s examine which moduli
Which means:
Further:
That’s saying if
It’s worth noting that
The behavior of
To quantify this probability, let’s assume the common case where
To put this in perspective:
- When
, the probability of achieving the tighter bound exceeds . - When
, the probability of achieving the tighter bound exceeds .
That is, for nearly all moduli in practice, the tighter bound
Here is an intuitive explanation:
Practical Optimization For Barrett Reduction Implementation
The tighter bound enables faster Barrett reduction implementation. According to traditional analysis, we may need to subtract
This is very useful for constant time implementation where it always substract the max times. For example, in RustCrypto P-256 scalar field implementation, it always subtracts twice.
pub(super) const fn barrett_reduce(lo: U256, hi: U256) -> U256 {
[...]
let r1 = [a0, a1, a2, a3, a4];
let r2 = q3_times_n_keep_five(&q3);
let r = sub_inner_five(r1, r2);
// Result is in range (0, 3*n - 1),
// and 90% of the time, no subtraction will be needed.
let r = subtract_n_if_necessary(r);
let r = subtract_n_if_necessary(r);
U256::new([r[0], r[1], r[2], r[3]])
}
As P-256 scalar field satisfies r
can be subtracted at most once to get the final result. The second subtract_n_if_necessary
call in the code above is unnecessary and can be safely removed.
Benchmarks show that simply removing the second subtraction improves multiplication and inversion performance by 14%
.
scalar operations/mul
time: [38.900 ns 38.957 ns 39.026 ns]
change: [-14.379% -14.052% -13.734%] (p = 0.00 < 0.05)
Performance has improved.
scalar operations/invert
time: [20.716 µs 20.758 µs 20.823 µs]
change: [-14.817% -14.331% -13.969%] (p = 0.00 < 0.05)
Performance has improved.
As analyzed in the previous section, the tighter bound holds for almost all moduli. Thus, this optimization applies to most of the Barrett reductions with fixed moduli (e.g., ECC, ZKP).
Here is the Python script to test if a modulus
# Tighter Bound Criterion for the Barret reduction
def tighter_bound_criterion(m):
def inner_test(m, b):
# k chosen such that b^{k-1} < m < b^k
k = 1
while b**k < m:
k += 1
print("k = ", k)
beta = b**(2*k) % m
# calculate criterion
return beta <= m - b**(k-1)
# test both b=2^32 and b=2^64
return inner_test(m, 2**32) and inner_test(m, 2**64)
# P-256 scalar field
assert(tighter_bound_criterion(0xffffffff00000000ffffffffffffffffbce6faada7179e84f3b9cac2fc632551) == True)
# P-256 base field
assert(tighter_bound_criterion(0xffffffff00000001000000000000000000000000ffffffffffffffffffffffff) == True)
Conclusion
Our analysis demonstrates that the error bound for the quotient approximation
We showed how this optimization applies to real-world cryptographic libraries, such as RustCrypto’s P-256 scalar field implementation, where removing the unnecessary subtraction improved performance significantly. These findings are broadly applicable to other cryptographic systems and zero-knowledge proof frameworks that rely on fixed moduli.