Sound and fast gas monitoring? Let’s use saturation arithmetic!

Introduction: we got gas

In Tezos, as with most smart contract platforms, on-chain operations cost gas — a theoretical resource intended to reflect (and so limit) the on-chain computational cost of running a smart contract.

The gas model allocates gas costs to atomic computation steps. When a computation starts it receives some finite allocation of gas, from which the gas cost of each of its atomic computations is deducted as the computation runs, step by step. If the gas allocation is exhausted, the computation is deemed to have become too expensive and is aborted.¹

Three design considerations for a gas system are:

The gas model should be accurate.

We don’t want easy computations to get terminated unnecessarily for lack of gas; and conversely, we don’t want the gas model to allow an attacker to run expensive computations whose cost is undetected by the gas model, thus opening a potential DoS attack surface.

Getting this right is an art: the recent Tezos Delphi protocol upgrade, for example, finessed gas costs significantly.
The gas model should be computed (reasonably) correctly.

That is, the arithmetic of gas computations should be correctly performed. This sounds basic, but as we shall see, there is subtlety to this, because:
The gas model should be computed (fairly) cheaply.

Again, this sounds basic — but computing gas costs is a computation, just like anything else; and gas computations are ubiquitous on Tezos, so there may be a trade-off between efficiency and accuracy.

Let’s talk about how we optimized the cost of gas monitoring while preserving its correctness, thanks to saturation arithmetic. This optimization is a part of the Florence protocol proposal, currently under consideration by the Tezos on-chain voting procedure.

How does gas monitoring work?

Consider a node — call it Maria — on the Tezos network, about to perform a computation \(C\) on some value \(V\). Before computing \(C(V)\), Maria first uses a gas cost model \(M\) to compute a gas quantity \(G_M(C,V)\) which — if we designed our model correctly — fairly realistically anticipates the real-world cost of actually computing \(C(V)\). Note that:

\(G_M\) is determined by the cost model \(M\).
\(G_M(C,V)\) depends on the computation \(C\), and also on the value (or values) passed to it. This means that \(G_M(C,V)\) must be computed at runtime, when \(V\) is known, and cannot be statically determined at compile-time.²

\(G_M(C,V)\) has dimension mgas (milligas) units, and \(G_M(C,V)\) mgas is subtracted from the remaining gas counter.

If the remaining gas counter falls below 0, then Maria declines to compute \(C(V)\), due to it exhausting its available gas.
If the counter remains nonnegative, then Maria proceeds to compute \(C(V)\).

Both the gas model and the check for exhaustion perform arithmetic operations over gas. The usual arithmetic operations available in most CPUs implement modular arithmetic over 64-bit integers, in which only the numbers from \(-2^{63}\) to \(2^{63}-1\) inclusive can be represented.

The OCaml int memory representation reserves one bit for a compiler tag. Thus the effective length of an OCaml integer datum is 63 bits, and the native 64-bit machine instructions yield effective bounds of \(-2^{62}\) to \(2^{62}-1\) for a programmer using a signed integer in the OCaml compiler on a 64-bit system (or \(0\) to \(2^{63}-1\) for an unsigned integer).

If the result of an operation (e.g. add or multiply) exceeds these bounds, then we say the operation overflows. The special case of an overflow downwards may also be called an underflow, and we will use this terminology in this post.

For example:

\((2^{62}-1) + 1\) yields \(-2^{62}\) (we overflow).
\((-2^{62}) - 1\) yields \(2^{62}-1\) (we underflow).

Overflow is dangerous!

The gas model is public knowledge so an attacker knowing that \((2^{62}-1)+1 = -2^{62}\) need only request an operation whose gas cost calculation triggers this addition, to make a gas profit of \(2^{62}\) via overflow, thus literally counjuring gas out of nowhere and taking over the system.³

How is it currently implemented?

We currently use the arbitrary-precision integer ZArith OCaml library, by Antoine Miné, Xavier Leroy, Pascal Cuoq, and Christophe Troestler. Edo performs gas computations using ZArith, and the Michelson interpreter uses ZArith for its arithmetic on natural numbers, integers, and timestamps.

ZArith uses a dynamically-sized representation for numbers: any number can be represented, provided it fits in the computer memory. This is safe, but slow. It protects Tezos from overflows by making them impossible, but:

The representation is complex. A mere machine word will not suffice, so arithmetic operations are costly simply because they process a more complex datastructure.
ZArith is implemented as a hybrid OCaml/C library. Calling a C function from OCaml is expensive. ZArith tries to avoid C functions when dealing with small integers, but it cannot fully escape this cost. Thus even an addition over a 63-bit integer represented in ZArith is about five times slower than a typical addition over int type in OCaml.
Inevitably and by design, very large numbers are there to be computed on, and the computational cost scales with their size.

\(2^{62} - 1\) `mgas` ought to be enough for anybody

From the Delphi protocol onwards — thus: including the current Edo protocol, and also in the Florence protocol proposal — an operation cannot consume more than \(1,040,000,000 \approx 2^{30}\) mgas. So representing mgas with a native OCaml 63 bit integer seems reasonable; on a 64-bit architecture the integer limit of \(2^{62} - 1 = 4,611,686,018,427,387,903\) mgas would be hit in future evolutions of the protocol only if we multiply the gas limit by several billion.⁴

Once the cost model reaches \(1,040,000,000\) mgas the operation will be cancelled, and that’s all we need to know. So integer precision is not an issue and we just need to consider the potential for over- or underflow.

Saturation arithmetic equips bounded integers with operations that do not overflow/underflow, but instead might saturate — they default to the biggest/smallest value that fits within the bounds (as we discuss next).

Because \(2^{30}<2^{62}\), our final results cannot saturate, and while there might exist intermediate computations that could saturate, we commit to writing code that will detect this and correct as necessary.

How to implement saturation arithmetic?

Saturation arithmetic is a well-known technique to avoid underflow and overflow while computing with bounded integers. In case of an overflow / underflow, the result is replaced by the maximum / minimum value. We only need nonnegative integers for gas — remember if we drop below 0 we stop computing — so the minimum is 0 and the maximal is max_int, which in OCaml running on a 64-bit system is \(2^{62} - 1\). Thus:

\((2^{62}-1)+1\) yields \(2^{62}-1\).
\(0-1\) yields \(0\).

Saturation arithmetic operations are available in machine instruction sets like MMX or AVX2. However, we prefer to implement them in software on top of standard modular arithmetic operations, for portability (the efficiency hit is modest).

The challenge is therefore to efficiently detect when an overflow or underflow has occurred:

For addition: for \(0\leq x,y<2^{62}\), we have that \(x+y<0\) if and only if the computation of \(x+y\) overflows (where here \(+\) denotes the modular arithmetic operation). Thus to check whether an addition has overflowed, we just check that the result is nonnegative.
For subtraction: for \(0\leq x,y<2^{62}\), we have that \(x-y<0\) if and only if the computation \(x-y\) underflows. Thus to check whether a subtraction has underflowed, we just check that the result is nonnegative.
Addition cannot underflow, and subtraction cannot overflow. Division can neither overflow nor underflow.
The interesting case is multiplication, as we now discuss.

We use a bespoke OCaml module saturation_repr.ml (see lines 73 and 98):

 1 let saturated = max_int
 2
 3 let small_enough z =
 4   z land 0x7fffffff80000000 = 0
 5 
 6 let mul x y =
 7   (* assert (x >= 0 && y >= 0); *)
 8   match x with
 9   | 0 ->
10      0
11   | x ->
12      if small_enough x && small_enough y then x * y
13      else if Compare.Int.(y > saturated / x) then saturated
14      else x * y

A multiplication between two nonnegative integers overflows when y > saturated / x. We see this check on line 13 above, but it requires a slow division so we try to avoid it on line 12 with a fast small_enough bitmask test that is sound, but not complete — an integer that passes small_enough is certainly small enough, but some integers are close enough to the bound that they may need re-checked the slow way. In practice, most executions run through the fast path.

An additional implementation trick: we use a phantom type to statically track the integers that are known to be safe for multiplication. The clients of the saturation arithmetic module are then offered several multiplications depending on the static knowledge they have about their arguments:

(** [mul x y] behaves like multiplication between native integers as
   long as its result stays below [saturated]. Otherwise, [mul] returns
   [saturated]. *)
val mul : _ t -> _ t -> may_saturate t

(** [mul_safe x] returns a [mul_safe t] only if [x] does not trigger
    overflows when multiplied with another [mul_safe t]. *)
val mul_safe : _ t -> mul_safe t option

(** [mul_fast x y] exploits the fact that [x] and [y] are known not to
   provoke overflows during multiplication to perform a mere
   multiplication. *)
val mul_fast : mul_safe t -> mul_safe t -> may_saturate t

(** [scale_fast x y] exploits the fact that [x] is known not to
   provoke overflows during multiplication to perform a
   multiplication faster than [mul]. *)
val scale_fast : mul_safe t -> _ t -> may_saturate t

What are the gains?

Here are the running times of a micro-benchmark which does 10,000 sequences of 10 operations over random pairs of integers:⁵

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith add     │ 2,993.12us │
│ Standard add   │   316.32us │
│ Saturation add │   411.82us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith sub     │ 2,340.94us │
│ Standard sub   │   322.81us │
│ Saturation sub │   421.24us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith mul     │ 3,285.63us │
│ Standard mul   │   321.70us │
│ Saturation mul │   479.08us │
└────────────────┴────────────┘

┌────────────────┬────────────┐
│ Name           │   Time/Run │
├────────────────┼────────────┤
│ Zarith div     │   481.98us │
│ Standard div   │   432.28us │
│ Saturation div │   436.89us │
└────────────────┴────────────┘

For addition, subtraction, and multiplication,

the ratio between modulo and ZArith arithmetic operations is roughly \(10\), while
the ratio between modulo and saturation arithmetic operations is roughly \(1.25\).

Unsurprisingly, there is no significant performance difference for division.

We see above that

saturation arithmetic is almost as fast as standard modulo arithmetic, and
provided you steer clear of division, it is ten times faster than ZArith.⁶

In the Michelson interpreter (documentation; nice overview) every execution cycle starts with a gas monitoring operation, and using saturation arithmetic makes this operation quicker: the gas counter can be represented in a machine register (which was not the case in Edo using ZArith); the OCaml compiler can inline the saturation arithmetic operations; and the cost model execution is faster because it uses a more efficient arithmetic. These optimizations taken together reduced the real computational cost of the Michelson execution cycle, and correspondingly allowed us to reduce by around 35% its gas cost in the Florence gas model.

When the node executes the following three well-known contracts, the gas consumed by the execution cycle⁷ is measured as follows (units are milligas; mgas):

┌──────────┬────────┬──────────┐
│ Contract │ Edo    │ Florence │
├──────────┼────────┼──────────┤
│ Dexter   │ 44,867 │  28,813  │
│ FA1.2    │  9,718 │   6,238  │
│ Manager  │  3,764 │   2,301  │
└──────────┴────────┴──────────┘

We used a bespoke version of Tezos to isolate the relevant signal in the measurement above, which we are happy to make freely available for your convenience. Code pointers for the contracts above are:

The Dexter (v1) contract (script).
The FA1.2 contract (script).
For the Manager contract, see this Manager script from the Mi-Cho-Coq repo.

What are the risks?

The system is still protected from an attacker armed with a hostile operation that requires a very large amount of gas: the cost model will saturate; the gas counter will be set to 0; and the hostile operation will be cancelled. So saturation arithmetic protects the node from this type of attack just as well as ZArith.

However, a more subtle issue is that we lose some standard arithmetic identities, because saturation arithmetic operations treats saturated values just like any other value — max_int and 0 serve double duty as themselves, and as overflow values, and this distinction is not recorded. For example:

\((x - y) + y = x\) is invalid in general since \(x - y\) can saturate to \(0\). For example, (0-1)+1 = 1.
\((x + y) - y = x\) is invalid in general because \(x + y\) can saturate to max_int. For example, (1 + max_int) - max_int = 0.
We lose the ability to rebracket in general. For example, (0-1)+1=1 whereas 0-(1-1)=0.

This makes saturation arithmetic potentially counterintuitive, which could in principle lead to programmer error. For instance, a programmer needs to be aware of situations where an intermediate computation might saturate (even if the final result is expected to be in bounds). However, the risk of such error seems fairly low, for one general reason and one specific reason:

The general reason is that OCaml is a strongly-typed programming language and integers equipped with saturation arithmetic are represented in OCaml by an abstract type, such that the programmer is reminded (by the type) that these are saturation arithmetic integers, and only saturation arithmetic operations can be used to manipulate its values.
The specific reason is that the cost model implementation should only need increasing functions, so the pitfalls described above should not arise as long as the functions are written to correctly propagate saturation. Aside from this, the rest of the gas monitoring subsystem is just trying to efficiently decrement a nonsaturated gas counter; if it saturates to 0, the operation is cancelled.

So in practice, saturation arithmetic is a good fit for our needs, and adds efficiency.

Conclusions and future work

So in conclusion: the 63-bit native integers of OCaml running on a 64-bit system are precise enough for gas monitoring using saturation arithmetic operations. Moving from a general-purpose arbitrary-precision C library in Edo, to a bespoke OCaml module implementing saturation arithmetic operations in Florence, allowed us to

speed up gas arithmetic by a factor of ten,
increase overall performance⁸ of an execution cycle of the Michelson interpreter in Florence by 35%,
and thus globally improve the efficiency of the gas monitoring subsystem.

For future work:

We believe that saturation arithmetic could be further optimized by making it branch-free.
Some automatic verification would be useful to further eliminate risk, by checking that the cost model implementation only uses increasing functions (and raise an alert if non-increasing functions are introduced).

You can also read more about Nomadic Labs’ work on gas costing in a previous blog post on Delphi.

So gas is a proxy for computational complexity, like money is a proxy for value. The gas model is like a pricelist, and the gas allocation is a budget. Computations that exceed their budget, get shut down. Nomadic Labs is advertising an open internship on gas model validation. If interested, please e-mail careers@nomadic-labs.com. ↩
Gas costs depending on values is common for Michelson because we have built-in data structures and arbitrary-precision numbers. Contrast with e.g. the Ethereum Virtual Machine’s gas model (EVM), in which almost all instructions have a fixed gas cost. ↩
Early flight simulators were vulnerable to integer underflow: if you dived towards the ground and picked up enough downward speed, you could underflow the speed counter and find yourself travelling upwards at incredible velocity. It was possible using this trick to “hop” indefinitely with an empty fuel tank. Unfortunately this issue was not restricted to games. ↩
… but not on a 32-bit architecture, since \(log_2 (1040000000) = 29.954\). The Tezos node only runs on 64-bit architectures so that is not a problem. ↩
We compose several operations to reach situations where large numbers appear in ZArith. ↩
In fairness to ZArith, it is good at what it does. But we only need to count to \(2^{30}\), so we just don’t need all the power that it offers. ↩
In practice, the cost of the Michelson interpreter cycle is only part of the full picture, and other costs may intervene (as for any large system), e.g. deserialisation costs. We are of course actively optimising these too (for instance, we recently optimised deserialisation costs by a factor of around 10, see this MR). ↩
The speedup per arithmetic operation from ZArith to saturation arithmetic is approximately 10, as observed above. The execution cycle includes other operations — typically dispatching over instructions and moving to the next instruction — so the overall reduction is 35%. ↩

Sound and fast gas monitoring with saturation arithmetic