1. Research & development
  2. > Blog >
  3. Verify, but test: extracting QCheck Property-Based Tests from F* specifications

Verify, but test: extracting QCheck Property-Based Tests from F* specifications

12 September 2022
Nomadic Labs

The focus of this blog post is on the automated extraction of an F* specification to OCaml and its execution as a QCheck Property-Based Test against an OCaml implementation. This work was done by Antonio Locascio during a 4 month internship at Nomadic Labs under the supervision of Germán Delbianco and Marco Stronati.

This work will also be presented at the ML 2022 workshop.


A typical workflow when developing high-assurance software is to start with a prototype to better understand the problem at hand, write a specification for the desired system (having learnt from all the mistakes in the process of implementing said prototype), write a reference implementation of the specification, and then formally verify that the implementation matches the specification, a process that is slow and costly.

However in a fast paced development cycle, the not-yet-verified implementation is often sent to production, continues to be improved over time and becomes a moving target for the verification effort, which often does not bear fruit. This is not a new problem. One of the main goals behind the development of novel formal verification tools has been to reduce the cost of verification, to allow it to be better integrated in the development cycle (see for example proof oriented programming).

In many cases it is possible today to extract an implementation that matches a specification, thus merging the last two steps of the workflow described above.

Going a step further, we would like to start sketching a specification while we develop the prototype and use this model to test the prototype itself. Thus moving from a linear development process to an iterative one, where implementation and specification are constantly refined and adapted. While abstracting away implementation details in the specification, the verification gap problem arises: given that the verified model is loosely coupled with the implementation, do the properties that we have proved for the model really hold for the implementation?

One way of addressing the verification gap problem is to rely on Property-Based Testing (PBT), that is, to extract the model as properties that can be executed as tests on the implementation. The focus of this work is on a lightweight automated tool for extracting F* specifications to OCaml, and its execution as a QCheck test against an OCaml implementation. We propose to OCaml programmers to write specifications as F* programs, instead of manually writing QCheck tests. By doing this, our tool will automatically extract the specification as QCheck properties, which can then be used to test either the code extracted from the model, or another implementation.

Thanks to this approach, rather than waiting for a finished implementation and working on a complete formal verification, we can move along a verification spectrum. Start with a prototype implementation tested by PBT on its co-developed abstract model, refine the model over time in order to extract parts of it and replace them in the implementation until (possibly) a full extraction can be achieved.

It should be noted that a full extraction is sometimes too costly, as it may require the model to precisely represent complex parts of the code that are not essential to the program’s core functionality. However, being able to find the sweet spot along this spectrum where only the crucial parts have been verified is still very valuable and common practice.

Even where a full extraction can be achieved, the complex toolchains and large trusted computing base can still benefit from a testing infrastructure. For example it is common practice to assume (and axiomatize) the existence of a List library that, during extraction, will be linked to the List module of the standard library, despite the fact that there is no formal connection between the two. For this reason, even if our properties have already been proved in the model, they should also be tested in the extracted implementation. In other words the verification gap may never be really closed in full, even at the end of the spectrum, and property based testing is a way to account for it in the development process.

The rest of the document is structured as follows: we first give a brief overview of the F* language and Property-Based Testing. Then, we give a detailed account of our proposed specification extraction tool for F*. After that, we present the case study for our workflow, Incremental Merkle Trees, which illustrates the wide spectrum of verification and testing strategies. Next, we describe our experience of working with F*, and explain our contributions to that project. Finally, we discuss related projects and some future work.


The workflow presented in this post, and the tooling developed, requires choosing a verification framework capable of extracting correct code to the target programming language, and a PBT library for the target language. In particular, given our OCaml codebase, we choose F* as a mechanization medium, and QCheck as a testing library. In the following, we give a brief overview of both systems.

An F* primer

F* is an ML-style functional programming language aimed at formal verification. F* combines automated verification through an SMT solver with the interactive functionality of a proof assistant based on dependent types, making it a hybrid verification tool. After the verification step, an F* program can be extracted to OCaml in order to be executed. Additionally F* supports meta-programming, which is mainly developed to support proving by tactics, but is a crucial feature for this work as it allows the extraction of specifications.

We’ll work through an example to introduce F*’s fundamental aspects. Let’s take a look at the following function, which reverses a list:

val reverse : list α -> Tot (list α)
let rec reverse l =
  match l with
  | []      -> []
  | x :: xs -> reverse xs @ [x]

Those familiar with ML-style languages (such as OCaml) will straight away feel at home with F*’s syntax. One difference with OCaml’s syntax is that in F* signatures are usually given before a definition. These are a central aspect of F*, because of its rich type system.

One key feature of the type system is the effect system. Notice that in the return type of the example, Tot is an effect annotation, expressing that the function is total, pure and non-diverging. The generic shape of a function type in F* is α -> E β, where E is an effect. Additional built-in effects are: Dv, for diverging functions, ST for stateful computations, and ML for functions with arbitrary side-effects. Tot is the default effect, meaning that its declaration can be omitted.

Refinement types are another important part of F*’s type system. The refinement type x:T{φ(x)} is a sub-type of T for which all its inhabitants satisfy the property φ, also called the refinement. In other words, type x:T{φ(x)} is the type of values x from T for which φ(x) is true. For simplicity, we can assume that the predicates φ are total functions returning a boolean, although they can be more general. For instance, we can use them to give a more precise signature to our example:

val reverse : l:list α -> Tot (l':(list α){length l' = length l})

Here, we use the refinement type to state that we expect the resulting list to have the same length as the original one. To refer to the original list, we needed to name the argument (l) in the signature, making this a dependent function type.

F* provides an alternative syntax for writing a function’s pre- and post-conditions explicitly. Let’s look at it with a new example:

val tail : l:list α -> Pure (list α)
                            (requires (length l > 0))
                            (ensures (fun l' -> length l' = length l - 1))

Pure is just a version of the Tot effect that, in addition to the type of the return value (list α), is parameterized by a pre-condition (requires clause) and a post-condition (ensures clause). Then, this signature is equivalent to:

val tail : l:(list α){length l > 0} ->
           Tot (l':(list α){length l' = length l - 1})

It’s not always desirable to pollute a function’s signature with properties about it. That’s when lemmas come into play. In F*, a lemma is a computationally irrelevant function (a total function returning unit), used to state some property about a program. For example, we could state (and subsequently prove) that our reverse function is involutive with the following lemma:

val reverse_involutive : l:list α -> Lemma (l = reverse (reverse l))

A more detailed introduction to the F* language can be found in its tutorial.

Property-Based Testing with QCheck

Property-based testing (or PBT for short) is a testing discipline in which properties about programs are checked to be true by validating them on a large number of randomly generated examples. Property-based testing a program generally requires the user to do two things: (i) to define the desired properties about the code, and (ii) to provide functions for generating random inputs for those properties (commonly known as generators).

To aid the definition of these properties and custom data type generators, several libraries have been developed. One of these is QCheck, which offers a wide range of combinators to do PBT in OCaml. In this project we present a tool that automatically extracts QCheck properties from F* specifications. The generator definition, however, is still left to the user, as explained in the following section.

Specification Extraction Toolchain

Let’s start with some motivation for using a specification extraction tool. Say you have an OCaml program, or even an idea for one, and you want to write a specification for it. We propose that, instead of directly writing QCheck tests for it, you write the specification as an F* program. If you do this, our tool will automatically extract the specification as QCheck properties, which can be used to test both the code extracted from the model and the original implementation.

It’s not required to do a full verification of the code in order to use our tool. As previously mentioned, one could specify an abstract model of the implementation, whose properties could be tested against the more complex OCaml implementation. Further, one might even write the specification as an axiomatization of the implementation, i.e assuming the properties and leaving their actual proofs as a future step.

In order to extract a specification written in F* as properties for property-based testing we introduce a Meta-F* tactic called extract_spec, which is defined in the SpecExtraction module.

This tactic does two things. First, it extracts a function’s pre- and post-conditions as OCaml boolean functions. Second, it synthesizes the QCheck boilerplate code for defining the test.


Before getting into the tool’s design, let’s take a look at a simple example of its usage. Its main functionality consists of extracting a function’s or lemma’s pre- and post-conditions as OCaml boolean functions. For instance, given the following F* function:

val foo : x:T{P1 x} -> Pure T' (requires (P2 x)) (ensures (fun y -> Q x y))
let foo x = ...

One can call the extract_spec tactic through a splice, which inserts syntax generated through Meta-F*. This is done by adding the following line (anywhere foo is in scope):

%splice[] (extract_spec (`%foo))

As a result, in the extracted OCaml code of the module where the splice was called, the following predicates will be defined:

let foo_pre = fun x -> P1 x && P2 x
let foo_post = fun x -> fun y -> Q x y

As shown in this example, a function’s pre-condition is parametrized by the function’s arguments, while its post-condition takes as an additional argument the function’s return value.

The second step, generating the QCheck tests, is performed by the make_test tactic (called by extract_spec). Continuing with the example, this tactic generates the following declarations:

let (gen_foo_args : T FStar_QCheck.arbitrary)
  = FStar_QCheck.undefined_gen "Generator for foo's input not yet implemented"

let (test_foo_spec : FStar_QCheck.test_t) =
  FStar_QCheck.test_make "foo_spec" gen_foo_args
    (fun a0 ->
      match a0 with
      | x -> (FStar_QCheck.assume_ (foo_pre x);
              foo_post x (foo x)))

The first of these corresponds to a template definition of the QCheck generator for foos arguments. Deriving the generators is outside the scope of this project. Some discussion regarding this can be found in this issue.

The second is the definition of the QCheck test, in which the pre-condition is assumed and the post-condition is checked. The FStar_QCheck module is just a simple interface wrapper to expose some QCheck functions in F*.

Tool’s design

The tool’s design revolves around Meta-F* tactics. Meta-F* is a metaprogramming framework for F*, i.e. a library for manipulating F* terms from within F*. A tactic, then, is simply a function that uses Meta-F*.

The bulk of the specification extraction work is performed by the extract_spec tactic, which takes as argument the name of the function the user wants to extract a specification from. We’ll give an overlook of the algorithm it implements:

  1. The first step is to query the environment to get the function’s type. This could either be annotated in a signature or inferred by the typechecker.
  2. The retrieved type is subject to some (mostly trivial) transformations. The most interesting preprocessing step is synonym resolution (for example turning nat into v:int{v >= 0}), which is crucial to capture properties hidden in refinements.
  3. Next, the type is traversed, one arrow at a time, collecting the names, types and refinements for each argument. For instance, for a type x1:T1{P1 x1} -> x2:T2{P2 x1 x2} -> C, collect_args will compute the list: [(x1;T1; fun x1 -> P1 x1); (x2;T2; fun x1 x2 -> P2 x1 x2)]. For each refinement we abstract all the previous binders. For now, we leave the final computation C untouched. It’s important to note that an argument’s type might have nested refinements, e.g. x:(v:int{v >= 0}){x < 5}. In this step, these refinements are flattened to get fun x -> x >= 0 && x < 5.
  4. All of the arguments’ refinements are then joined to get the function’s implicit pre-condition.
  5. Finally, the final computation type C is inspected. From it the post-condition (both from an ensures clause and the return type’s refinement) and potential explicit pre-condition (requires clause) are retrieved. When there is an explicit pre-condition, it is joined with the computed implicit pre-condition. There’s special treatment for lemmas, as in that case the post-condition doesn’t require an extra argument.

Most of this procedure is performed by the pre_post tactic defined in SpecExtraction.PrePost. The QCheck test boilerplate generation, defined in SpecExtraction.TestGeneration.make_test, is not discussed, as it’s mostly trivial syntax declaration.


Our specification extraction tool has two limitations worth mentioning.

The first one is that only bool refinements are supported. This is necessary, as Type predicates might not be computable, for they can include universal and existential quantifiers anywhere in the term.

The second is that only pure functions work with the specification extraction mechanism for now. Although some other effect monads might be easily supported, this is not true in general. Supporting the ever-so-present ST effect, that carries a model of the heap in the type, for instance, doesn’t seem a trivial matter.

Case study: Incremental Merkle Trees

We applied our proposed workflow of verification and testing to the implementation of the Incremental Merkle Tree data structure, used in the Sapling protocol. This was a suitable target for our project, as:

  • It’s a self-contained piece of code, whose only important dependencies are well-specified cryptographic primitives.
  • The data structure has clear invariants, the preservation of which is not trivial.
  • There are multiple implementations for it, of varying complexity, which help showcase the wide spectrum of verification and testing models.

We now proceed to give a brief introduction to Sapling and IMTs, before delving into a detailed account of our case study.


Sapling is a protocol used in Tezos that enables privacy-preserving transactions of tokens. For storage purposes, this protocol uses the Incremental Merkle Tree data structure, or IMT. This IMT structure is simply a fixed height Merkle tree, in which the leaves are only stored on the last level in the leftmost positions and cannot be deleted. Because of their use in Zero-Knowledge proofs, the IMTs must always be considered to be full trees of fixed capacity. In the next section, we provide a more detailed description of IMTs. For more documentation, refer to the protocol’s specification.

In Tezos, there are currently two implementations of Sapling. The first one, found in lib_sapling, implements IMTs through purely functional ADTs. In contrast, the second implementation, which is part of the economic protocol, is written in a stateful style to make use of the protocol’s key-value storage. While the latter is more efficient, the former is simpler and thus easier to verify.

Incremental Merkle Trees

An Incremental Merkle Tree of height h contains 2^h leaves and h + 1 levels of nodes, with all leaves at level 0 and root at level h.

For now, we focus on the IMT implementation found in the Storage module from lib_sapling. This implementation includes the following definition of an algebraic data type tree to represent IMTs:

type tree = Empty | Leaf of C.Commitment.t | Node of (H.t * tree * tree)

This type of trees have commitments in their leaves and hashes (H.t) in their nodes. Although both of these have the same internal representation, they are differentiated because the commitments correspond to the values stored in the tree, while nodes’ hashes are used to preserve integrity (as in any Merkle tree).

The trees are always treated as being full, using the default value H.uncommitted 0 for unused leaves. All the nodes at the same level of an empty tree have the same hash, which can be computed from the default value of the leaves. These hashes are stored in the H.uncommitted list, so H.uncommitted n is the hash of an empty tree of height n.

To avoid storing huge empty trees, any subtree filled with default values is represented by the Empty constructor and given its height it’s possible to compute its hash using the H.uncommitted list. Because of this, we’ll sometimes refer to this representation of IMTs as compressed trees.

The leaves are indexed by their position pos, ranging from 0 to (2^h) - 1. The tree is incremental in the sense that leaves cannot be modified but only added and exclusively in successive positions from the leftmost to the rightmost leaf — i.e., from 0 to (2^h) - 1.

IMT F* Model

Commitments axiomatization

Let’s begin our dive into the model by focusing on the Commitments module. This interface axiomatizes the cryptographic primitives needed for IMTs, namely commitments and hashes. Here is where most of the model’s assumptions are introduced, most notably the following lemma:

val perfect_merkle : h:v_height ->
  Lemma (ensures (forall h1 h2 h3 h4.
          merkle_hash h h1 h2 = merkle_hash h h3 h4
            ==> h1 = h3 /\ h2 = h4))

which states that the assumed Merkle hash function is injective at every valid level (h). This property, which can be read as “if two outputs are equal, then their inputs must be equal”, models the absence of collisions in an ideal hash function.

This module interface’s OCaml realization (ml/Commitments.ml) is just a wrapper for the corresponding functions in lib_sapling.

Machine integers

The second aspect of the model we’ll discuss is the use of machine integers. Our initial version of the model was based on F* unbounded ints, which are implemented as Zarith ints. In order to make the model as close to the implementation as possible, we refined the specification using machine integer types, following the ones used in the OCaml code. This not only means that we prove the absence of arithmetic over/underflow in our model, but also that the extracted code should have the same performance (w.r.t. arithmetic operations) as the original implementation.

In the OCaml implementation from lib_sapling, both Stdint and OCaml native ints are used. For the former, F* already provides [U]IntN modules in the standard library, that are extracted to their Stdint counterpart.

Things are a little trickier for OCaml’s native int type, whose representation is architecture dependent. For this model, we assume a 64 bit architecture, so we need to deal with 63 bit integers. To do so, we provide an interface module Int63 following the structure of those provided by the standard library. We also include an OCaml realization for it (ml/Int63.ml), that implements them using the native int type.

Refining the specification using these representations of machine integers proved to be much smoother than what we initially feared. There was a downside, however, from an extraction point of view, which will be discussed in a future section.

Main Tree model

Now we can delve into the main part of the model for IMTs, defined in the Tree module. There are three important components that build up this model, each implemented as a sub-module. The first of these is Tree.Data, which introduces the underlying data type that we use to represent IMTs.

type tree α β : Type =
  | Empty : tree α β
  | Leaf : α -> tree α β
  | Node : β -> l:tree α β -> r:tree α β -> tree α β

As implied by its name, the type tree models lib_saplings type of the same name. The only difference is that in our model, the types of the values of the leaves and internal nodes are parameterized. For our IMT spec, α will generally be instantiated with the type for commitments and β with the one for hashes.

This tree type, then, is a direct translation of its OCaml counterpart. Things only start getting interesting when the type is refined with richer properties. Hence, in Tree.Properties the core properties used to specify IMTs are presented, alongside some lemmas about them. Let’s take a look at the two properties that give IMTs their name.

val incremental : h:C.v_height -> t:tree α β -> Tot bool
let incremental h t =
    valid t && left_leaning t && balanced t && has_height h t

The first one is incrementality. This property essentially states that the leaves are filled in consecutive positions from left to right. Additionally, some other structural invariants are included in this property, such as the fact that no internal node should have two empty sub-trees. The next one, merkle, checks that the values in the internal nodes actually are the hashes of their sub-trees.

val merkle :
  h:C.v_height ->
  t:tree C.t_C C.t_H{valid t /\ has_height h t /\ balanced t} ->
  Tot bool
  (decreases %[v63 h])
let rec merkle h t =
  match t with
  | Empty -> true
  | Leaf _ -> true
  | Node ha l r ->
    let hs = pred63 h in
    ha = hash hs l r && merkle hs l && merkle hs r

With these two properties, we can define the following type:

type imt (h:C.v_height) : Type  =
   t:tree C.t_C C.t_H {incremental h t && merkle h t}

So, imt h is the type of incremental Merkle trees of height h. As we’ll see next, this type makes it quite simple and elegant to specify that a tree manipulating function preserves the IMT invariants directly on the function’s signature.

The third component of the core IMT model is Tree.Methods, where the functions that operate on IMTs are defined. We’ll focus on insertion, as it’s the only way to modify an IMT. The insert_list function, whose signature we show next, inserts a list of commitments into a tree. Its implementation is a model of insert, from lib_sapling.

val insert_list :
  vs:list C.t_C ->
  pos:uint32 ->
  h:C.v_height ->
  t:(imt h) ->
  Pure (imt h)
       (requires (length vs <= pow2 (v63 h) - UInt32.v pos
                  && UInt32.v pos = count_leaves t))
       (ensures (fun t' ->  to_list t' = to_list t @ vs))
       (decreases %[v63 h;0])

Let’s break down this signature. The first three arguments are quite simple: a list of commitments vs, the position pos of the tree in which those commitments will be inserted, and the height h of the tree (which is bounded by 32). Then, the function takes the initial tree t, which must be an IMT of height h. The return type first states that the function is Pure, meaning that it terminates and has no side-effects. After that, it describes that the function returns an IMT of height h. Finally, we have the requires and ensures clauses, which define additional pre- and post-conditions. For pre-conditions, the function requires that the list of commitments fits on the tree, and that pos is the next position to fill. As for post-conditions, we have that the list of elements of the resulting tree must be equal to appending vs to that of t. The decreases tag is just aiding the termination-checker.

It’s important to note that some pre- and post-conditions are implicit (those in the types of the arguments and return value), while others are explicit (requires/ensures clauses). In this example, the implicit properties are used to state the preservation of the IMT invariants, while the explicit clauses are reserved for the functional specification of insert_list.

Insertion identities

There were three insertion functions defined in the initial IMT specification:

  • insert_model: a naive implementation that checks if the left sub-tree is full to determine the path,
  • insert_pow: an optimization carrying the next available position,
  • insert_list: a further optimization for batch insertion. This is the one found in the OCaml implementation.

In the Tree.InsertionIdentities module the first two are defined, and all three are proven to be semantically equivalent.

Full Merkle Tree Translation

As explained in a previous section, our model uses a compressed representation of IMTs, following their implementation. An important verification target was to prove a translation from this representation to a standard Merkle tree model. To do so, we used the off-the-shelf Merkle tree model from F*, being able to adapt the Merkle properties proven in this model for our IMT type.

Certified OCaml implementation extraction

For the certified OCaml implementation, we need to extract the Tree.Data and Tree.Methods modules. The module ml/Tree.ml bundles these two extracted modules, and should replace the original OCaml implementation.

To perform the extraction, we just ask F* to extract these modules to OCaml. This will generate the OCaml files in _out/ containing the extracted code (alongside the extracted specification).

We can now take a brief look at some of the extracted OCaml code. First, the tree is extracted to the expected OCaml part.

type ('a, 'b) tree =
  | Empty
  | Leaf of 'a
  | Node of 'b * ('a, 'b) tree * ('a, 'b) tree

More interestingly, the refined imt h type becomes:

type 'h imt = (Commitments.t_C, Commitments.t_H) Tree_Data.tree

Although the refinements are erased as expected, we’re left with a phantom type parameter h, which would be instantiated to unit in the rest of the extracted code. However, as the imt type is tagged with inline_for_extraction, it’s replaced by its definition. As for the insert_list function, the resulting OCaml code is:

let rec (insert_list :
  Commitments.t_C list ->
    Tree_Data.uint32 ->
      Commitments.v_height ->
        (Commitments.t_C, Commitments.t_H) Tree_Data.tree ->
          (Commitments.t_C, Commitments.t_H) Tree_Data.tree)
  fun vs ->
    fun pos ->
      fun h ->
        fun t ->
          match (t, (Int63.eq h (0)), vs) with
          | (Tree_Data.Empty, true, v::[]) -> Tree_Data.Leaf v
          | (uu___, uu___1, []) -> t
          | (Tree_Data.Empty, uu___, uu___1) ->
              insert_node (Tree_Data.pred63 h) Tree_Data.Empty
                Tree_Data.Empty pos vs
          | (Tree_Data.Node (uu___, l, r), uu___1, uu___2) ->
              insert_node (Tree_Data.pred63 h) l r pos vs

Here we see that the final code looks quite similar to the original version, apart from some very clearly automatically generated names for the unused variables in patterns.

The extracted code has important shortcomings, some of which we managed to solve, as explained in a future section.

The Makefile target to run the extraction is:

make extract

Specification extraction and QCheck integration

In our concrete case, we put all the splices in a new module called Tree.PrePost.fst. This is not only helpful for organizing the extracted code, but also allows the user to verify the IMT model without running the specification extraction tactic (which can be slow).

In turn, running the splices will generate the OCaml file _out/Tree_PrePost.ml, containing the pre- and post-conditions and tests spliced with the extract_spec tactic.

We can take a look at the extracted property for insert_lists pre-condition:

let (insert_list_pre :
  Commitments.t_C list ->
    Stdint.Uint32.t ->
      Int63.t ->
        (Commitments.t_C, Commitments.t_H) Tree_Data.tree -> bool)
  fun vs ->
    fun pos ->
      fun h ->
        fun t ->
          (((Int63.gte h (0)) &&
              (Int63.lte h Commitments.max_height))
             ((Tree_Properties.incremental h t) &&
                (Tree_Properties.merkle h t)))
            (((Util.(Z.of_int << List.length) vs) <=
                ((Prims.pow2 (Tree_Data.v63 h)) - (Util.(Z.of_int << Stdint.Uint32.to_int) pos)))
               && ((Util.(Z.of_int << Stdint.Uint32.to_int) pos) = (Tree_Properties.count_leaves t)))

Although it may be somewhat hard to see at once, this is just the conjunction of the refinements in the function’s signature. Note that the << operator is just function composition.

Finally, we want to actually test the certified implementation against the extracted properties. To do so, we first complete the definition of the extracted QCheck generator templates, which can be found in the ml/generators.ml. Next, we copy the extracted QCheck tests in ml/test_internal_tree.ml. For instance, one of these tests is:

let (test_insert_model_spec : FStar_QCheck.test_t) =
  FStar_QCheck.test_make "insert_model_spec" gen_insert_model_args
    (fun a0 ->
       match a0 with
       | Prims.Mkdtuple2 (v, Prims.Mkdtuple2 (h, t)) ->
           (FStar_QCheck.assume_ (insert_model_pre v h t);
            insert_model_post v h t
              (Tree_InsertionIdentities.insert_model v h t)))

These splices are executed by running:

make extract

To compile the tests, just run:

make test

This will generate an executable called test.exe in the _out directory, which runs the tests.

Testing the Protocol Sapling Implementation

As mentioned earlier, there are two Sapling implementations in Tezos. Up to now, we’ve solely focused on the one found in lib_sapling, for its relative simplicity. The other implementation is part of the Tezos Protocol, and is written in a stateful manner to make use of the Context (Tezos’ key-value store). This makes this IMT implementation much harder to verify directly than the ADT-based version.

This is a perfect opportunity to recall the verification spectrum we proposed in the introduction. The IMT implementation from lib_sapling sat at one end of this spectrum: it was easy to verify it directly and it could be replaced by the extracted OCaml code. The PBTs extracted from the specification in this case help close a somewhat narrow verification gap (mostly assumptions about machine integers, lists, cryptographic primitive and F*’s extraction mechanism).

The protocol version, on the other hand, stands at the opposite end of the spectrum. Due to its pervasive use of the Context and Lwt (an OCaml library for promises and concurrent I/O with a monadic interface), doing a full verification and replacing the code with the extracted specification becomes unfeasible. Then, the verification effort in this case is limited to a model of the implementation, whose extracted code then becomes a reference implementation against which the complex version can be tested. In our particular case, the work of specifying and verifying an abstract model for IMTs is already done, as we can use the work done for lib_sapling to this end.

Following this idea, we decided to test the protocol Sapling implementation against the certified and tested extracted ADT implementation. This is implemented on the ml/test/proto-test.ml module. Several tests were considered, which are discussed in this issue.

The key to carrying this out was to define a projection function (project_tree) from the context tree representation to the IMT ADT. With this, we can describe the test we implemented with the following diagram:

Testing the Protocol sapling implementation.

This diagram is divided in two by the vertical dotted line that delimits the protocol IMTs to the ADT IMTs. For instance, t0 stands for a context in which an IMT is stored, following the representation defined in sapling_storage.ml.

Here, proj0 and proj1 are calls to the projection function, and insert' is the IMT insertion function from our ADT spec (insert_list).

The function that we want to test is insert, the protocol IMT insertion (actually called add). Our test consists of checking the commutativity of this diagram. This means, to check that, for any given t0, proj1 (insert t0) is equal to insert' (proj0 t0).

By performing this test we ensure that the complex protocol insertion function is equivalent to the one we know is well-behaved, modulo the projections.

Our experience of working with F*

In this section we’ll summarize our experience with the F* language, and outline our contributions to the F* project.

General evaluation of F* for verifying OCaml code

One aspect that makes F* suitable for verifying OCaml code is the similarity of their syntax and type system. However, F* is not a superset of OCaml, so the modelling of a piece of OCaml code requires some varying degree of manual work. This will depend on the source program, as the translation of certain OCaml constructs (such as functors) is not trivial. In our experience, this process was quite straightforward, but that might not be the case for code bases that make heavy use of first-class modules.

Another of F*’s key selling points is its hybrid nature. The language’s SMT support did in fact allow for many of our model’s properties to be proven with little to no work, while being flexible enough to spell out more complex proofs. This, however, came at a cost. Overreliance on the SMT backend often led to increasingly flaky (i.e. unstable) proofs. A small change in a seemingly unrelated lemma might cause a previously proven theorem to no longer be accepted. This issue is often exacerbated by uninformative error messages that require some deciphering, which might negate some of the time savings gained by proof automation. All this should not be read as taking merit away from F*’s hybrid approach, but rather as a remark on the importance of finding the right balance between explicitness and reliance on automation when writing a proof.

Contributions to F*

As previously mentioned, during the duration of this project we encountered some of F*’s rough edges, which are to be expected in a research language. After discussing them with some of the language’s designers we agreed to try to provide solutions for some of these shortcomings, under their guidance. We can split these contributions into two groups.

Extraction of machine integers

After going to the trouble of refining the specification with machine integer types, it came as a surprise that the extracted OCaml code using machine integers was quite inefficient.

This was mainly due to how machine integer constants were extracted. The code generated for a literal such as 42l was the following:

FStar.Int32.int_to_t (Prims.of_int 42)

where int_to_t is defined as:

let int_to_t x = Stdint.Int32.of_string (Z.to_string x)

This means that for every machine integer constant, the extracted OCaml code would:

  1. Convert it to a Zarith int,
  2. Convert that Zarith int into a string, and
  3. Parse that string to get the desired Stdint int.

This added a significant unnecessary overhead, considering that the same can be achieved by the code:

Stdint.Int32.of_int 42

This was precisely the solution we implemented in PR FStar/#2325, which has already been merged.

A particular case of this issue was the extraction of 0 and 1 constants, which followed a similar logic. These cases are special, because all the Stdint modules, which implement F*’s machine integers, expose zero and one constants. So the solution for this was much easier: to add these constants to the machine integer interface. This is implemented by PR FStar/#2306, which is also merged.

Another source of inefficiency is the Int.Cast module, that defines the conversion between machine integer types. The issue here is almost the same as in the extraction of literals. For casting a value of a machine integer type to another, the extracted code converts the value into a string, parses the string into a Zarith int, which in turn gets converted into a new string that finally gets parsed as a value of the desired type. Again, this means that casting has a significant overhead. This can be avoided by providing OCaml realization for this module, instead of extracting it. The realization can use Stdint casts directly, improving its performance. We implemented this on PR FStar/#2315, which hasn’t been merged yet.


The biggest contribution we made to F*, and the most crucial to our project, was adding support for mutually recursive let-bindings to the reflected syntax. Reflection allows inspecting, manipulating and creating F* terms from within F*, and was the tool we used for extracting specifications.

However, F*’s reflection AST didn’t support mutually recursive let-bindings, so our toolchain wouldn’t work for the function we cared the most about (insert_list).

After discussion with the F* team, and several iterations, we arrived to the solution in PR FStar/#2291, which has already been merged. This was a (though slight) breaking change, so we also had to fix some uses of the reflection interface in PR HACL-star/#471.

Finally, we encountered an issue when extracting polymorphic functions built through Meta-F*. In short, if a function’s type had more than one universally quantified type variable, then all except the first one would be instantiated as unit in the extracted code. We solved this issue in PR FStar/#2305, which is yet to be merged, but has already been reviewed. Although the fix is quite small, finding the source of this error took some time.

Unsolved issues with F*

Even though we managed to overcome the limitations of F* outlined in section [ref to section], there’s still one problem with the extracted OCaml code: its reliance on fstarlib (the OCaml implementation of F*’s standard library).

The issue lies in the fact that the extracted code uses fstarlib even when that could be avoided. For instance, primitive types (such as bool) are extracted as their F* standard library symbols (Prims.bool), instead of being extracted directly to their OCaml counterpart. Although both alternatives are semantically equivalent — Prims.bool is defined as a synonym to bool — this means that the extracted OCaml code will depend on fstarlib, even if the only part of the standard library that is used is the primitive types. Then, if one wanted to plug a module extracted form F* into an existing codebase, a number of possibly unnecessary dependencies needed by fstarlib might be added (e.g. batteries, yojson), greatly hindering the adoption of such a workflow.

There are possible, though unsatisfactory, workarounds to this issue. One of them is to manually tinker the extracted code to avoid the uses of the F* types, which is essentially what we did with the cleanup script. It goes without saying that doing this somewhat defeats the purpose of having a certified implementation, so one has to be very careful when doing so.

Before we go, we discuss projects that address similar problems through different approaches.

Extraction vs Lightweight Validation

Another well-established workflow is to directly extract correct implementations automatically from mechanized implementations. Projects like the Verified Software Toolchain narrow the verification gap by building a vertical stack of mechanized components, building on the foundations provided by the Compcert certified C compiler. Closer to F*, the KaRaMeL project allows the extraction of certified programs written in Low*, a subset of F*, to C.

Even when this approach has been successfully adopted in large industrial contexts, such as the verification of the seL4 micro-kernel, it would be hard to apply to a codebase like Tezos’, which is naturally designed to evolve. Instead, we propose to tackle the verification gap by leveraging the automated extraction of property-based tests from a formal specification in order to validate that the implementation complies with their formal model.


QuickChick is a randomized property-based testing plugin for the Coq Proof Assistant. Its central idea is foundational testing, which means that the testing code is formally verified to be testing the desired property. Additionally, QuickChick supports automatic derivation of generators for data satisfying a particular predicate. Our approach is more light-weight: implementing QuickChick’s features for F* would have required a very significant effort, and QCheck was already integrated into the Octez test suite.

There has also been some recent work leveraging QuickChick for testing OCaml code. Their workflow is the opposite of ours: they use coq-of-ocaml to translate OCaml definitions into Coq, and then rely on QuickChick to test the implementation. Moreover, this approach suffers from the limitation that coq-of-ocaml cannot fully translate all OCaml features into Gallina (Coq) without making arbitrary design decisions which carry semantic weight.


Monolith is a framework for testing OCaml libraries, that supports random testing and fuzzing. The user has to specify the library’s interface (types and operations) and provide a reference implementation. Then, Monolith runs sequences of operations trying to find unexpected behaviours.

Given that we propose extracting reference implementations from formally verified abstract models, it would be interesting to study if those could be integrated with Monolith. This way, the open issue of defining generators could be solved by making use of Monolith’s fuzzing and random testing features.

Extracting effectful specifications

Currently, our tool targets only pure F* programs with Boolean pre- and post-conditions. An interesting line of future work would be to extend it to support other F* monadic effects, and to leverage the specification extraction mechanism to test arbitrary monadic OCaml code implementing such effects. For example, in our setting this could be used for directly extracting a specification for the more complex stateful implementation of Incremental Merkle Trees.