Improving OCaml Dependency Management in Octez

TL;DR: To mitigate the risk of errors and supply-chain attacks, we make sure that Octez is always compiled with the same OCaml dependencies. Recently we have streamlined the process of managing dependencies by using ‘opam lock’.

The majority of Octez is written in the OCaml programming language, and makes heavy use of third-party OCaml libraries. As Octez is used to run a network handling assets of considerable value, it is critical that those third-party libraries can be trusted.

Imagine the following scenario:

Fulbert, an Octez contributor, wants to use the (hypothetical) third-party library “FasterCrypto” in Octez. He checks that the library is safe, and even finds papers claiming that FasterCrypto has been formally verified. Reassured, Fulbert brings FasterCrypto into Octez.

The day after, Gerberge, a respected baker, updates Octez to the latest version. To her horror, she realizes that her private keys have been wiped from the harddrive. Fortunately she has back-ups – but what went wrong? Gerberge reaches out to Fulbert and they investigate together.

They check that they are using the same version of FasterCrypto. They are. Maybe different versions of other Octez dependencies somehow interact differently with FasterCrypto? Gerberge installs the same versions that Fulbert is using, but the problem persists.

Desperate to find the reason, they begin comparing their respective source codes, not only of Octez but of all third-party libraries. They find one difference… in one of FasterCrypto’s files. Despite Gerberge and Fulbert using the same version of FasterCrypto, their copies differed. What happened?

Managing the supply-chain

A benign scenario is that the authors of FasterCrypto had discovered the bug and fixed the library without updating the version number. It’s unlikely, but technically possible.

A much worse scenario is that Gerberge and Fullbert have experienced a supply-chain attack, in which an attacker manages to introduce malicious code via a third-party library used by the software they’re targeting.

One notable example of a supply-chain attack is the recent Ledger Connect Kit exploit, where malicious code was introduced to a Javascript library used for connecting Ledger hardware devices to third party dApps. It was the result of a former Ledger employee falling victim to a phishing attack, which allowed a bad actor to upload a malicious file to Ledger’s package manager for Javascript code shared across apps.

In practice, such scenarios are very unlikely for Octez. Its build system is designed to mitigate the risk of them happening by ensuring that everyone uses the exact same source code for Octez and its dependencies. In this blog post, we explain our process for maintaining this guarantee, and how we have recently simplified it using opam lock.

Opam is not enough

To fetch Octez dependencies, Octez developers and users who build from source use opam, an OCaml Package Manager. When opam installs a dependency, it downloads the source code of the dependency (its “tarball”), checks the integrity of the data, and only then installs the dependency. So opam guarantees that what has been installed is what opam expects. Problem solved, just use opam? Not quite.

What opam expects is that the tarball it downloads has a given hash. Each opam package is specified by an opam file which contains, in particular, a URL where to find the tarball, and the hash of this tarball. The opam files themselves are stored in the public opam repository.

When Gerberge installs FasterCrypto, she has to trust that no one changed the hashes in the opam file of FasterCrypto – neither the authors of FasterCrypto, nor the maintainers of the opam repository, nor GitHub, nor any attacker who could have gotten hold of the opam repository keys. She also has to trust that those hashes are cryptographically secure, i.e. that one will not be able to substitute the tarball with a different one that happens to have the same hash (for some time, many opam packages only had an MD5 sum). This is a lot of trust that Gerberge has to put in a lot of third-party people.

The dedicated ‘Tezos Opam Repository’ is a costly solution

To minimize the need to trust external parties, Octez developers until recently maintained their own copy of the public opam repository, the so-called Tezos Opam Repository. They made sure that

this repository only contained Octez dependencies, and exactly one version per package;
hashes were cryptographically secure (we provided both SHA256 and SHA512);
building Octez from source would use this repository instead of the public opam repository.

With this approach, Gerberge only had to trust Octez developers, and could in theory more easily check that the Tezos Opam Repository had not been tampered with, since it was updated much less often than the public one.

However, maintaining a clone of the public opam repository, and ensuring that building Octez from source uses this clone, takes a lot of effort. It requires using opam commands that are not your everyday opam update, opam upgrade and opam install.

This meant that only a handful of Octez developers were capable of maintaining the Tezos opam repository. Even with scripts created to ease repository management, the process was inflexible and it was difficult to get critical and urgent releases out quickly.

Fortunately, we were recently able to begin using the opam lock feature to make it simpler and more flexible.

Choosing the right tool for the job

Opam has two features that are related to what we are trying to achieve:

opam switch export --freeze can save the set of packages that are currently installed, as well as the corresponding opam files, into a single export file.
opam lock can take a package (such as Octez) and save the set of dependencies that are used to build this package into a lock file. Only package names and version numbers are saved.

Both were introduced in opam 2.1.0, although opam lock was available as a plugin before. We ended up using opam lock and not opam switch export, but comparing the two is interesting.

Files obtained with opam switch export can be imported with opam switch import. Importing a switch like this makes sure that the state is exactly the same as the one that was exported with opam switch export --freeze. This is exactly what we want. In fact, it looks like it does exactly what Octez were doing with the Tezos opam repository: store the exact opam files of our dependencies in a single location, and ensure that only those opam files are used to build those dependencies. The main difference being that this location is a file instead of an opam repository.

However, a solution based on opam switch export would suffer from some drawbacks:

the export file is quite large (more than 10000 lines for a typical Octez developer) and changes to this file would be impractical to review;
one can expect the import process to erase all the current state, meaning that if a developer installed some developer tools that are only used by them, they would have to be reinstalled after each import.

The benefits of ‘opam lock’ (and a drawback)

Opam lock, on the other hand, produces files that are much smaller: they contain one line per dependency, with the dependency name and the version number. Lock files are actually regular opam files that happen to list all dependencies recursively with exact version constraints. This makes them easy to understand and changes are easy to review. One could easily edit them manually, although in practice one should update them using opam lock to ensure that their contents can actually be installed.

Since lock files are regular opam files, they can be installed with opam install like any opam package. In our case, we use opam install --deps-only so that the lock file itself is not installed, only its contents (the list of dependencies). Because those dependencies are installed in addition to the current state, this does not usually remove any existing package. Developers can install their preferred development tools, and then install the lock file to obtain both at the same time.

But opam lock, contrary to opam switch export --freeze, does not guarantee that the source code we compile is the same for everyone. It merely guarantees that version numbers are the same. So… back to square one? Not necessarily.

Specifying hashes

To ensure that the lock file installs exactly the same files for everyone, we tell opam to use a specific commit hash from the public opam repository.

More concretely, we use opam repository remove default to tell opam not to use the default repository (which is the public opam repository, without any constraint on the commit hash). Then we add the public opam repository back with opam repository add, setting the URL to https://github.com/ocaml/opam-repository.git#HASH where HASH is a commit hash for a chosen release which has the dependencies (and versions) we want to use.

With opam set up to only use the opam files from this particular commit, we only need to verify that the packages referred to by the lock file have cryptographically strong hashes.

The benefit of all this is that Octez developers define the set of files that can be used to compile Octez – not just by version numbers, but by hashes. In other words, Gerberge only has to trust Fulbert.

A more streamlined setup

Using ‘opam lock’ with the public repository is considerably simpler than maintaining a separate opam repository for Octez’ OCaml dependencies. It provides much more flexibility in choosing which dependencies to upgrade, which is of particular importance when handling urgent security updates.

In short, Octez developers specify which commit hash from the public opam repository to use and update the lock file to reflect this. The Octez installation script (make build-deps) then automatically replaces the default opam repository and installs dependencies from the chosen commit for Octez’ end users (and the Continuous Integration system).

The lock file is updated with a custom script which simplifies the process and implements certain checks. The script is simpler than the previous ones, which makes it easier to understand and maintain for future Octez developers.

We could probably simplify our workflow further and get rid of all of our scripts entirely if opam could guarantee that installing a lock file results in the same source files being installed as the ones that were used to generate the lock file. As our approach shows, one possible solution would be to store the commit hash of the public opam repository in lock files.

Still, our updated solution is a big step forward in making management of Octez’ OCaml dependencies more streamlined – to the benefit of both Octez’ developers and end users.