Introducing new Octez node logs for better UX

TL;DR: Future versions of the Octez node will only output essential information, while a more detailed log is written to disk in the background

Core developers and users of the Octez software suite know it well: the Octez node is pretty verbose.

For every block, the node yells out a whole series of steps taken:

Jan  6 14:33:10.879 - validator.block: prechecked block BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.900 - validator.block: block BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E validated
Jan  6 14:33:10.900 - validator.block:   Request pushed on 2023-01-06T13:33:10.853-00:00, treated in 153us, completed in 45.531ms
Jan  6 14:33:10.920 - prevalidator: switching to new head BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.920 - prevalidator:   Request pushed on 2023-01-06T13:33:10.900-00:00, treated in 308us, completed in 19.333ms
Jan  6 14:33:10.920 - validator.chain: Update current head to BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E
Jan  6 14:33:10.920 - validator.chain:   (level 462824, timestamp 2023-01-06T13:33:10-00:00, fitness
Jan  6 14:33:10.920 - validator.chain:   02::00070fe8::::ffffffff::00000000), same branch

To improve the user experience, we’ve decided to make a significant change to these logs. In the next major version of Octez (17.0), the node will output ONE SINGLE LINE for each new block, plus for injected operations and baked blocks (if running a baker). But don’t worry, all the nitty gritty technical information remains available.

Read on to learn what we did and why.

What a node log should do

Logs are a way to communicate information on the state of some software. In the context of the Octez node, the goal is to keep the user aware of the interaction between local Octez software and the Tezos blockchain.

Some examples of what could be going on:

The chain is running, node is synced
The node is stuck or bootstrapping
New blocks are being discovered
Connecting or disconnecting to/from peers
Operations are being received
Various errors
…

When everything goes well, a new head appears regularly with every new block. Or the node reports receiving new operations. During these peaceful times there is really no need to be verbose.

Still, at every new block, the node is yelling the eight lines in the introduction of this post. These lines contain what to most users must seem like dark magic incantations, like prevalidator and Request pushed or timing in micro-seconds.

Is that interesting for a user? We don’t believe it is. The only pertinent information for a user is :

Hey, it's going well, the level is now 462824 and the most recent head 
has the hash BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E.

Wait, I want the juicy details!

On the other hand, when issues arise, we want users to be able to communicate errors to relevant people, such as the developer team and incident managers.

In fact, it’s not only the errors we need, but also the state of the node during the last few hours or days. This information can sometimes be even more important than the error itself.

To make the logs less of a pain to read for a user, we need to remove information. But if we want logs to stay useful enough for a developer to fix issues after an incident, we need to keep this information somewhere, and even add more!

So, we’ve split logs into two channels:

User logs: Simple and clean, except for exceptional behaviors or errors. It is shown as the standard output.
Internal logs: Verbose, but silently written and stored to disk for a limited number of days to avoid excessive use of disk space.

For internal logs, we added a new node configuration option (as specified in the documentation), daily-logs, to create a log file every day, which is kept during a user-defined number of days.

Users can find these log files at the path <node-data-dir>/daily-logs/. By default, the daily log files will be kept for seven days, in which case the directory shouldn’t exceed 500MB.

Stripping down the node output

Now that the important debugging logs are stored elsewhere, we can safely remove information from the user logs. At each block, we now have only one line:

Jan  6 14:33:10.879: head is now BLkhtA81uak1SxxCDCVMXEnXCKsJcFLajVT84ZjrFKbxAaJUL4E (462824)

In short, we

Removed the part of the log (like prevalidator, validator.chain) that references a location in the code
Removed a few redundant messages saying the same thing: a new block has been processed and added to the chain
Split some messages containing too much information, like timestamp or fitness that can be found elsewhere anyway.

Most of this work is expected to appear in Octez suite v17.

What’s next?

The next step is mostly a refactoring of the node logging engine, and hopefully removing a deprecated library, lwt_log. This work will allow us to have colors in the logs to emphasize specific information, like when your baker baked a block, or when an error is shown.