Lighthouse Update #24

v0.11.1 Testnets

Since the last update we've been working on testing our implementation of the latest spec, v0.11.1. In the last week we've raised two testnets:

  • unity-4k: a testnet with 4,096 validators for quick and easy testing.
  • unity-alpha: a testnet with 16,364 validators serving as a test-run for our next publicised testnet.

Both testnets are running on AWS across four regions with 16 instances per testnet. The testnets are not yet 100% aligned with v0.11.1, with some networking message validation still undergoing unit testing. These testnets are publicly accessible and have been used by other client teams and researchers, however we haven't publicised them so that we can be free to restart them at any time without disappointing users.

We're expecting to be fully aligned with v0.11.1 this week and launching a public, user-facing testnet next week (pending successful stress testing).

Testnet Automation

The two aforementioned testnets were both launched using our new Ansible automation setup. This setup fully automates testnet creation, including:

  • Multi-region AWS infrastructure deployment
  • Deposit contract deployment
  • Installing Lighthouse, Geth, etc.
  • Provisioning boot node networking keys
  • Provisioning ETH to each validator node which then submits its own staking deposits.
  • Defining and distributing a testnet specification
  • Starting the testnet and waiting for genesis

It is designed such that it can easily support multiple testnets with different validator counts, infrastructure layouts and specification constants so we're looking forward to running some testnets with challenging characteristics, such as short slot times.

This automation setup is fully open source and available here: sigp/lighthouse-ansible.

Additional achievements

Stable Futures

Wade progress on switching to Rust's latest aysnc programming feature known as stable futures which gives us access to the latest versions of the Rust async runtime tokio.

This latest version of tokio has a new "blocking" thread-pool that stops long-running tasks (block processing) from blocking time-sensitive tasks (network handshakes). We've experience these issues during stress testing and are looking forward to alleviating them.

Discv5

Our implementation of discovery v5 has been maintained on our fork of rust-libp2p. Other users are beginning to make use of this and we therefore plan to move it into it's own crate. Notably, @jrhea identified some bugs in our implementation which have been fixed.

Over the next few weeks we will undergo this transisition and start proper versioning of the protocol for public use.

API Improvements

Clients are diverging on their high level user API's and so client teams have come together to try an unify the API standards. We have been slowly adding to our external API and will be conforming to the new standards once they are finalized. Of particular note, we have (finally) added in API access to our syncing logic via /node/syncing as well as some advanced information into our networking stack, for example /lighthouse/peers.

In the coming weeks, we will be growing our API rapidly to meet the standards and improving our documentation to make Lighthouse more end-user friendly.

Safer arithmetic

During fuzzing we noticed that the spec doesn't clearly define how to handle an overflow in integer arithmetic. We opened issue #1701 on the spec repo to deal with this which has had a good reception.

Whilst there should be no way to trigger an overflow when processing all blocks from a trusted genesis state, however if we develop more complicated sync strategies like Eth1 warp-sync we may find that states are no longer trusted and may be malicious.

To address this, Michael has implemented safe arithmetic throughout the state transition logic and we have started fuzzing our implementation with arbitrary states.

During the process, Michael also made a PR to clippy so that we can do linting for unchecked arithmetic.

Multiclient Interoperability

Over the past few weeks we have been doing interoperability tests with other clients in order to achieve our goal of a multi-client testnet.

Teku has performed various tests checking interoperability with our RPC methods and recently become compatible with our Noise libp2p protocol and have demonstrated successful connections with noise handshakes.

Prysm recently launched their topaz testnet which we initially had difficulties connecting to. We've tracked the connection issues down to Prysm nodes reaching max peers, often preventing Lighthouse node from joining in. With this aside, we have managed to connect to the topaz network. Prysm nodes currently only support the secio protocol, so a successful connection has demonstrated Lighthouse's ability to attempt to negotiate noise, then fallback to secio.

Once connected, the initial RPC methods seem to be compatible and we begin syncing the topaz tesnet, yay! However we found a state-root mismatch on the first block of the second epoch. It was discovered there was a critical bug in rewards calculations in Prysm's nodes causing the state-root mismatch. Therefore, further multi-client interoperability tests with Prysm will be conducted on a new Prysm testnet alongside tests with Prysm nodes joining one of our v0.11 testnets.