Lighthouse Update #18

A collection of Lighthouse updates resulting from the drive to our public testnet

Summary

We have been delaying this update in the hope that it would be primarily detailing the instructions of joining our public testnet. However, as this update will detail, there have been a number of improvements, bug fixes and upgrades that we feel are necessary before releasing our public testnet.

The list of major achievements since the last update:

  • Eth1 integration
  • Discovery v5 standardisation and interoperability with other implementations
  • Full network upgrade (improved stream management and syncing) merged into master
  • IP limiting in the DHT as a mitigation for Sybil attacks
  • v0.9.1 specification update
  • Validator slashing protection
  • Improved syncing stability
  • Initial work on the Noise handshake for rust-libp2p
  • Improved network testing (RPC stream management tests, artificial chaos monkey, etc)
  • Freezer database (improvements in speed and storage)
  • Validator on-boarding and deposit contract input generation for public testnets
  • Bug fixes in ENR library from fuzzing
  • AWS infrastructure for public and private testnets
  • Private testnet using Goerli testnet

In other news, we have selected Trail of Bits as the security auditor for Lighthouse. This audit is planned to begin in January 2020.

Public Testnet

There has been a fair amount of interest in when the first Lighthouse public testnet will be available. In short, the answer is very soon (likely next week). For the interested reader, we have a tracking milestone for issues still to be completed: Public Testnet Milestone

Over the last month, we have been working on building out the necessary features for a long-lasting public testnet. These include, Eth1 integration (connection to an Eth1 node to vote on blocks and read the deposit contract), network update (shift to the new stream based specification and associated syncing re-write), discovery v5 interoperability, v0.9.1 specification update, new database storing strategy, AWS infrastructure and validator on-boarding (allowing public users to generate validator keys, interact with the deposit contract and join the network). Over the last month, all of these features have been independently completed.

If these are complete, why has a public testnet still not been released?

Along with the completion of these features we have been ramping up some of our internal testing efforts. As the system becomes more intricate, we are starting to discover more bugs in unforeseen edge cases. Our plan is to iron these out before releasing the public testnet to ensure a smooth user experience for the community.

On this note, we are trying to isolate and find as many edge cases as we can and as such intend to release a public testnet with Lighthouse configured to maximally discover bugs. For example, session timeouts in discovery v5 are set to 30 seconds (when they should be of the order of days/weeks). Settings such as this will hopefully allow us to catch issues with a minimal number of nodes in shorter periods of time than using real-world values.

We hope any users participating in our testnet will log and report back any discovered issues to aid in this effort.

The final elements left before our testnet are:

  • Write up clear documentation on how to use and interact with the testnet and become a validator
  • Polish and test our new database implementation
  • Implement a load-balanced syncing strategy
  • Perform some final large scale tests

We hope to complete these last few items in about a week.

Freezer DB

In our earlier private testnets we have been storing the blocks and associated state at each slot in a local database. With a 6 second slot time, this occupies 32GB in about 5 days, which is obviously not ideal and not satisfactory for a public testnet.

We have implemented a new approach which drastically reduces the storage costs for the local database. The technique fundamentally removes any duplication of state in the database and adds in periodic state snapshots. From any given state snapshot a new state can be constructed by replaying the blocks that are stored in the database.

Further testing and performance benchmarking is required to provide exact improvement metrics: these will be given in the next update.

Network Updates

This PR has been merged into the master branch and brings all the latest network updates. This primarily includes the transition to a stream-based RPC along with various interoperability updates and bug fixes.

Along side this, we have implemented further testing tooling and infrastructure. We've added comprehensive tests to ensure the RPC and libp2p protocols behave as expected. We've added the ability for Lighthouse to randomly propagate messages on the network to simulate packet loss which has allowed us to identify some issues with our syncing algorithm (which have now been corrected).

We have improved the Lighthouse ambient block discovery process. Any unknown block seen via attestations or gossip whose parent is unknown undergoes a chained query from connected peers to find the block's ancestor that matches a block in our chain before processing all the downloaded blocks. This helps ensure a Lighthouse node remains at the current head given all information available to it and helps prevent accidental forking in periods of high packet loss.

We have a PR ready to be reviewed for more advanced block and attestation validation (thanks @g-r-a-n-t). This validation allows us to decide more thoroughly whether to propagate a block/attestation and effectively maintain a healthy network, filtering out malicious/bad blocks/attestations.

We have also started working on some mainnet features. This includes the Noise handshake for rust-libp2p and compression support protocols for our RPC (this is currently being worked on by @b-m-f).

Security Updates

The main news for this section, is that we have selected and confirmed Trail of Bits as the firm who will be auditing the Lighthouse codebase. We expect this to start in January 2020 and look forward to working with such a highly reputable team.

As always the security efforts for Lighthouse are continual and integrated into the development process. Further fuzzing of our ENR (Ethereum Node Record) implementation revealed a crash when encoded ENRs other than version 4 (currently the only specified version) were attempted to be decoded. This has been corrected and fixed.

Next Steps

We are focused on completing all the necessary final updates and features we feel are required for a public-facing testnet. This is our primary goal and we hope to release a new update very soon detailing how to join, contribute, become a validator, and bug hunt with us!