Lighthouse Update #29

Update Summary

Lighthouse has reached a new level of stability and performance. As we approach the mainnet launch and the completion of our final audits, we have been finishing off adding some final features and spec changes. As always, we continue to rigorously test all aspects of the client to identify any areas that can improve stability, security and performance.

The main highlights for this update in dot-point form are:

Ethereum 2.0 Standard API upgrade. The official Ethereum 2.0 client API's have stabilised and we have a working version of these implemented. They should find their way into our master branch soon, bringing Lighthouse in line with the standard and making life easier for developers and users.
Gossipsub v1.1 - We have completed the rust implementation and it is currently undergoing an external audit. We are also undertaking initial research into a set of v1.1 scoring parameters.
Attestation inclusion updates - A number of updates to improve Lighthouse's attestation inclusion rate.
Discv5.1 - Have started a protocol upgrade to our current discovery mechanism.
Network Fuzzing - Enhanced our fuzzing efforts to target all of our networking sub-protocols.
Sync Enhancements - Improvements to the sync protocol to increase its reliability and speed.
Key management updates - Improvements to key management including the ability for key recovery
Started building tooling for security testing of testnets.

Ethereum 2.0 Standard API

All clients have agreed to implement a set of standard HTTP API endpoints as specified in the Ethereum 2.0 Standard API.

The purpose of this is to expose a uniform set of endpoints that developers, users, block-explorers and even other clients validator clients can interact with. We hope this makes life easier for end-users as all clients will have a set of standardised endpoints allowing applications and users to swap between clients easily.

We are in the final testing phase of our implementation of these APIs and they should reach our master branch in the coming week(s). We recommend visiting the Eth2 API standards repository to see the updated endpoints and accessible information available from Eth2 clients, including Lighthouse.

Gossipsub v1.1

Although most clients (including us) have Gossipsub v1.1 currently enabled in Medalla, the core features of v1.1 are accessed via a set of scoring parameters that we apply to each gossipsub topic.

These parameters are highly network-specific, meaning that we need to devise, simulate and test a set of parameters that will work for each topic in the Ethereum 2.0 network.

We are currently in the process of researching a set of parameters and actively testing on Medalla. Alongside this, we are using previously built Libp2p Testground plans to further simulate and test our scoring parameters.

In the course of this testing we have discovered a number of incompatibilities between clients on the gossipsub network layer. We have been correcting and improving these (across clients), making the network more efficient (less bandwidth) and faster (incompatibilities would have degraded message propagation).

We hope to have the scoring parameters finalized soon (this is a multi-client collaboration) such that all clients can use the full features of gossipsub v1.1.

Discv5.1

A number of security tests (attacks) were performed by Jonny on the discovery protocol implementations for each client. Lighthouse performed well and no attack was found in the latest round of testing that could stop discovery or meaningfully impact a Lighthouse node. One attack in particular revealed an inadequacy in the Discv5.0 spec.

Felix has been working hard updating the specification to a new version (5.1) which contains a number of fixes and security updates. This attack was enough for the client implementers to decide to update to 5.1 before mainnet. This is a breaking change in the discovery protocol, but one we prefer to make now before we go to launch on mainnet.

For Lighthouse, we have started this update and intend on completing it in time for our two rounds of network security audits. Naturally, this update is going to require more interoperability tests with other clients as they update their discovery protocols also.

The transition in Medalla may see out-dated nodes finding less and less peers, forcing the update to a 5.1 discovery mechanism.

Sync Enhancements

The long-period of non-finality has surfaced a race condition in our sync algorithm. Essentially, we attempt to sync from the most peers who agree on a finalized root. As the chain progresses and new epochs get finalized, peers start agreeing on new finalized roots (in good conditions these get updated every epoch). In normal conditions this is fine. We would seamlessly continue syncing with a new target of the updated finalized epoch.

However, in Medalla, there is a long period of non-finalization. As Lighthouse nodes try to sync this section, typically peers will update their agreed finalized root each epoch and the syncing Lighthouse node will switch to these peers as they contain the latest source of truth. However, for complete security, we start downloading blocks from the last known finalized epoch (from our perspective), but in Medalla this could be many, many blocks in the past (if we are in the process of syncing the period of non-finality), and the sync process essentially starts again from a long period in the past.

Although, this is technically a safe approach to account for any potential forks (even in the finalized part of the chain) it can be updated where we optimistically start from where we left off and be smarter about detecting any kind of abnormalities in the chain before (as a last resort) starting again from our last known finalized epoch.

This update should not only make syncing more reliable, but remove any duplication of block downloads and processing, essentially speeding up the time to sync to the head of the chain.

Attestation Inclusion

In the last update we talked about the complexity of an attestation life-cycle and its journey to be included into a proposed block. We spent a bit of time analyzing the gossipsub network and tracking down missed attestations from Lighthouse validators.

Eventually it was discovered that the attestation production cache was faulty under certain patterns of skip-slots. Since Pawan discovered and fixed this bug we have not noticed a single missed attestation on any of the many nodes we're monitoring.

The detective work in finding this issue has lead to extended metrics and monitoring of attestation propagation and the gossipsub network in general. We intend on leveraging this extra tooling to help evaluate our new gossipsub v1.1 scoring parameters.

Network Fuzzing

We have two scheduled audits approaching that will be targeting Lighthouse's networking stack. We therefore wanted to further ensure the stability and robustness of Lighthouse at the network layer before handing over to the auditors.

To this end, we have implemented a number of fuzzing targets which test not only various elements internal to Lighthouse, but also our discv5 implementation and our gossipsub implementation. We have been running these fuzzers non-stop for almost two weeks and have not found a single crash.

This is a sign of the stability and battle-testing that has been done on Lighthouse through the various interoperability and adverse testnet conditions. The final stage before we handover the keys to the auditors is to try and attack Lighthouse ourselves.

Network Testing

As all the moving parts in Lighthouse have come together and are running very smoothly we are approaching the time to start performing some of our own security testing on Lighthouse.

Although we have woven security best-practices into the development process of Lighthouse and employed fuzzing where applicable, we think there is value to tackle Lighthouse from the mindset of a malicious actor.

This involves building malicious tools/infrastructure that have the capabilities of attacking various parts of the client and known weak spots that may be exploitable. If time permits, we hope to get some serious attack vectors going on some internal testnets before the audits to see how Lighthouse performs.

This process will require some extra development as we build out some tooling that weaponizes some of the protocols and networking inside Lighthouse.

Hopefully by the next update we will have some interesting exploits or bugs that were uncovered, but we're equally hopeful that nothing is found :).

Still to come

With all the above happening, we're still working away at a validator client UI, which we'll unveil soon, so keep an eye on our updates.

We're approaching the end of Phase 0 development and are set to complete our final core features and protocols before undertaking our final audits before Mainnet.

Our core focus is now on testing and ensuring the stability, performance and security of Lighthouse nodes. We'll be actively stress testing the client and attacking it from all possible directions we can think of to battle-harden it for a live Eth2 Mainnet.