Lighthouse Update #07

The Sigma Prime team seeks to provide some clarity regarding the March testnet, provide an update on our recent achievements, and outline the next-steps for Lighthouse.

Beacon Chain Testnet

Ameen Soleimani from SpankChain recently posted an article titled "The State of Ethereum 2.0". I found this a great read for many reasons -- it was well-researched, and written by someone who demonstrates experience in managing projects. One thing that resonated with me was the idea that the updates we're providing are full of technical jargon that reads a bit like:

We implemented the ASDF protocol from the JKL; v0.2.1 specification and demonstrated a 0.0024ns improvement in our QWERTY cycles. This give us confidence to upgrade to ZXVCV v3.1.4.1.5.9.

Whilst this information is useful to those familiar with the core-protocol, it doesn't do much to set expectations for everyday users of Ethereum.

To address this, the first section of this article is dedicated to setting some expectations for the "2019 Q1 Beacon Chain Testnet". I hope it provides useful information without drowning you in technicalities.

Thanks to Danny Ryan for providing feedback. (This line seems like a cliché now!)

Introduction to Ethereum Testnets

Before I get started, I want to clarify that a client is a piece of software which implements a blockchain specification. We use client in the "client-server architecture" sense, not the "customer" sense. For example, Parity-Ethereum and Geth are clients implementing the Eth 1.x specification, whereas Lodestar, Nimbus, Prysm, and Lighthouse are some clients which will implement the Eth 2.0 specification. (See EthHub for a comprehensive list of Eth 2.0 clients.)

In the Ethereum space, a testnet (short for test-network) is a blockchain with a core token of no substantial value which provides a cheap, low-stakes environment for testing network upgrades and smart-contracts. Görli, Ropsten, Rinkeby, and Kovan are Ethereum 1.x testnets which have provided testing-grounds for both core protocol updates and dApps.

The "Ether" on testnets has almost no value and can generally be obtained for free from web sites or chat channels. The combination of a low-value token and beta software makes testnets insecure and unreliable environments -- the cheap token makes attacks cheap and beta code has been known to cause testnet downtime.

Ethereum 2.0 Testnets

Whilst the Eth 1.x testnets listed earlier provide a useful environment for smart-contract developers, the first Eth 2.0 testnets will not cater for dApp and smart contract developers. Instead, the testnet will serve as a proof-of-concept that the networking and core-consensus aspects of the Beacon Chain can stand up to the harsh realities of running on the Internet (latency, lossy-transmission, attackers, etc).

There are quite a few qualifications you can apply to a testnet: single-client/multi-client, private/public and short-lived/long-lived. Each of these are explored in the following three sections.

Interoperability

  • Multi-client testnets: multiple different clients co-operating on the same testnet, communicating with each other, and reaching consensus on a canonical block.
  • Single-client testnets: each client running different testnets, each agreeing on a completely different blockchain.

Single-client testnets are only useful for proving that your client works as you expect, whereas multi-client testnets are additionally useful for proving that your client works in a way that all the other clients expect -- i.e. it's an interoperability test.

Ethereum 2.0 intends to have multiple clients all building the same blockchain, so single-client testnets eventually need to evolve into multi-client testnets.

Accessibility

  • Public testnets: are available on the public internet and are open to participation by anyone. All of the testnets listed in the introduction are public.
  • Private testnets: run on a private network, accessible only to authorised parties.

Private testnets are useful for client developers who need to test functionality without uptime requirements or distractions from users and potential attackers.

Of course, ignoring users and attackers is unrealistic, so private testnets must eventually evolve into public testnets.

Longevity

  • Short-lived testnets: no guarantees that the testnet won't be completely abandoned next week.
  • Long-lived testnets: some guarantees that the testnet will survive for some period of time.

Short-lived testnets are much easier to maintain -- if you discover a bug, you can fix the bug and abandon the previous buggy software. A long-lived testnet on the other hand requires that blocks generated by buggy code can still be processed by the updated code. Not only does this bloat the codebase, it involves complicated decision making regarding trade-offs when handling the buggy blocks.

Ethereum should be a persistent store of information so short-lived testnets must evolve into long-lived testnets. No-one wants to lose their wallet every few weeks.

Getting to Testnet

It's clear that the "holy grail" of testnets is a long-lived, multi-client, public testnet and that everything before that is a stepping stone. For implementers, getting to the holy grail has the following primary requirements:

  1. An agreed-upon specification. This is almost certainly going to be the result of the Ethereum Foundation (EF) researchers "blessing" a specification release as "testnet ready".
  2. Agreed-upon test-vectors. Test-vectors are, generally speaking, a collection of files that define inputs and outputs to some function/program. Implementers can run the inputs through their functions and ensure their output matches that of the test-vector. This allows implementers to ensure their code operates as expected in a controlled environment, rather than "in the wild" on a testnet. At this stage, test vectors look like they're going to be generated by Python code in a repository managed by the Ethereum Foundation.

Whilst a long-lived multi-client testnet requires co-ordination from the Ethereum Foundation, it is possible for implementers to start laying the stepping-stones by building short-lived, single- and multi-client testnets without a "blessed" specification from the EF. (Indeed it is always possible for implementers to ignore the input of the EF, but they seem benevolent enough for now).

Producing single-client testnets can be done without test-vectors. However, multi-client testnets are challenging to achieve without test vectors as it is very difficult to discover critical bugs without comparing outputs to an agreed reference is very difficult.

EF researchers are dedicating more effort towards producing test-vectors and acknowledge there are bugs to be ironed out before multi-client testnets are sensible. As far as I know, they've been hesitant to provide hard deadlines and I find that completely reasonable. This is a first-of-kind network -- rushing the research and design process is risky and committing to uncertain deadlines can reduce confidence. I'd rather have a robust design tomorrow than a rushed design today. Of course, perfect is the enemy of good but I don't presently accuse the EF of perfectionism.

Testnet Timelines

Moving onto the "when testnet" question, you've probably heard "Ethereum 2.0 testnet in Q1 2019" before. This is indeed the informal mandate of several Eth 2.0 teams (Lighthouse inclusive). However, will it be many single-client testnets or a big multi-client testnet? Will they be private and short-lived, or public and long-lived?

Please note: the following sections are my opinions and are based upon a subjective "feel" of the current situation.

What to expect in March

In my opinion, we should expect private, single-client testnets as a minimum. Most teams are already nearing this goal, especially those which already have a working implementation for their networking stack (libp2p gossipsub).

More optimistically, we could also expect to see an assortment of short-lived, single-client, public testnets, and some short-lived, multi-client private testnets.

What not to expect in March

We should not expect a long-lived, multi-client, public testnet. At this stage I think it would be a mistake to commit to longevity as I suspect we'll find some major bugs in the coming months which may produce an unmaintainable chain, or a chain which is so messy that keeping their blocks around will hinder ongoing development.

I do not think it's reasonable to expect anything which resembles Ropsten or has smart-contract functionality. That's well beyond this phase of the project.

Summary

We're going to see testnets in March, but they won't be comparable to Ropsten -- they'll be proof-of-concepts of a new blockchain design. What we're going to see are a handful of different testnet varieties which each test and demonstrate isolated parts of a larger system.

They won't demonstrate a user-ready platform, but they will demonstrate huge progress towards a scalable and efficient Ethereum.

Progress update

The rest of this post details the technical progress and achievements of the Lighthouse project over the past month and our plans for the future.

Achievements:

  • Adrian (@agemanning) submitted a PR which implements the Gossipsub protocol for rust-libp2p.
  • Block processing and state transitions have been implemented, as per late January.
  • Paul (@paulhauner) implemented a test harness which is capable of building a beacon chain with thousands of validators.
  • Benchmarking on state transition and fork choice has begun.
  • A new Rust developer is starting in March.

Next steps:

  • Update to match the January pre-Release spec.
  • Implement an optimised LMD GHOST.
  • Improve state transition efficiency.
  • Implement implement basic network syncing.
  • More developer interviews, apply here.

Gossipsub

For ethereum 2.0, the nodes will be communicating in a P2P fashion using a publish/subscribe (pubsub) protocol. These protocols allow a network to be segregated into topics that nodes can publish and subscribe to. Thus, a node that only cares about a particular topic does not need to be troubled by the potential noise of all other topics on the network. In Ethereum 2.0, we imagine each shard will exist on its own topic and only nodes interested in specific shards will subscribe to that topic.

The routing of these protocols is also important. Passing of messages from node to node should be done efficiently. For example, one routing mechanism known as floodsub dictates that when a node gets a message, it passes that message to all other nodes that it knows about who are also subscribed to the topic for which the original message was received. If all nodes do this, there becomes significant network congestion as messages are being duplicated everywhere. The upside is that messages propagate through the network rather quickly.

Gossipsub is a more efficient routing mechanism than floodsub. It has been implemented in a standalone daemon, however there was no implementation in rust. Sigma Prime decided to build this implementation for a variety of reasons:

  • Making it available for other projects in the space to utilise
  • Gaining a greater understanding of the inner workings of the rust-libp2p implementation
  • Written as native rust code, allowing us to tweak and optimise our networking layer of the lighthouse client to be efficient for our needs relating to the Ethereum 2.0 chain.

Benchmarking

As shared on Twitter, we've been doing benchmarks on chain operations with many thousands of validators. We're excited to have the infrastructure to benchmark our code and look forward to integrating it into our process.

What we've discovered is that our present implementation is too slow for production usage. This is not alarming and we suspected it would be the case. We have the following items listed as good candidates for speeding up chain operations:

  • Implementing cached hash-tree-root updates. This involves reducing the number of hashes performed during state updates by only re-hashing the parts which have changed. Vitalik posted examples of this code last week and we're currently investigating it.
  • Optimising state transition logic. Our state transition logic is implemented in a way that makes it easy to change as the spec updates. Once we see more stability from the spec (which is expected soon) we can start to optimise for execution speed instead.
  • Faster LMD GHOST implementations. We're in the process of implementing and benching two different implementations and will post results once we have them.

At this point in time, we're not releasing our benchmarks as they don't represent what one can expect from a final product. We look forward to sharing once we're confident of their relevancy.

Contributors

As always, we're keen for more contributors! If you want to get involved, please find a "good-first-issue" and leave a comment or drop a line on the gitter channel.