Introducing Byzantine Fault Tolerance Consensus for Lisk
I will give an overview of Lightcurve’s proposal for a Byzantine fault tolerance (BFT) consensus algorithm for Lisk. I will explain the importance of BFT, outline the main features and benefits of our proposed change and compare it to other consensus algorithms.
Byzantine fault tolerance addresses a very real challenge in distributed systems — that challenge is ensuring that computers in a said system can reach consensus on certain data in a robust fashion. Today, I will give an overview of Lightcurve’s proposal of introducing a BFT consensus algorithm in Lisk. The idea of the consensus algorithm is that delegates not only forge blocks, but also vote on the correct block at every height. This allows to add Byzantine fault tolerance (BFT) to the consensus algorithm in Lisk and yields block finality guarantees, that is, guarantees that a block is never reverted assuming a certain fraction of honest delegates. Block finality and a robust consensus are key requirements before introducing sidechains and communication between different chains in Lisk. Below, I will explain in more detail what these properties mean, why they are useful and how the proposed consensus algorithm works, as well as compare it to other industry solutions.
Lightcurve Science team has published this proposal on our Research forum following the Lisk Improvement Proposals (LIPs) process. Introduced at the end of last year, LIPs provide an open way to discuss and contribute to the evolution of the Lisk platform. Check out our Research forum for many LIPs that are already proposed and ongoing discussions about them. As a science team at Lightcurve, we are researching LIPs for all the objectives defined on the Lisk roadmap and are encouraging the Lisk community to also participate in this research effort by giving feedback, discussing proposals and presenting your own ideas.
Table of Contents:
- Motivation for introducing Byzantine fault tolerance
- Proposed BFT consensus algorithm
- Basic concepts: voting on blocks, safety and liveness
- Key properties of the proposed algorithm
- Comparison with other BFT consensus algorithms
- Comparison to Tendermint BFT: separate messages slow down block proposals
- Comparison to Ethereum BFT: overhead of vote messages that need to propagate through the network and be stored
- Comparison to proof-of-work blockchains
- Comparison to Bitcoin: probabilistic guarantees depending on the number of confirmations
- Comparison to Ethereum Classic: 51 % attack proven possible for market caps of $550 million
- Next steps: community review and feedback
Motivation for introducing Byzantine fault tolerance
Now that we have established the main concepts, let us have a look at the current consensus protocol used in the Lisk blockchain. In Lisk, in every 10 second time window exactly one of the 101 active delegates (the top 101 delegates by vote weights) is allowed to add a block to the Lisk blockchain. If the network conditions are good and the active delegates are online, which is true most of the time, one block is proposed after another, always referencing the previously proposed block and hence forming only one growing chain, simply known as the Lisk blockchain. Sometimes, however, there can be multiple child blocks of the same parent block and separate growing chains. We call two blocks conflicting if neither of them is a descendant of the other. This situation can happen, for example, when not all nodes in the network receive a block due to latency or attacks on the network, or if a delegate forges multiple valid blocks in its 10 second time window.
In such a case, there are multiple valid chains, and nodes need a common rule in order to decide how to choose one of them. This rule is called a fork choice rule. In Lisk, nodes use the longest chain rule, which means that among all valid chains the longest chain is chosen, as described in the original DPoS whitepaper. However, there are some additional rules that have been put in place to prevent attackers from easily disrupting the consensus protocol using blocks with invalid heights, or by creating alternative chains that lead to the deletion of a large number of blocks. For instance, the number of nodes in the network that have the same chain is taken into account (also known as broadhash consensus). A node also never deletes more than 201 blocks when changing from one chain to another chain. This value has been chosen based on the experience that situations with conflicting blocks are almost always resolved within two rounds.
The current fork choice rule used in the Lisk network has worked well in practice, but does not provide finality, which is a guarantee that a certain block is never reverted, in all theoretically possible network situations. For instance, once a transaction is included in one block of the Lisk blockchain and 201 blocks are added as descendants of this block, it is still possible that this block is eventually not part of the Lisk blockchain, even if no delegate acts maliciously. For instance, if there is a network split for an extended period of time, a minority group of delegates could forge on one chain and the majority group another chain, both containing more than 201 distinct blocks.
Eventually, all exchanges and network participants would likely move to the majority chain, which is longer. The delegates in the minority chain now have to revert more than 201 blocks (this would require manual intervention) to be able to change to the majority chain.
Due to scenarios as described above, we want to introduce a consensus algorithm in Lisk that is based on theoretical analysis and has proven guarantees that hold even in arbitrary long periods of bad network conditions. Furthermore, we would also like to tolerate Byzantine faults, i.e., delegates that try to maliciously attack the network by sending contradicting information or delegates that are simply offline. A consensus algorithm that only relies on the minimal assumption that a certain fraction of delegates behaves honestly (follows the consensus rules) for guaranteeing finality yields the highest degree of certainty that a transaction will irrevocably be part of the blockchain in the future. In other words, these minimal assumptions yield the highest degree of trust that a specific part of a blockchain is immutable and the corresponding blocks cannot be reverted. This is important for many applications building on blockchain technology. When exchanges credit you a deposit in LSK tokens, for instance, they require a very high degree of certainty that this cannot be undone. For the future possibility of value transfers between chains in the Lisk ecosystem, it is also necessary that one chain has a very high degree of certainty that certain transactions in another chain, which initiate the value transfer, cannot be reversed.
Proposed BFT consensus algorithm
Now that we have established why finality is important, let us have a deeper look at how the proposed BFT consensus algorithm would improve the Lisk network.
Basic concepts: voting on blocks, safety and liveness
The basic idea of the proposed consensus algorithm is that delegates not only propose blocks, but also vote on what they believe to be the correct block at every height. Let us first consider a simplified model to get a better understanding of the problem. Assume that delegates are supposed to cast one vote for what they believe to be the correct block at a given height (e.g., by sending a specific message containing a vote for a block to the network). A node further finalizes a block, meaning it irrevocably deems it part of the blockchain, once a block reaches a certain number of votes.
The first question would be how to choose the right threshold of votes for finalizing a block. One idea may be that a majority is sufficient, i.e., votes by at least 51 out of 101 delegates in Lisk. Recall that we also want the consensus algorithm to tolerate Byzantine faults, in particular, malicious delegates that may send contradicting vote messages. Assume, for instance, that there are two blocks in different chains with votes by 50 delegates each.
Then one malicious delegate could vote for blocks in both chains that would be deemed final if 51 votes are already sufficient. A consensus algorithm that never finalizes conflicting blocks is said to satisfy the safety property. So in order to satisfy the safety property we would need to choose a higher threshold.
Now, assume that we want to be very conservative and only finalize a block once we receive votes by all 101 delegates. In this case, a malicious delegate could never vote and a block is never finalized so this is also not a good idea. Namely, we also want the consensus algorithm to satisfy the liveness property, which means at some point new blocks actually are finalized. Therefore, the threshold cannot be chosen too high.
By trying different thresholds, we may realize that a finality threshold of at least 68 out of 101 delegates allows for up to 33 delegates to not participate in voting and still a new block can still be finalized. Moreover, if at most 33 delegates vote for two conflicting blocks, still at most one at every height can obtain 68 votes. For our simplified example this means that by choosing the threshold of 68, we can tolerate the largest number of Byzantine delegates, namely 33 Byzantine delegates.
Actually the situation is more complex and one round of voting is not enough to satisfy the safety property, as there can be delays in the network and vote messages can get lost. If for example 68 delegates vote for a block on one chain, but never receive the votes by the other delegates and later vote for blocks on another chain, again two conflicting blocks could get 68 votes. It is therefore necessary to introduce two rounds of voting and only finalize a block after two rounds are completed successfully, which is described in more detail in the next section.
Key properties of the proposed algorithm
We call votes in the first round of voting prevotes, and those in the second round of voting precommits. Only after a block receives at least 68 prevotes, a delegate is allowed to cast precommits for it. Moreover, after a block receives 68 precommits, a block is finalized. There are some additional constraints that need to be satisfied when casting a precommit for a block. For instance, after sending a precommit for a block, a delegate will afterwards only cast prevotes for descendants of that block (unless a different chain has a block of larger height with at least 68 prevotes). The full formal description of this consensus algorithm, referred to as Lisk-BFT, and the correctness proofs are given in the related paper. The detailed specifications how this consensus algorithm could be introduced to Lisk is given in the “Introduce BFT consensus” LIP.
The main advantage of the proposed Lisk-BFT consensus algorithm is that delegates do not explicitly send separate messages. Instead, the prevotes and precommits are implied by delegates adding the largest height, at which they previously forged, to every block. Moreover, the fork choice rule requires delegates to vote and forge on the chain that contains the block of largest height with at least 68 prevotes. In order for delegates to apply this fork choice rule efficiently, every block additionally contains the height of the block with at least 68 prevotes. This means that only two additional integers in blocks and no overhead of additional vote messages enable Lisk to have a robust Byzantine fault tolerant consensus algorithm. The consensus algorithm provides both the safety and liveness property that we introduced above if at least 68 delegates are honest. In particular, two conflicting blocks are never finalized, as long as at most 33 delegates are Byzantine.
Comparison with other BFT consensus algorithms
On our road to formulating this LIP, we considered multiple other Byzantine fault tolerant consensus algorithms. Below you can find a comparison with two other popular ones.
Comparison to Tendermint BFT: separate messages slow down block proposals
In Tendermint, one of the first BFT consensus algorithms used for blockchains, two rounds of voting are necessary to finalize blocks, but block proposers send their votes in separate messages and a new block is only proposed after the previous block is finalized. This means blocks are finalized fast, but the block proposal is slowed down by two additional voting phases that need to be completed successfully before a new block can be proposed. For instance, if 101 block proposers participate in voting in Tendermint, this means that 202 different messages need to propagate through the network for every block and this data including the signatures needs to be stored. Moreover, without the use of an aggregate signature scheme, this overhead currently increases linearly with the number of proposers involved. For Lisk-BFT the blockchain can progress as fast as delegates are proposing blocks as no separate messages are necessaryand the blockchain can progress without requiring blocks to be finalized. On average, around 150 blocks have to be subsequently added as descendants of a block for it to be finalized.
Comparison to Ethereum BFT: overhead of vote messages that need to propagate through the network and be stored
Casper the Friendly Finality Gadget (Casper-FFG) is a consensus algorithm proposed for the Ethereum blockchain. In Casper-FFG the block proposal and consensus on blocks happen independently and participants send separate vote messages not for every block, but only for specific checkpoints (e.g., every 50 blocks). Depending on the exact parametrization adopted by Ethereum, the time until a block is finalized is similar to our proposal, but Casper-FFG has the overhead of additional vote messages that need to propagate through the network and need to be stored to be able to prove protocol violations of the participants. In the future we may explore to evolve the proposed consensus algorithm by using separate vote messages together with a suitable aggregate signature scheme to achieve a lot faster finality with little overhead as vote messages can be aggregated.
Comparison to proof-of-work blockchains
In this section, we want to compare the finality guarantee given by our proposed Byzantine fault tolerant consensus algorithm to the finality in proof-of-work, such as Bitcoin. Proof-of-work blockchains, where participants choose the chain that requires the most work, yield probabilistic finality assuming that a majority of miners (in terms of hash rate) is honest. Probabilistic finality means that it becomes increasingly unlikely that a block will be reverted, the more blocks are added as descendant of it, but there is never a guarantee that a block cannot be reverted.
Comparison to Bitcoin: probabilistic guarantees depending on the number of confirmations
In Bitcoin, for instance, it is recommended to wait until a transaction has 6 confirmations (i.e., the are 5 subsequent blocks building on top of the block containing a transaction) until it can be deemed part of the Bitcoin blockchain in the future with a high degree of certainty. However, an extremely lucky miner with a rather small hash rate may still find an alternative longer chain not containing this transaction even if the majority of miners is honest (see Section 11 of the Bitcoin Whitepaper or https://people.xiph.org/~greg/attack%5Fsuccess.html for concrete probabilities). Moreover, the assumption that always a majority of miners is honest does not actually hold for most proof-of-work blockchains. Due to mining marketplaces like NiceHash it is possible to temporarily obtain more than half of the hash rate of a network without a large investment into mining hardware. This can then be used to perform a 51 % attack, i.e., revert a large number of blocks that the network participants already deemed irrevocably be part of the blockchain in the future.
Comparison to Ethereum Classic: 51 % attack proven possible for market caps of $550 million
The recent 51 % attack on Ethereum Classic shows that 51 % attacks are feasible in practice, even for cryptocurrencies with a significant market capitalization. See also Crypto51 for an overview for the price of performing a 1 hour 51 % attack on different proof-of-work blockchains. This shows that for most proof-of-work blockchains there is not a very high degree of certainty that some information like transfers of tokens will irrevocably be part of the blockchain in the future.
Next steps: community review and feedback
As outlined in the LIP purpose and guidelines, the Lisk community is now encouraged to thoroughly review and give feedback in our Research forumto the “Introduce BFT consensus” LIP proposed by Lightcurve. Note that the “Improve blocks verification” objective and its implementation is scheduled for the next roadmap phase “Security and Reliability”. In the meantime, we look forward to community input on this improvement.