Lisk Core 2.0.1 Released to Fix a P2P Network Vulnerability

Today, we released Lisk Core 2.0.1 to both Testnet and Mainnet. This patch was released because of a vulnerability in the way nodes manage peer lists. This in turn led to unsustainable growth in CPU and memory usage for all nodes in the network. This update is for node operators only — all the funds are safe and there is no need for any action from Lisk Network users.

By Lisk

•

31 Jul 2019

On July 15th, we received a Bug Bounty Program submission with a number of vulnerabilities. We accepted three of them as valid. Two out of the three are related to the P2P layer of our application. One of them was a possible vector of an attack related to spamming the network with a huge number of invalid peers. In the meantime, another issue was opened on GitHub that highlighted a problem with unreachable peers not being removed. We wanted to fix all the P2P-related issues as part of the upcoming Lisk Core 2.1.0 release (based on Lisk SDK 2.3.0). This is because this release will already bring a lot of changes and improvements to our P2P protocol and make some of those attack vectors no longer possible.

However, two days ago, a slightly different variation of an invalid peers spam attack was executed on the Testnet network. Almost 30,000 invalid peers were announced to the network and then propagated through over the following hours. Nodes kept trying to connect to all the peers at once and the amount of so many simultaneous connections being opened in a short period of time overwhelmed them. Testnet lost stability because nodes were too busy to forge and validate blocks on time. Shortly after, we received two additional submissions for the Bug Bounty Program which described that attack in detail.

Because the vulnerability was disclosed publicly and it also made both networks vulnerable, we decided to release a patch with the following mitigations:

Remove the peer on the first failed connection attempt (previously peers were kept as disconnected and node were trying to re-establish the connection every 30 seconds).
When the node is discovering new peers (through remote procedure call list) it will accept only 100 (previously it was accepting all of them, even if the amount were huge).
For the peers discovery process, the node is announcing only connected peers in response to RPC calls from the other node (previously it returned up to 100 known peers, regardless of the connection status)
In consensus calculation and when sending requests to other peers, we pick only one node per IP address (this prevents many nodes from the same IP address having a negative impact on the network).

The situation on the Testnet is stabilizing, as node operators are upgrading pretty quickly. The full release notes are now available on GitHub.