New Lisk ID System
In this article, we give an overview of the main improvements related to the new ID system introduced in the Lisk platform, as part of the Network Longevity phase of the Lisk Protocol Roadmap. All these changes are formally specified as Lisk Improvement Proposals (LIPs) that have been thoroughly researched and reviewed.
We start by introducing the main technical concepts used in this article. This is followed by an explanation of the new address system for the Lisk ecosystem (LIP 0018). We discuss it from the protocol perspective and also the user interface perspective. Next, this is followed by the new ID format for transactions (LIP 0019), and finally by covering the new ID system for blocks in Lisk (LIP 0020).
This article mentions some technical concepts that are important to fully understand the motivation behind the new ID system. In this section, we briefly explain these concepts in case you are not familiar with them.
A hash function is a function that maps an input of arbitrary size to a fixed-size output. One of the most commonly used hash functions is SHA-256 which has a fixed output length of 256 bits. A cryptographic hash function is a hash function that is suitable for use in cryptography and is required to have some additional properties. In this article, we are particularly interested in the following two properties of cryptographic hash functions: preimage resistance and collision resistance.
Given any hash output, it should be difficult to find an input that gives this output. Functions that lack this property are vulnerable to preimage attacks. If the output length is 128 bits, then there are 2128 possible outputs of the hash function. For a cryptographic hash function, these outputs are assumed to be equally likely. So in turn it would take 2128 attempts in expectation in order to find the input that yields the desired hash output. This is considered to be sufficiently difficult so that hash outputs of at least 128-bit lengths are assumed to be secure in the Cryptography community.
A hash function is collision resistant, if it is hard to find two different inputs such that their hash outputs are the same. When this occurs, we define this as a hash collision. Every hash function has collisions due to the fixed size of the output, and the infinite number of possible inputs. In this context, hard means that there is basically no better way to find a hash collision than trying numerous inputs until any two of them yield the same output.
Representation of Numbers
In this article, we see different number representations used to represent public keys and addresses in both the old format and the new one. The table below shows a basic example of these number representations and examples of usage:
This address format was designed with the aim to be convenient for users when reading, typing or communicating an address. However, it also has three relevant shortcomings, two of them affecting the security of the users:
- Low preimage-resistance
Currently, Lisk holders are strongly advised to perform at least one transaction from their account. This way their public key is stored in the Lisk blockchain as part of the account, and only this public key is allowed to sign any future transactions from the account. Otherwise, anyone who has a public key yielding the same address would be able to access all funds in the account. For 64 bits this is still highly unlikely, but as mentioned previously, 128-bit preimage resistance is considered safe, whereas 64-bit preimage resistance is generally considered too little for the long term.
- Low collision-resistance
Due to the length of only 64 bits, the probability of having an address collision, i.e., two users coincidentally generating key pairs that both yield the same address, is not negligible. Note that collisions do not provide any value to an attacker per se, but it can lead to accidental loss of money if a new user generates an existing address.
- No error-detection
The addresses do not have any mechanism to detect typos or errors in an address. Hence, mistyping a single character of the address in a balance transfer transaction results in sending funds to a different account. In addition, those funds will be lost forever if no user has the corresponding key pair for the receiving account.
We will explain in the next subsections how the new address system solves these shortcomings and how we plan to migrate to it in the Lisk platform.
At the protocol level, the new address is the 160 most significant bits of the SHA-256 hash of the public key of the account. For example, let’s assume that we have a public key, given in hexadecimal format as shown below: `0x0eb0a6d7b862dc35c856c02c47fde3b4f60f2f3571a888b9a8ca7540c6793243`, then the address used at the protocol level is the following: `0xc247a42e09e6aafd818821f75b2f5b0de47c8235` (it is a common standard to add the prefix `0x` to hexadecimal representations).
With a length of 160 bits, new addresses are long enough so that it is computationally infeasible to find a new key pair (public key and private key), that returns an existing address. A 160-bit long string means one has to try 2160 key pairs on average to find the key pair for a given address (see Preimage Resistance above). Therefore, users do not need to perform the initial registration transaction process anymore. This also means that it is practically impossible that two independent users pick two key pairs that result in the same address.
However, dealing with these hexadecimal strings may be tedious and prone to errors for end users. We encode this 160-bit string in a way that the address format at the user interface level is easier to use, more compact, and prevents typing mistakes or copy-paste errors. Let’s see how this encoding works in the next subsection.
Address Representation for the User Interface Level
For users of the Lisk platform, addresses will always be shown in a user-friendly Base32 format and include a checksum which detects typing errors. This means that instead of ‘0xc247a42e09e6aafd818821f75b2f5b0de47c8235’, the example address above will look like ‘lsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu’.
This user-interface address representation is always 41 characters long, contains decimal digits and lower-case letters from the Latin alphabet, and starts with the prefix ‘lsk’. Basically, the encoding of the addresses for the user interface level is divided into three main steps listed below:
- Compute a 30-bit checksum of the 160-bit address and append this checksum to the address.
- Encode the output into the Base32 format.
- Prepend the prefix “lsk” to the previous output.
In the next diagram, we can see how the whole process of address generation works in each step, and how it is applied for our initial public key: `0x0eb0a6d7b862dc35c856c02c47fde3b4f60f2f3571a888b9a8ca7540c6793243`
Let’s see why we introduced these BCH code checksum and Base32 encoding steps in the next subsections.
Checksum and error detection capabilities
A checksum is a small piece of redundant data added to a certain block of data, namely for the purpose of detecting errors that may have been introduced during its transmission or manipulation. The simplest and most common example of checksum is the parity bit. This means that you add an extra 0 or 1 to the data such that the total number of 1’s in the binary representation is even. This allows it to detect flips of a single bit or an odd number of bits. With this goal, we introduced a 30-bit checksum to be part of the address. This is computed by using the same BCH code that is used in bech32 for Bitcoin.
Thanks to this added checksum, a user can mistype up to 4 characters in the address and it is guaranteed that the application will detect it. It does not matter if the errors are introduced in the checksum part, and/or the part that represents the 160-bits address. If the user introduces more than 4 errors, there is no detection guarantee anymore, but a very high detection probability. For example, the probability of detecting 5 errors is larger than 99,9999999%.
Let’s consider again the previous address: ‘lsk24cd35u4jdq8szo3pnsqe5dsxwrnazyqqqg5eu’. For this address, the checksum (in Base32) is ‘qqg5eu’. Imagine that a user wants to send LSK to this address, The following graphic shows how sending a transaction to an address that includes 4 errors would be rejected by the wallet:
Base32 format encoding
A Base32 encoding basically means that we use a set of 32 digits, each representing 5 bits (note that 32 is 25). Base32 is especially interesting for our case because it allows us to use common numbers and letters to represent these 32 digits. Since the modern Latin alphabet has 26 letters and there are 10 decimal digits, i.e., overall 36 distinct characters, we have to choose what characters are going to represent our Base32 format. Thus, the most ambiguous characters, “i”, “l”, “0” and “1” are excluded. Also, for readability reasons, we only use lower case letters. Hence the 32 characters used in our Base32 format in order of encoding are as follows:
An address in Lisk containing a character that is not in this list will automatically result in an invalid address (excluding the ‘lsk prefix).’ To convert the address to its Base32 representation, each 5-bit chunk is mapped to the corresponding Base32 character.
Migration to the New Address System
For the majority of users, the migration to this new address system will be done seamlessly. Once the new address system is introduced, users will still be able to log into Lisk Mobile or Lisk Desktop with the passphrase as before. The only difference will be that now the interface will show the new Lisk address in the Base32 format, and users will only be able to send money to addresses in this new format. Hence the only requirement for the users is to become familiar with the new address format and its appearance, so they can continue using the platform as usual. This is the case for accounts with an associated public key, which are the majority in Lisk.
Only accounts that never send a transaction cannot be migrated to the new address system automatically, as the public key is not stored in the Lisk blockchain. For these accounts, users will have to once send a special reclaim transaction, signed with their original key pair to perform the migration.
New ID System for transactions
In Lisk, every transaction has an identifier, abbreviated as a transaction ID that is supposed to uniquely identify a transaction, either included in the blockchain or waiting for inclusion. Currently, this transaction ID is a 64-bit value that is shown in decimal representation in the user interfaces. For example, one of the first transactions included in the Lisk genesis block has the transaction ID of ‘5449806225917864483’. Transaction identifiers do not suffer the same security vulnerabilities as addresses. However, the current 64-bit length has some user experience drawbacks, as the probability of a hash collision is non-negligible. For instance, if every block contains 120 transactions, a collision is expected in approximately 11 years. This means that users might experience a scenario whereby a validly signed transaction is rejected by the network, due to the uniqueness requirement of transaction IDs. More importantly, the user interface level (e.g., Lisk explorer) would be also affected since the already included transaction and the rejected transaction have equal IDs.
The solution for this issue is as straightforward as you may assume here, which is that we need to define longer transaction IDs. Hence, the new transaction ID is the entire SHA-256 output of the transaction, which is 256 bits long. With this ID length, assuming that each block contains around 120 transactions, the probability of having a collision within the next 50 years is practically zero. For a more compact representation in the user interfaces for 256-bit IDs, we further switch from a decimal representation to a hex representation. This means, that in the future a transaction ID will appear as the following: ‘0xda63e78daf2096db8316a157a839c8b9a616d3ce6692cfe61d6d380a623a1902’ instead of ‘15822870279184933850’.
New ID System for blocks
Current identifiers for blocks, also denoted by block IDs, have the same format as transaction IDs, namely a 64-bit value derived from the block header and shown in decimal representation. For block IDs, the limited range of possible hash values provides only relatively weak protection against pre-image attacks. This means that an attacker could try to create an alternative block with the same block ID, and also the same previousBlock property as a given block in the blockchain. If they succeed, other community members might be tricked to synchronize from the attacker chain as they could be convinced this is genuine, as the altered chain would be valid. To be able to perform the whole attack, the attacker would need the keypair for the delegate of the corresponding block slot, to complete ~264 computation steps, and also the targeted users would need to synchronize to the altered blockchain. This means that this attack is very expensive and complex to achieve practically.
Nevertheless, in order to ensure the immutability of the Lisk blockchain for a long time into the future, the new block ID is the entire SHA-256 output of the block header. This 256-bit length provides resistance against pre-image attacks of 256 bits, which makes the explained attack infeasible in any situation in the foreseeable future.
This blog post gives an overview of the new ID system, but all the details are given in the specific LIPs. For further questions and comments on the topic, we will host an AMA on Lisk.chat with Iker Alustiza (Research Scientist) this Friday, July the 24th at 4 pm CEST. We also invite all community members to go to the Lisk Research forum where we are always happy to hear your feedback and to participate in discussions on this and other topics.
Lisk is on a mission to enable you to create decentralized, efficient, and transparent blockchain applications. Join us: