AMA Recap: Sparse Merkle Trees and the New State Model

In the previous research blog post "Sparse Merkle Trees and New State Model", we introduced one of the cornerstones of Lisk-interoperability: The state model of an interoperable chain. The blog post describes the concept of sparse Merkle trees, first introduced to the Lisk protocol in LIP 0039, and explores the state model, the state tree, and the new storage interface.

In order to understand what will change from a developer perspective when interoperability is implemented into the new SDK we published a blog post and also hosted a live AMA (Ask Me Anything) on Lisk.chat to give the community a chance to learn more about the topic of Sparse Merkle Trees and the New State Model. In this blog post, we recap the full AMA session and feature the questions asked by community members, as well as the answers of Alessandro Ricottone, Research Scientist.

By Lisk

28 Jun 2021

AMA_Sparse_Merkle_Trees&New_State_Model_Social@2x.png

In this blog post, we recap the full AMA session and feature the questions asked by community members, as well as the answers of Alessandro Ricottone, Research Scientist.

Alessandro opens the AMA Hello everyone!
Sand: Hi Alessandro, could you explain the regular Merkle tree
Alessandro: First things first, let me link my previous [blog post](https://lisk.com/blog/research/introducing-lisk-tree). In short, a regular Merkle tree (RMT from now on) is an authenticated data structure organized as a tree. Authenticated because it is possible to hold a single hash, the Merkle root, to commit to the underlying array.

This means that there is a "one-wayness" if you want, where each possible array corresponds to a unique Merkle root. It's organized as a tree because rather than just hashing the array directly, we put each element in a leaf node and then hash those pairwise until we get to the root the advantage of doing so is that basically you can show that an element is part of the underlying array with a short proof, where short means that it scales logarithmically with the number of elements, rather than linearly.

Moreover, the tree structure allows the prover to prove (nomen est omen) that a certain element is in the array without revealing any other element. If anything is unclear at all, please ask, we're here for this. The defining feature of a Merle tree is that the order of the array matters. If you switch two elements in the array, you get a different root.

That is a very useful property if we want to be sure that indeed nobody is messing with the order of the array. For instance, our interoperability solution uses RMT for the inbox/outbox, i.e. the data structures holding the cross-chain messages (CCM from now on). In this case it's very important to authenticate the order, because we wanna make sure that messages are received in the same order in which they were sent RMT for transactions.
Przemer: Are there any projects that use Sparse Merkle trees
Alessandro: Projects as in other blockchains? Sure! Basically every other project that wants to implement interoperability uses sparse Merkle trees (with slightly different implementations).
Sand: Why do Sparse Merkle Tree come into the picture?
Alessandro Good question. RMT are good at authenticating arrays. Sparse Merkle trees (SMT from now on) are good to authenticate generic key-value maps.

This is because RMT cannot be updated efficiently when you try to insert a new element in the middle of the array, but only when you append at the end (they are also called append-only)

If you instead want to update a generic element, then you need SMT. The point being that in a SMT every leaf node already exists in the correct position (given by the key of the entry in the key-value map)

Going back to the SMT topic. The state of a blockchain created with the Lisk SDK is basically a giant key-value map. Every module can define the entries that get into this map, which gets updated whenever a transaction is processed (or in general when the state is modified). Then the state tree is calculated as a sparse Merkle tree on top of the state key-value map. The tree root is the state root

I think the biggest take-home message is that this new state model is much more powerful in terms of what a dev can do with it. For instance, now it is possible to prove that something happened on another chain, with an inclusion proof against that chain state root (i'm omitting some details)
Romeo: What are the practical advantages for the developers by introducing sparse Merkle trees to the Lisk protocol?
Alessandro: First of all, introducing the sparse Merkle trees was really necessary to define the state root, and the state root is really necessary for interoperability.

But as I write in the blogpost there are also other advantages, like checking the consistency of the state or implementing light clients in the future.

From a dev perspective, I would say that first of all the storage interface now is much more powerful, you just have a generic key-value map for your module and you can store whatever you need there. Furthermore, applications using stuff like SPV are now easier to implement (as I mentioned before).

Jurre | Moosty: Thanks for all the clarification so far. I'm not a computer scientist so this may sound like a not so smart question but I try to understand where the trees are used. Do I understand correctly that it is specifically to calculate a "state of the chain" and that you can prove these states to other chains? Or is it used in other circumstances too?
Alessandro: At the moment, it is only used to authenticate the state of the chain.

In other words, you start with this big key-value map, containing the full state of the chain. Why? Well, a single hash is much easier to communicate to other chains, it's just 32 bytes!

This hash is signed by the chain validators, and since it's a one-way function, it completely authenticates the state, meaning that you can prove with an inclusion proof that a certain key-value entry is in the state.
LiskPoland: How will this check be implemented? Additional nodes (validators?) is this are needed in the network? how it might be implemented physically?
Alessandro: It's all done automatically. The state root is included in the block header. When you receive the block you process it and calculate the state root. You compare it with the one in the block header and hope they are the same. There are no additional nodes, every node does this.
Jurre | Moosty: Thanks! A totally different question then. As you explain it, it sounds very logical and easy and a must for every blockchain project. (In short, I hope I am correct); We use SMT because it can hold key value pairs instead of RMT that hold arrays and are appended only.) How did you get to this solution? Was it already planned a long time ago? Did you compare different other solutions?
Alessandro:We certainly studied very carefully other solutions in the ecosystem. In the end we opted for this because it was the one that better fulfilled our requirements: guarantee of short proofs, non-inclusion multi proofs, of course security requirements.... Also we opted for something relatively simple and not error prone, this is like the cornerstone of interoperability.
LiskPoland: Are you going to improve it (SMT) in the future somehow? Can it be improved?
Alessandro: I don't think it will be necessary. It could be that for performance reason we do something fancy like concurrent update or batching nodes together, but I think it should not be necessary.
Jurre | Moosty: Something else, again.. What will you work on next? What has been your favourite research topic so far?
Alessandro: We have the new research topics that we presented at Liskjs 2021. My favourite topics are in general the most theoretical ones, like I really enjoyed looking at cryptographic accumulators and in general crypto/math topics.
Gaxda: There's a new position for Research Scientist in Lightcurve, a lot of work still to do or someone leaves the team?
Alessandro: A lot of work! Nobody is leaving the team :) we are kind of like a family in the research group (I know Maxime will probably cry when he reads this). In all seriousness, we wanted to give a boost to research to go get the future goals that Jan presented as soon as possible.