Simply put, a Merkle Tree is a method in cryptography and computer science to store and represent data in a concise and secretive (encoded) form. This means a large amount of data can be represented in a concise and encoded (Hash) manner.
In order to dig deep the Merkle Trees, we need to understand the “Hash Function”.
A Hash is a function in cryptography, which when applied to data of any length or form (text, audio, picture, etc.) produces a unique output called ” Digest” of a specific length. The Hash function has the below given characteristics.
- Hash of any two data values x and y cannot be the same. That is for any two data values x and y we can not find Hash(x) and Hash(y), such that Hash(x)=Hash(y). Hence any data has a unique Hash Value which can be treated as its fingerprint.
- Hash of any data of any length and form is of the same length. The figure below depicts SHA 256 (A Hash function) applied to different data values.
Merkle Trees in Blockchain
Data in the Blocks in most Blockchains are stored in the form of Merkle Trees. As per Wikipedia the definition of Merkle Trees is as follows:
In cryptography and computer science, a hash tree or Merkle tree is a tree in which every leaf node is labelled with the cryptographic hash of a data block, and every non-leaf node is labelled with the cryptographic hash of the labels of its child nodes.
Let us take the example of a Blockchain with three blocks, namely A, B and C. See the figure below. The Block B which is being discussed here has (suppose) transactions A to F, namely T(A), T(B), T(C), T(D),……….., T(F).
- Each of the transactions will be converted into its corresponding Hash value, i.e., H(A), H(B), H(C),……………,H(F).
- Each Hash value is paired with the Hash value next to it to create a new Hash Value. H(A) + H(B)=H(AB).
- In case of a single Hash value with no nearby Hash value to pair, it creates a Hash with itself, i.e. H(EFEF).
- This process continues, until we get a single Hash Value of all the transactions of the current block, i.e. H(ABCDEF) in the figure above. This Hash value is called the Merkle Root.
- The Merkle or Hash Root is saved in the Block Header of the current block and the next block in the chain.
Why are Merkle Trees used in Blockchain
- Scalability : Theoretically it is possible to store data in a block without employing a Merkle Tree. But then contrary to the concise Hash value of all the combined transactions in the block, we will get giant Block Headers with all the transaction values of the block. These giant block headers will pose scalability challenge and will limit the use of Blockchain to only the super computers with immense storage capacities.
- Immutability: The Merkle Root of all the transactions of a block of a blockchain is saved in the Block Header of the current and the next block in the blockchain. Now if any data tampering happens, it will change the hash value which in turn will alter the Merkle root. This altered Merkle root will not match with the original root saved in the Block Header of the next block which will be easily detectable. This is the magic formula with which data remains tamperproof and secure online in a public blockchain such as Bitcoin or Ethereum.