# Blockchain 101
To understand the interchain, you must first understand blockchain in general. This first chapter will help to demystify the foundations of this powerful technology. The background information contained will be important as you progress through the course.
The modules include learning materials for your studies, as well as examples and exercises for you to practice. You will also find freely accessible resources from other sites to help you dive deeper into the subject matter at hand.
# What is a blockchain? And what does it solve?
A blockchain is a distributed ledger that records all transactions on a network. All nodes in such a network need a copy of said ledger. The network and ledger need to run continuously even if nodes join and leave unpredictably. Nodes that join the network must be able to sync up with the latest ledger state. The ledger state must be secure, with strong defenses to prevent malicious nodes from inserting invalid information.
Therefore, a blockchain is a highly-available network that ensures a stateful protocol, always operating as designed and never permitting a departure from well-defined rules.
Blockchain technology changes fast and is difficult to understand in its entirety, but the basics are here to stay. This video provides a quick introduction to the concepts that underpin blockchain, including how data is encoded in transactions and added to a block, and how blocks and transactions spread through a network.
Blockchain is a solution to a particular problem: the double-spending problem.
We are all familiar with digital artifacts, like text files or images, and the ease with which they can be copied. This presents obvious problems if digital artifacts are intended to represent assets with value. When it comes to such digital tokens, a more pressing issue is the possibility of spending a token more than once, also known as double-spending. What prevents someone from making copies and spending the same token twice?
The double-spending problem refers to the challenge of designing a digital cash system in which tokens are digital artifacts but cannot be spent more than once.
Satoshi Nakamoto, whose identity remains shrouded in mystery, published a seminal whitepaper in October 2008 that presented a solution to the double-spending problem for digital currencies. In doing so, he revealed the underlying technology known as blockchain, and an example of blockchain's possible application in the form of a simple implementation called Bitcoin.
Want to take a closer look at the first blockchain implementation whitepaper? Satoshi Nakamoto: Bitcoin: A Peer-to-Peer Electronic Cash System (opens new window) - it is a fairly straightforward paper.
Bitcoin has gained widespread attention since then and the world has discovered blockchain's usefulness in many environments and a great variety of possible use cases. The interchain is one such blockchain implementation.
# P2P networking & distributed networks
Networks can be centralized, decentralized, distributed, or decentralized and distributed.
Because blockchains are decentralized ledgers, their primary application is in peer-to-peer (P2P) networks. Security is challenging in P2P networking for two reasons:
- P2P software has to be downloaded to join a network, making it especially vulnerable to remote exploits.
- Malicious participants can send incorrect requests or responses, as well as malware, and because of interconnectivity corrupted data may propagate throughout the network.
Other security risks include denial-of-service (DoS) attacks, routing attacks, and routing network partitions.
A "secure" P2P network needs to repel malicious and erroneous input. A key example of this is double-spending.
# How does blockchain prevent double-spending?
In the current financial system, double-spending is avoided by involving legacy actors and institutions of the financial sector, i.e. centralized third parties that manage and control financial transactions. A third party such as a bank, credit card company, or payment service is used as a trusted ledger keeper. They maintain the digital ledgers and do not allow funds to be spent twice.
Consequently, it is generally not possible for two parties to exchange value online without involving a trusted third party to handle the settlement process and update their ledgers, as well as the account balances.
At a high level, blockchain solves the double-spending problem by replacing the trusted, central ledger-keepers with a decentralized and distributed ledger that is maintained by a large network of ledger-keepers. Each member of the network has an exact replica of the ledger, and no one can update the ledger without establishing consensus with the other ledger-keepers.
It is as though each transaction is observed by a large crowd of witnesses who reach consensus about proposed changes. The crowd prohibits events that should not occur, such as spending the same funds twice (i.e. double-spending).
Bitcoin and its underlying technology convincingly demonstrated that a network of participants that do not necessarily trust each other can achieve consensus about the validity of a transaction, its history, and the resulting state of the ledger. This is interesting, because simple account balance ledgers and protocols to move funds are far from the only use cases for distributed consensus.
# How does blockchain work?
Imagine you want to retain and monitor changes to a file, for example a log file. Now, imagine you also want to verify an unbroken history of all changes ever made to the file. How can you proceed?
A well-understood solution uses cryptographic hash functions (opens new window). Let us briefly introduce this concept in case you are unfamiliar with them.
The ideal cryptographic hash function has five main properties:
- Deterministic: the same message always results in the same hash.
- Fast: the hash value for any given message is computed quickly.
- Resistant: it is not feasible to generate a message from its hash value except by trying all possible messages.
- Uncorrelated: a small change to a message alters the hash value so extensively that the new value shows no relation to the old.
- Collision-resistant: it is infeasible to find two different messages with the same hash value.
You can see hashing in action to get the feel for it here: https://www.browserling.com/tools/all-hashes (opens new window). As you type into the text box, the hash updates automatically. Even a minuscule change to the input creates completely different hashes. You can also see that different hashing algorithms produce different output. Hash algorithms have evolved over time, often for security reasons. Try it out!
Notice that there are many different hashing algorithms that aim for similar results and fulfill the properties described above. Each algorithm consistently produces hashes of the same size regardless of the input's size.
A hash can be used to prove an input exactly matches the original, but the original cannot be reconstructed from a hash. So, a hash function can demonstrate that a copy of the file is an authentic replica of the original in every detail.
What about subsequent changes to the file? Suppose you want to demonstrate that a series of changes is authentic, complete, and correctly ordered?
Accountants have known the advantages of "append-only" ledgers for centuries, so suppose that you will only add new entries (i.e. make changes) to the end of the file. You start with an empty file and then append a series of changes that, when replayed in order, will produce the current state of the file - Git users are familiar with this concept.
How can a hash function help you be certain that a series of entries is the unbroken chain of inputs? You make a rule that states that, in addition to the new content, the previous hash will also be an input of the next hash.
The (pseudo-)code would look like this:
This way, you can examine candidate changes, and confirm that the proposed changes belong to the known, authentic, previous version of the file. This method ensures that changes to the file are accurately disclosed. Some further basic examples of hashing can be found in the section Cryptographic Fundamentals of Blockchain.
This process repeats for all subsequent versions. Any version of the file contents can be shown to be part of an unbroken chain of changes all the way back to the file's inception. This is pure mathematics.
Any departure from the system (e.g. a hash does not compute as expected) proves a break in the history and is therefore invalid.
Since the hash of the latest valid version is an input to the next version's hash function, it is not possible to generate a new valid version without knowledge of the valid version that precedes it. This process forces changes to be appended to a previous valid version.
Blockchains function similarly: blocks of transactions are appended, using hashes of previous blocks as inputs into hashes of subsequent blocks. Any participant can quickly verify an unbroken chain of blocks (i.e. the correct historical order).
Transaction blocks are logical units that wrap a set of transactions in a specific order. While the implementation details are somewhat more subtle, for now think of this as a set of transactions that occurred during a specific time interval and in a specific order.
Transaction ordering is surprisingly challenging in a distributed system due to design goals and constraints. The example of Bitcoin and its novel solution is valuable to understanding how this challenge can be addressed.
In case the foregoing is not clear:
- A valid block is a well-ordered set of transactions.
- Each block contains the hash of the previous block.
- The hash of the block has properties that are especially difficult to generate but very easy to verify.
A well-ordered set of blocks that each contain well-ordered transactions is a well-ordered set of all transactions that have ever occurred.
There are some important constraints to keep in mind when talking generally about transaction ordering:
- In a truly distributed network, no one's clock is considered more authoritative than anyone else's clock: a blockchain is a distributed timestamp server without a central network time.
- Because of physics and network latency, even if all members of a network mean well and participate honestly, everyone in the network will learn about transaction proposals in a slightly different order and each node will arrive at a slightly different opinion about the ordering of transactions.
Although there is no obvious way to settle it, transaction order must be resolved because processing transactions out of order would produce non-trivial differences in outcomes. Such a non-trivial difference could be an instance of double-spending. Without agreement about the transaction order, there can be no agreement about the balance of accounts.
So, how is the correct order of transactions determined? There are now a number of different approaches to achieving consensus in blockchain technology. You can find more information in the section Consensus in Distributed Networks.
However, cryptographic hash functions are instrumental to all forms of consensus, in that they empower all participants to ensure that they possess an undistorted history of everything. Since all nodes can verify the chain independently, they can proceed on the assumption that all other nodes will eventually come into agreement about the history of everything. This is known as eventual consensus.
# Deterministic, atomic operations - all or nothing
What are the implications of blockchain's way of ordering transactions and blocks on its overall state?
A blockchain starts with a known state. This is a simple matter of an initialized universe in which nothing has happened. It is often referred to as the genesis block.
It proceeds by constructing a verifiable and widely agreed history of everything that has ever happened on an append-only basis. Nodes independently construct a present state of the universe by reviewing the ordered history of every change (i.e. the transactions) that has ever occurred. This comprehensive history moves forward in time as "lottery winners" announce new transaction blocks and these are accepted as valid by a consensus of network participants.
Thus, transactions being included in blocks and those becoming part of the chain leads to the state changing and being amended.
In computer science, an event is said to be "atomic" if it cannot be split into smaller parts. For example, the statement x = y
is atomic if the language guarantees that y
cannot be partially copied to x
.
In the world of databases, atomicity is often specified by the developer by grouping multiple operations in a wrapper such as a commit or rollback block, to ensure that all of the steps are complete or none of the steps execute at all. This method is often used to ensure database integrity.
In the context of a blockchain, a transaction is a single instruction allowed by the protocol, signed as required by the protocol, and sent to the blockchain through a local node connected to the network. Transactions are either completely successful or they fail. Generally, the actual result cannot be known with certainty until the transaction is included in a block to establish an execution order. For example, a transaction to send funds from Alice to Bob depends on Alice's balance at execution time.
All nodes must arrive at the same conclusion; given a transaction in a certain sequence, all nodes must agree on the result. This means the protocols must be deterministic. Either the transaction was successful or it failed, and the effect must be indisputable.
Therefore, blockchain transactions are both deterministic and atomic.
To summarize, this section has explored:
- How blockchain is a form of append-only ledger, a database in which immutably ordered blocks of immutably ordered transactions provide an easily verifiable historical record that can be held and updated across multiple nodes of a network.
- How blockchain is a solution to the double-spending problem, which provides a way to ensure that digital tokens (such as of an online currency) can only be spent on one occasion, despite the relative ease with which digital files and artifacts can be copied.
- How networks can generally be categorized as centralized, decentralized, or distributed, with blockchain using the advantages of distributed, decentralized networking to minimize or completely remove the need for any form of centralized authority.
- How blockchain uses the deterministic, fast, resistant, uncorrelated, and collision-resistant properties of hash functions to allow for easy and rapid verification of an unbroken history of all changes ever made to its ledger of transactions.