# Consensus in Distributed Networks
A known challenge in distributed computing, which is provably unsolvable but can nonetheless be mitigated, is how to reach consensus in a hierarchy-free, permissionless, and failure-prone network. In a distributed network without authorities, we need a process to reach consensus about what is to be considered the truth. This is referred to as distributed consensus.
This problem is commonly known as the Byzantine Generals Problem and mitigation strategies for it are subsumed under the term Byzantine Fault Tolerance (BFT).
An overview of well-known blockchain solutions to this provides perspective and insight into the options the interchain offers because it allows a network designer to choose the consensus algorithm that is best suited to the developer's purpose.
# The Byzantine Generals Problem
In the traditional description of the problem, generals whose armies are spread around a target city need to reach consensus on a time to attack. The generals can only rely on unsecured communication channels to achieve this. A lack of acknowledgement can either be caused by a failure to deliver a message, by a dead general, or by a failure to deliver the acknowledgement. While there are variations of the problem description to adapt to varying real-world fault-tolerance situations, most descriptions include an element of catastrophe if the generals fail to coordinate their actions.
The practical challenge of blockchain reflects this hypothetical. Similar to the generals who must decide when to attack, the agreed transaction list in a distributed ledger has to be identified, and consensus on the correct order of transactions has to be reached.
Network failures in a Byzantine sense are:
- Fail-stop failures: a node fails and stops operating.
- Arbitrary node failures: a node fails to return a result, responds with an incorrect result deliberately or not, or responds with a result different from other results in the network.
When a distributed network reaches consensus even with nodes failing to respond or responding with wrong information (i.e. faulty nodes), it has Byzantine Fault Tolerance (BFT). BFT ensures that a distributed network can continue to work even with faulty nodes, as the consensus is a result of collective decision-making.
# Reaching consensus
In addition to the chain of blocks, one of the innovations introduced by Bitcoin was to use Proof-of-Work (PoW) to reach consensus. Since then, many other consensus algorithms have been proposed and employed. One should note that there were consensus algorithms presented in the academic community before Bitcoin.
Many teams have experimented with distributed consensus using many algorithms deployed to different chains. The algorithms that exist can be compared in terms of the distribution of authority, expected performance, and transaction finality. Transaction finality can be categorized as:
- Probabilistic: it is exponentially more improbable that a seen transaction will be reversed.
- Deterministic: the protocol prevents alterations in a definite way.
Remember, in blockchain individual transactions are sent to the network from individual nodes. Each node must pass (or fail to pass) transactions to other nodes. Because of the time delay required for data to physically travel across the network (i.e. physical latencies), not all nodes will see the same transactions at the same time.
Each node must therefore build its own order of transactions. Since all nodes participate equally, there is no authoritative order of transactions. Nevertheless, the network must decide which node's version (or any version) of the truth will become the authoritative truth, also known as the canonical chain.
The rest of this section will look at some of the most popular consensus algorithms to keep in mind when diving into consensus in the interchain.
# Proof-of-Work (PoW)
In PoW consensus, a user must complete a task of arbitrary difficulty which when combined with ordered transactions in a block yields a hash function result that matches certain criteria. Finding such a solution is taken as evidence of considerable effort, meaning proof that considerable work must have been invested in the search, hence the name.
The network's nodes, aka "miners", conduct their searches independently. The successful node that announces a solution first receives an economic reward to encourage participation in the process.
The idea of substantiating a claim through an arbitrary amount of work was previously suggested as a way to combat denial-of-service (DoS) attacks and spam in other contexts. "Work" is understood as the amount of specific computational effort used.
If a node wishes to distort the ledger in a PoW system, it must first acquire an authoritative position or it will be unable to overcome the combined problem-solving capacity of the network. This known attack vector is called the 51%-attack. As the name suggests, if a single party acquires more than 50% of the total problem-solving capacity of the network, that party is theoretically able to alter the consensus.
The control mechanism for the amount of work required is called difficulty. PoW networks set a target average time for a solution to be found by any node on the network. The difficulty adjusts to compensate for increasing/decreasing total network problem-solving capacity. Thus, PoW networks do not get faster as more computing capacity is added. They become more resilient by increasing difficulty, which raises the threshold a 51%-attacker needs to overcome.
In the following video, Jae Kwon, founder and CEO of Tendermint, discusses finding an alternative solution to the energy-intensive Proof-of-Work consensus methodology that still maintains the provable security benefits of distributed networking: Tendermint.
# Proof-of-Authority (PoA)
Proof-of-Authority (PoA) is a simple mechanism that solves for faster block times by relying on validators that are pre-selected based on the belief that they are trustworthy. In other words, validators are allowed to produce blocks based solely on their pre-existing authority - they do not need to engage in the practice of competitively searching for a nonce, which underpins the high energy consumption of PoW networks.
This benefit of PoA networks is counter-balanced by the frequently cited criticism that everyone needs to trust the PoA validators in order to trust the overall network and blockchain. Such centralization of authority might be considered antithetical to the motivations originally driving the creation of a blockchain network.
# Proof-of-Stake (PoS)
Proof-of-Stake (PoS) is another method of selecting the authoritative node for a given block, based on the assumption that those with the most to lose are the most incentivized to safeguard network integrity.
A successful PoS system must address the problem of "nothing at stake". That is, validators must face a disincentive for bad behavior as well as a high probability that bad behavior will be detected. The burden of detection usually falls on the rest of the network, which can either accept or reject the validator's opinion. Validators place funds (i.e. the stake) at risk, and face economic penalties for emitting opinions that are ultimately rejected by sizable numbers of other nodes. A validator is thus incentivized to generate blocks that are likely to be accepted by the network and faces economic punishment when it fails to do so.
PoS validators are selected in a pseudo-random fashion, with a larger stake producing a higher probability of being selected to generate a block. While PoS systems generally reward validators with new coins for honest behavior, so-called block rewards, validators also receive transaction fees in return for generating blocks that the rest of the network accepts. A criticism of PoS is that it tends to centralize economic rewards in a small number of already wealthy nodes, those validators with sufficiently high stakes to win selection.
# Delegated-Proof-of-Stake (DPoS)
An extension of Proof-of-Stake algorithms is Delegated-Proof-of-Stake (DPoS). It is used for example in BitShares, EOS, Steem, Lisk, and Tron.
In this type of consensus mechanism, so-called witnesses are elected by the stakeholders of the network to secure the network, who "vote" by staking their own tokens upon a given witness. Afterward, several witnesses are chosen for the block creation so that they represent at least 50% of the stakeholders' votes, thereby ensuring a majority consensus regarding new blocks.
Witnesses are paid for their services, receiving fees for creating and validating blocks; those who voted for an active witness typically receive a share in these rewards proportional to the value of the tokens they staked. As a result, the operation of DPoS networks tends to be more economically balanced than pure PoS.
This economic incentivization to become a witness also leads to competition potentially increasing with each new member, because the number of witnesses is limited. In case a witness misbehaves, the network's community can withdraw their votes (i.e. they fire the witness). Witnesses that no longer hold enough votes lose their income basis.
Alongside ascribing the role of witnesses to some participants, DPoS networks also elect delegates. Delegates are a group of participants that supervise network governance and performance, and propose changes that are then voted on by the entire network.
Many consider DPoS algorithms superior to PoW and PoS because of their fast block creation, high degree of security, energy efficiency, level of integrity, and democratic structure.
However, DPoS systems are less decentralized than PoW and PoS systems, because they have fixed validator sets and higher barriers to entry. In pure PoS systems, there is no fixed validator set and the number of potential validators that can participate in the consensus mechanism is dependent on the total supply of tokens in circulation.
# Practical Byzantine Fault Tolerance (pBFT)
Practical Byzantine Fault Tolerance (pBFT) was first introduced in 1999 and arose from academia.
You can access the 1999 paper by Miguel Castro and Barbara Liskov on pBFT here (opens new window). There is also an intuitive presentation (opens new window) you can look at.
In pBFT, a replica is a network node that maintains a full copy of the ledger state. pBFT is a three-phase protocol. Periodically, a "primary" replica is selected to create a new block, and it broadcasts a message containing an order of transactions to the other replicas for approval. When a threshold of agreement is reached, the message is verified. The broader network is informed about transaction blocks when they are finalized (i.e. "signed by sufficient replicas").
pBFT brings two main advantages to consensus on blockchains:
- Energy efficiency: consensus is achieved without having to conduct expensive computations, such as in PoW algorithms.
- Transaction finality: once a block is finalized it is immutable, as are all the included transactions.
This of course is a very simplified presentation!
# CometBFT and consensus in the interchain
While most consensus implementations are tightly coupled with a particular blockchain project, the interchain focuses on simplifying the process of creating blockchains (among others) by offering a decentralized consensus mechanism called CometBFT.
CometBFT's engine runs its own public blockchain. It differs from Ethereum on its consensus protocol, as it uses the concept of validators who need to bind funds to validate, and who then validate blocks over the course of a certain number of rounds.
CometBFT offers the most mature Byzantine Fault Tolerance, transaction finality, and a great deal of flexibility - flexibility that is passed to the developers of custom blockchains built with the Cosmos SDK.
CometBFT will be explored in more detail in the Main Concepts section.
Further reading
- Castro, M. & Liskov, B. (1999): Practical Byzantine Fault Tolerance (opens new window)
- Castro, M. & Liskov, B.: Practical Byzantine Fault Tolerance presentation (opens new window)
- Curran, B. (2018): What is Pracictcal Byzantine Fault Tolerance? Complete Beginner's Guide (opens new window)
- Kwon, Jae (2014): Tendermint: Consensus without Mining (opens new window)
- Vasa (2018): ConsensusPedia: An Encyclopedia of 30 Consensus Algorithms. A complete list of all consensus algorithms (opens new window)
- Vitalik Buterin (2016): On Settlement Finality (opens new window)
- Witherspoon, Z. (2013): A Hitchhiker’s Guide to Consensus Algorithms. A quick classification of cryptocurrency consensus types (opens new window)
To summarize, this section has explored:
- How the challenge of reaching consensus in hierarchy-free, permissionless, and failure-prone distributed networks is conveniently revealed through the Byzantine Generals Problem, with Byzantine Fault Tolerance (BFT) being the blanket term for any mitigating strategies.
- How Proof-of-Work (PoW) consensus assigns a difficult arbitrary task to block creation, with winning miners rewarded for the effort necessary to complete it. The volume of work done contributes to the security of the network, as any attempt to distort the historical record would demand potentially vast quantities of work which quickly become impractical as the valid chain continues to lengthen.
- How Proof-of-Authority (PoA) provides a more energy-efficient alternative to PoW at the cost of some centralization of power, by assigning block creation duties to "validator nodes" which are assumed to be trustworthy based on their preexisting authority.
- How Proof-of-Stake (PoS) reduces the reliance on validator trust, as validators now undertake the duties of block creation in proportion to their stake of financial value in the network, earning rewards for beneficial behavior but having penalties levied on their stake for poor performance.
- How PoS's extension, Delegated Proof-of-Work (DPoS), restores some of the "democratic" aspect of PoW and reduces value centralization, as users are able to delegate their own stakes to those of the validators, sharing the rewards (and possibly penalties) associated with block creation through the network.
- How Practical Byzantine Fault Tolerance (pBFT) represents the capacity for a distributed network to keep reaching consensus even if significant numbers of nodes are either failing to respond or are responding with wrong information. An example of pBFT in action is CometBFT, which underpins custom blockchains built with the Cosmos SDK.