Byzantine Fault Tolerance

Getting started in the blockchain space can be overwhelming. There’s a ton of concepts and terms that have foreign names and the terminology may sometimes be more confusing than the concepts themselves. As we are building out the DFINITY ecosystem for developers and miners, we made it our goal to give everyone a way to get up to speed. Here’s a first piece of what will be the ‘DFINITY Academy’. Ákos and I talk about the term ‘Byzantine Fault Tolerance’. If you’re interested in what the Byzantine Empire and Blockchain have in common – dive in and watch the video!

Video Transcription

Alright, I just finished my first call and with that, we’re starting another beautiful day!

This morning we’ve already had a team call and every day we discuss another kind of basic topic when it comes to the blockchain.

We try to clear out terms one by one – one term a day. Ákos in our team is leading that process. He has a very good understanding of all the terminology and concepts and so the first topic we covered was Byzantine Fault Tolerance. Or very often it’s shortened to BFT.

I took the occasion that I’m still here and I can still get together with him to do this video. And I wanted to share some of our thoughts and the way we would explain the term in a most simple way with you. And hopefully you’ll get some value from this and in the future, a term like Byzantine Fault Tolerance or many of the other, at first sight seemingly very complex terms, will become a bit easier to grasp and more acceptable.

The DFINITY Academy

The blockchain world can be quite a difficult one to get into. Even for me who’s been dabbling in technology-related projects for the past 15 years, it can be quite a challenge getting used to all the different terms that are prevalent in the blockchain world.

When we started moving part of the Sendtask team over to DFINITY, we realized that we all need a way to catch up and figure out what all this stuff means. So we’ve come up with something we call the DFINITY Academy. Which is just a way for us to, bit by bit, day by day, learn something about the blockchain world.

There are a ton of concepts and terms that at first seem very confusing and one of them is Byzantine Fault Tolerance or also in short – BFT or BFT systems. And so that was one of the terms that we started to think about. And I’m sitting here with Ákos today who’s our lead developer and the two of us will talk a bit about what BFT is, where the term comes from and the importance and the impact it has for the blockchain world.

The origins of the Byzantine Fault Tolerance

Cédric: So, Ákos, first question – where does this term “Byzantine Fault Tolerance” come from?

Ákos: The term comes from a story about a Byzantine general who tries to attack the city with his three armies led by his three lieutenants. They need to communicate with each other by sending messages. But messages can be captured by the enemy so they need to come up with a scheme for how to successfully attack the city.

Cédric: So just to summarize it real quick. So we have one general and he has an army that’s divided into three groups. Each one of these groups is led by a lieutenant. The term “Byzantine” refers to the Byzantine Empire which is many hundreds of years ago. So back then, they did not have phones or the internet to communicate with each other. But communicating with each other and agreeing on a common time to attack the village, meant they had to send messengers.

And now these messengers – they could either get captured or they could get lost on the way. So there was never really a way to 100% be sure that the message was delivered. Plus, there were traitors. So it could be that some of his lieutenants or even the general himself have a different interest than winning this attack. And that’s why they could pass on a message that’s not with the original intent.

Ákos: The scheme that they come up with in order to be successful, would be that each lieutenant shares the order that they received from the commander. They shared the command with other lieutenants. And lieutenants collect all the orders that the other lieutenants received and see if there is a consensus between the orders that they have received.

So in the scenario where one of the lieutenants is a traitor, he would get the order to attack but he forwards the message to retreat to his peers – to the other lieutenants. Then the other two lieutenants would get two orders to attack and one to retreat.

Cédric: Okay, so lots and lots of content in there. So first of all, what happens: instead of just the general sending out one message to each of the lieutenant’s and the lieutenant’s just acting upon that message, they decide to also propagate that message to the other two lieutenants.

Let’s say the message that the general sends to lieutenant number one is captured and never makes it to lieutenant number one. Lieutenant number one will now still receive the message from lieutenant number two and number three. There’s some redundancy in the system.

Now, the second thing that you mentioned is they look at the majority of the messages that they receive in case there’s a traitor. So what could happen potentially is the general sends out the message to lieutenant number two, lieutenant number three, and lieutenant number one. And now lieutenant number two, besides that even though the order was “attack tomorrow morning”, he has some secret deal with the city and he sends a message that says “retreat”. But now lieutenant number one and number three, they will receive an “attack“ from the general and from each other – so two attack orders – and they will receive one message from lieutenant number two that says “retreat”.

If they count and tally up those orders, they will see two orders say “attack” and one order says “retreat”. In this case, they will follow the majority.

Ákos: Also, this system relies on having at least two-thirds of the returns being honest. If there are more than 1/3 of the lieutenant’s as traitors, then this system will fall apart.

What happens when you scale up the army

Cédric: And that holds for a system where we only have three lieutenants, right? Where majority just means we need 2 out of 3. How does this look, let’s say when we have a hundred lieutenants?

Ákos: It’s all about getting a majority order that can be acted upon. So if we scale up the army and there are hundreds or thousands of lieutenants or participants in this system, then the required majority will be closer and closer to 50%.

Cédric: Ok, so that means when the network scales to potentially hundreds or thousands of lieutenants, all that’s needed is that the majority, which could mean half plus one, act in accordance to the rules and that’s when the system can still function and reach consensus.

For me, as a takeaway, that means that a system that’s Byzantine Fault Tolerant which means it’s a system where individual actors all relay their messages and then each of the actors acts upon the order that is a majority. They don’t have to receive all messages and not all messages have to say the same, but as long as there’s a majority in messages that they receive, the system comes to a consensus and they all move in the same direction.

Ákos: And it also assumes that the majority of these networks are honest.

Applications in the blockchain world

Cédric: So how does Byzantine Fault Tolerance or Byzantine Fault Tolerant system, how does that relate to the decentralized world of blockchain?

Ákos: There is a huge economic incentive for participants to manipulate the system for their own good. A system which can be easily manipulated isn’t reliable. We need a Byzantine Fault Tolerant system that cannot be manipulated by the minority or by a few bad actors. As long as the majority is honest, the system remains reliable.

Cédric: My main takeaway is the name comes from this problem that was originally coined based on the Byzantine Empire and the general that was attacking a village back then. And number two, I think my most important takeaway, is that this describes systems where, as long as the majority of the actors in that system are honest, the system reaches consensus and can function. And we’ll see you next time!

You can listen to the audio version here: