This is Part 2 of my conversation with Robert about the System Model and the Threat Model that are used for the proofs in the DFINITY Consensus Whitepaper. In particular, Robert explains the assumptions about network synchrony and Byzantine faults.
Robert: So let’s jump to the next important thing in our proof system, which is the model of our proof. I’ve already mentioned that the model can be thought of as two sub models – one is the system model describes the toolset available in our system; so all the mechanisms, tools and methods that we can use in our system. On the other hand, we want to have a secure system, which means that we want to somehow assume that an adversary could come and try to destroy our system or to abuse it so we have to model the adversary as well.
Robert: To go a bit more into the details, in the system model you can make use of different kind of tools, one of them obviously in a crypto setting is cryptography. All the cryptography comes with its own assumption. So cryptographic algorithms like signature schemes, or encryption, or the threshold signature scheme used by DFINITY has a set of assumptions that need to be fulfilled in order to guarantee the security of the underlying cryptographic tools. So we won’t go into the details here, but we need to have the assumptions fulfilled in order to have sound cryptography.
Robert: Then, another kind of important assumption or important for our own system is a synchrony assumption. Now what does that mean? There are two kinds of distributed systems or that’s one way to differentiate the whole field of distributed algorithms, and these are the synchronous and asynchronous algorithms. Synchrony means that you have some kind of assumption on the time it takes for a data packet or some block, or transaction to get spread over the network to reach some other party in the network.
I mean that time is variable; it depends on the geographic location; it is upper bounded by the speed of light in some sense because you cannot send your packets faster than the speed of light. But obviously in practice it would be much or somewhat lower than the speed of light.
Network Traversal Time
Robert: We have some network traversal time, which is abbreviated with the Delta. In our system, we’ve used this network traversal time assumption so we need to have an upper bound on this network traversal time in order to guarantee liveness and security of our system. We use slightly different kind of assumptions for liveness and security.
For liveness, the block time delay – a forced delay- makes sure that the notaries will wait some time and collect the block so that the block makers actually have a chance of getting their blocks spread out to the notaries. There’s block time in our system and this block time needs to be at least three Delta in order to have liveness.
Robert: We have the a second property, which we abbreviate or call T in our white paper, which is a time a delay that you wait as an observer for the network in order to get assured that your transaction has been recorded and finalized in the system. So I’ve already mentioned in previous videos that you need to wait two confirmations plus 2 Delta in order to have this security that your transaction is final.
We need two confirmations plus 2 Delta after the second confirmations to have safety. This is an overview of our system. In a write paper, it is a bit more complicated than that but this gives you the big picture.
Cédric: So just to understand where we are right now – we first talked about what is needed in order to come up with proofs and you said we first have to define properties that we want to prove, and, of course, environment or circumstances. We first talked about the properties and we went through three different levels. Now we’re talking about the circumstances. You mentioned there’s a model. Basically, we have to look at two models – when it’s a system model, so we own what we control, and then there’s the threat model, which is what we can’t control.
Robert: Let’s now move on to the Threat model. This is also interesting because it can also be broken down in two aspects – one of it is what can the adversary actually do. Can he just stop other nodes from making any action or from creating blocks let’s say by hacking them or by DDoS attacking them? Or can the adversary do more? Can the adversary actually control the node and send any kind of data? Is he able to just do any arbitrary action? So this is a big and an important distinction.
Robert: We call the first model, where the adversary’s just allowed or it’s just able to make nodes crash or to have set up a node that just crashes. This is the crash-faults model. On the other hand we have the full model with an adversary, which has full control, which is called a Byzantine fault model.
Byzantine Fault Model
Cédric: The crash model in a sense is a bit more binary. It’s either yes or no, either he can block or he cannot block. The Byzantine fault model is a bit more open-ended in terms of the attack factors that an attacker has.
Robert: The crash-fault is either you send some data or you act as the protocol prescribes or you’re quiet. The Byzantine fault tolerant model allows you to do any kind of action. It doesn’t allow you to break cryptography because we have the assumption that the adversary cannot break the underlying assumptions that you cannot factorize extremely large numbers into prime factors or something like that. That’s also a bound that we need to make when we consider an adversary. Otherwise, our system would break down.
Fraction of Honest Nodes
Robert: Another important aspect, which is just the number or the fraction of the adversarial power. So how many nodes does adversary own compared to the number of honest nodes, who follow the protocol, which can be described as fraction of honest nodes. There are important theoretical upper bounds on the number, which depend on whether you have a synchronous network or an asynchronous network.
In a synchronous network, it’s normally sufficient to have an honest majority – so at least half of the network should be honest. This is for synchronous networks. In asynchronous networks, you normally need at least two-thirds to be honest.
DFINITY Synchronous Networks
Robert: But we have in DFINITY synchronous networks. So you would assume that you would be fine with 50% honest nodes. Bu now there is one caveat in our system, which I have to make.
DFINITY is based on subgroups. So we have these groups of four hundred nodes that notarize blocks and which contribute to the randomness and these groups are randomly selected from the whole population of nodes. Due to the varieties of random sampling, it can happen that even if you have 50% or 51% of honest nodes that by chance you end up having more than 50% of bad nodes in a certain group. That’s why we need to have an extra margin to make sure that this won’t happen with a very high probability.
What we do is if we now take these two-thirds from the asynchronous bound or from the asynchronous systems then we can show that if we have 2/3 majority or at least over the whole population, then it is very unlikely that one group will have more than 50% of dishonest nodes.
This is elegant in the way that we have set the numbers so that it corresponds to the upper bound for asynchronous networks.
Cédric: Even though it is a synchronous network, it’s a completely different property that leads us to the 2/3 upper bound but it just happens to correlate with the upper bound for asynchronous networks.