I sat down with Mahnush who joined the DFINITY team early on to talk about the various innovations that DFINITY will release. We cover terms such as Threshold Signatures, Random Beacon and Distributed Key Generation. We not only speak about these concepts but cover why they are so important.
Hi. Welcome to yet another Inside DFINITY episode. Today I’m going to be talking to Mahnush, who joined us very early on as a researcher. She came from Yale University and today we are going to find out what she works on at DFINITY, what she’s done before, and then also she’s going to give us some insights into some of the technologies and innovations that DFINITY developed. [Music]
Good having you. Thanks. Thanks for having me. So, you’ve joined DFINITY very early on. You’ve been a key part of our research since the very early days. Maybe for people who have not heard of you yet, “What is it that you do here?” So, I’m a researcher and what I do is that I design the consensus layer and I help in designing the consensus layer, and prove the security of the consensus layer and the random beacon.
My day-to-day life is I just gather the ideas that we have in the team, try to make them more formal so we can understand them between ourselves, and write them formally, and then I like prove that they are secure so we can use. I think a very nice way of how you described it yesterday to me was, “I take whatever we come up with and I make it secure.” I prove that it’s secure, which I think is a very important property for what we’re trying to build right with DFINITY. Cool. What led you to this position? What did you do before? Where did you come from? Before joining DFINITY, I was a postdoc at Yale University; before that I was a PhD student. For my PhD I worked on secure, like multi-party computation.
Secure Multi-Party Computation
Secure multi-party computation is that you have to make sure that the parties reach agreement. One important part of the computation is the output, and everybody should agree on the output. So, the consensus and Byzantine agreement is a very important part of the MPC protocols; that’s why I get interested in consensus protocols. And, then, during my time at Yale I was trying to build a consensus protocol that scales to millions of parties using committee-based protocols. Then, I met Timo and Dominic and, then, I understood that DFINITY is trying to do the same thing, which was very interesting to me. It’s been great having you. I know we’ve made a lot of progress since you first joined, including the white paper that we put out at the beginning of this year, which I know you’ve spent many sleepless nights working on.
What are some of the hard problems or some of the interesting problems that we are solving here? I think what we are trying to do at DFINITY is a new version of cloud computing for the cloud computing, if you think about it, is actually a state replica machine. So, because you have to have a lot of replicas to make sure that you don’t lose your computation or your data, but all these replicas should reach agreement; because if some of them think that something is final, some other replicas think that something else is final – then, the users don’t know which version is correct. So making sure that all these replicas are in agreement is a very important part of DFINITY. That is why consensus protocols and agreement, in general, are very important for DFINITY.
Now, here if all the replicas are honest, then it’s much easier because we can make sure, for example, if in Google Cloud, you have a few number of replicas and you are sure that all of them are controlled by Google, then… Right! So, because they are all controlled by Google, you can make sure that every machine can apply the same algorithm. Exactly. That’s what you mean by, “They’re all honest!” Honest. Exactly. Because all of them are owned by one central party, so it’s much easier for them, but at the same time you have to put your trust into the Google to make sure that the algorithm that they are running is honest.
But, in DFINITY, instead of assuming that all the replicas are owned by DFINITY, we want to make sure that everybody in the world can have one replica; so it’s like a more open way of doing the cloud computing. Users don’t need to put trust on DFINITY but they put trust on the protocol that is run by DFINITY. At some point, it’ll also be open-source, right? Exactly! Everyone can
look at the code! Exactly! What do you mean by “Everyone can have a replica.” Basically, it means everyone can be a miner for our system, which we still need miners in DFINITY; although they’re not going to be solve proof-of-work riddles – they are what makes up the distributed computation power on DFINITY. Exactly! They are helping in the computation, they are helping on maintaining the data, and they are helping on maintaining their computation on over the data.
Practical Byzantine Fault Tolerance Protocol
If the replica is different, like computers are on different domains, some of them can be corrupted or can be adversarial – in computer science we call them Byzantine, because it comes from the Byzantine generals problem. Then solving this problem is a very difficult problem – it has this problem was around in computer science for the last 40 years; and everybody was trying to improve the solution because it’s a very underlying roblem, and if you solve that problem efficiently you solve a lot of problems on top of it. Now what different is trying to do is the new method of solving this problem.
So when you say the problem has been around for a long time means, just off the top of my head – one solution could be if I have three independent replicas, and two say one thing
and the third says something else, I just believe the majority always. To which degree is that what’s actually happening in current networks and to which degree is it completely different?
That’s kind of the first answer that computer scientists came up with that –if you have multiple replicas, like three of them, then you can’t take majority. So after you take the majority, everyone should send his input to everybody – so it’s a very like high communication depth circuit. It’s still because if you are agreeing only on one like bit, then that taking majority might work; but if you are agreeing on like a string, then even honest parties might not be in agreement. So one solution to that which was using PBFT protocol was to choose a leader.
PBFT (Byzantine Fault Tolerance) Protocol
What does PBFT mean? It’s like Byzantine fault tolerance (BFT) and they call it the Bad parties or Byzantine parties, and P in PBFT is coming from Practical. In that protocol, the replicas choose a leader – so in like real world – if you want to reach an agreement, the best way’s to choose one leader, then the leader says what to do next and everybody does the same thing and they reach agreement.
So it’s a similar thing for like PBFT protocol – they choose a leader and if the leader is honest, he proposes the same value to everybody in the network and, then, they can reach agreement;
but the problem is that if the leader is dishonest, then they don’t reach agreement so they have to change the leader. Checking if the leader was honest and proposed the same value to everybody and, if not changing the leader which they call a change of view, is a very expensive protocol – so it needs an all-to-all communication, which is very expensive.
Then, blockchain protocols came up like Bitcoin, which instead of changing the leader whenever he is dishonest, they decided, “Okay. No, we don’t need to really change the leader only if he’s dishonest, we are assuming that protocol is synchronous so we change the leader every new epoch, every new round. Then, they need a way to choose their leader because you have to choose the leader, and choose the next leader, choose the next leader.
Instead of checking if the leader is honest or not, you just continue building on top of the protocol, building on top of the log. With some probability, one of these leaders is honest and healthy to reach agreement. The problem now is reduced to choosing the leader for choosing the leader you need a random number; because if you have a random number, like a dice, then you can use this random number to choose a leader – that’s how a proof-of-work (POW) helps.
Proof of Work
Blockchain protocol is, essentially, a Byzantine agreement protocol because they essentially choose a leader using a proof-of-work. By just running a proof-of-work, the first
person who comes up with a solution of the proof-of-work is the leader and this is a random process. So, I often do this when I talk to my parents or someone else who’s interested
in what blockchain is and I asked them, “How would you choose someone random amongst the people in this room; let’s say there are 10 people in the room?
First, everyone thinks that’s easy – like we just roll a dice. I’m like, “So, who rolls the dice?” They say we just add all our birth dates, we start with you, and then we go in circles until that number is reached. But then, “How do you choose who to start with? Right?” So, it sounds like a very simple problem; but if you really want to make it random, it’s hard. That was the big innovation of Bitcoin – finding a way amongst peers to agree on one random selection basically. Inherently, creating randomness using a proof-of-work is a very expensive way of creating randomness because you have to burn a lot of electricity.
So for like Bitcoin to work currently, it’s burning around like a country of electricity to solve a proof-of-work, that’s why it’s inherently very expensive. We are not really using Bitcoin too much – if you want to scale to use Bitcoin for a lot of computation, then you have to burn the whole world of it. If we compare where Bitcoin is now and where it has to go to fulfill its vision, you would still have to like a tenfold or a hundredfold of its usage. You have already mentioned – it’s already burning a lot of electricity for nothing basically. So, we’re looking for a more efficient process to do that leader selection, to choose a random person amongst peers.
Proof-of-Work as a Synchronous Protocol
Another reason that we try to change the way that Bitcoin is creating randomness is that because of the proof-of-work inherently is a synchronous protocol and inherently the amount of time that you have to spend to do the proof-of-work should be the order of magnitude larger than the delay of the network. Otherwise, you don’t know if people were like burning electricity to solve a proof-of-work and burning time to solve a proof-of-work or their message was lost in the network.
Inherently, proof-of-work chains or proof-of-work consensus is low throughput – you cannot tick in a fast way because, inherently, you have to wait much longer than the network delay to create the next randomness. That is why you cannot have a high throughput protocols on top of that. That’s why transactions also get very expensive because in each block there’s only a limited amount of transactions that fit in. We cannot increase the amount of blocks per time unit without messing with that protocol.
The Way DFINITY Creates Randomnes
DFINITY is trying to solve this problem – so the first idea is that now we need a new way of creating randomness, now but how we can create a randomness? It is a very difficult problem, as I said. So, one way is to have one person to create the randomness, but then if that person is dishonest, he cannot can bias the randomness. Now we need a group to create the randomness – but if we choose a group, then if I’m the first person who created the randomness and you are the second person, then the last person can just look at our random numbers and choose his own randomness
in a way that the result is what he wants, which we call it a last-person bias. We cannot have a group that has a last person to choose randomness, that’s why we use the threshold cryptography.
Threshold Crypogtaphy means is that instead of having a group of n people and all of n of them require to create a randomness, we have a group of n people but only K of them is required to participate. If you are waiting for K people, if you receive K minus 1 parties input, you are waiting for one input; but any n minus K other parties can be the last person. So there is no last-person bias. So if the last person decided, “I don’t want to participate,” then there is another party to participate, but then it’s not enough – because now if the set of this K group creates a randomness and another set of K people creates a randomness and if these two randomness are different, the adversary – by deciding if I want to participate in the group or not – can bias the protocol. That’s why we need uniqueness.
The uniqueness in randomness means that it doesn’t matter which K parties participate in creating the randomness, the result is always unique – the result is always the same. Which is kind of a mind-blowing property at first. Like it sounds very unintuitive, “How do we get there?” If you have a message and if you threshold sign it with BLS signatures then you need K parties, if the threshold is K, you need K parties to sign it – and no matter who are the k parties signing it, then you get a unique results back. That is the very key property of BLS signatures.
It’s a pairing-based cryptography. So it’s kind of new, but it’s not too new – because if the scheme is too new, you are a little bit worried about the security, and you’re worried if it’s proven correctly or not. If there has been enough people reviewing it, giving comments and testing it. There are enough people looked at it in Stanford, the group did it for a long time to make sure that it is secure, and it is like a very secure protocol. There is good implementation of it around. So, we have a very fast implementation of the BLS, which is an open source – everybody can use it.
Maybe, just give a bit of an overview and then we can also talk about random beacon and how what that exactly means. But you’ve mentioned I think it was a great explanation of where things started and how a proof-of-work just doesn’t scale beyond a certain point in terms of throughput and energy usage; and then the problems with creating this random number with the last person bias, which we get around by a k out of n people threshold signature. And we’re using this BLS signature that both is unique for any k out of n people and still random; which are two properties that are still hard for me to always see how they go together. Cool.
So what do we do with this? Let’s go back to blockchain, the Nakamoto’s blockchain and see how they do. So the Nakamoto’s blockchain is like creating randomness for each block completely independently – meaning that for each block you solve the proof-of-work from the start and you don’t reuse any randomness there. That’s one of the reasons that it’s very expensive. So, here we don’t want to do that it’s because we know that the threshold cryptography is very expensive, like a proof-of-work world. It’s not that expensive but it still is very expensive operation.
We want to make sure that most of the operation that is expensive we can do offline; and then when we are online, we are doing a very small amount of operations. If you look at the signature schemes, there are like different methods that are required to handle, they require to work. So one of them is creating the keys, which we call it Distributed Key Generation (DKG).
DKG (Distributed Key Generation)
DKG (Distributed Key Generation) is a very expensive operation because you have to everybody should agree on the public key that they want to sign with, and everybody should output for himself a secret key that is part of a big secret key of the group so they can sign the input with it. All-to-all process is an expensive process – you need to verify with secret chains, so it’s expensive process. Just to quickly talk about that I think that’s also a fascinating topic, it basically means that we’re all in the open air and we’re all talking to each other, but we’re all agreeing on a public key which is not that astonishing because, we can talk about that… But it also means we all create private keys right that work with that public key and all our private keys in some way work together even though we don’t mention our private key to anyone else. Exactly.
The good part is that you can run the DKG process once and then you can sign with the same key for multiple times – it means that you amortize the cost of the expensive cost of DKG over multiple rounds; you can have an epoch of rounds and then you run the DKG among a group, and then, after they have their own keys, then they can sign different inputs with it. Every time they sign again is a new random number – it’s a pseudo, a new pseudo random number – it is not actually a random anymore because the first key is a random, but after that point is a pseudo random function; it acts as a pseudo random function for you.
Then, in this way we can have the signature scheme non-interactive, meaning that “I don’t need to talk to you to sign something.” “I do my part – I sign, you do your part – your sign, but then we can like put these signatures together and combine them and have a combined signature without talking to each other.” That’s what they call non-interactive. This is a very good property because it makes the signature scheme much, much faster comparing to non-interactive versions.
How DFINITY works is that we create this group, we do the DKG, now we sign the first randomness, and then we have one random for this round, then for the next round we sign the previous randomness to create their new randomness for this next round and, then, we continue signing for the blocks. How long do we reuse the same kind of original randomness? You can reuse it for a long time – you can reuse it for months, or around one month or two months. So, you can reuse the same randomness as long as there are enough parties participating in the group, they are still active, and honest. You’re also mentioning groups.
Are there different groups in the way DFINITY works? We have this threshold cryptography, we have this DKG process, but still if you want to run this process among a huge amount of parties like 5 million people in Bitcoin, it’s a very like expensive procedure because you have to send a message to everybody, everybody has to sign it. It’s still a very expensive procedure. We looked at this system and saw, “You don’t need the signature of everybody, you don’t need everybody to participate in the randomness – you can random select the parties to participate in the randomness; it’s like election in the countries – so if you want to decide on something, you don’t need everybody to come up with a law or to decide, you can have a committee, you can have delegates instead of having everybody, we should choose a group, but then how we choose a group?
Still we can reuse our own randomness so because randomly selecting a group then the group has a similar properties of the actual number of parties that are in total. Instead of running the whole protocol inside five million people, we just need to run the protocol among like five hundred people and it’s a factor of ten faster. Independent of whether there are 100,000 people participating in the whole network or a hundred million, the process that this group goes through will always be similar or of similar length because the group size is constant. Another good thing about choosing groups is that not also you get scalability but also you get parallelism because now instead of having only one chain, we can have multiple chains.
Each group is responsible for one chain, then you can have multiple chains in parallel which we call it sharding usually in blockchain; and then it’s a very natural way of sharding. So it’s that’s why, I think, DFINITY has a very good way of sharding so it’s inherent in DFINITY because we have groups, and then each group can have its own shard. So what are some bigger questions that we are working on right now? Currently, we have a complete proof in a synchronous setting. In synchronous computation or in synchronous model of communication, when one honest party sends a message, it will received by all the honest parties within the bound of Delta.
So, for example, in Bitcoin when honest party sends a message, it will be received within the bound of like one minute, less than one minute for everybody in the world. The protocol knows it and uses it, that’s why, for example, in Bitcoin the length of the proof-of-work is dependent on the delay of the network and it should be a magnitude of a lot longer – so it’s like around 10 minutes for Bitcoin because they are considering it like a magnitude of 10 longer. Synchrony assumptions work but the problem is that if the synchrony assumption is not correct in actual network, then the protocol is not secure anymore now.
Complete Proof in an Asynchronous Network
What we are doing in DFINITY is that we have a proof of security in a complete synchronous model and we are updating our proof to work with asynchronous network as well. That’s one of the areas that I’m working on now and the other area that I’m working on is that currently within the DKG process that we are having is the interactive DKG process, which means that every party who wants to participate in the DKG should be available at a time that they are running their DKG protocol.
Non-Interactive DKG (Distributed Key Generation)
We want to make it non-interactive, meaning that the parties can participate and go home, and come back and within that the protocol is completed. So it’s like a more natural way of running it DKG is a non-interactive version instead of an interactive version. The third thing that I’m working on is the sharding part, which I talked about a little. We are designing a sharding system for DFINITY. We have designed the sharding system for the Bitcoin-like network and we are using the same concepts so our paper is going to be published in CCS – it is a nice paper so you should read it.
So, maybe just to wrap this up a couple questions that I’ve had. “What are the requirements for DFINITY? How many people can be dishonest and the network will still survive?” That’s a very good question. So, in general, if you are working in asynchronous environment or a semi-synchronous environment the number of parties that the protocol can tolerate is one third – so a fraction of the dishonest parties is always one third in asynchronous model, it’s the lower bound so it cannot tolerate better than that.
DFINTY’s Sharding System
DFINITY is working in that model, meaning that the fraction of dishonest parties in a total network should be one-third, but within each group we can tolerate a fraction of one-half – so we go from one third to one half. “Do we already know how big these groups are going to be?” The size of the group depends on the size of the security level that you want to get – the larger the group, you get the better security. Currently, a group of size from 400 to 800 seems like logical for the security level of around 86-90 a bit of security – it means that the probability of error is going to be one minus one over two to the 86. So it’s like you never going to lose your token.
Block Time & Finality
The other question that I had before we wrap up is, “We talked about energy efficiency and we mentioned some blocktimes for the Bitcoin blockchain, you mentioned that it’s roughly ten minutes. So what are the block times that we achieve in DFINITY? Currently, in our test network the block time is less than a second – it’s around one second, less than a second. Another difference
between DFINITY and, for example, Bitcoin is that to reach finality in Bitcoin because they don’t have an actual consensus protocol, you need to wait for six blocks; for DFINITY we need two blocks. So, it means that two seconds – one second for each block, two seconds for finality means that we have a finality of two seconds comparing to Bitcoin finality for around one hour.
That’s basically what will enable credit-card-like transactions and finality on DFINITY versus Bitcoin, where if I go to a coffee store and I buy my coffee, I’d have to wait a total of one hour in the worst case with Bitcoin to be sure that the transaction actually went through; whereas on DFINITY it could be two seconds. Yes. In a good day if the blockmaker is honest in optimistic way, so you only wait for two seconds to reach finality in DFINITY and then that’s a very like high throughput system. Thanks Mahnush for taking the time and walking us through some of these challenges and innovations of DFINITY.