Byzantine Fault Tolerance

Consider the simple (!) problem of getting a value of a variable $x$ from a distributed (containing multiple servers) storage system that uses replication to ensure correctness.

Basic problem in such a system would be inaccessible servers. Assuming that there would be $f$ inaccessible servers, as long as there are $f + 1$ servers in the system, value of $x$ queried from this system would be correct.

However, things get uglier if servers can behave the worst possible way: lie about the value of a variable.

If in addition to $f$ inaccessible servers, $f$ servers can lie about a value of a variable, the number of servers we would need to provide correctness would be $3f + 1$ due to the fact that it is not possible to differentiate between right and wrong answers. The solution is to use majority ( $f + 1 > f \rightarrow |Right| > |Wrong|$ ).

Systems tolerant to lying parts are called “Byzantine fault-tolerant” systems. The term was coined in this paper and the name comes from a representation of the problem at hand in a war scenario: Byzantine generals trying to get to a consensus for a decision on “attack” or “retreat”.

Share this:

Leave a comment Cancel reply