**What does it mean if somebody claims that their software is formally verified?**

## 1. Verification as we learn in high school

In high-school maths, we are introduced to the idea of a *proof* — the word “proof” is etymologically related to the word “probe”. In a proof, we reason from first principles (aka *axioms*), why a certain assertion (aka *theorem*) holds true. Said differently, we *verify* the theorem, and **the** **proof represents the verification process**.

The axioms and the proof need a language in which they are represented — at a minimum, this language involves a syntax and a set of *proof rules* governing that syntax. This mathematical rule-based verification is also called *formal verification*, to distinguish it from the colloquial use of the word verification which may just involve human inspection.

The theorem being verified is often broken up into two parts: (1) the implementation, and (2) a specification that is supposed to hold for that implementation. Here is an example of an implementation, a specification, the axioms that we assume, and the proof system that can be used to complete the proof.

Implementation | Specification |

Axioms | Proof System |

Properties of natural numbers and associated operators such as addition, multiplication, and division | Mathematical Induction, Propositional Logic |

###### Choosing an Abstraction Level for the Proof

The proof author can choose an abstraction level for the axioms and the proof rules, to trade-off between proof simplicity and the level of rigour. For example, the author may choose to consider the commutativity of natural number addition as an axiom, and perhaps prove it separately elsewhere. Alternatively, the author may choose to restrict herself to lower-level axioms only, to potentially obtain more succinct and elegant proofs.

Similarly, the author may choose to assume the validity of mathematical induction over natural numbers (proof rule) even though that may be derivable from more general proof rules such as Noetherian induction.

From a pure theoretical standpoint, it should not matter which (abstraction level of) axioms and proof rules you pick. However, from a computer systems point of view, this does have an effect on your confidence in the verification effort as we discuss below (TL;DR higher abstraction levels slightly increase the size of specifications and proof rules, thus potentially increasing the probability of a bug escaping the verification process).

## 2. Verification of software systems

```
size_t strlen(char const* str) {
char* ptr; ulong *longword_ptr;
ulong longword, himagic, lomagic;
for (ptr = str; ((ulong)ptr & 7) != 0; ++ptr)
if (*ptr == ‘\0’) return ptr - str;
longword_ptr = (ulong*)ptr;
himagic = 0x8080808080808080L;
lomagic = 0x0101010101010101L;
for (;;) {
longword = *longword_ptr++;
if ((longword - lomagic) & ~longword & himagic) {
char *cp = (char*)(longword_ptr - 1);
if (cp[0] == 0) return cp - str;
if (cp[1] == 0) return cp - str + 1;
if (cp[2] == 0) return cp - str + 2;
if (cp[3] == 0) return cp - str + 3;
if (cp[4] == 0) return cp - str + 4;
if (cp[5] == 0) return cp - str + 5;
if (cp[6] == 0) return cp - str + 6;
if (cp[7] == 0) return cp - str + 7;
} } }
```

In software, we are usually interested in proving properties of a computer program. The program will have its own syntax, depending on the programming language (PL) in which it has been developed. The syntax of the computer program can be different from the syntax that is used for the specification, the axioms and the proof. In the text below, we show four different ways to write the specification for the C library’s `strlen()`

function; for each specification method, we also provide the required axioms and proof system.

(i) Specification in First Order Logic

Axioms | Proof System |

Logical encoding of the C programming language semantics | First-order logic with bitvector and array theories |

(ii) Specification in Quantifier-Free Logic With the a Custom Axiomatic Operator

Axioms | Proof System |

Logical encoding of the C programming language semantics | Quantifier-free logic with bitvector and array theories. Proof rules for the operator, as shown below |

The proof rules for the operator are shown as inference rules below. If you are not an expert in reading this notation, don’t bother — basically these are a set of rules that try and resolve the value of the operator depending on the context in which it occurs. The mechanical checker can use these inference rules as input to decide the validity of a proof step.

In these inference rules, *Ctx* represents the context under which this evaluation is performed, *Heap* represents the memory array, *addr _{b}* and

*addr*represent the bitvectors representing addresses used to index into the heap.

_{e}*L*represents a (lambda) function that takes a byte and returns either true or false. Intuitively, the operator returns the first index (relative to

*addr*) of the byte (starting at

_{b}*addr*and before

_{b}*addr*) which evaluates

_{e}*L*to true. The sentinel value of

*-1*is used if none of the bytes between

*addr*and

_{b}*addr*

_{e}evaluate

*L*to true. For example, the first inference rule (base case) states that if

*addr*and

_{b}*addr*

_{e}are equal (i.e., there are no bytes in the range), then evaluates to the sentinel value

*-1*(irrespective of

*Heap*and

*L*). The operator represents bitvector addition with wrap-around semantics.

Notice that the *addr*_{e} argument in our theorem relating *strlen* to is zero, indicating that we are interested in finding the first null character between *str* and the maximum address value *addr _{max}* in the memory address space (recall that

*addr*1=0).

_{max}(iii) Specification through Equivalence to a Program in the Same Syntax

```
size_t strlenNaive(char const* s) {
char const* p = s;
for (p = s; *p; p++);
return p - s;
}
```

In this method of specification, both **strlenNaive** and **strlen** are specified in the same syntax (the syntax of the C programming language). However, the specification program (**strlenNaive**) is significantly simpler and smaller than the implementation (**strlen**).

Axioms | Proof System |

Logical encoding of the C programming language semantics | Quantifier-free logic with bitvector and array theories. Bisimulation Proof Theory. |

(iv) Specification through Equivalence to a Program in a Different Syntax

```
Inductive string : Set :=
| EmptyString : string
| String : ascii -> string -> string
Fixpoint strlenCoq (strCoq : string) : nat :=
match s with
| EmptyString => 0
| String c s' => S (length s')
end.
```

Here, a different syntax (Coq) is used for the specification program. Using a higher-level Coq syntax has the benefit that a higher level syntax does not allow several unsafe behaviours, such as memory errors, that are possible in the C syntax (discussed in more detail in Adam Chlipala’s blog post). On the other hand, proofs with respect to the Coq specification now require the prover to relate the data structures between two different representations — e.g., *str* in C is a character array that needs to be related to *strCoq* that is a user-specified abstract data type.

Axioms | Proof System |

Logical encoding of the C and Coq programming language semantics | Quantifier-free logic with bitvector and array theories. CoInduction and Bisimulation. Support for relational predicates across data structures in C and Coq |

## 3. Correctness of a Proof

A proof, or a verification process, is incorrect if it concludes that a theorem is provably valid even when it was not.

Irrespective of the method of specification and proof, the verification process can be incorrect for one of the following reasons:

**Incoherent Proof Steps**: If any step in the proof does not follow from the existing theorems/axioms/proof rules, then we have an incorrect proof. The common solution to guard against this possibility is to have a mechanical*proof checker*, codified as a computer program, that can ascertain that all the steps in your proof are valid.

However, if the proof checker has bugs or oversights, then you may admit an incorrect proof.**Incorrect Axioms**: If the axioms were incorrect, then this can corrupt the entire proof effort. In the context of verification of computer systems, the semantics of a programming language form the set of axioms for example. Further, the*specification*itself is a part of the axioms, e.g., if the simpler**strlenNaive**function has a bug, then we would admit the same bug in the**strlen**implementation as well.*It is important for the axioms to be either small (so that they are manually verifiable) or rigorously tested*. For example, the**strlenNaive**function is small enough to develop confidence through manual inspection alone. On the other hand, the C language semantics are much larger and hard to read manually to develop confidence. Fortunately, C language semantics involve a single codebase that is used for verification and testing across tens of thousands of codebases, and so such wide testing and production use provides confidence in its correctness. For example, it is possible to test the logical encodings of the C semantics by checking that the outputs of the logical encodings are identical to the outputs produced through C compilations on a physical computer.**Incorrect Proof Rules**: If any of the proof rules is unsound (buggy), then the proof steps can become unsound, which in turn results in an incorrect proof.

In our example, most proof rules are based on standard quantifier-free or first-order logic for bitvectors and arrays. These proof rules have undergone several code reviews, extensive testing, and are in production use by thousands of users — and so we can be reasonably confident of their correctness.

There is a slightly higher likelihood for the proof rules to be unsound for custom proof rules added for custom axiomatic operators, such as in our example. However, even so, these custom rules are typically much simpler than the implementations that they verify (because they do not need to be optimized) and further they have wider use and testing (e.g., could be used for several search-based algorithms, and is not limited to*strlen*).

## 4. So what does Formal Verification buy us?

We have already discussed that the proof can be incorrect. If you have read this interesting blog post on the Piano Test by John Regehr, you now know that you do not want to be sitting under that piano.

Then, *what does it mean for a software program to be formally verified*, if it does not guarantee the absence of bugs?

The value of a formal verification effort is that it **reduces your Trusted Computing Base/Probability of a Bug**, or **TCB/PB**. In our *strlen* example, if the implementation has been formally verified (using any of the four specification methods), then we know that for there to be a bug in the implementation, there must be a bug either in the specification (which is much smaller than the implementation) or there must be a bug in the logical encoding of the C programming language semantics (which have been heavily tested) or there must be a bug in the proof rules (also heavily tested) or there must be a bug in the proof checker (also heavily tested). The formal verification process adds value if the chances of bugs in any of these subsystems is lower than the chance of a bug in your original implementation.

It is worth emphasizing that a “reduction” in the TCB is not necessarily a quantifiable reduction, e.g., it often cannot be measured by SLOC (source lines of code) alone. For some components of the proof, such as the specification, it makes sense to quantify the TCB reduction through SLOC. But for other components, such as axioms, proof rules, and proof checkers, we are not relying on SLOC reduction but on the fact that these are infrastructural components that have been heavily tested and are often in heavy production use. Thus TCB reduction is not enough to capture the value of the verification effort, and so we also need to quantify the reduction in the bug probability (PB).

Further, even if any of these infrastructural verification components have a corner-case bug, it is very unlikely that the bug manifests itself in a way that it causes an incorrect proof to go through. For example, it is much more likely for my *strlen* implementation to have a memory error, than the possibility of a* *memory error in the proof checker that causes the verification of *strlen* to incorrectly succeed. At worst, a corner-case bug in the verification tools would cause the verification process to end abruptly, not cause a wrong proof to go through (with a very high probability).

## 5. Applications of Formal Verification

In practice, formal verification is very useful because it is able to catch several corner-case bugs. For **mission-critical applications**, such as those found in defence, aerospace, and automobile sectors, the formal verification of software reduces the probability of errors by orders of magnitude and **prevents catastrophes** (e.g., Boeing).

The speed at which implementations evolve is significantly faster than the pace at which the verification infrastructure needs to be modified/adapted — and so the **presence of a verification harness provides the much needed confidence-of-correctness **even in the presence of continuous code modifications. Such verification infrastructure allows organizations to **save testing/QA costs**.

**Security properties **are usually better suited for verification than testing. Researchers have found serious bugs in rigorously tested compilers and libraries — most such **corner-case bugs are very difficult to catch through testing and much easier to catch through verification tools**. If you look through our bugs-found page for examples of bugs found by our verification tool, you will find that the most common bugs involve corner cases like pointer aliasing, buffer/integer overflows, overflows during type-conversions, etc. Verification tools have unique advantages over traditional testing methods in identifying such hidden errors and security loopholes.

**Machine-readable proofs serve as executable documents** and help an engineering team in the development and maintenance of their software over time.

It is my humble prediction that it is only a matter of time before verified software becomes the norm.

Regarding the code snippet shown at the beginning of section 2 “Verification of software systems”,

Should the variable longword_ptr be incremented after the first line of the for loop?

Otherwise, it seems to me that, the code inside the for loop is executed on the same data in each iteration.

Indeed, I had made an omission while copying the code by hand. A post-increment was missing and I have added it now. Thank you for reading the code so carefully and for pointing out the error!

Regarding subsection ii “Specification in Quantifier-Free Logic With the a Custom Axiomatic Operator” of section 2 “Verification of software systems”,

won’t the wrap around semantics of the special addition operator make the provided axioms inconsistent?

As an example, using the rule “Found Previously”, we could simply wraparound addr_e back to addr_b and then the value of FirstOccurrence is potentially non-negative. However, according to the rule “Base Case”, the value of FirstOccurrence when addr_b and addr_e are equal, must always be -1.

This sounds correct — thank you for pointing this out! I think the rules would become consistent if I add “addr_e + 1 != addr_b” (+ is bitvector addition) as one of the premises to each of the “Just Found”, “Not Found Yet”, and “Previously Found” rules. Would you agree?

I agree. It looks consistent to me.

There is a corner case when str is 0. FirstOccurence(Heap, str = 0, 0, …) = -1 using the BaseCase rule. Granted, str is generally not 0 in practice since zero is a special value nullptr, so this shouldn’t be an issue.