A Rodeo for Production Software Verification Tools that are based on Formal Methods

This is currently a proposal.

The goal is to motivate progess by
measuring the efficacy of tools using
formal methods
algorithm and techniques
for software verification
of real-world software.

We will focus on tools that handle entire pieces of software (of production size and complexity), not just particular approaches or problems.

Our approach is to

Connect with the community of tool developers and tool users,
Develop tools to measure code complexity quantities of production software, using them to
Select a suite of tests (and accompanying specs),
Run tools against them in a competition or rodeo, and
Assess findings to quantify improvement and identify research opportunities.
This requires establishing a software laboratory.

We presented the Rodeo proposal at TOOLympics at TACAS'19 in Prague, April 2019.

White Paper: What Formal Methods Can Help Assure Statistical Software?

A simplistic pseudorandom number generator may offer sufficient randomness for a video game, but may not be sufficient for valid Monte Carlo simulation. Such problems are hard to find through traditional testing. Formal methods can complement testing to gain greater assurance that critical portions of programs are correct.

The purpose of this part of the Rodeo is to determine which formal methods, tools, and techniques can be used now (or have the potential in the near future) to gain higher assurance of statistical software with reasonable resources.

Our plan is to

Contact key researchers and institutions acquainted with the purpose and architecture of statistical software. NIST will develop a list of current challenges facing those who are developing statistical software, especially errors that are difficult or impossible to detect with current methods.
Conduct a preliminary survey of types of existing or prototype tools based on formal methods, proofs of concepts, or building blocks. The survey will include their capabilities.
Hold a one-day workshop to get additional input. It will be co-located with High Confidence Software and Systems Conference (HCSS) Conference in Annapolis, Maryland, USA, on Wednesday, 1 May, 2019. (Jonas Green room.)
Deliver a report containing the survey, a summary of key research and researchers, workshop participant input, a sketch of current state of the art, and some ideas for progress.

This is part of the Rodeo is funded. We will deliver the report no later than the end of fiscal year 2019. We plan to deliver it much sooner.

Details of the Rodeo

Complexity Measurement Tool

It is difficult to objectively measure how easy or hard the verification of a piece of software will be. We propose to

Identify quantities embodied in test suites of SV-COMP, RERS , VerifyThis, etc. Quantities may be depth of recursion or number of states.
Build a tool to measure those quantities.
Calibrate the tool against those test suites.

The result is an automated tool capable of objectively measuring properties that are relevant to the use of formal methods.

This tool complements automated relevant measures such as CISQ's code quality standards.

Test Suite

We want the test suite to be realistic, scalable software and specs with a variety of properties to “prove.”

Criteria

Computational or analytical software
- Not embedded, distributed, real time, etc.
Open source; good quality; widely used
- We may take production code as a basis to synthesize code or mix in generated code.
Some software not in C
Prefer software that already has formal specs

To help select the production software, we will use the complexity measurment tool above to scan thousands of applications.

The test suite can effectively be scaled by selecting other software of the proper complexity.

Verification and Proof Validation

To verify software, the analyst needs to be an expert in using tools to verify or have really smart tools. In contrast, to gain assurance, the analyst can validate witnesses of correctness (proof hints). Witnesses may be supplied by the software developer or a third-party. Tools to validate proofs are far more automatic. That is, they need far less expertise to run.

Accordingly, we will separate the task of verification into two parts: coming up with a proof and validating that proof. See the image. This allows the software user to have high assurance without having to be an expert in verification or relying on upstream parties with no way to check.

A Software Laboratory

Like a research chemist or physcist needs a stocked laboratory with skilled staff to assemble, calibrate, and run equipment, we need a software laboratory.

It will consist of programs and tools to manipulate, transform, and analyze software . . . and the in-house expertise to use them.

We need this to quickly get an idea of the potential of new techniques and approaches. Also to confirm (or refute) hypothesis, e.g., would we detect this pattern if loops were unrolled or spread across function invocations.

The following types of tools could be in the laboratory:

Semantic and structural search tools
Transformers capable of obfuscation and other changes
Tools to refactor and combine common code
Inlining and macro expansion tools to do the inverse
Program mutators or bug injectors
Code generators to produce many different programs with specified characteristics
Various compilers and interpreters (many in existence)

Examples of tools are LLVM, ROSE, Txl, Frama-C, DMS, and Bandera.

Please direct suggestions, comments, or questions to
Paul Black
paul.black@nist.gov

The FMSwVRodeo web site was created Fri Nov 9 10:42:53 2018

by Paul E. Black (paul.black@nist.gov)

Updated Fri May 3 09:07:15 2019

by Paul E. Black (paul.black@nist.gov)

This page's URL is https://samate.nist.gov/FMSwVRodeo/.