U.S. Census Bureau
National Institute of Standards and Technology (NIST)
Workshop on
Formal Methods for Statistical Software
FMfSS
Wednesday, 1 May 2019 Jonas Green room co-located with High
Confidence Software |
Overview
Statistical software and related data analysis pose different challenges than typical software. A simplistic pseudorandom number generator may offer sufficient randomness for a video game, but be inadequate for valid Monte Carlo simulation or other stochastic algorithms. Likewise, erroneous implementations of mathematical functions may generate results that look correct, but which fail in unpredictable ways—examples of which include both errors in floating point division implementations and bugs in differential privacy implementations that leak confidential data into supposedly “privatized” outputs. Such errors are typically missed by traditional testing.
Formal methods can complement testing to gain greater assurance that critical portions of programs are correct and that results are valid. Although there is a vast expanse of formal methods, tools, and techniques, they are rarely applied to such software.
The purpose of this workshop is to identify formal methods, tools, and techniques can be used now (or have the potential in the near future) to gain higher assurance of statistical software and results with reasonable resources.
We call for position statements from one to three paragraph long. Position statements may be on topics like the following:
- types of errors, bugs, and failures in statistical software or its use.
- formal methods that can assure statistical software or its results.
- correct-by-construction approaches.
- lists or definitions of important properties, such as convergence and differential privacy.
- gaps in current capabilities and specific research needed.
- higher-level, non-procedural languages or tools so a statistician can specify the desired analysis instead of how to achieve it.
- approaches to gain assurance in statistical software and its use.
We will publish a workshop report, including selected position statements, in autumn.
The output of this workshop informed a white paper on formal methods for statistical software and also the Rodeo for Production Software Verification Tools based on Formal Methods. The paper is "Formal Methods for Statistical Software", October 2019, NIST Interagency Report (IR) 8274, Paul E. Black, DOI 10.6028/NIST.IR.8274.
Scope
We use the term "statistical software" to mean software that has particular features or characteristics that challenge typical means of software assurance. These features include
- stochastic algorithms, such as Monte Carlo simulations, and
- software to effect differential privacy, where any one query must be handled correctly but a series of queries may be blocked if they would violate privacy.
This workshop also deals with the proper use or selection of algorithms. A result may also be invalid if a computation does not converge or an off-by-one mistake puts data into 99 bins instead of an intended 100.
Formal methods are an analytical approach relying on mathematical and logical reason to establish the applicability of properties. Testing demonstrates results in a few discrete circumstances. However, like proofs of trigonometric identities, formal methods employ rules of inference, along with assumptions and models, to argue for assurance in all cases, at least potentially. Formal methods range from basic and automated, like LALR parsing or type checking, to sophisticated and undecidable, like program correctness.
Submissions
Submissions are now closed.
Important Dates
2 March: deadline to submit a position statement1 April: presentation invitations sent1 May: Workshop
Registration
The workshop was at Historic Inns in Annapolis, Maryland.
Program
0900 Welcome and definitions, Paul E. Black, NIST
Individual introductions and expectations, all participants
Keeling and Pavur, "Statistical Accuracy of Spreadsheet Software"
1005 A Trustworthy Mechanized Formalization of R, Martin Bodin, Imperial College, m.bodin@imperial.ac.uk
1030 break
1100 Formalizing Statistical Computation in HOL4, Jared Yeager, University of Massachusetts Amherst, jyeager@cs.umass.edu
1120 Trustworthy Data Wrangling and Analysis, Eric Davis
1140 The Challenges of Using SAST for Validation and Verification, Arthur Hicken, Parasoft, arthur.hicken@parasoft.com
1200 lunch
1330 Intro to Differential Privacy, Simson L. Garfinkel, U.S. Census Bureau
1340 Assuring Real-World Differential Privacy, José Calderón, Galois, jmct@galois.com
1405 Detecting Violations of Differential Privacy, Zeyu Ding, Penn State, dxd437@psu.edu
1430 Discussion: Future of Formal Methods in Differential Privacy, Simson L. Garfinkel, U.S. Census Bureau
1500 brief break
1515 A Formal Methods Tools Rodeo, Paul E. Black, NIST
1530 Static Analysis Challenges: Code Coverage, Warning Classes, and Types of Checkers, Arthur Hicken, Parasoft, arthur.hicken@parasoft.com
1555 Formal Methods in Statistical Work, Paul E. Black, NIST
1600 Workshop ends
Other Material
People mentioned other resources. We give them here.
Filip Křikava and Jan Vitek, Tests from Traces: Automated Unit Test Extraction for R, Proc 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2018), pp 232-241. DOI: 10.1145/3213846.3213863. br> The paper presents "Genthat, a tool ... to non-invasively record execution traces of R programs and extract unit tests from those traces." br> Link includes a docker image showing "how the genthat tool ... works and [how it] can be used for extracting code from R packages, and reproduce the experiment used to evaluate it."
Pavel Panchekha, Alex Sachez-Stern, James R. Wilcox, and Zachary Tatlock, Automatically Improving Accuracy for Floating Point Expressions, 36th annual ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI 2015). br> The paper presents "Herbie, a tool which automatically discovers the rewrites experts perform to improve accuracy." That is, "manually rearranging expressions and understanding the finer details of floating point arithmetic" "to mitigate rounding error". br> Link includes the paper, a talk, and the slides.
Co-Chairs
Paul E. Black
(NIST)
paul.black@nist.gov
Simson L. Garfinkel
(U.S. Census Bureau)
simson.l.garfinkel@census.gov
Program Committee
Jim Alves-Foss | (U Idaho) |
Rance Cleaveland | (NSF) |
Alan Heckert | (NIST) |
Martin Klein | (U.S. Census Bureau) |
Rick Kuhn | (NIST) |
Stephen Magill | (Galois) |
Edward "Ned" Porter | (U.S. Census Bureau) |
John Henry Scott | (NIST) |
Ray Richards | (DARPA) |
Eric Smith | (Kestrel Institute) |
Matthew Wilding | (Collins Aerospace) |
Huan Xu | (U Maryland) |
This page's URL is https://samate.nist.gov/FMSwVRodeo/FMfSS2019.html.