SAMATE Reference Dataset

The SAMATE Reference Dataset and Target Practice Test Suite

The SAMATE Reference Dataset (SRD) is a rapidly growing set of contributed test cases for measuring software assurance (SwA) tool capability against a functional specification for that tool.

This initial distribution is a compilation of C source code test cases that will be used for evaluating the functional capability of C source code scanning tools. Contributions from MIT Lincoln Lab and Fortify Software Inc. make up this initial set. Additional contributions from Klocwork Inc. and Ounce Labs Inc. will be added soon.

We expect to expand the SRD to include other languages (e.g. C++, Java) as well as to include test suites for other SwA tools (such as requirements and software design documents).

MIT Contribution

Documentation for each test case is contained in the source files themselves. In the case of the MIT contribution, the first line of each test case contains a classification code describing the test case “signature” (in terms of code complexity). All MIT discrete test cases are “buffer overflow” examples, with permutations of some of the 22 coding variation factors to challenge a tool's ability to discover a buffer overflow or recognize a patched version of the overflow. Also, MIT contributed 14 models (scaled-down versions) of 3 real world applications (bind, sendmail, and wu-ftpd).

Fortify Software Test Case Contribution

Fortify Software has contributed C code test cases, the majority of which are also buffer overflow vulnerabilities. Additionally a number of race condition, command injection and other vulnerabilities are also included in the test suite. Like the MIT test cases, the Fortify test cases are “self documenting”, with keywoSRD describing the type of software flaw present in the code. Additionally, to provide a uniform way of classifying the complexity of the test cases, the MIT classification code is placed at the top of each test file.

Klocwork Test Case Contribution

Klocwork Inc. has donated an initial contribution of C++ test cases, the majority of which are memory management realated (e.g. memory leak, bad frees, use after frees ). They intend to follow up with an additional donation of Java test cases.

Target Practice Test Suite - [View the files] - [Download the files (zip)]

A subset of both the MIT (152 discrete test cases and 3 models) and Fortify (12) test cases make up the “target practice” test suite. A representative group of well-understood and documented tests are presented as a “starting point” to get initial feedback from tool developers and users as to how useful the test suite is. Both a “bad” (flawed) and “good” (patched) version exists for each test case.

Test Suite Execution - It is expected that each tool developer/user will run their tool against the target practice test suite before attending the workshop on Tuesday, so as to provide maximum time for discussion of the merits/deficiencies in the test suite. Tests are provided in two separate directories (MIT and Fortify). How a tool scans the test suite is at the discretion of the tool implementer/user.

Test Suite Evaluation - After running their tool on the Target Practice test suite, tool developers/users will be asked to fill out a questionnaire regarding usefulness of the test suite in the following areas:

Validity of the tests
Do test cases reflect real world examples?
Test case coverage (What software flaws should we focus on initially?)
Complexity (Were the tests challenging/enlightening for discovering a tool's capability?)
Sufficient metadata for describing test case flaws and code complexity (e.g. MIT's metadata scheme - do we need more? If so what?)

Confidentiality of Test Results - At no time is a tool developer required to report anything about their tool's performance against the Target Practice test suite. The purpose of the target practice is to solicit feedback on the SRD… NOT the tools that run against it. If a tool developer wishes to provide further insight into the usefulness of the SRD by disclosing how their tool performed against it, they do so at their own discretion.

Agenda for the Target Practice :

9 AM - 11:30 AM - Discussion of Test Results and Reference Dataset by target practice particpants and workshop attendees :

9:00 - 10:30 - Usefulness of test cases:

Validity

Do test cases reflect real world examples?

Coverage

Where (what flaws) should we focus on initially?

Complexity

What levels of code complexity are necessary to properly evaluate a tool's capability

Variation

Expressed in Taxonomy of Flaws. Or in Test Case itself?

10:30 - 11:00 - Test Case Metadata:

Classification of software flaws in test cases

What common taxonomy to use for all code scanning tools? (Plover, CLASP, Fortify, Klocwork)
How can all the taxonomies be harmonized?
Correct metadata for describing test case complexity ( e.g. MIT's metadata scheme - do we need more? if so what? )

11:00 - 11:20 - Requirements for an “easy to use” Reference Dataset:

Security
Web Accessibility?
Ad Hoc Query Capability
Validatable Submission
Batch Submission(1000s of Test Cases)
Dynamic Test Case Generation
Access Control
Demo NIST Prototype SRD

11:20 - 11:30 - Next Steps:

Harmonize ideas for a common taxonomy of SwA flaws and vulnerabililties
Test Case Submission by SwA community

Fortify
Klocwork
Ounce Labs
MIT
Other

11:30 Lunch

Disclaimer: Any commercial product mentioned is for information only; it does not imply recommendation or endorsement by NIST nor does it imply that the products mentioned are necessarily the best available for the purpose.