SAMATE Reference Dataset (SRD) Manual
The purpose of the reference dataset is to provide researchers, SA tool developers, and end users with a database of known software security errors and fixes for them. Those security errors are presented as problem samples in design, source code, binaries, etc. from all the phases of software lifecycle. The samples will include "synthetic" (written to test), collected from "the wild" (production), and academic (from university researchers). This database will also contain real software applications with known bugs and vulnerabilities. This will allow developers to test their methods and end users to evaluate a tool when considering it. The dataset intends to encompass a wide variety of possible vulnerabilities, languages, platforms, and compilers. The dataset is anticipated to become a large-scale effort, gathering samples from any contributors. The creation of such reference dataset is a major enabling goal of the NIST SAMATE Project. There is more information about the ideas behind the reference dataset in the SAMATE Reference Dataset philosophy page. To access the database itself, visit http://samate.nist.gov/SRD/.
Eligibility of a test case
Any software artifact with security vulnerabilities is welcome to be submitted. Samples of avoiding or mitigating such vulnerabilities are also welcome. Although we intend to have security errors from the whole software lifecycle, this dataset concentrates on source code for now.
Submit Test Cases
A test case consists of one or more files, which manifest the security error, and metadata about the file(s), such as the weakness type, language, etc. Contact us to submit test cases.
View, Search, and Download Test Cases
Any user can view, search, or download test cases. The view/download screen and its subsequent screens present all the test cases in the SRD. You can download selected test cases on a page, download the entire SRD, or download those on a page.
Clicking a Test Case ID displays that test case.
You can search for test cases according to certain search criteria, such as test case id, test case description, language, weakness type, string in the test file, etc. As above, you can download selected test cases from the set of test cases found, download all test cases found, or all the test cases on a page.
Source Code Obfuscation
In order to provide variation of source code, we added a simple obfuscation tool. If the user selects "Obfuscate" on a test case listing page, then clicks one of the download buttons, the downloaded test cases will be obfuscated.
The obfuscation tool preserves the weaknesses (or lack thereof) found in the original code. Also, if a source file can be compiled, then the obfuscated source file can be compiled (but see limitations below).
The obfuscation tool:
- Replaces the names of structures, classes, enumerated types and namespaces with random strings.
- Replaces source file names with random strings.
- Removes comments.
- Does not replace file names in Makefiles, shell scripts, SQL files, etc.
- Works only with C, C++ and Java source code. Only obfuscates known extensions (c, cc, cpp, c++, cxx, h, hh, hpp, h++, hxx, java)
- Sometimes replaces contents of the strings.
- Does not handle properly the cases where a variable has the same name as enum or struct.
Anyone can view and download test suites.
- Test suite on the navigation bar will lead to a screen that can display and down test suite.
- The cell in the Test Suite column of view/download screen and search/download screen will prompt for creating test suite when click on. After a test suite has created, the sign-in user can create more test suites via Test Suite tab on the navigation bar.
- After a test suite is created, the registered user can modify and delete the created test suite via Test Suite on the navigation bar.
When a test case is first added, its status is "Candidate". After review, the status of a test case could be "Approved". If a test case needs to be withdrawn, it is marked "Deprecated". It is still available, for historical purposes, but should not be used in any new work. See Test Case Status - What it Means for details of the review process.
Comments are the remarks, observations, annotations or clarifications of the test case. The general audience can view the comments on a selected test case screen. Registered users are welcome to comment on any the test case.
Only registered user can submit test cases or create test suites. The registration process is simple. Send us some basic information, such as name, email address, and. Upon our approval, a registered user has the following privileges:
- Submit test case
- Add comments to any test case
- Modify the metadata of a test case that he/she submitted
- Create a test suite
- Modify or delete a test suite that he/she created
A registered user can display/modify his profile. Click on profile on the top menu bar. The profile of the sign-in user will display. The user can click on a specific field to make change. On this profile screen, the sign-in user can also list and search test cases he entered.
Modify Test Case Metadata
The contributor can modify the metadata of a test case he submitted. Click on the specific field to be changed on the test case screen and enter the updated information. However, the source code of the test case is not allowed to modify.
The SRD is continuing to evolve. Suggested near-term enhancements include:
These are major subsystems to be added, which will require many changes.
- Enhance SRD architecture to support "complex" test cases, for example,
- if test cases use a large infrastructure (like web app scanner), they could share it
- a single, large case may have many pieces which are updated from time to time. the best we could do now is (a) one huge test case, which is deprecated when a piece changes or (b) a test suite composed of test cases - the user would have to know to collect them up and combine them to use them.
- Support Easy Running of Downloaded Test Cases
- Indicate which file(s) is the argument to the test script
- Add "target file(s)" field to test case metadata
- Indicate the expected result (like good or bad, and where)!
- Populate the Expected Output field in the metadata
- As above, have download collect this info in some easily (mechanically) accessible way.
- The sample tool script should have an option to do all the downloaded target files at once, that is, in one tool invocation.
- Indicate which file(s) is the argument to the test script
- Present all downloaded test cases as one big program. That is, there is one main().
- "Compile" instructions would have to indicate each subdirectory, or, everything would have to put in one directory.
- This could be an option. With this, a source code analyzer would only have to be started once, which may save considerable time.
- Each test case would have to be a procedure, which could be called by a (synthesized) main().
- This only applies to source code artifacts, not binaries, designs, etc.
- Link from contributor
- In a test case page, e.g., http://samate.nist.gov/SRD/?7, link from Contributor to the SRD Acknowledgements.
- Add ability to search test cases by:
- test case id (ranges)
- file size
- Full-text search of description - natural language search, Boolean operators.
- other metadata
- other code metrics? LOC (Line of Code)? Cyclomatic complexity? Keywords?
- Optional case-sensitive search
- Allow Boolean connections between search criteria
- Users should be able to download information about all the test suites. (7 May 2007)
- They can download all test cases, with a manifest, but not the test suites. That manifest, or another manifest, could include all test suites and their information.
- Statistics of visit for each web page (available through Special pages, Popular pages) and each test case
- It is partially done. There is statistics about visitors, pages but not really on test cases.
- When we can search by contributor, enhance the SRD Acknowledgements page with links to search and show the test cases of the contributor.
- Add button to select (or download) all test cases in search or display set. With that user can search, then easily download them all
- It's possible to upload twice the same file for a given test case.
- Refreshing the screen after having submitted a test case may submit the test cases again with a different test case ID.
- Due to POST variables... don't know yet what to do (Romain)
- The registration process should use a randomly generated image (e.g., letter and numbers to type by user or Captcha) to verify human interaction. The risk is that someone uses a script to register hundreds of users.
These are internal to the SRD. They are not visible to users.
- Use a source code control system. Let SAMATE members make small changes by themselves.
- Document the tests (for regression testing)