Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Engineering and Public Policy: Data Management: Documentation

MetaData

Data that provides descriptive information (content, context, quality, structure, and accessibility) about a data product and enables others to search for and use the data product. In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata is a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.

The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.

Some specific examples of metadata standards, both general and domain specific are:

Data Documentation

Make sure all data generated and/or collected is easy to understand and analyze. If someone were to look at this in 20 years, they should be able to understand what and why it was done. Stable non-proprietary software should be used.  Documentation may include, but is not limited to:

  • lab notebooks
  • methodology reports
  • codebooks with full variable and value labels,
  • documenting decisions about software,
  • tracking changes to different versions of the dataset,
  • recording assumptions made during analysis. 

Cite Your Data

Datasets should have an identifier and a locator. A persistent identifier is a unique Web-compatible, alphanumeric code that points to a specific dataset that will be preserved for the long term. The dataset identifier is an identifier of the dataset such as its title, file name, or even an object ID code. Examples of identifiers are UUID (Universally Unique Identifier), OID (Object Identifier), LSID (Life Sciences Identifier).