CMU LibGuides: Engineering and Public Policy: Data Management: Documentation

MetaData

Data that provides descriptive information (content, context, quality, structure, and accessibility) about a data product and enables others to search for and use the data product. In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata is a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.

The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.

Some specific examples of metadata standards, both general and domain specific are:

Dublin Core
Domain agnostic, basic and widely used metadata standard
DDI ((Data Documentation Initiative)
Common standard for social, behavioral and economic sciences, including survey data
EML (Ecological Metadata Language)
Specific for ecology disciplines
ISO 19115
For describing geospatial information
FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadat
For describing geospatial information
MINSEQE (MINimal information about high throughput SEQeuencing Experiments)
Genomics standard
FITS (Flexible Image Transport System)
Astronomy digital file standard that includes structured, embedded metadata

Data Documentation

Make sure all data generated and/or collected is easy to understand and analyze. If someone were to look at this in 20 years, they should be able to understand what and why it was done. Stable non-proprietary software should be used. Documentation may include, but is not limited to:

lab notebooks
methodology reports
codebooks with full variable and value labels,
documenting decisions about software,
tracking changes to different versions of the dataset,
recording assumptions made during analysis.

Cite Your Data

Datasets should have an identifier and a locator. A persistent identifier is a unique Web-compatible, alphanumeric code that points to a specific dataset that will be preserved for the long term. The dataset identifier is an identifier of the dataset such as its title, file name, or even an object ID code. Examples of identifiers are UUID (Universally Unique Identifier), OID (Object Identifier), LSID (Life Sciences Identifier).

Additional Resources

EPP Librarian

Julie Chen

Contact Info
Julie Chen, Librarian
Sorrells Engineering & Science Library, Wean Hall
412-268-6116

Engineering and Public Policy: Data Management: Documentation

MetaData

Data Documentation

Cite Your Data

Additional Resources

EPP Librarian

EPP pages