Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data 101: Documentation & Metadata

In this LibGuide, we introduce you to the wide world of data, including data types (qualitative, quantitative, ethnographic, geospatial, etc.), finding data, visualizing data, and managing data.

                                                                     Image Description: Machine learning workflow documenting different stages of the analysis

Documenting your data means: 

  • Better data discovery
  • Increased citations of your data
  • Broader impact of your scholarly products
  • Less stress managing your own research

Data Documentation

Documenting your research data means ensuring all data generated and/or collected is easy to understand, analyze, and reuse. A good practice is to consider whether your documentation would address the following situations:

1) If someone from another discipline outside of my own were to look at the data, would the documentation help provide important context to understanding the data?

2) If someone were to look at this data in 20 years, would they be able to understand why and how it was collected?

3) If someone wanted to reuse my data, would they know which software to use to replicate my findings?

Types of research data documentation may include, but are not limited to:

  • lab notebooks (such as LabArchives)
  • methodology reports
  • codebooks or data dictionaries with full variable and value labels
  • documenting decisions about software
  • tracking changes to different versions of the dataset through version control (such as in GitHub)
  • recording assumptions made during analysis

Open Science Framework is a great place to document not only your research data but an entire research project, all in one simple location. 

Cite Your Data

Why should you cite your data? There are a multitude of reasons, from ensuring your research data can be discovered, verified, and even reused, to increasing the impact and reach of your scholarly research. Datasets should have an identifier and a locator. A persistent identifier is a unique Web-compatible, alphanumeric code that points to a specific dataset that will be preserved for the long term. The dataset identifier is an identifier of the dataset such as its title, file name, or even an object ID code. Examples of identifiers are UUID (Universally Unique Identifier), OID (Object Identifier), LSID (Life Sciences Identifier).

Metadata

Simply put, metadata can be known as "data about data." But it's so much more complex (and fascinating) than that! Metadata are data that provide descriptive information (content, context, quality, structure, and accessibility) about a data product and enable others to search for and use the data product. In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata are a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.

The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.

Some specific examples of metadata standards, both general and domain specific are:

 

GitHub for Version Control

GitHub is a code hosting platform for version control and collaboration. When you make changes to a file on GitHub, it keeps track of what changes you made, and when. Whether working by yourself or with research collaborators, using GitHub as a version control platform can help you keep track of your evolving research data outputs!

Additional Resources

Helpful Books in the CMU Libraries:

Helpful Online Resources:

Metadata Specialist

Profile Photo
Angelina Spotts
Contact:
Hunt Library
Digitization Lab
5550 Penn Ave.
Pittsburgh, PA
15213
4122687261

Data, Gaming, and Popular Culture Librarian

Profile Photo
Hannah Gunderman
Contact:
410A Hunt Library
4909 Frew Street
Carnegie Mellon University
Pittsburgh, PA 15213
Office Phone: 412-268-7258

Credits and Acknowledgements

A special thank you to Sue Collins (https://www.library.cmu.edu/about/people/sue-collins), Senior Librarian and Liaison for Engineering & Public Policy and History, for creating many of the original sections and structures on which this LibGuide is based on and evolved from. 

Banner image courtesy of Yancy Min on Unsplash, found here: https://unsplash.com/photos/842ofHC6MaI. Design made in Canva.