Image Description: Machine learning workflow documenting different stages of the analysis
Documenting your data means:
Documenting your research data means ensuring all data generated and/or collected is easy to understand, analyze, and reuse. A good practice is to consider whether your documentation would address the following situations:
1) If someone from another discipline outside of my own were to look at the data, would the documentation help provide important context to understanding the data?
2) If someone were to look at this data in 20 years, would they be able to understand why and how it was collected?
3) If someone wanted to reuse my data, would they know which software to use to replicate my findings?
Types of research data documentation may include, but are not limited to:
Open Science Framework is a great place to document not only your research data but an entire research project, all in one simple location.
Why should you cite your data? There are a multitude of reasons, from ensuring your research data can be discovered, verified, and even reused, to increasing the impact and reach of your scholarly research. Datasets should have an identifier and a locator. A persistent identifier is a unique Web-compatible, alphanumeric code that points to a specific dataset that will be preserved for the long term. The dataset identifier is an identifier of the dataset such as its title, file name, or even an object ID code. Examples of identifiers are UUID (Universally Unique Identifier), OID (Object Identifier), LSID (Life Sciences Identifier).
Simply put, metadata can be known as "data about data." But it's so much more complex (and fascinating) than that! Metadata are data that provide descriptive information (content, context, quality, structure, and accessibility) about a data product and enable others to search for and use the data product. In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata are a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.
Some specific examples of metadata standards, both general and domain specific are:
GitHub is a code hosting platform for version control and collaboration. When you make changes to a file on GitHub, it keeps track of what changes you made, and when. Whether working by yourself or with research collaborators, using GitHub as a version control platform can help you keep track of your evolving research data outputs!
Helpful Books in the CMU Libraries:
Helpful Online Resources:
A special thank you to Sue Collins (https://www.library.cmu.edu/about/people/sue-collins), Senior Librarian and Liaison for Engineering & Public Policy and History, for creating many of the original sections and structures on which this LibGuide is based on and evolved from.
Banner image courtesy of Yancy Min on Unsplash, found here: https://unsplash.com/photos/842ofHC6MaI. Design made in Canva.