Skip to Main Content Carnegie Mellon University Libraries

Data Management for Research

Documentation

Documentation captures your work on the dataset so that others could understand what you did and be able to reproduce your work if needed.  Your documentation should include both your step-by-step processes (what you did) as well as placing the project in a larger context (why you did it).  Your documentation should include the following elements at a minimum:

  •     What was done
  •     How it was done
  •     Why it was done
  •     When the work was performed
  •     Where it was performed
  •     Who performed the work

This helps ensure that all data generated and/or collected is easier to understand, analyze, and reuse. A good practice is to consider whether your documentation would address the following situations:

  • If someone from another discipline outside of my own were to look at the data, would the documentation help provide important context to understanding the data?
  • If someone were to look at this data in 5 years, would they be able to understand why and how it was collected?
  • If someone wanted to reuse my data, would they know which software to use to replicate my findings?

Possible elements to consider documenting include:

  • The reasoning and context for the research you are conducting
  • The reasoning and hypotheses for the specific experiments that you are conducting
  • Descriptions of your experiments, methodologies and analysis  
  • Providing a high-level description of your data
  • Describing the content of your data files, including the units of measurement used, how missing values are accounted for, and any conditions that might affect the quality of the data
  • Time and duration of the data collection
  • Location or geographical coordinates of the data collection
  • Information about the precision, accuracy or any uncertainties in your data, including any outliers
  • A description of the instruments that you are working with including the settings, calibrations and software that you are using
  • Where the data is stored and backed up
  • Any quality assurance procedures applied to your data
  • Definitions of the terms that you are using (a glossary)

Version Control

Version control is a documentation technique that entails keeping track of the changes you make to your data over time so you can easily go back and view those changes, having a clear record of how your data has evolved. It can include putting a "v1", "v3", "v8", etc. at the end of your file names to note changes in the file over time, and it can also include using specialized tools such as Git that keep track of granular changes in your document.


Contact

If you are unsure of the data management policies or practices best suited for your research, or if you have any other questions, please contact the University Libraries Data and Publishing Services team.