Skip to Main Content

Data Management for Research

What does it mean to document your data? What are the benefits of it?

Documenting your data in a research project means writing down what you do during the project! Documentation captures your work on the dataset so that others could understand what you did and be able to reproduce your work if needed.  Your documentation should include both your step-by-step processes (what you did) as well as placing the project in a larger context (why you did it).  Your documentation should include the following elements at a minimum:

  •     What was done
  •     How it was done
  •     Why it was done
  •     When the work was performed
  •     Where it was performed
  •     Who performed the work

 

This helps ensure that all data generated and/or collected is easier to understand, analyze, and reuse. A good practice is to consider whether your documentation would address the following situations:

  • If someone from another discipline outside of my own were to look at the data, would the documentation help provide important context to understanding the data?
  • If someone were to look at this data in 5 years, would they be able to understand why and how it was collected?
  • If someone wanted to reuse my data, would they know which software to use to replicate my findings?

 

Possible elements to consider documenting include:

  • The reasoning and context for the research you are conducting
  • The reasoning and hypotheses for the specific experiments that you are conducting
  • Descriptions of your experiments, methodologies and analysis  
  • Providing a high-level description of your data
  • Describing the content of your data files, including the units of measurement used, how missing values are accounted for, and any conditions that might affect the quality of the data
  • Time and duration of the data collection
  • Location or geographical coordinates of the data collection
  • Information about the precision, accuracy or any uncertainties in your data, including any outliers
  • A description of the instruments that you are working with including the settings, calibrations and software that you are using
  • Where the data is stored and backed up
  • Any quality assurance procedures applied to your data
  • Definitions of the terms that you are using (a glossary)

 

 

What is Version Control?

Do you ever change something in your data, such as adding or deleting certain variables, and you tell yourself that you'll definitely remember when and why you made those changes without writing it down? The truth is, we rarely remember! And that is definitely not a negative trait about yourself - we simply have so much going on and we don't always remember each and every decision we make every day! It's completely understandable. This is where version control comes in! Version control is a documentation technique that entails keeping track of the changes you make to your data over time so you can easily go back and view those changes, having a clear record of how your data has evolved. It can include putting a "v1", "v3", "v8", etc. at the end of your file names to note changes in the file over time, and it can also include using specialized tools such as Git that keep track of granular changes in your document.

Resources for Learning More about Documentation

 

 

  • Maintaining a Laboratory Notebook: Written by Colin Purrington, an evolutionary biologist, this site provides a lengthy list of dos and don'ts in keeping a lab notebook for any discipline.

 

  • A Visual Guide to Version Control: Produced by Better Explained, a comprehensive guide to version control that provides terminology, graphics, and recommended practices for version control.