CMU LibGuides: Research Reproducibility: Home (Editable)

JupyterLab

JupyterLab Is Now Ready for Users!

JupyterLab is an interactive development environment for working with notebooks, code and data. It enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.

Main features:

Drag-and-drop to reorder notebook cells and copy them between notebooks.
Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more

Essential Readings on Research Reproducibility

What is Research Reproducibility?

What does research reproducibility mean? BY STEVEN N. GOODMAN, DANIELE FANELLI, JOHN P. A. IOANNIDIS SCIENCE TRANSLATIONAL MEDICINE01 JUN 2016 : 341PS12

The "Reproducibility Crisis" and What Can We Do?

"1,500 scientists lift the lid on reproducibility ": A survey article published in Nature, discussing whether the "reproducibility crisis" in scientific research is real, what the potential causes are and what we could do to make our research more reproducible.
"Challenges in irreproducible research": A special collection of Nature (July 2018), that addresses the growing alarm about results that cannot be reproduced.
"Scientific Rigour and Reproducibility": Another special collection of Nature (April 2018) that contain articles specifically targeted to discuss issues in research reproducibility.
"Reproducibility of research: Issues and proposed remedies": A special issue of PNAS (March 2018), focusing on issues and potential solutions for research reproducibility.
"Report on the First IEEE Workshop on The Future of Research Curation and Research Reproducibility": NSF report on reproducibility landscape, the importance of research curation, and calls out collaborative effort among publishers, research communities, and funders.
NIH resources and guidelines on "Rigor and Reproducibility"

Simple Rules to Enhance Reproducibility

"Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript": A list of some of the most common statistical mistakes that appear in the scientific literature, as well as advice on how authors, reviewers and readers can identify and resolve these mistakes.
Best practices and good enough practices for computational reproducibility: "Ten Simple Rules for Reproducible Computational Research" and "Good enough practices in scientific computing"

Reproducible Workflow for Biomedical Research

"Reproducibility: automated.": use “continuous analysis" workflow to automate and containerize data analysis steps, allowing others to easily reproduce and build on the results.

Reproducibility in the Research Life Cycle

Stage 1: Designing and Planning

Organizing literature with reference managers: Mendeley or Zotero

Collect articles and PDF from web browsers as you discover them
Organize, read and annotate in one place
Share with your group
Prepare references for Microsoft Word or LaTeX (BibTeX) with ease

Use an Electronic Lab Notebook (ELN)

Document study design, reagents, procedures, data analysis, images, and other results in one platform
Searchable and discoverable
Great tool for note-keeping, lab management, and collaboration
Which one to pick?
- Access your experimental needs
- Institutional support (We are in the process of evaluating ELNs and purchasing a license. Feedback welcomed!)

Use a project management platform: Open Science Framework

Open source, great tool for research transparency and reproducibility
Mange research components on different platforms at one place
Ideal for documenting and organizing collaborations
Project exmples:
- Reproducibility Project: Cancer Biology
- Reproducibility Project: Psychological Science

Plan ahead and write the Data Management Plan (DMP)

Required by many funders and publishers
Before starting a project, think about research question, sample collection methods, statistical power, software and hardware tools, project management and documentation, result sharing and dissemination
Find more about DMP basics here

Stage 2: Collecting and Analyzing Data

Follow good data management practices to avoiding losing your work

Follow good file naming conventions: use meaningful names, and avoid space and special characters
Document metadata
Consider file security
Back up following the 3-2-1 rule
- 3 copies of your data - 2 copies are not enough
- 2 different formats - i.e. hard drive+tape backup or DVD (short term)+flash drive
- 1 off-site backup - have 2 physical backups and one in the cloud

Use an ELN for note-taking and OSF for project management (see above)

Use Literate Programming to weave together text, code, and visualization

Jupyter Notebook
- Interactive programing environment that not only work as text editors, but also displays notes and execution results
- Supports 40+ programming languages
- Works together with Binder and GitHub to reproduce code and computational environment
- Project example: Was used to release the code for the discovery of gravitational waves
R Markdown
- Supports multiple programming languages, including R, Python, and SQL
- Works seamlessly with other R Studio packages

Use Version Control

Git and GitHub

Reproducible computational environment

Docker
Free research computing allocations at PSC Bridges
- A data- and memory-intensive system designed to integrate HPC with Big Data
- supports a high degree of interactivity, science gateways, and a very flexible user environment
- Many popular applications for simulation, machine learning and data analytics already installed and running
- Available at no charge to the open research community

Stage 3+4: Publishing, Archiving, and Sharing your work

Collaborative writing tool for LaTeX: Overleaf (CMU license)

Version-controlled, web-based platform that allow multiple authors to work simutaneously
Many tutorials available
Many style templates for specific journals, presentations, reports
Format and insert citations with ease using .bib files

Publish in open access journals

CMU’s institutional repository: KiltHub

Repository for many form of research product, including papers, posters, datasets, videos, etc
Every item gets assigned a DOI
Powered by FigShare, indexed by Google search and usually rank high in search results

Other Generalist or Subject-specific Open Repositories

See a list of data repositories recommended by Nature Scientific Data

Repositories for Computational Reproducibility

Code Ocean
- A cloud-based computational reproducibility platform
- Preserve your code, data, and computational environment in a capsule and get a DOI
- Let others easily run your code in the cloud and share it privately or publicly
- Use widget to embed a working copy of your code directly into any webpage, including your personal site
- Free version available, with limited features
Software and Data Artifacts in the ACM Digital Library
- A repository put together by ACM’s Reproducibility Task Force
- Encourages authors to submit software and data sets with their papers.

Research Reproducibility: Home (Editable)