Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Email me, or schedule a one-on-one research consultation.
JupyterLab is an interactive development environment for working with notebooks, code and data. It enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area.
- Drag-and-drop to reorder notebook cells and copy them between notebooks.
- Run code blocks interactively from text files (.py, .R, .md, .tex, etc.).
- Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work.
- Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more
Data Collaborations Lab
The CMU Libraries Data Collaboration Lab (DataCoLAB) connects the research community across disciplinary borders, and facilitates collaborations between data producers and data scientists. The program connects researchers who want more from their datasets with individuals who have data and computer science skills, creating opportunities for people with different technical and disciplinary backgrounds to work together.
Want to learn more or ask questions? Email dataCoLAB@andrew.cmu.edu.
Essential Readings on Research Reproducibility
What is Research Reproducibility?
The "Reproducibility Crisis" and What Can We Do?
Simple Rules to Enhance Reproducibility
Reproducible Workflow for Biomedical Research
"Reproducibility: automated.": use “continuous analysis" workflow to automate and containerize data analysis steps, allowing others to easily reproduce and build on the results.
Reproducibility in the Research Life Cycle
Stage 1: Designing and Planning
Organizing literature with reference managers: Mendeley or Zotero
Collect articles and PDF from web browsers as you discover them
Organize, read and annotate in one place
Share with your group
Prepare references for Microsoft Word or LaTeX (BibTeX) with ease
Use an Electronic Lab Notebook (ELN)
- Document study design, reagents, procedures, data analysis, images, and other results in one platform
- Searchable and discoverable
- Great tool for note-keeping, lab management, and collaboration
- Which one to pick?
- Open source, great tool for research transparency and reproducibility
- Mange research components on different platforms at one place
- Ideal for documenting and organizing collaborations
- Project exmples:
Plan ahead and write the Data Management Plan (DMP)
- Required by many funders and publishers
- Before starting a project, think about research question, sample collection methods, statistical power, software and hardware tools, project management and documentation, result sharing and dissemination
- Find more about DMP basics here
Stage 2: Collecting and Analyzing Data
- Follow good file naming conventions: use meaningful names, and avoid space and special characters
- Document metadata
- Consider file security
- Back up following the 3-2-1 rule
- 3 copies of your data - 2 copies are not enough
- 2 different formats - i.e. hard drive+tape backup or DVD (short term)+flash drive
- 1 off-site backup - have 2 physical backups and one in the cloud
Use an ELN for note-taking and OSF for project management (see above)
Use Literate Programming to weave together text, code, and visualization
- Interactive programing environment that not only work as text editors, but also displays notes and execution results
- Supports 40+ programming languages
- Works together with Binder and GitHub to reproduce code and computational environment
- Project example: Was used to release the code for the discovery of gravitational waves
- Supports multiple programming languages, including R, Python, and SQL
- Works seamlessly with other R Studio packages
Use Version Control
Reproducible computational environment
Free research computing allocations at PSC Bridges
- A data- and memory-intensive system designed to integrate HPC with Big Data
- supports a high degree of interactivity, science gateways, and a very flexible user environment
- Many popular applications for simulation, machine learning and data analytics already installed and running
- Available at no charge to the open research community
Stage 3+4: Publishing, Archiving, and Sharing your work
- Version-controlled, web-based platform that allow multiple authors to work simutaneously
- Many tutorials available
- Many style templates for specific journals, presentations, reports
- Format and insert citations with ease using .bib files
Publish in open access journals
CMU’s institutional repository: KiltHub
- Repository for many form of research product, including papers, posters, datasets, videos, etc
- Every item gets assigned a DOI
- Powered by FigShare, indexed by Google search and usually rank high in search results
Other Generalist or Subject-specific Open Repositories
Repositories for Computational Reproducibility
- A cloud-based computational reproducibility platform
- Preserve your code, data, and computational environment in a capsule and get a DOI
- Let others easily run your code in the cloud and share it privately or publicly
- Use widget to embed a working copy of your code directly into any webpage, including your personal site
- Free version available, with limited features
- A repository put together by ACM’s Reproducibility Task Force
- Encourages authors to submit software and data sets with their papers.