Skip to Main Content Carnegie Mellon University Libraries

Data Management for Research

FAIR Principles for Research Software

Executive Summary

In the modern research landscape, software has become a fundamental component of scholarly work across all disciplines. The FAIR principles—Findable, Accessible, Interoperable, and Reusable—provide a framework to enhance the quality, impact, and longevity of research software. This guide offers practical approaches to implementing these principles in your research.

1. Introduction

Research software is "source code files, algorithms, scripts, computational workflows and executables that were created during the research process or for a research purpose" (Gruenpeter et al., 2021). As a critical research output, software requires the same careful attention to management and preservation as data and publications.

The FAIR principles were originally developed for data management (Wilkinson et al., 2016) but have been adapted for research software through the collaborative efforts of the Research Software Alliance (ReSA), FORCE11, and the Research Data Alliance (RDA), resulting in the FAIR Principles for Research Software (FAIR4RS) (Barker et al., 2022).

2. The FAIR4RS Principles

Each principle below is followed by practical implementation steps.

2.1 Findable

Software, and its associated metadata, is easy for both humans and machines to find.

Principle Description
F1 Software is assigned a globally unique and persistent identifier
F1.1 Components of the software representing levels of granularity are assigned distinct identifiers
F1.2 Different versions of the software are assigned distinct identifiers
F2 Software is described with rich metadata
F3 Metadata clearly and explicitly include the identifier of the software they describe
F4 Metadata are FAIR, searchable and indexable

Implementation:

  1. Obtain a persistent identifier: Deposit your software in a repository that issues DOIs, such as:
    • Zenodo (integrated with GitHub)
    • KiltHub, CMU's Institutional repository.
    • Domain-specific repositories (e.g., bio.tools for bioinformatics)
  2. Develop comprehensive metadata: Include at minimum:
    • Software name and version
    • Authors with ORCID identifiers
    • Description of purpose and functionality
    • Keywords and research field classifications
    • Dependencies and system requirements
    • Date of creation/last update
    • Associated grants and funding
  3. Create a project website: Consider creating a dedicated website for your software project hosted on your university's domain, enhancing discoverability through academic networks.
  4. Register with software indexes: Submit your software to relevant directories like the Research Software Directory, SciCrunch, or discipline-specific catalogs.

2.2 Accessible

Software, and its metadata, is retrievable via standardized protocols.

Principle Description
A1 Software is retrievable by its identifier using a standardized communications protocol
A1.1 The protocol is open, free, and universally implementable
A1.2 The protocol allows for an authentication and authorization procedure, where necessary
A2 Metadata are accessible, even when the software is no longer available

Implementation:

  1. Use standard access protocols: Make your software available through:
    • HTTPS for downloads
    • Git for version control
    • Package managers when appropriate (e.g., PyPI, CRAN, npm)
  2. Clarify access conditions: Specify any authentication requirements clearly, especially for software with sensitive applications or data.
  3. Leverage institutional infrastructure: Work with CMU Libraries, OSPO and IT services to ensure long-term accessibility, even after you change institutions.
  4. Separate metadata from software: Ensure metadata is deposited in a persistent location (e.g., institutional repositories, Zenodo) even if the software itself is hosted elsewhere.
  5. Provide alternative access methods: Consider offering multiple ways to access your software to enhance accessibility across different computational environments.

2.3 Interoperable

Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.

Principle Description
I1 Software reads, writes and exchanges data in a way that meets domain-relevant community standards
I2 Software includes qualified references to other objects

Implementation:

  1. Follow community data standards: Use established data formats and standards in your field. Examples include:
    • Life sciences: FASTA, FASTQ, BAM, VCF
    • Chemistry: CML, SDF
    • Earth sciences: NetCDF, GeoJSON
    • Social sciences: DDI, CSV with standardized variables
  2. Document APIs clearly: If your software provides an API, document it thoroughly using standards like OpenAPI.
  3. Include structured citations: Reference datasets, other software, and publications using persistent identifiers.
  4. Implement common interfaces: Design your software to work with established workflows and pipelines in your research domain.
  5. Collaborate with your research IT office.

2.4 Reusable

Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).

Principle Description
R1 Software is described with a plurality of accurate and relevant attributes
R1.1 Software is given a clear and accessible license
R1.2 Software is associated with detailed provenance
R2 Software includes qualified references to other software
R3 Software meets domain-relevant community standards

Implementation:

  1. Choose an appropriate license:
    • MIT or Apache 2.0 for maximum reusability
    • GPL for ensuring derivatives remain open source
    • Custom licenses for specific university requirements
    • You can consult your librarian.
  2. Document thoroughly: Provide comprehensive documentation:
    • README file with overview and quick start
    • Installation instructions for different environments
    • User guide with examples
    • API documentation if applicable
    • Contributing guidelines
  3. Follow coding best practices:
    • Comment your code
    • Use consistent style
    • Implement automated testing
    • Manage dependencies explicitly
    • Version control with descriptive commit messages
  4. Track provenance: Document the origin of algorithms, methods, and data used in your software.
  5. Create examples and tutorials: Develop example data, use cases, and tutorials that demonstrate the functionality of your software in research contexts.
  6. Encourage community engagement: Set up mechanisms for users to report issues, request features, and contribute improvements.

3. Practical Implementation Examples

Example 1: MATLAB Toolbox for Neuroimaging Analysis

Findable:

  • Deposited in Zenodo with DOI
  • Comprehensive README and function-level documentation
  • Registered in the university's software catalog
  • Published a methods paper describing the toolbox

Accessible:

  • Available for download from Zenodo and GitHub
  • Pre-compiled versions (stand alone) for those without MATLAB licenses
  • Regular releases with version-specific DOIs
  • Maintained metadata in institutional repository

Interoperable:

  • Supports standard neuroimaging formats (NIFTI, BIDS)
  • Well-documented API for integration with other tools
  • Can be called from Python using MATLAB Engine
  • Clear documentation of input/output specifications

Reusable:

  • BSD 3-Clause license with university approval
  • Extensive tutorials with sample datasets
  • Modular design allowing reuse of specific components
  • Comprehensive test suite ensuring reliability
  • Detailed attribution of included algorithms

Example 2: R Package for Statistical Analysis in Ecology

Findable:

  • Published on CRAN and GitHub
  • DOI from Zenodo for specific versions
  • Registered in the rOpenSci catalog
  • Keywords optimized for ecology research

Accessible:

  • Installable directly from R via CRAN
  • Source code openly available on GitHub
  • All releases preserved in Zenodo
  • Detailed vignettes accessible through R help system

Interoperable:

  • Functions accept and return tibbles/data.frames
  • Imports and exports standard file formats (CSV, NetCDF)
  • Compatible with tidyverse workflows
  • Uses controlled vocabularies for ecological parameters

Reusable:

  • GPL-3 license
  • Includes citation file (CITATION.cff)
  • Extensive documentation with ecological examples
  • Continuous integration testing
  • Explicit handling of dependencies

4. Benefits in the Academic Context

Implementing FAIR principles for research software offers several benefits in academic settings:

  1. Enhanced Research Impact: FAIR software is more likely to be discovered, used, and cited.
  2. Improved Reproducibility: Well-documented, accessible software supports reproducible research claims.
  3. Funding Advantage: Many grants now require software management plans aligned with FAIR principles.
  4. Collaboration Opportunities: FAIR software facilitates collaboration across institutions and disciplines.
  5. Career Recognition: Some universities now recognize software as a legitimate research output for promotion and tenure.
  6. Teaching Integration: FAIR research software can be more easily incorporated into teaching materials.
  7. Institutional Compliance: Helps meet institutional and funder requirements for open science.

5. Organizational Adoption Examples

Several research organizations are implementing the FAIR4RS Principles:

Australian Research Data Commons (ARDC)

The ARDC is updating its co-investment policy to reference the FAIR4RS Principles, developing a FAIR research software self-assessment tool, and making its own software outputs FAIR to demonstrate impact.

ELIXIR

ELIXIR recommends that all research outputs of ELIXIR infrastructure be FAIR, including software. They are aligning their Software Management Plan with the FAIR4RS Principles and developing training materials.

Netherlands eScience Center

The Netherlands eScience Center is using the FAIR4RS Principles to support reusable software creation, developing necessary skills through training programs, and collaborating on national templates for Software Management Plans.

6. Checklist for Making your Research Software FAIR

F - Findable

Choose appropriate repositories

  • Primary development repository (e.g., GitHub, GitLab)
  • Archival repository that issues DOIs (e.g., Zenodo, Figshare, institutional repository)
  • Domain-specific repository if applicable (e.g., bio.tools, ASCL, R-universe)

Implement persistent identification

  • Obtain a DOI for stable releases via integration (e.g., GitHub-Zenodo)
  • Create a Software Heritage identifier (SWHID) as a secondary persistent ID
  • Ensure each major version receives its own DOI
  • If applicable, register with a language-specific package manager (PyPI, CRAN, npm)

Create comprehensive metadata

  • Complete all required fields in repository metadata
  • Create a standardized metadata file (e.g., codemeta.json, CITATION.cff)
  • Include contextual information about research domain and purpose
  • Add relevant keywords for discovery
  • Link to related publications, datasets, and other research objects
  • List all contributors with their ORCIDs
  • Include funding information and grant IDs

Enhance discoverability

  • Register in institutional software catalog
  • Submit to subject-specific software indexes
  • Create a dedicated software landing page or website
  • Ensure repository is indexed by search engines (no robots.txt restrictions)
  • Provide descriptive repository name and description

A - Accessible

Enable straightforward retrieval

  • Ensure software is downloadable via standard protocols (HTTPS, FTP)
  • Provide installation instructions for different operating systems
  • Create packages for relevant package managers if applicable
  • Consider containerization (Docker, Singularity) for complex dependencies
  • Provide conda/pip installation options for scientific software

Clarify access conditions

  • Describe any authentication requirements clearly
  • Document any institutional access agreements
  • Specify if special hardware/infrastructure is needed
  • Explain how to request access for restricted software

Ensure long-term accessibility

  • Deposit in an archival repository (not just GitHub)
  • Separate metadata publication to ensure it persists
  • Create preservation-friendly release packages (e.g., source tarballs)
  • Document the preservation plan and expected maintenance period
  • Consider escrow arrangements for sensitive software

Prepare for future access scenarios

  • Document contact information for software stewards
  • Create succession plan for maintenance
  • Set up institutional ownership of repositories when appropriate
  • Consider how access will continue post-funding/post-project

I - Interoperable

Adopt community standards

  • Use standard data formats for input/output
  • Implement established APIs and protocols when applicable
  • Follow domain-specific conventions for algorithms and methods
  • Adhere to relevant controlled vocabularies
  • Adopt standard command-line argument patterns (if applicable)

Design for integration

  • Create well-documented APIs
  • Provide machine-readable interface specifications (e.g., OpenAPI)
  • Implement standard input/output streams
  • Support common integration patterns (pipes, callbacks, hooks)
  • Enable component-based reuse where possible

Structure references to external objects

  • Include structured citations to other software
  • Use persistent identifiers when referencing datasets
  • Link related publications with DOIs
  • Document external service dependencies
  • Use formal references to standards implementations

Ensure portability

  • Test on multiple platforms (Windows, macOS, Linux)
  • Minimize hardcoded environment assumptions
  • Virtualize environment-specific features
  • Provide cross-platform installation options
  • Consider containerization for complex dependencies

R - Reusable

Apply appropriate licensing

  • Select an OSI-approved open-source license
  • Include full license text in the repository
  • Document license in metadata
  • Consult university IP office for institutional requirements
  • Address any third-party components and their licenses
  • Consider dual-licensing if required for specific use cases

Create comprehensive documentation

  • Write a detailed README with overview
  • Create installation guide with prerequisites
  • Develop user manual with examples
  • Produce API/function documentation
  • Include inline code comments
  • Provide tutorials for common use cases
  • Document known limitations and edge cases

Establish provenance information

  • Document origins of algorithms and methods
  • Cite foundational papers and theoretical basis
  • Clarify development history and major changes
  • Acknowledge all contributors (not just code authors)
  • Detail the roadmap and future development plans
  • Document the governance model

Enable proper citation

  • Create a CITATION file (CFF format recommended)
  • Include recommended citation format in documentation
  • Register with a service like CiteAs
  • Document version-specific citations
  • Include DOI in citation information

Support reproducible execution

  • Document all dependencies with specific versions
  • Provide environment specification files (requirements.txt, environment.yml)
  • Consider containerization (Docker, Singularity)
  • Create automated build/test workflows
  • Include sample data for verification
  • Document expected outputs for validation

Foster community engagement

  • Create contributing guidelines
  • Document code of conduct
  • Set up issue templates
  • Establish pull request process
  • Create a community discussion forum/mailing list
  • Plan for user support mechanisms

References

Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A. L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. Scientific Data, 9(1), 622. https://doi.org/10.1038/s41597-022-01710-x

Gruenpeter, M., Katz, D. S., Lamprecht, A. L., Honeyman, T., Garijo, D., Struck, A., Niehues, A., Martinez, P. A., Castro, L. J., Rabemanantsoa, T., Chue Hong, N. P., Martinez-Ortiz, C., Fouilloux, A., Liffers, M., Foufoulas, Y., Konovalov, A., Weilenmann, J.-M., Pelikan, M., Orviz, P., & Grüning, B. (2021). Defining Research Software: a controversial discussion. Zenodo. https://doi.org/10.5281/zenodo.5504016

Jiménez, R. C., Kuzak, M., Alhamdoosh, M., Barker, M., Batut, B., Borg, M., Capella-Gutierrez, S., Chue Hong, N., Cook, M., Corpas, M., Flannery, M., Garcia, L., Gelpí, J. L., Gladman, S., Goble, C., González Ferreiro, M., Gonzalez-Beltran, A., Griffin, P. C., Grüning, B., … Crouch, S. (2017). Four simple recommendations to encourage best practices in research software. F1000Research, 6, 876. https://doi.org/10.12688/f1000research.11407.1

Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., Dominguez Del Angel, V., van de Sandt, S., Ison, J., Martinez, P. A., McQuilton, P., Valencia, A., Harrow, J., Psomopoulos, F., Gelpi, J. L., Chue Hong, N., Goble, C., & Capella-Gutierrez, S. (2020). Towards FAIR principles for research software. Data Science, 3(1), 37–59. https://doi.org/10.3233/DS-190026

Netherlands eScience Center & DANS. (2020). Five Recommendations for FAIR Software. https://fair-software.nl/

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18