FAIR Principles for Research Software

Executive Summary
1. Introduction
2. The FAIR4RS Principles
3. Practical Implementation Examples
4. Benefits in the Academic Context
5. Organizational Adoption Examples
6. Comprehensive Checklist for Making Research Software FAIR
References

Executive Summary

In the modern research landscape, software has become a fundamental component of scholarly work across all disciplines. The FAIR principles—Findable, Accessible, Interoperable, and Reusable—provide a framework to enhance the quality, impact, and longevity of research software. This guide offers practical approaches to implementing these principles in your research.

1. Introduction

Research software is "source code files, algorithms, scripts, computational workflows and executables that were created during the research process or for a research purpose" (Gruenpeter et al., 2021). As a critical research output, software requires the same careful attention to management and preservation as data and publications.

The FAIR principles were originally developed for data management (Wilkinson et al., 2016) but have been adapted for research software through the collaborative efforts of the Research Software Alliance (ReSA), FORCE11, and the Research Data Alliance (RDA), resulting in the FAIR Principles for Research Software (FAIR4RS) (Barker et al., 2022).

2. The FAIR4RS Principles

Each principle below is followed by practical implementation steps.

2.1 Findable

Software, and its associated metadata, is easy for both humans and machines to find.

Principle	Description
F1	Software is assigned a globally unique and persistent identifier
F1.1	Components of the software representing levels of granularity are assigned distinct identifiers
F1.2	Different versions of the software are assigned distinct identifiers
F2	Software is described with rich metadata
F3	Metadata clearly and explicitly include the identifier of the software they describe
F4	Metadata are FAIR, searchable and indexable

Implementation:

Obtain a persistent identifier: Deposit your software in a repository that issues DOIs, such as:
- Zenodo (integrated with GitHub)
- KiltHub, CMU's Institutional repository.
- Domain-specific repositories (e.g., bio.tools for bioinformatics)
Develop comprehensive metadata: Include at minimum:
- Software name and version
- Authors with ORCID identifiers
- Description of purpose and functionality
- Keywords and research field classifications
- Dependencies and system requirements
- Date of creation/last update
- Associated grants and funding
Create a project website: Consider creating a dedicated website for your software project hosted on your university's domain, enhancing discoverability through academic networks.
Register with software indexes: Submit your software to relevant directories like the Research Software Directory, SciCrunch, or discipline-specific catalogs.

2.2 Accessible

Software, and its metadata, is retrievable via standardized protocols.

Principle	Description
A1	Software is retrievable by its identifier using a standardized communications protocol
A1.1	The protocol is open, free, and universally implementable
A1.2	The protocol allows for an authentication and authorization procedure, where necessary
A2	Metadata are accessible, even when the software is no longer available

Implementation:

Use standard access protocols: Make your software available through:
- HTTPS for downloads
- Git for version control
- Package managers when appropriate (e.g., PyPI, CRAN, npm)
Clarify access conditions: Specify any authentication requirements clearly, especially for software with sensitive applications or data.
Leverage institutional infrastructure: Work with CMU Libraries, OSPO and IT services to ensure long-term accessibility, even after you change institutions.
Separate metadata from software: Ensure metadata is deposited in a persistent location (e.g., institutional repositories, Zenodo) even if the software itself is hosted elsewhere.
Provide alternative access methods: Consider offering multiple ways to access your software to enhance accessibility across different computational environments.

2.3 Interoperable

Software interoperates with other software by exchanging data and/or metadata, and/or through interaction via application programming interfaces (APIs), described through standards.

Principle	Description
I1	Software reads, writes and exchanges data in a way that meets domain-relevant community standards
I2	Software includes qualified references to other objects

Implementation:

Follow community data standards: Use established data formats and standards in your field. Examples include:
- Life sciences: FASTA, FASTQ, BAM, VCF
- Chemistry: CML, SDF
- Earth sciences: NetCDF, GeoJSON
- Social sciences: DDI, CSV with standardized variables
Document APIs clearly: If your software provides an API, document it thoroughly using standards like OpenAPI.
Include structured citations: Reference datasets, other software, and publications using persistent identifiers.
Implement common interfaces: Design your software to work with established workflows and pipelines in your research domain.
Collaborate with your research IT office.

2.4 Reusable

Software is both usable (can be executed) and reusable (can be understood, modified, built upon, or incorporated into other software).

Principle	Description
R1	Software is described with a plurality of accurate and relevant attributes
R1.1	Software is given a clear and accessible license
R1.2	Software is associated with detailed provenance
R2	Software includes qualified references to other software
R3	Software meets domain-relevant community standards

Implementation:

Choose an appropriate license:
- MIT or Apache 2.0 for maximum reusability
- GPL for ensuring derivatives remain open source
- Custom licenses for specific university requirements
- You can consult your librarian.
Document thoroughly: Provide comprehensive documentation:
- README file with overview and quick start
- Installation instructions for different environments
- User guide with examples
- API documentation if applicable
- Contributing guidelines
Follow coding best practices:
- Comment your code
- Use consistent style
- Implement automated testing
- Manage dependencies explicitly
- Version control with descriptive commit messages
Track provenance: Document the origin of algorithms, methods, and data used in your software.
Create examples and tutorials: Develop example data, use cases, and tutorials that demonstrate the functionality of your software in research contexts.
Encourage community engagement: Set up mechanisms for users to report issues, request features, and contribute improvements.

3. Practical Implementation Examples

Example 1: MATLAB Toolbox for Neuroimaging Analysis

Findable:

Deposited in Zenodo with DOI
Comprehensive README and function-level documentation
Registered in the university's software catalog
Published a methods paper describing the toolbox

Accessible:

Available for download from Zenodo and GitHub
Pre-compiled versions (stand alone) for those without MATLAB licenses
Regular releases with version-specific DOIs
Maintained metadata in institutional repository

Interoperable:

Supports standard neuroimaging formats (NIFTI, BIDS)
Well-documented API for integration with other tools
Can be called from Python using MATLAB Engine
Clear documentation of input/output specifications

Reusable:

BSD 3-Clause license with university approval
Extensive tutorials with sample datasets
Modular design allowing reuse of specific components
Comprehensive test suite ensuring reliability
Detailed attribution of included algorithms

Example 2: R Package for Statistical Analysis in Ecology

Findable:

Published on CRAN and GitHub
DOI from Zenodo for specific versions
Registered in the rOpenSci catalog
Keywords optimized for ecology research

Accessible:

Installable directly from R via CRAN
Source code openly available on GitHub
All releases preserved in Zenodo
Detailed vignettes accessible through R help system

Interoperable:

Functions accept and return tibbles/data.frames
Imports and exports standard file formats (CSV, NetCDF)
Compatible with tidyverse workflows
Uses controlled vocabularies for ecological parameters

Reusable:

GPL-3 license
Includes citation file (CITATION.cff)
Extensive documentation with ecological examples
Continuous integration testing
Explicit handling of dependencies

4. Benefits in the Academic Context

Implementing FAIR principles for research software offers several benefits in academic settings:

Enhanced Research Impact: FAIR software is more likely to be discovered, used, and cited.
Improved Reproducibility: Well-documented, accessible software supports reproducible research claims.
Funding Advantage: Many grants now require software management plans aligned with FAIR principles.
Collaboration Opportunities: FAIR software facilitates collaboration across institutions and disciplines.
Career Recognition: Some universities now recognize software as a legitimate research output for promotion and tenure.
Teaching Integration: FAIR research software can be more easily incorporated into teaching materials.
Institutional Compliance: Helps meet institutional and funder requirements for open science.

5. Organizational Adoption Examples

Several research organizations are implementing the FAIR4RS Principles:

Australian Research Data Commons (ARDC)

The ARDC is updating its co-investment policy to reference the FAIR4RS Principles, developing a FAIR research software self-assessment tool, and making its own software outputs FAIR to demonstrate impact.

ELIXIR

ELIXIR recommends that all research outputs of ELIXIR infrastructure be FAIR, including software. They are aligning their Software Management Plan with the FAIR4RS Principles and developing training materials.

Netherlands eScience Center

The Netherlands eScience Center is using the FAIR4RS Principles to support reusable software creation, developing necessary skills through training programs, and collaborating on national templates for Software Management Plans.

6. Checklist for Making your Research Software FAIR

F - Findable

Choose appropriate repositories

Primary development repository (e.g., GitHub, GitLab)
Archival repository that issues DOIs (e.g., Zenodo, Figshare, institutional repository)
Domain-specific repository if applicable (e.g., bio.tools, ASCL, R-universe)

Implement persistent identification

Obtain a DOI for stable releases via integration (e.g., GitHub-Zenodo)
Create a Software Heritage identifier (SWHID) as a secondary persistent ID
Ensure each major version receives its own DOI
If applicable, register with a language-specific package manager (PyPI, CRAN, npm)

Create comprehensive metadata

Complete all required fields in repository metadata
Create a standardized metadata file (e.g., codemeta.json, CITATION.cff)
Include contextual information about research domain and purpose
Add relevant keywords for discovery
Link to related publications, datasets, and other research objects
List all contributors with their ORCIDs
Include funding information and grant IDs

Enhance discoverability

Register in institutional software catalog
Submit to subject-specific software indexes
Create a dedicated software landing page or website
Ensure repository is indexed by search engines (no robots.txt restrictions)
Provide descriptive repository name and description

A - Accessible

Enable straightforward retrieval

Ensure software is downloadable via standard protocols (HTTPS, FTP)
Provide installation instructions for different operating systems
Create packages for relevant package managers if applicable
Consider containerization (Docker, Singularity) for complex dependencies
Provide conda/pip installation options for scientific software

Clarify access conditions

Describe any authentication requirements clearly
Document any institutional access agreements
Specify if special hardware/infrastructure is needed
Explain how to request access for restricted software

Ensure long-term accessibility

Deposit in an archival repository (not just GitHub)
Separate metadata publication to ensure it persists
Create preservation-friendly release packages (e.g., source tarballs)
Document the preservation plan and expected maintenance period
Consider escrow arrangements for sensitive software

Prepare for future access scenarios

Document contact information for software stewards
Create succession plan for maintenance
Set up institutional ownership of repositories when appropriate
Consider how access will continue post-funding/post-project

I - Interoperable

Adopt community standards

Use standard data formats for input/output
Implement established APIs and protocols when applicable
Follow domain-specific conventions for algorithms and methods
Adhere to relevant controlled vocabularies
Adopt standard command-line argument patterns (if applicable)

Design for integration

Create well-documented APIs
Provide machine-readable interface specifications (e.g., OpenAPI)
Implement standard input/output streams
Support common integration patterns (pipes, callbacks, hooks)
Enable component-based reuse where possible

Structure references to external objects

Include structured citations to other software
Use persistent identifiers when referencing datasets
Link related publications with DOIs
Document external service dependencies
Use formal references to standards implementations

Ensure portability

Test on multiple platforms (Windows, macOS, Linux)
Minimize hardcoded environment assumptions
Virtualize environment-specific features
Provide cross-platform installation options
Consider containerization for complex dependencies

R - Reusable

Apply appropriate licensing

Select an OSI-approved open-source license
Include full license text in the repository
Document license in metadata
Consult university IP office for institutional requirements
Address any third-party components and their licenses
Consider dual-licensing if required for specific use cases

Create comprehensive documentation

Write a detailed README with overview
Create installation guide with prerequisites
Develop user manual with examples
Produce API/function documentation
Include inline code comments
Provide tutorials for common use cases
Document known limitations and edge cases

Establish provenance information

Document origins of algorithms and methods
Cite foundational papers and theoretical basis
Clarify development history and major changes
Acknowledge all contributors (not just code authors)
Detail the roadmap and future development plans
Document the governance model

Enable proper citation

Create a CITATION file (CFF format recommended)
Include recommended citation format in documentation
Register with a service like CiteAs
Document version-specific citations
Include DOI in citation information

Support reproducible execution

Document all dependencies with specific versions
Provide environment specification files (requirements.txt, environment.yml)
Consider containerization (Docker, Singularity)
Create automated build/test workflows
Include sample data for verification
Document expected outputs for validation

Foster community engagement

Create contributing guidelines
Document code of conduct
Set up issue templates
Establish pull request process
Create a community discussion forum/mailing list
Plan for user support mechanisms

References

Barker, M., Chue Hong, N. P., Katz, D. S., Lamprecht, A. L., Martinez-Ortiz, C., Psomopoulos, F., Harrow, J., Castro, L. J., Gruenpeter, M., Martinez, P. A., & Honeyman, T. (2022). Introducing the FAIR Principles for research software. Scientific Data, 9(1), 622. https://doi.org/10.1038/s41597-022-01710-x

Gruenpeter, M., Katz, D. S., Lamprecht, A. L., Honeyman, T., Garijo, D., Struck, A., Niehues, A., Martinez, P. A., Castro, L. J., Rabemanantsoa, T., Chue Hong, N. P., Martinez-Ortiz, C., Fouilloux, A., Liffers, M., Foufoulas, Y., Konovalov, A., Weilenmann, J.-M., Pelikan, M., Orviz, P., & Grüning, B. (2021). Defining Research Software: a controversial discussion. Zenodo. https://doi.org/10.5281/zenodo.5504016

Jiménez, R. C., Kuzak, M., Alhamdoosh, M., Barker, M., Batut, B., Borg, M., Capella-Gutierrez, S., Chue Hong, N., Cook, M., Corpas, M., Flannery, M., Garcia, L., Gelpí, J. L., Gladman, S., Goble, C., González Ferreiro, M., Gonzalez-Beltran, A., Griffin, P. C., Grüning, B., … Crouch, S. (2017). Four simple recommendations to encourage best practices in research software. F1000Research, 6, 876. https://doi.org/10.12688/f1000research.11407.1

Lamprecht, A.-L., Garcia, L., Kuzak, M., Martinez, C., Arcila, R., Martin Del Pico, E., Dominguez Del Angel, V., van de Sandt, S., Ison, J., Martinez, P. A., McQuilton, P., Valencia, A., Harrow, J., Psomopoulos, F., Gelpi, J. L., Chue Hong, N., Goble, C., & Capella-Gutierrez, S. (2020). Towards FAIR principles for research software. Data Science, 3(1), 37–59. https://doi.org/10.3233/DS-190026

Netherlands eScience Center & DANS. (2020). Five Recommendations for FAIR Software. https://fair-software.nl/

Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18

Data Management for Research

FAIR Principles for Research Software

Table of Contents

Executive Summary

1. Introduction

2. The FAIR4RS Principles

2.1 Findable

2.2 Accessible

2.3 Interoperable

2.4 Reusable

3. Practical Implementation Examples

Example 1: MATLAB Toolbox for Neuroimaging Analysis

Example 2: R Package for Statistical Analysis in Ecology

4. Benefits in the Academic Context

5. Organizational Adoption Examples

Australian Research Data Commons (ARDC)

ELIXIR

Netherlands eScience Center

6. Checklist for Making your Research Software FAIR

F - Findable

Choose appropriate repositories

Implement persistent identification

Create comprehensive metadata

Enhance discoverability

A - Accessible

Enable straightforward retrieval

Clarify access conditions

Ensure long-term accessibility

Prepare for future access scenarios

I - Interoperable

Adopt community standards

Design for integration

Structure references to external objects

Ensure portability

R - Reusable

Apply appropriate licensing

Create comprehensive documentation

Establish provenance information

Enable proper citation

Support reproducible execution

Foster community engagement

References