FAIR Principles for Computational Workflows

What Are FAIR Principles?
Key Definitions
The FAIR Principles for Computational Workflows
Workflow Complexity Spectrum
Quick Implementation Checklist
Practical Tools and Resources
Common Challenges and Solutions
Examples of FAIR Workflow Implementation
Getting Started Today
Benefits of FAIR Workflows
References

What Are FAIR Principles?

The FAIR principles stand for Findable, Accessible, Interoperable, and Reusable. Originally developed for research data, these principles have been adapted specifically for computational workflows to maximize their value as research assets and facilitate their adoption by the wider research community.

Key Definitions

Computational Workflow

Software with two main characteristics:

Composition of multiple components (software, workflows, code snippets, tools, services)
Explicit abstraction from run mechanics using high-level workflow language that specifies data flow between components

Workflow Specification

The formal specification of data flow and execution control between executable components, expected datasets, and parameter files.

Workflow Run

The instantiation of the workflow with inputs (parameters, input datasets) and outputs (output data, provenance execution log, lineage of data products).

Workflow Management System (WMS)

Software that handles data flow and/or execution control, abstracting the workflow from underlying digital infrastructure (examples: Nextflow, Galaxy, Snakemake, Parsl).

The FAIR Principles for Computational Workflows

🔍F - Findable

F1. Workflow assigned globally unique and persistent identifier

What it means: Your workflow needs a permanent, unique "address" on the internet

How to implement:

Use DOI (Digital Object Identifier) through repositories like Zenodo, WorkflowHub
Register workflows in workflow registries (WorkflowHub, Dockstore)
Ensure identifiers persist even if hosting changes

F1.1. Components assigned distinct identifiers

What it means: Each part of your workflow (scripts, tools, sub-workflows) needs its own identifier

How to implement:

Version control individual components
Use container registries for Docker/Singularity containers
Reference specific versions of external tools and datasets

F1.2. Different versions assigned distinct identifiers

What it means: Each version of your workflow gets a unique identifier

How to implement:

Use semantic versioning (v1.0.0, v1.1.0, etc.)
Tag releases in Git repositories
Create new DOIs for major versions

F2. Workflow described with rich metadata

What it means: Comprehensive information about your workflow's purpose, requirements, and usage

How to implement:

Document workflow purpose and scientific application
List computational requirements and dependencies
Provide example input/output data
Include author information and creation date

F3. Metadata explicitly includes workflow identifier

What it means: The description clearly states which workflow it describes

How to implement:

Include DOI/identifier in README files
Reference identifier in documentation
Use structured metadata formats (schema.org, Bioschemas)

F4. Registered in searchable FAIR resource

What it means: Your workflow can be found through search engines and registries

How to implement:

Submit to WorkflowHub
Register in Dockstore
Use institutional repositories with metadata harvesting

🔓A - Accessible

A1. Retrievable by identifier using standardized protocol

What it means: Anyone can download your workflow using standard web protocols

How to implement:

Host on platforms using HTTPS
Provide direct download links
Ensure stable URLs that don't break

A1.1. Protocol is open, free, and universally implementable

What it means: No special software needed to access your workflow

How to implement:

Use HTTPS (not proprietary protocols)
Avoid platform-specific access methods
Provide standard file downloads

A1.2. Authentication/authorization when necessary

What it means: If access restrictions are needed, use standard authentication

How to implement:

Use institutional single sign-on (SSO)
Implement standard OAuth protocols
Document access requirements clearly

A2. Metadata accessible even when workflow unavailable

What it means: Description remains available even if workflow can't be run

How to implement:

Store metadata separately from workflow code
Use long-term preservation repositories
Maintain documentation in multiple locations

🔗I - Interoperable

I1. Use formal, accessible language for knowledge representation

What it means: Use standard formats that both humans and computers can understand

How to implement:

Write workflows in standard languages (CWL, WDL, Nextflow DSL)
Use structured metadata formats (JSON-LD, RDF)
Follow established workflow description standards

I2. Use vocabularies following FAIR principles

What it means: Use standardized terms and classifications

How to implement:

Use domain-specific ontologies (EDAM for bioinformatics)
Apply Bioschemas markup for life sciences
Reference standard vocabulary resources

I3. Components read/write data meeting domain standards

What it means: Your workflow uses standard file formats and data structures

How to implement:

Use standard file formats (CSV, JSON, standard domain formats)
Document input/output specifications
Ensure compatibility with common tools

I4. Include qualified references to other objects/components

What it means: Clear links to external tools, datasets, and related workflows

How to implement:

Cite tool versions with DOIs when available
Reference datasets with persistent identifiers
Link to related workflows and publications

♻️R - Reusable

R1. Described with accurate and relevant attributes

What it means: Complete documentation enabling others to understand and use your workflow

How to implement:

Write comprehensive README files
Document installation and execution instructions
Provide example usage scenarios
Include troubleshooting guides

R1.1. Released with clear and accessible license

What it means: Legal terms for using and modifying your workflow are explicit

How to implement:

Choose appropriate open source license (MIT, Apache 2.0, GPL)
Include LICENSE file in repository
Clearly state licensing terms in documentation

R1.2. Components have clear licenses

What it means: All parts of your workflow have explicit licensing

How to implement:

Document licenses of all dependencies
Ensure license compatibility
Include license information for containers and external tools

R1.3. Associated with detailed provenance

What it means: Clear history of workflow development and data lineage

How to implement:

Maintain version history in Git
Document workflow development process
Include provenance tracking in workflow runs
Link to source publications and datasets

R2. Include qualified references to other workflows

What it means: Clear connections to related workflows and dependencies

How to implement:

Reference parent workflows or templates
Link to sub-workflows with specific versions
Document workflow ecosystem relationships

R3. Meet domain-relevant community standards

What it means: Follow best practices specific to your research field

How to implement:

Use field-specific workflow languages and tools
Follow community guidelines (e.g., Galaxy Tool Shed guidelines)
Implement domain-specific quality checks

Workflow Complexity Spectrum

Computational workflows exist on a spectrum of complexity, and the implementation of FAIR principles can vary depending on the scale and sophistication of your workflow.

Simple Workflows (Without WMS)

Simple Workflow Characteristics

Scale: Small datasets, few processing steps (2-10 steps)
Implementation: Scripts (Bash, Python, R), Jupyter notebooks, simple pipelines
Infrastructure: Single machine, minimal computational requirements
Examples: Data cleaning scripts, basic analysis pipelines, small-scale data transformations

FAIR Implementation for Simple Workflows:

Findability: Focus on clear documentation and version control (Git repositories)
Accessibility: Share via GitHub/GitLab with direct download links
Interoperability: Use standard file formats (CSV, JSON), document dependencies clearly
Reusability: Provide clear README files, example data, and licensing

Practical Steps:

Version Control: Use Git from day one, even for single scripts
Documentation: Write clear README with usage examples
Dependencies: Use requirements.txt (Python) or similar dependency files
Testing: Include sample input/output data
Licensing: Add a simple license file (MIT, Apache 2.0)

Complex Workflows (With WMS)

Complex Workflow Characteristics

Scale: Large datasets, many processing steps (10+ steps), parallel processing
Implementation: Workflow Management Systems (Nextflow, Snakemake, Galaxy, CWL)
Infrastructure: Multi-node clusters, cloud computing, HPC environments
Examples: Genomics pipelines, climate modeling, machine learning pipelines, multi-omics analysis

FAIR Implementation for Complex Workflows:

Findability: Use workflow registries (WorkflowHub, Dockstore), structured metadata
Accessibility: Containerization (Docker/Singularity), cloud deployment
Interoperability: Standard workflow languages (CWL, WDL), formal metadata schemas
Reusability: Comprehensive documentation, provenance tracking, modular design

Practical Steps:

Workflow Language: Choose established WMS (Nextflow, Snakemake, etc.)
Containerization: Package all dependencies in containers
Registry Publication: Submit to WorkflowHub or Dockstore
Metadata Standards: Use Bioschemas, schema.org, or domain-specific ontologies
Provenance Tracking: Implement automatic execution logging
Testing: Include continuous integration, multiple test datasets

Comparison Table

Below is a table that illustrates common features for simple and complex workflows:

Aspect	Simple Workflows	Complex Workflows
Scale	< 1GB data, < 10 steps	> 1GB data, 10+ steps
Time Investment	Hours to days	Weeks to months
FAIR Complexity	Basic implementation	Full implementation
Primary Tools	Git, GitHub, basic docs	WMS, registries, containers
Metadata	README files, comments	Structured schemas, ontologies
Testing	Manual testing	Automated CI/CD
Deployment	Local execution	Multi-platform deployment
Maintenance	Occasional updates	Ongoing maintenance

Consider WMS When:

Your workflow has more than 5-10 interconnected steps
You need to process multiple datasets with the same pipeline
You require parallel processing or cluster computing
You collaborate with multiple researchers or institutions
You need detailed provenance tracking
Your workflow takes more than a few hours to run
You plan to publish your methodology

Quick Implementation Checklist

For Simple Workflows (Scripts/Notebooks)

Before You Start

Choose appropriate scripting language and libraries
Plan your data processing steps
Identify input/output file formats

During Development

Use version control (Git) from the beginning
Write clear comments in your code
Use meaningful variable and file names
Test with small sample datasets

For Sharing

Create clear README with usage instructions
Add LICENSE file (recommend MIT or Apache 2.0)
Include requirements/dependencies file
Provide example input and expected output
Upload to GitHub/GitLab public repository
Consider Zenodo integration for DOI

For Complex Workflows (WMS-based)

Before You Start

Choose appropriate workflow management system
Plan your workflow architecture and components
Identify required licenses and dependencies
Design containerization strategy

During Development

Use version control (Git) from the beginning
Document as you develop
Use standard file formats and naming conventions
Test with example data
Implement modular, reusable components
Set up continuous integration testing

For Publication

Create comprehensive README and documentation
Add LICENSE file
Test installation instructions on clean systems
Prepare example datasets and test cases
Register in appropriate workflow registry
Obtain DOI through registry or Zenodo
Create structured metadata (Bioschemas, schema.org)

After Publication

Monitor for issues and provide support
Update documentation as needed
Create new versions with distinct identifiers
Maintain long-term accessibility

Practical Tools and Resources

Workflow Registries

WorkflowHub (https://workflowhub.eu/): Multi-domain workflow registry
Dockstore (https://dockstore.org/): Container and workflow sharing platform
Galaxy ToolShed: For Galaxy workflows and tools

Repository Services

Zenodo (https://zenodo.org/): General-purpose research repository with DOIs
GitHub/GitLab: Version control with release management
Institutional repositories: KiltHub, CMU's institutional repository is an instance of FigShare and both follow FAIR principles.

Metadata Standards

schema.org: General structured metadata
Bioschemas: Life sciences extension of schema.org
CodeMeta: Software metadata standard
CWL: Common Workflow Language for portable workflows

Containerization

Docker Hub: Container registry
Singularity Hub: Scientific container registry
GitHub Container Registry: Integrated with GitHub

Common Challenges and Solutions

Challenge: Complex Dependencies

Solution: Use containerization (Docker, Singularity) to package dependencies

Challenge: Large Data Files

Solution: Use data repositories (Zenodo, domain-specific archives) and reference by DOI

Challenge: Platform-Specific Code

Solution: Use workflow management systems that abstract execution environment

Challenge: Evolving Software Dependencies

Solution: Pin specific versions and use containers for reproducibility

Challenge: Limited Documentation Time

Solution: Start with minimal documentation and improve iteratively

Examples of FAIR Workflow Implementation

Simple Workflow Example: Data Processing Script

Example: Gene Expression Analysis Script

Type: Python script for differential expression analysis
Scale: Single-file script, processes < 100MB data
Repository: GitHub repository with direct download
FAIR Implementation:
- F1: GitHub release with version tags (v1.0.0, v1.1.0)
- F2: Detailed README with purpose, requirements, usage
- A1: Direct download via HTTPS from GitHub
- I1: Standard CSV input/output formats
- R1: MIT license, clear documentation, example datasets
- Example: https://github.com/researcher/gene-expression-analysis

Key FAIR Features for Simple Workflows:

Clear repository structure with README, LICENSE, requirements.txt
Example data files and expected outputs
Simple installation instructions
Version tags for releases
DOI through Zenodo-GitHub integration

Complex Workflow Example: Protein MD Setup Workflow

Repository: WorkflowHub (DOI: 10.48546/workflowhub.workflow.29.3)
Language: Common Workflow Language (CWL)
License: Apache License 2.0
Components: BioBB building blocks with individual identifiers
Metadata: Structured using RO-Crate format
Scale: Multi-step molecular dynamics simulation pipeline
Infrastructure: Requires HPC or cloud computing resources

Complex Workflow Example: Digital Pathology Workflow

Repository: Zenodo with workflow run provenance
Language: CWL with CWLProv provenance tracking
Format: RO-Crate with Provenance Run Crate profile
Standards: OpenSlide for digital pathology images
License: MIT License for workflow and components
Scale: Processes whole slide images (GBs of data)
Infrastructure: GPU-accelerated computing for deep learning

Comparison of Implementation Approaches

FAIR Aspect	Simple Workflow Approach	Complex Workflow Approach
F1 - Identifiers	GitHub releases, optional Zenodo DOI	Workflow registry DOI, component DOIs
F2 - Metadata	README files, inline comments	Structured metadata, ontologies
A1 - Access	Direct GitHub download	Container images, registry access
I1 - Standards	Standard file formats	Workflow languages (CWL, WDL)
R1 - Documentation	README + examples	Comprehensive docs + tutorials
R1.3 - Provenance	Git history, manual logs	Automated provenance tracking

Getting Started Today

For Simple Workflows (Start Here)

Pick one script or notebook you use regularly for analysis
Create a Git repository and upload your code
Write a basic README explaining what the script does and how to run it
Add a LICENSE file (MIT is simple and permissive)
Create a requirements file listing dependencies
Test your instructions by asking a colleague to run your code

Quick Start for Simple Workflows

Time investment: 2-4 hours

Immediate benefits: Easier sharing, version control, basic reproducibility

Next steps: Add example data, create GitHub release, consider Zenodo DOI

For Complex Workflows (Advanced)

Choose one complex analysis pipeline to make FAIR as a pilot project
Start simple: Focus on F1 (getting a DOI) and R1.1 (adding a license)
Use existing tools: Don't reinvent the wheel - use established registries
Document iteratively: Improve documentation over time
Engage with community: Join relevant working groups and forums

Migration Path

Start with simple workflow practices, then gradually adopt WMS features:

Basic FAIR (Git + README + License): 1-2 hours
Enhanced documentation (Examples + tests): 4-8 hours
Registry publication (DOI + metadata): 8-16 hours
Full WMS implementation (Containers + provenance): 40+ hours

Progressive Implementation Strategy

Phase 1: Foundation (All Workflows)

Set up version control
Write basic documentation
Add licensing information
Include example data

Phase 2: Enhancement (Growing Complexity)

Obtain persistent identifiers (DOIs)
Improve metadata quality
Add containerization for dependencies
Implement basic testing

Phase 3: Full FAIR (Complex Workflows)

Use workflow management systems
Publish in specialized registries
Implement provenance tracking
Follow domain-specific standards

Benefits of FAIR Workflows

Benefits

Increased citations and research impact

Easier collaboration with other researchers

Reduced duplication of effort

Better reproducibility of research results

Compliance with funder and publisher requirements

Future-proofing your research outputs

Remember

FAIR is a journey, not a destination. Start with what you can implement today and improve over time!

References

Wilkinson, S.R., Aloqalaa, M., Belhajjame, K., et al. (2025). Applying the FAIR Principles to computational workflows. Scientific Data, 12:328. https://doi.org/10.1038/s41597-025-04451-9
Wilkinson, M.D., Dumontier, M., Aalbersberg, I.J., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3:160018. https://doi.org/10.1038/sdata.2016.18
Barker, M., Chue Hong, N.P., Katz, D.S., et al. (2022). Introducing the FAIR principles for research software. Scientific Data, 9:622. https://doi.org/10.1038/s41597-022-01710-x

Data Management for Research

FAIR Principles for Computational Workflows

Table of Contents

What Are FAIR Principles?

Key Definitions

The FAIR Principles for Computational Workflows

🔍F - Findable

F1. Workflow assigned globally unique and persistent identifier

F1.1. Components assigned distinct identifiers

F1.2. Different versions assigned distinct identifiers

F2. Workflow described with rich metadata

F3. Metadata explicitly includes workflow identifier

F4. Registered in searchable FAIR resource

🔓A - Accessible

A1. Retrievable by identifier using standardized protocol

A1.1. Protocol is open, free, and universally implementable

A1.2. Authentication/authorization when necessary

A2. Metadata accessible even when workflow unavailable

🔗I - Interoperable

I1. Use formal, accessible language for knowledge representation

I2. Use vocabularies following FAIR principles

I3. Components read/write data meeting domain standards

I4. Include qualified references to other objects/components

♻️R - Reusable

R1. Described with accurate and relevant attributes

R1.1. Released with clear and accessible license

R1.2. Components have clear licenses

R1.3. Associated with detailed provenance

R2. Include qualified references to other workflows

R3. Meet domain-relevant community standards

Workflow Complexity Spectrum

Simple Workflows (Without WMS)

FAIR Implementation for Simple Workflows:

Practical Steps:

Complex Workflows (With WMS)

FAIR Implementation for Complex Workflows:

Practical Steps:

Comparison Table

Quick Implementation Checklist

For Simple Workflows (Scripts/Notebooks)

Before You Start

During Development

For Sharing

For Complex Workflows (WMS-based)

Before You Start

During Development

For Publication

After Publication

Practical Tools and Resources

Workflow Registries

Repository Services

Metadata Standards

Containerization

Common Challenges and Solutions

Examples of FAIR Workflow Implementation

Simple Workflow Example: Data Processing Script

Key FAIR Features for Simple Workflows:

Complex Workflow Example: Protein MD Setup Workflow

Complex Workflow Example: Digital Pathology Workflow

Comparison of Implementation Approaches

Getting Started Today

For Simple Workflows (Start Here)

For Complex Workflows (Advanced)

Progressive Implementation Strategy

Phase 1: Foundation (All Workflows)

Phase 2: Enhancement (Growing Complexity)

Phase 3: Full FAIR (Complex Workflows)

Benefits of FAIR Workflows

References