The FAIR Principles: A Comprehensive Guide

Summary
Introduction
Core Principles
- Overview
- Detailed Breakdown
Key Terms and Definitions
Significance of Machine-Actionability
Implementation and Examples
Benefits of FAIR Data
Complementary Initiatives
Challenges and Considerations
Resources for Implementation
Practical Checklist for Students and Researchers
References

Summary

The FAIR Guiding Principles represent a concise and measurable set of guidelines designed to improve the Findability, Accessibility, Interoperability, and Reusability of digital research objects. Developed by a diverse group of stakeholders from academia, industry, funding agencies, and publishers, these principles aim to enhance the infrastructure supporting scholarly data management and stewardship.

Introduction

In today's data-intensive research environment, there is an urgent need to improve how scholarly data is managed and shared. Good data management is not an end in itself but a fundamental component leading to knowledge discovery, innovation, and data integration. The FAIR Principles provide guidelines for those wishing to enhance the reusability of their data holdings.

Unlike initiatives that focus primarily on human scholars, the FAIR Principles emphasize the ability of machines to automatically find and use data, in addition to supporting its reuse by individuals. This machine-actionability is becoming increasingly important as the scale of data and the complexity of research questions continue to grow.

Core Principles

Overview of the FAIR Guiding Principles

Principle	Description
Findable	Data and metadata should be easy to find for both humans and computers
Accessible	Once found, data should be retrievable through standardized protocols
Interoperable	Data should be able to be integrated with other data and work with applications for analysis
Reusable	Data should be well-described so it can be replicated and/or combined in different settings

Detailed Breakdown

To be Findable:

F1. (Meta)data are assigned a globally unique and persistent identifier
F2. Data are described with rich metadata (defined by R1 below)
F3. Metadata clearly and explicitly include the identifier of the data it describes
F4. (Meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
- A1.1 The protocol is open, free, and universally implementable
- A1.2 The protocol allows for an authentication and authorization procedure, where necessary
A2. Metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data

To be Reusable:

R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
- R1.1. (Meta)data are released with a clear and accessible data usage license
- R1.2. (Meta)data are associated with detailed provenance
- R1.3. (Meta)data meet domain-relevant community standards

Key Terms and Definitions

Term	Definition
Data Stewardship	The long-term care of valuable digital assets, with the goal that they should be discovered and re-used for downstream investigations
Machine-actionable	The capability of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention
Metadata	Data that provides information about other data; descriptive information about a digital object
DOI	Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects
Interoperability	The ability of data or tools from non-cooperating resources to integrate or work together with minimal effort
Provenance	Information about entities, activities, and people involved in producing a piece of data, which can be used to form assessments about its quality, reliability, or trustworthiness

Significance of Machine-Actionability

A distinguishing feature of the FAIR Principles is their emphasis on enhancing the ability of machines to automatically find and use data. This is increasingly important because:

Scale: Humans cannot operate at the scope, scale, and speed necessitated by the volume of contemporary scientific data
Integration: Complex research questions often require integration of diverse data types from multiple sources
Automation: Computational agents need to be able to autonomously and appropriately act when faced with diverse types, formats, and access protocols

For data to be machine-actionable, it should enable a computational agent to:

Identify the type of object (structure and intent)
Determine if it is useful within the current task
Determine if it is usable (license, consent, accessibility)
Take appropriate action automatically

Implementation and Examples

The FAIR Principles are designed to be domain-independent and applicable to a wide range of scholarly outputs. They define characteristics that contemporary data resources, tools, vocabularies, and infrastructures should exhibit to assist discovery and reuse by third parties.

Example Implementations

Several repositories and platforms have implemented FAIR principles to varying degrees:

FigShare

Did you know that KiltHub, our institutional repository is an instance of FigShare? They both:

Assigns persistent identifiers (DOIs or Handles) to all research outputs (F)
Provides access to metadata through standard protocols (https, REST API, OAI-PMH) (A)
Supports multiple citation metadata formats and controlled vocabularies (I)
Requires clear licensing and maintains detailed versioning for all items (R)

Learn more about FigShare →

Dataverse

Generates formal citations with DOIs for datasets (F)
Provides access to metadata, data files, licenses, and version information (F, A, R)
Offers metadata at three levels, supporting interoperability (I, R)
Provides machine-accessible interfaces to search data (A)

Learn more about Dataverse →

FAIRDOM

Identifies research assets with unique and persistent HTTP URLs (F)
Provides web access to assets in various formats (RDF, XML) (I)
Annotates with rich metadata using community standards (I)
Stores metadata as RDF to enable interoperability (R)

Learn more about FAIRDOM →

ISA

Provides structured metadata for data papers and repositories
Offers RDF-based and JSON serializations to enable interoperability (I, R)
Becomes fully FAIR when published as linked data

Learn more about ISA →

Open PHACTS

Provides machine-accessible interface with multiple representations (A)
Allows multiple URLs to access information about a particular entity (F, A)
Describes data sources using standardized dataset descriptions (R, I)
Uses community-agreed upon ontologies (I)

Learn more about Open PHACTS →

wwPDB

Hosts data on stable FTP servers (A)
Represents data in machine-readable formats (F, I)
Contains cross-references to common identifiers (R)
Represents entries by DOIs (F, A)

Learn more about wwPDB →

UniProt

Identifies entries by stable URLs (F, A)
Provides rich metadata in both human and machine-readable formats (F)
Uses shared vocabularies and ontologies in RDF format (I)
Includes extensive links to other databases (R)

Learn more about UniProt →

Benefits of FAIR Data

Implementing the FAIR Principles provides numerous benefits:

Enhanced Discoverability: Making data findable increases its visibility within the research community
Improved Reproducibility: Well-described, accessible data facilitates validation of research findings
Increased Efficiency: Reduced time spent searching for and processing data for reuse
Greater Impact: Data that is easy to find and reuse leads to more citations and recognition
Future-Proofing: Applying FAIR principles helps ensure data remains valuable over time
Facilitated Collaboration: Interoperable data enables collaboration across disciplines and institutions
Accelerated Innovation: Machine-actionable data enables new types of data-driven discovery
Compliance: Many funding agencies and publishers are beginning to require FAIR data practices

Complementary Initiatives

The FAIR Principles complement and build upon other initiatives:

Joint Declaration of Data Citation Principles (JDDCP): Focuses on making data citable, discoverable, and available for reuse
Data Seal of Approval (DSA): Focuses on the responsibilities and conduct of data producers and repositories
Force11: A community working on the future of research communications and e-scholarship

Challenges and Considerations

Implementing FAIR principles may involve addressing several challenges:

Technical Infrastructure: Developing systems that support FAIR data practices
Standards Development: Creating or adopting community standards for metadata and data formats
Training and Skills: Building capacity for researchers to implement FAIR principles
Incentives: Aligning academic rewards with good data stewardship practices
Sensitive Data: Balancing openness with privacy and security concerns
Legacy Data: Retrofitting existing data and systems to be more FAIR
Sustainability: Ensuring long-term preservation and maintenance of FAIR data

Practical Checklist to go FAIR

Planning Stage

☐ Create a data management plan that incorporates FAIR principles
☐ Identify relevant data standards and repositories in your field
☐ Plan for long-term storage and accessibility of your data
☐ Determine appropriate licenses for your data and documentation
☐ Allocate resources (time, storage, tools) for FAIR implementation

✓ Findable

☐ Select a repository that provides persistent identifiers (e.g., DOI, Handle)
☐ Create comprehensive metadata describing your dataset
☐ Ensure your metadata includes the dataset's identifier
☐ Use descriptive titles and keywords to improve discoverability
☐ Register your dataset in appropriate disciplinary indexes or search engines

✓ Accessible

☐ Store data in a repository with a stable interface/API
☐ Ensure access protocols are open, free, and widely implemented
☐ If data has access restrictions, clearly document how to request access
☐ Implement appropriate authentication mechanisms if needed
☐ Ensure metadata remains accessible even if the data is restricted

✓ Interoperable

☐ Use open, standard file formats (e.g., CSV instead of Excel, XML, JSON)
☐ Apply community standards for metadata in your field
☐ Utilize controlled vocabularies, ontologies, and thesauri where available
☐ Include qualified references to other datasets when relationships exist
☐ Structure data to be easily integrated with other datasets

✓ Reusable

☐ Apply a clear and accessible license to your data (e.g., Creative Commons)
☐ Document the provenance of your data (how it was collected/generated)
☐ Include detailed methodology in your documentation
☐ Provide information about data quality and any limitations
☐ Follow community standards for data and metadata

Documentation Best Practices

☐ Create a comprehensive README file explaining your data
☐ Document variable names, units, and allowed values
☐ Explain file naming conventions and organization
☐ Note software and version information used for data processing
☐ Include contact information for future inquiries

Publication and Citation

☐ Cite datasets properly in your publications
☐ Link your paper to your datasets and vice versa
☐ Publish your data before or simultaneously with related papers
☐ Consider publishing a data paper describing your dataset
☐ Track usage of your dataset through citation metrics

Maintenance

☐ Plan for updates or corrections to your dataset
☐ Establish version control for your data
☐ Set reminders to check persistent links are working
☐ Review and update documentation as needed
☐ Monitor for changes in community standards or best practices

Use this checklist throughout your research project to ensure your data follows FAIR principles, increasing its value to both you and the broader research community.

References

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3, 160018. https://doi.org/10.1038/sdata.2016.18

Data Management for Research