The FAIR Principles: A Comprehensive Guide
Summary
The FAIR Guiding Principles represent a concise and measurable set of guidelines designed to improve the Findability, Accessibility, Interoperability, and Reusability of digital research objects. Developed by a diverse group of stakeholders from academia, industry, funding agencies, and publishers, these principles aim to enhance the infrastructure supporting scholarly data management and stewardship.
Introduction
In today's data-intensive research environment, there is an urgent need to improve how scholarly data is managed and shared. Good data management is not an end in itself but a fundamental component leading to knowledge discovery, innovation, and data integration. The FAIR Principles provide guidelines for those wishing to enhance the reusability of their data holdings.
Unlike initiatives that focus primarily on human scholars, the FAIR Principles emphasize the ability of machines to automatically find and use data, in addition to supporting its reuse by individuals. This machine-actionability is becoming increasingly important as the scale of data and the complexity of research questions continue to grow.
Core Principles
Overview of the FAIR Guiding Principles
Principle |
Description |
Findable |
Data and metadata should be easy to find for both humans and computers |
Accessible |
Once found, data should be retrievable through standardized protocols |
Interoperable |
Data should be able to be integrated with other data and work with applications for analysis |
Reusable |
Data should be well-described so it can be replicated and/or combined in different settings |
Detailed Breakdown
To be Findable:
- F1. (Meta)data are assigned a globally unique and persistent identifier
- F2. Data are described with rich metadata (defined by R1 below)
- F3. Metadata clearly and explicitly include the identifier of the data it describes
- F4. (Meta)data are registered or indexed in a searchable resource
To be Accessible:
- A1. (Meta)data are retrievable by their identifier using a standardized communications protocol
- A1.1 The protocol is open, free, and universally implementable
- A1.2 The protocol allows for an authentication and authorization procedure, where necessary
- A2. Metadata are accessible, even when the data are no longer available
To be Interoperable:
- I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
- I2. (Meta)data use vocabularies that follow FAIR principles
- I3. (Meta)data include qualified references to other (meta)data
To be Reusable:
- R1. Meta(data) are richly described with a plurality of accurate and relevant attributes
- R1.1. (Meta)data are released with a clear and accessible data usage license
- R1.2. (Meta)data are associated with detailed provenance
- R1.3. (Meta)data meet domain-relevant community standards
Key Terms and Definitions
Term |
Definition |
Data Stewardship |
The long-term care of valuable digital assets, with the goal that they should be discovered and re-used for downstream investigations |
Machine-actionable |
The capability of computational systems to find, access, interoperate, and reuse data with minimal or no human intervention |
Metadata |
Data that provides information about other data; descriptive information about a digital object |
DOI |
Digital Object Identifier; a code used to permanently and stably identify (usually digital) objects |
Interoperability |
The ability of data or tools from non-cooperating resources to integrate or work together with minimal effort |
Provenance |
Information about entities, activities, and people involved in producing a piece of data, which can be used to form assessments about its quality, reliability, or trustworthiness |
Significance of Machine-Actionability
A distinguishing feature of the FAIR Principles is their emphasis on enhancing the ability of machines to automatically find and use data. This is increasingly important because:
- Scale: Humans cannot operate at the scope, scale, and speed necessitated by the volume of contemporary scientific data
- Integration: Complex research questions often require integration of diverse data types from multiple sources
- Automation: Computational agents need to be able to autonomously and appropriately act when faced with diverse types, formats, and access protocols
For data to be machine-actionable, it should enable a computational agent to:
- Identify the type of object (structure and intent)
- Determine if it is useful within the current task
- Determine if it is usable (license, consent, accessibility)
- Take appropriate action automatically
Implementation and Examples
The FAIR Principles are designed to be domain-independent and applicable to a wide range of scholarly outputs. They define characteristics that contemporary data resources, tools, vocabularies, and infrastructures should exhibit to assist discovery and reuse by third parties.
Example Implementations
Several repositories and platforms have implemented FAIR principles to varying degrees:
FigShare
Did you know that KiltHub, our institutional repository is an instance of FigShare? They both:
- Assigns persistent identifiers (DOIs or Handles) to all research outputs (F)
- Provides access to metadata through standard protocols (https, REST API, OAI-PMH) (A)
- Supports multiple citation metadata formats and controlled vocabularies (I)
- Requires clear licensing and maintains detailed versioning for all items (R)
Learn more about FigShare →
Dataverse
- Generates formal citations with DOIs for datasets (F)
- Provides access to metadata, data files, licenses, and version information (F, A, R)
- Offers metadata at three levels, supporting interoperability (I, R)
- Provides machine-accessible interfaces to search data (A)
Learn more about Dataverse →
FAIRDOM
- Identifies research assets with unique and persistent HTTP URLs (F)
- Provides web access to assets in various formats (RDF, XML) (I)
- Annotates with rich metadata using community standards (I)
- Stores metadata as RDF to enable interoperability (R)
Learn more about FAIRDOM →
ISA
- Provides structured metadata for data papers and repositories
- Offers RDF-based and JSON serializations to enable interoperability (I, R)
- Becomes fully FAIR when published as linked data
Learn more about ISA →
Open PHACTS
- Provides machine-accessible interface with multiple representations (A)
- Allows multiple URLs to access information about a particular entity (F, A)
- Describes data sources using standardized dataset descriptions (R, I)
- Uses community-agreed upon ontologies (I)
Learn more about Open PHACTS →
wwPDB
- Hosts data on stable FTP servers (A)
- Represents data in machine-readable formats (F, I)
- Contains cross-references to common identifiers (R)
- Represents entries by DOIs (F, A)
Learn more about wwPDB →
UniProt
- Identifies entries by stable URLs (F, A)
- Provides rich metadata in both human and machine-readable formats (F)
- Uses shared vocabularies and ontologies in RDF format (I)
- Includes extensive links to other databases (R)
Learn more about UniProt →
Benefits of FAIR Data
Implementing the FAIR Principles provides numerous benefits:
- Enhanced Discoverability: Making data findable increases its visibility within the research community
- Improved Reproducibility: Well-described, accessible data facilitates validation of research findings
- Increased Efficiency: Reduced time spent searching for and processing data for reuse
- Greater Impact: Data that is easy to find and reuse leads to more citations and recognition
- Future-Proofing: Applying FAIR principles helps ensure data remains valuable over time
- Facilitated Collaboration: Interoperable data enables collaboration across disciplines and institutions
- Accelerated Innovation: Machine-actionable data enables new types of data-driven discovery
- Compliance: Many funding agencies and publishers are beginning to require FAIR data practices
Complementary Initiatives
The FAIR Principles complement and build upon other initiatives:
- Joint Declaration of Data Citation Principles (JDDCP): Focuses on making data citable, discoverable, and available for reuse
- Data Seal of Approval (DSA): Focuses on the responsibilities and conduct of data producers and repositories
- Force11: A community working on the future of research communications and e-scholarship
Challenges and Considerations
Implementing FAIR principles may involve addressing several challenges:
- Technical Infrastructure: Developing systems that support FAIR data practices
- Standards Development: Creating or adopting community standards for metadata and data formats
- Training and Skills: Building capacity for researchers to implement FAIR principles
- Incentives: Aligning academic rewards with good data stewardship practices
- Sensitive Data: Balancing openness with privacy and security concerns
- Legacy Data: Retrofitting existing data and systems to be more FAIR
- Sustainability: Ensuring long-term preservation and maintenance of FAIR data
Practical Checklist to go FAIR
Planning Stage
- ☐ Create a data management plan that incorporates FAIR principles
- ☐ Identify relevant data standards and repositories in your field
- ☐ Plan for long-term storage and accessibility of your data
- ☐ Determine appropriate licenses for your data and documentation
- ☐ Allocate resources (time, storage, tools) for FAIR implementation
✓ Findable
- ☐ Select a repository that provides persistent identifiers (e.g., DOI, Handle)
- ☐ Create comprehensive metadata describing your dataset
- ☐ Ensure your metadata includes the dataset's identifier
- ☐ Use descriptive titles and keywords to improve discoverability
- ☐ Register your dataset in appropriate disciplinary indexes or search engines
✓ Accessible
- ☐ Store data in a repository with a stable interface/API
- ☐ Ensure access protocols are open, free, and widely implemented
- ☐ If data has access restrictions, clearly document how to request access
- ☐ Implement appropriate authentication mechanisms if needed
- ☐ Ensure metadata remains accessible even if the data is restricted
✓ Interoperable
- ☐ Use open, standard file formats (e.g., CSV instead of Excel, XML, JSON)
- ☐ Apply community standards for metadata in your field
- ☐ Utilize controlled vocabularies, ontologies, and thesauri where available
- ☐ Include qualified references to other datasets when relationships exist
- ☐ Structure data to be easily integrated with other datasets
✓ Reusable
- ☐ Apply a clear and accessible license to your data (e.g., Creative Commons)
- ☐ Document the provenance of your data (how it was collected/generated)
- ☐ Include detailed methodology in your documentation
- ☐ Provide information about data quality and any limitations
- ☐ Follow community standards for data and metadata
Documentation Best Practices
- ☐ Create a comprehensive README file explaining your data
- ☐ Document variable names, units, and allowed values
- ☐ Explain file naming conventions and organization
- ☐ Note software and version information used for data processing
- ☐ Include contact information for future inquiries
Publication and Citation
- ☐ Cite datasets properly in your publications
- ☐ Link your paper to your datasets and vice versa
- ☐ Publish your data before or simultaneously with related papers
- ☐ Consider publishing a data paper describing your dataset
- ☐ Track usage of your dataset through citation metrics
Maintenance
- ☐ Plan for updates or corrections to your dataset
- ☐ Establish version control for your data
- ☐ Set reminders to check persistent links are working
- ☐ Review and update documentation as needed
- ☐ Monitor for changes in community standards or best practices
Use this checklist throughout your research project to ensure your data follows FAIR principles, increasing its value to both you and the broader research community.
References
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J. W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3, 160018. https://doi.org/10.1038/sdata.2016.18