Different communities create, collect, manage, and use different types of information. These communities may have different concerns and approaches to metadata. There is no single standard that can satisfy all the needs across communities.
However, in order for metadata to be understandable to the people and software applications that use it, some sort of consistency is required. Organizations often define and publish metadata standards to meet needs broadly across knowledge domains or within specialized disciplines. Published standards help system designers and end users accomplish their goals effectively.
Explore the different pages in this section of the guide to learn more about metadata standards.
Dublin Core is the most common metadata schema for web content.
Named in part for a 1995 metadata conference hosted by the Online Computer Library Center (OCLC) located in Dublin, Ohio, Dublin Core consists of 15 elements that were considered broad and generic enough to describe a wide range of resources.
This document is an up-to-date specification of all metadata terms maintained by the Dublin Core Metadata Initiative, including properties, vocabulary encoding schemes, syntax encoding schemes, and classes.
Element | Description |
A name given to the resource, either supplied by the individual assigning metadata or from the object. Example: "A Pilot's Guide to Aircraft Insurance" |
|
Entity responsible for making the resource. Example: "Duncan, P. A." |
|
The topic of the resource, typically represented using keywords. Example: "Colonial medicine" |
|
An account of the resource. Example: "Illustrated guide to airport markings and lighting signals for airports with low visibility conditions." |
|
An entity responsible for making the resource available. Example: "The University of Texas Press" |
|
An entity responsible for making contributions to the resource (e.g. editor, transcriber, illustrator). Example: "Austin Citizen Photograph" |
|
The spatial or temporal topic of the resource. Example: "Austin, TX" |
|
A point or period of time associated with the resource. Example: "1998-02-16" |
|
The nature or genre of the resource. For a list of possible types, visit the DCMI Type Vocabulary. Example: "image" |
|
The file format, physical medium, or dimensions of the resource. Example: "[128] p. : ill. ; 15 cm." |
|
Information about rights held over the resource. Example: "This electronic resource is made available by the University of Texas Libraries solely for the purposes of research, teaching and private study." |
|
A related resource from which the described resource is derived. Example: "ZA 3075 Y69 2007" |
|
Language(s) of the resource. Example: "Spanish" |
|
A related resource. For a list of possible relations, visit the Summary Refinement and Scheme Table. Example: "HasVersion 13th Edition" |
|
A unique reference to the resource. Example: "doi:10.15781/T2251FN91" |
MARC, an acronym for MAchine-Readable Cataloging, is a widely used standard among libraries. MARC was developed by the Library of Congress in the 1960s to enable computer production of catalog cards.
The MARC record structure contains ten sets of fields, associated with 3-digit numbers called tags. Some fields are further defined by indicators and subfield codes.
These publications contains descriptions of every data element, along with examples, in full or concise versions.
This booklet explains what a MARC record is and provides the basic information needed to understand and evaluate a MARC record.
Tag |
Description |
---|---|
0XX | Control information, numbers, and codes |
1XX |
Main entries related to personal and corporate names Example: 100 1# $a Arnosky, Jim. |
2XX |
Titles, edition, imprint information Example: 245 10 $a Dinosaurs : $b a visual encyclopedia. |
3XX |
Physical description Example: 300 ## $a 303 p. : $b col. ill. ; $c 29 cm. |
4XX | Series statements |
5XX |
Notes Example: 520 ## $a Presents an illustrated look at dinosaurs. |
6XX |
Subject access entries Example: 650 #1 $a Dinosaurs. |
7XX | Added entries other than subjects or series |
8XX | Series added entries and holdings information |
9XX |
Fields for locally-defined use Example: 900 ## $a 599.74 ARN |
Encoded Archival Description (EAD) was developed in the 1990s by the archival community as a way of presenting finding aids in electronic form. It uses Standard Generalized Markup Language (SGML) and XML as the encoding schemes. Maintained by the Society of American Archivists and the Library of Congress, the latest version is EAD3.
Inside the outermost wrapper element, <ead>, the EAD contains two main sections: <control> and <archdesc>. Each of these sections also include required and optional child elements.
This tag library represents version EAD3 of the Encoded Archival Description schemas, containing descriptions of 165 elements, arranged alphabetically by element name.
Element | Description |
Encoded Archival Description | Contains all the elements in an EAD document, including Control and Archival Description. |
Control | Contains elements about the finding aid itself, such as title, author, creation date, language, and description rules used. |
Archival Description |
Contains elements describing the content, context, and extent of the archival collection, rather than the finding aid, such as subjects, format, and box and folder inventory. |
Metadata Object Description Schema (MODS) is an XML schema with MARC-like semantics. MODS was developed by the Library of Congress out of the need for something easier to learn than MARC and richer than Dublin Core for describing complex digital objects.
MODS is "friendly" because it uses language-based tags rather than the numeric codes (e.g. 250) traditional to MARC. There are 20 top-level MODS elements, many of which contain subelements for granular desription.
MODS Top-Level Elements
Element | Description |
---|---|
Title Information | Contains all subelements related to title information. |
Name | Contains all subelements related to information about the name of a person, organization, or event (conference, meeting, etc.) associated in some way with the resource. |
Type of Resource | Information about the original item that specifies the characteristics and general type of content of the resource, as chosen from a defined list of terms. |
Genre | A term that give more specificity for the form, style, or content of an object than the broad terms used in Type of Resource. |
Origin Info | Contains subelements related to place of origin or publication, publisher/originator, and dates associated with the resource. |
Language | Contains a subelement to record the language in which the language of the content of a resource is expressed. |
Physical Description | Contains all subelements relating to physical attributes of the resource. |
Abstract | A succinct summary of the content of the resource. |
Table of Contents | A description of the contents of a resource. |
Target Audience | A description of the intellectual level of the audience for which the resource is intended. |
Note | General textual information relating to a resource, that is not encoded in other more specific elements. |
Subject | Contains subelements relating to the primary topic(s) on which a work is focused. |
Classification | Indication of the subject via a formal system of coding and organizing resources (e.g. call number). |
Related Item | Contains subelements with information that identifies other resources related to the one being described. |
Identifier | A unique standard number or code that distinctively identifies a resource. |
Location | Contains subelements that identify the institution or repository holding the resource, or the electronic location in the form of a URL when available. |
Access Condition | Information about restrictions imposed on access to a resource. |
Part | Contains subelements to designate physical parts of a resource in detail. |
Extension | Provides additional information not covered by MODS (when local elements or elements from other standard schemas are needed). |
Record Info | Contains subelements relating to information necessary to managing metadata. |
VRA Core is a standard commonly employed by cultural heritage organizations to describe images and works of art. VRA Core is hosted by the Visual Resources Association (VRA) and the Library of Congress.
VRA Core contains 19 elements. There are three primary entities: Work (a built or created object), Collection (an aggregate of such objects), and Image (a visual surrogate of such objects).
Element | Description |
Work, Collection, or Image | A record is described as a Work, a Collection, or an Image. |
Agent | Individual, group, or corporate body that has contributed to the design, creation, production, etc. of the work. |
Cultural Context | Name of the culture, people, or country with which the work has been associated. |
Date | Date associated with the work. |
Description | Free-text note about the work that gives additional information not in other categories. |
Inscription | All marks added to the work at the time of production (e.g. signatures or stamps). |
Location | Geographic location or repository whose boundaries include the work. |
Material | Substance which the work is composed of (e.g oil paint, bronze, or graphite). |
Measurements | Dimensions of the work. |
Relation | Terms describing the relationship between the work and a related work. |
Rights | Information about the copyright status for the work. |
Source | Reference to the source of information recorded about the work. |
State Edition | Identifying number or name assigned to the edition of a work that exists in more than one form. |
Style Period | Defined style, historical period, school, or movement whose characteristics are represented in the work. |
Subject | Terms that describe the work. |
Technique | Production processes, techniques, and methods incorporated in the fabrication of the work. |
Textref | Unique identifier assigned to the work. |
Title | Title given to the work. |
Work Type | Identifies the specific type of Work, Collection, or Image being described in the record. |
CDWA is a standard for the description of art, architecture, and other cultural works.
TEI is most commonly used for literary texts or manuscripts in the field of digital humanities.
ONIX is an international standard for representing and communicating book industry product information.
PBCore is a metadata schema designed for sound and moving images.
PREMIS is used for creating preservation metadata for digital objects.
Darwin Core is an extension of Dublin Core for biological diversity data.
EML a standard for the earth, environmental, and ecological sciences.
CIF is a standard for archiving and reporting crystal structure data.
FITS is a standard used for astronomy image data.
NeXus is an international standard for neutron, x-ray, and muon experiment data.
ISO 19115 is an international standard for describing geographic information.
DDI is an international standard for describing social science datasets which includes a metadata specification, controlled vocabularies, and tools for working with DDI metadata.
FOAF is a standard for social media and is focused on the relationships that connect people, places, and things described on the web.
Common types of controlled vocabularies include subject headings lists, authority files, and thesauri. Controlled vocabularies can be arranged as alphabetical lists of terms or as taxonomies with a hierarchical structure of broader and narrower terms. Thesauri also include synonyms, related terms, scope and editorial notes, term history, alternate languages, or numerical codes. Ontologies include even more specification, such as descriptions of terms or concepts by their position in the hierarchy and any number of relationships to other terms/concepts.
There are well-established standards to control names of people, geographic names, topics, concepts, resource types or genres, and languages.
Below are examples of controlled vocabularies commonly used in various communities. There are also tools available to help find appropriate vocabularies and ontologies.
General Purpose
Arts & Humanities
Physical & Life Sciences
Social & Behavioural Sciences
Metadata crosswalks translate elements and values from one schema to those of another. Crosswalks facilitate interoperability between different metadata schemas and serve as the a base for metadata harvesting and record exchange.
Below are examples of crosswalks that have been developed between widely applied metadata standards.
Metadata harvesting is the automated collection of metadata descriptions from different sources to create useful aggregations of metadata and related services. The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), which uses XML over HTTP to exchange data on the World Wide Web, was introduced in 2001. OAI-PMH is widely used by digital libraries, institutional repositories, and digital archives, as well as commercial services, to facilitate data exchange among systems and expand access to collections.
A few common examples of aggregators using OAI-PMH:
DPLA aggregates metadata from America's libraries, archives, and museums, providing an open access portal to discover materials and a platform to enable innovative reuse of DPLA content.
NSDL, established by the National Science Foundation and hosted by the University Corporation for Atmospheric Research (UCAR), is an open-access digital library of learning collections for the sciences, technology, engineering, and mathematics (STEM) education community.
A union catalog of millions of records representing open access resources from around the world, managed by the Online Computer Library Center (OCLC).
OLAC, an international partnership of institutions and individuals, aggregates metadata from participating archives to create a worldwide virtual library of language resources.
Created and maintained by the University of North Texas Libraries, the Portal aggregates metadata from partners across Texas to provide a gateway to rare, historical, and primary source materials from or about Texas. The Portal is also a service hub for The Digital Public Library of America.