Skip to Main Content

Engineering and Public Policy: Data Management: File Naming Conventions

Data Sets

Data set titles should be as descriptive as possible. These data sets may be accessed many years in the future by people who will be unaware of the details of the project. Data set titles should contain the type of data and other information such as the date range, the location, and the instrument used.

Examples-Data Sets

Examples of bad titles are:

  • The Aerostar 100 Data Set
  • Respiration Data

Some great titles are:

  • SAFARI 2000 Upper Air Meteorological Profiles, Skukuza, Dry Seasons 1999-2000
  • NACP Integrated Wildland and Cropland 30-m Fuel Characteristics Map, U.S.A., 2010
  • Global Fire Emissions Database, Version 2 (GFFDv2.2)

File Names

In order for others to use your data, they must fully understand the contents of the data set, including the parameter names, units of measure, formats, and definitions of coded values. Parameters, units, and other coded values may be required to follow certain naming standards as defined in experiment plans and the destination archive.

File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as

  • project acronym,
  • study title,
  • location,
  • investigator,
  • year(s) of study,
  • data type,
  • version number, and
  • file type

Select a consistent format that can be read well into the future and is independent of changes in applications. If your data collection process used proprietary file formats, converting those files into a stable, well-documented, and non-proprietary format to maximize others' abilities to use and build upon your data.

Examples-Files Names

Avoid really long file titles-aim for no more than 64 characters. 

Examples of bad file names:

  • SamTempsFridayNight.xls
  • Temps1.doc
  • Temps2.doc

Good file name examples:

• Howland_small_stem_biometry_2010.csv
From data set NACP New England and Sierra National Forests Biophysical Measurements: 2008-2010

Sevilleta_LTER is the project name
NM is the state abbreviation
2001 is the calendar year
NPP represents Net Primary Productivity data
csv stands for the file type—ASCII comma separated variable

Instead of "data May2011" use "data_May2011" or "data-May2011"

EPP pages

EPP pages created by Sue Collins, maintained by Julie Chen