Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research Data Services at UK: File Formats & Naming Guidelines

Resources and information to help you get started with data management, data preservation, and data sharing

File Formats

It is important to consider what format will be best for managing, sharing, and preserving your data. How you choose to represent your data is a primary factor in someone else's ability to use your data in the future.

Formats that are likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standards
  • Commonly used by the research community
  • Use standard character encodings (ASCII, UTF-8)
  • Uncompressed

Examples of preferred format choices include:

Discourage formats include:

  • Word (prefer PDF)
  • Quicktime (prefer MPEG-4)
  • GIF (which uses proprietary compression)

File Naming Guidelines

UK Libraries recommends applying a consistent filenaming convention to your research data using the following guidelines

First and foremost:
• Keep file names short but descriptive
• Be consistent with established conventions

Recommended practices:
• Denote dates in YYYYMMDD format
• Use unique identifier, e.g., project name or grant number
• Identify document content, e.g., questionnaire or grant proposal
• Use underscores or hyphens as delimiters; avoid space and special characters, e.g., &, *, #, etc.
• Keep track of document version either sequentially or with a unique date and time, e.g., v01, v02, 20140403_1800, etc.
• Avoid complex folder hierarchies

File Format Factors

UK Libraries recommends considering multiple factors when choosing a file format for your research data. 

• Proprietary and non-proprietary (open) formats

Proprietary formats are limited by software patents, lack of format specification details, or built-in encryption to prevent open usage by the public. This results in requiring specific software provided by one vendor in order to use the proprietary format. In contrast, an open format is a file format that is freely available for everyone to use. Because the specifications are released, opens source developers can write software to utilize the file format in the case that a particular vendor no longer supports the file format. This increases the chances that technological developments do not make particular file formats obsolete.

• Industry format adoption

In some cases, an industry or profession may treat specific file formats as a de facto standard even if the formats are proprietary and rely on expensive software. In those cases, it may be more convenient to use the same proprietary file format.

• Technical dependencies

Technical dependencies are the degree to which a particular format depends on particular hardware, operating system, or software and how these dependencies might influence future usage of the media. Using non-proprietary file formats may decrease the risk of technical obsolescence by removing the dependency on the underlying technology.

• File quality and file size

Each file type such as text, images, or sound has many file formats available. File quality, the representation of the given item’s characteristics, is a large part of the file format decision. Encoding that handles high resolution will be larger than lower quality file formats. However, the trade-off comes at the cost of storage space and convenience in disseminating the file to others.