Research Guides: Research Data Services at UK: Manage

Choose the Right Storage for Your Data

Use the table below to compare data storage options available to UK researchers. You can download the table as a spreadsheet using the link below.

Storage	Sharepoint and OneDrive	Lab Archives	LCC/MCC Scratch Space Storage	Network Attached Storage (Gemini) paid Condo Storage	UKY Tape Archive	OURRstore Tape Archival	OSN (Open Storage Network)	Google Drive	Dropbox	Box	AWS Glacier
Purpose	General document and data storage	Electronic Research Notebook for collaboration, notes and data management	Mounted to HPC cluster. Short-term scratch space for projects that do not need to maintain data long-term	Stand-alone on-site disk storage for researchers needing terabytes of space	On-site magnetic tape storage for long-term data preservation or creating a backup	Magnetic tape storage in two copies, with one copy at the University of Oklahoma and the other sent to the researcher	Stand-alone object store for lots of data	General document and data storage	General document and data storage	General document and data storage	Object storage for lots of data that does not need to be frequently accessed
Can data be made accessible to non-UK researchers?	Yes	Yes	No	No	No	No	Yes	Yes	Yes	Yes	Yes
Quota (individual)	5TB to 25TB	Unlimited number of potential notebooks and attached files under 4GB in size	25TB	Minimum purchase quota is 100TB	Based on quota purchased	Based on number of tapes purchased	10TB	15GB	2GB (free version)	10GB (free version)	Based on quota purchased
Graphical user interface (GUI) for editing data	Yes	Yes	No	No	No	No	No	Yes	Yes	Yes	No
Cost (TB)	Free for UK researchers	Free for UK Researchers	Free (see quotas)	Paid service (condo)	Paid service	$12/TB for 2 copies	Free	Free	Free	Free	Paid
Good	Syncs easily with UK devices through OneDrive	Good for setting up workflows and protocols for research teams and organizing multiple team members' data	Convenient for HPC users	Highly cost-effective option for large amounts of data	On-site storage option for preserving data	Cheap option for backing up massive quantities of data	Easy to get access (with a short proposal); available to researchers at other institutions	Smooth intuitive web GUI; familiar to most researchers	Smooth intuitive web GUI; familiar to most researchers	Smooth intuitive web GUI; familiar to most researchers	Relatively inexpensive storage for large amount of data.
Bad	Slow data transfer speeds	Intended for notebook attachments rather than a place to store data in general. Max File size for upload is 16GB	Short-term only	Difficult to learn for new users; lack of reliable backups of data	Limited support; not ideal for actively managed data	Initial set-up (tape purchase, tutorial session etc.) can be confusing.	Storage is available only until the active proposal duration and not long term	Low amount of free storage	Low amount of free storage	Low amount of free storage	Users need to purchase on their own; egress charges are high
Where to Learn More	UK ITS	UK OVPR	UK CCS	UK CCS	UK ITS	OURRstore	NSF	UK ITS (set up your UK Google Workspace)	Dropbox	Box	AWS

This table was adapted from one made by Mami Hayashida.

For document storage options at UK using Microsoft services, see this guide from UK ITS.

For more detailed information on these storage options, see this guide from the Center for Computational Science.

UK Data Storage Options

Back Up and Secure Your Data

Accidents and unfortunate events happen, but they don't need to ruin your research. Taking proactive steps to keep your data safe are a crucial part of any project.

The 3-2-1 Rule

The 3-2-1 rule is a guideline for data security that recommends maintaining 3 copies of your data on 2 different forms of media, with 1 copy stored off-site. The scope of your work should determine the degree of intensity of your backup system. For a class project, storing a copy of your files in a cloud service such as OneDrive will likely be sufficient, but grant-funded research should adhere to the 3-2-1 rule.

When making a plan for backing up your data, take the following into account:

Develop a backup procedure that is automatic or easy to regularly incorporate into your workflow. Because most people do not expect data loss to happen to them, they are more likely not to back up their data if the process is cumbersome. Syncing files to cloud storage or automatically backing up data to an external hard drive on a regular basis can both be easily sustainable practices.
Test your backups. Confirm that your backups contain the files you need them to and that you are able to access them.
Do not treat files or directories in active use for collaboration as backups. When files stored in the cloud are automatically synced to collaborators' local machines, data deletion in one location may result in deletions elsewhere. Maintain separate copies of your files that are solely for restoring lost data, not for active use.

Choosing Your Backup Media

There are a variety of options for backing up your data depending on your needs:

External hard drives have high storage capacities and can be moved to an off-site location. For a high degree of security, maintain backups on two external hard drives, one off-site and one on-site. Make frequent backups to your on-site hard drive and switch it out periodically with the off-site drive.
Local servers maintained at UK can minimize the effort you need to put into making backups, but you should confirm that you are backing up files properly and are familiar with the recovery process.
Cloud storage is convenient, but it is often not a "true" backup, because deleting files in one location deletes them in all synced locations as well. Additionally, changes in company policies may impact your backups. As one example, Google Drive previously provided unlimited storage to academics but now imposes a limit.
Flash drives should not be used as data backups. While flash drives can be convenient for transferring files between machines, their small size makes them easy to misplace, and flash storage degrades more quickly than other forms.

These considerations are adapted from "Chapter 8: Storage and Backups" of Research Data Management for Researchers by Kristin Briney.

Securing Sensitive Data

When developing data storage procedures, researchers collecting data from human subjects should consult the University of Kentucky Institutional Review Board's Confidentiality and Data Security Guidelines for Electronic Data to ensure they are in compliance. Other important UK and legal policy information regarding data security, privacy, and confidentiality can be found in the IRB Survival Handbook.

Organize Your Files

Depending on how much data you are generating or collecting, as a your research progresses, the combination of your data, code, and outside sources can quickly become unmanageable. A key file might get lost in a folder with hundreds of unrelated items, you might forget what the variable names you chose actually mean, or you might need to reproduce a particular series of steps for data analysis but aren't sure what you did the first time. Following best practices in organizing and naming your files and folders can make it easier for you and your collaborators to find the files you need quickly.

File Naming Conventions

Consistent file naming conventions makes it easier to locate specific files quickly. The table below shows some best practices in file naming.

Practice	Reasoning	Example
Start file names with the date written in YYYYMMDD format	Dates at the start of file names make them easy to sort chronologically.	Use 20230614 to represent June 14, 2023
For version numbers, include leading zeros	Without leading zeros, operating systems will not sort files numerically.	Computers sort files with these names in the following order: SampleFile_v01.txt SampleFile_v04.txt SampleFile_v3.txt
Separate terms using underscores or camel case rather than spaces or special characters	Spaces and special characters like */ : \ ) ( # % ? " \|** are not always read correctly by operating systems or programs.	Sample_File.txt SampleFile.txt
Keep file names descriptive but brief	Lengthy file names become difficult for humans to read and may result in file paths that are too long for operating systems to process. Using abbreviations can be helpful, but be sure to document their meaning.	Use 20230614_mwd_data_v01.txt rather than 20230614_MetropolitanWaterDistrict_data_v01.txt

Create Your Own Naming Convention

It is much easier to develop a file naming structure before data collection and analysis begins rather than attempting to rename your files in the middle of your work. Kristin Briney has created a File Naming Convention Worksheet that can help you consider your needs and develop an appropriate naming convention.

File Naming Convention Worksheet

Data Dictionaries

It's best practice to keep data files simple to make them easier to analyze, but that also means important information about the data may not be readily available. A data dictionary is a table or document for describing the variables used in a dataset. For each variable, a data dictionary might include:

A definition
The units (e.g. meters, inches, Celsius, Fahrenheit, etc.) used
Acceptable value ranges
An explanation of how missing or null values will be entered
Any other context necessary to work with the variable

Use the link below to download a worksheet (developed by Kristin Briney) for brainstorming a data dictionary for one of your datasets.

Data Dictionary Worksheet

Folder Management

Sub-folders (or sub-directories) can keep you from confusing similarly named files by separating different phases of your work. Consider the different categories of the files that you are working with to determine how best to sort them into different folders. One common division is to maintain different folders for raw data, cleaned data, and data visualizations (such as charts and tables).

When creating sub-folders, be careful not to create too many divisions between your files. A project folder containing dozens of sub-folders that only contain two or three files each will not be very easy to navigate. Additionally, nesting too many folders inside one another can create lengthy file paths which may prevent you from opening your files.

README Files

A README file is a single data file that describes your data and its technical, interpretive, and analytical requirements. It's a form of documentation, usually a simple text file, that serves as a guide to your data. README files are considered a general best-practice measure for data management.

At a minimum, it is a good idea to create a README file for your project data as a whole, but if you are working with a large amount of data, consider using them in your sub-folders as well to describe portions of your work.

Guidelines for Creating README Files

Cornell University's Research Data Services Group has created useful guidelines for writing README style metadata, which can be helpful in creating README files at various levels of granularity in your project folder.

Describe Your Data with Metadata

What Is Metadata?

Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage data. Metadata is key to ensuring that data will survive and continue to be accessible in the future. Typically, it is helpful to use a defined metadata standard to describe your research.

A few popular standards can be found on this page, but there may be others that are more specific to your research.

Some basic metadata elements include:

Title
Creator (Principle Investigators)
Data created
Format
Subject
Unique identifier
Description of the data asset
Coverage of the data (spatial or temporal)
Publishing organization
Rights
Funding agency

Metadata Standards

While you can always choose to create your own metadata based on what is most relevant to your project, it can also be helpful to use an existing metadata standard. Because standards are used by a very large community of practitioners, implementing them in your own work can help make your data more discoverable once you share it.

Below are some common metadata standards. If you are planning to deposit your final dataset in a repository, check to see if that repository prefers a particular standard.

Darwin Core

The Darwin Core is body of standards intended to facilitate the sharing of information about biological diversity.

Data Documentation Initiative (DDI)

The Data Documentation Initiative (DDI) is an international standard for describing statistical and social science data.

Dublin Core (DC)

The DC is a simple, easy to use vocabulary of fifteen properties for use in resource description.

Federal Geographic Data Committee (FGDC)

The FGDC is the content standard for digital geospatial metadata.

Integrated Taxonomic Information System (ITIS)

The ITIS is the authoritative taxonomic information on plants, animals, fungi, and microbes of North America and the world.

Text Encoding Initiative (TEI)

The TEI is a standard for the representation of texts in digital form.

Visual Resources Association Core (VRA)

The VRA Core is a standard for the description of works of visual culture and the images that document them.

Document Your Analysis

As you transform your data to produce new insights, it is important to document your process so that other researchers (including yourself in the future) can understand the steps you took, evaluate them, and perhaps make changes to produce different results. This page offers guidance on incorporating documentation practices into your habitual processes.

Choose the Right Software

Some tools are better equipped than others to make documenting analysis a natural part of your workflow. Writing scripts in a programming language such as R or Python to transform your data means that the code in your script naturally serves as a step-by-step explanation of everything that you did (and can be further improved with comments in your code). If you are analyzing data in a software like Excel, however, it can be more difficult to keep track of your transformation steps, especially for complicated processes.

Implement Version Control

Version control refers to the process of keeping track of what changes have been made to your files. It can be particularly important when working as part of a team.

At a minimum, save a copy of your original, unedited data. Perform your analysis on a separate copy of the file so that you can return to the original if need be. Consider saving copies of your transformed data at other important checkpoints, such as the completion of your cleaning process.

If you anticipate saving separate versions of your data at many different points, consider using a version control system such as Git. Instructions for installing Git and an explanation of a sample workflow can be found in this tutorial on GitHub for Version Control.

Keep a Research Notebook

A research notebook can be a formal or informal way to document various parts of your research process. UK researchers can create detailed and secure electronic research notebooks using LabArchives. University of Kentucky Research offers detailed instructions on getting started with LabArchives.

Even if you are not working in a discipline that uses research notebooks in a formalized manner, they can still provide a useful means of documenting elements of your research process that may be important in the future, including:

Challenges in working with your data and how you responded to them
Insights that are not immediately actionable but may be relevant at a later point in your process
Reasoning for particular choices you are making in organizing or analyzing your data

Your notebook can be as simple as a document where you regularly add date-stamped entries with any thoughts on your process. It can be useful later for recalling a specific choice that you haven't documented elsewhere, providing a narrative writeup of your process in an article, or reflecting on your research practices.

Research Data Services at UK: Manage

What Is Data Management?