Data Storage#

This is a rolling copy of the group’s storage policy.

Our group’s allocation is 1.5 PB of Active plus 505 TB of Archive. Our Active allocation is at /storage1/fs1/rvmartin/Active (1 PB) and /storage1/fs1/rvmartin2/Active (0.5 PB).

Our Archive allocation is at /storage1/fs1/rvmartin/Archive (1.5 PB) and /storage1/fs1/rvmartin2/Archive (5 TB).

The Active allocation is where you should put data that you are currently working with. Once a project finishes you should move its data to Archive.

Data Transfer between Active and Archive#

  • Globus (Recommended): The Globus endpoints for Storage1 are named RIS Storage1, Wash U RIS storage1 dtn1 and Wash U RIS storage1 dtn2.

    • File status can be checked by: mmlsattr -L $FILE_PATH, where Misc attributes of ARCHIVE OFFLINE means the file has been moved from cache to tape, and ARCHIVE means the file is on cache.

    • Globus transfers files on cache seamlessly with fast speed, while experiencing unpredictable delays, ranging from several minutes to hours, and probably denoting endpoint error for files on tape.

    • Manually bringing files from tape to cache in advance, called rehydration, can speed up Globus transferring a lot, which can be achieved by file command. Specific usage is running file $FILE_PATH from a home node (compute1-client-*). Wildcards (*) or rehydrating all files in a directory are also supported for file command.

    • When you restart a Globus transferring task, check sync box under Transfer & Timer Options on Globus (it is unchecked by default).

  • The safe-transfer script in the /Shared directory.

Checking usage#

Monthly usage reports for rvmartin allocation are generated in /storage1/fs1/rvmartin/Active/Shared/StorageReports. Run the ./view_most_recent_usage_report command:

$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_most_recent_usage_report

Monthly usage reports for rvmartin2 allocation are generated in /storage1/fs1/rvmartin2/Active/Shared/StorageReports. Run the ./view_most_recent_usage_report command:

$ cd /storage1/fs1/rvmartin2/Active/Shared/StorageReports
$ ./view_most_recent_usage_report

A combined usage report for both rvmartin and rvmartin2 allocation is generated by running the ./view_both_allocations_usage_report command:

$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_both_allocations_usage_report

The controls are below.

q - quit
j - down
k - up
/ - search for text
   n - next match
   N - previous match

To view the total Active storage space left on rvmartin allocation:

$ mmlsquota --block-size auto -j rvmartin_active rdcw-fs1

To view the total Active storage space left on rvmartin2 allocation:

$ mmlsquota --block-size auto -j rvmartin2_active rdcw-fs2

Policy#

This policy is not intended to impose any restrictions on your usage of Compute1 or Storage1. It’s moreso intended to facilitate the data lifecycle of the group.

  1. The Archive directory should have the same structure as the Active directory. If you are archiving rvmartin/Active/lbindle/sgv it should go in rvmartin/Archive/lbindle/sgv.

  2. If you are done with a directory, and it can be archived, move it to Archive. If at any point you need to bring it back, it’s easy to do with Globus (mind you it might take several days if it’s large).

  3. Before you leave the group, archive your entire project directory.

  4. Every 6 months to a year, go through your directories and archive anything that you can.