Data Storage#
This is a rolling copy of the group’s storage policy.
Our group’s allocation is 1.5 PB of Active
plus 505 TB of Archive. Our Active allocation is at /storage1/fs1/rvmartin/Active (1 PB) and /storage1/fs1/rvmartin2/Active (0.5 PB).
Our Archive
allocation is at /storage1/fs1/rvmartin/Archive (1.5 PB) and /storage1/fs1/rvmartin2/Archive (5 TB).
The Active allocation is where you should put data that you are currently working with. Once a project finishes you should move its data to Archive.
Data Transfer between Active and Archive#
Globus (Recommended): The Globus endpoints for Storage1 are named
RIS Storage1,Wash U RIS storage1 dtn1andWash U RIS storage1 dtn2.File status can be checked by:
mmlsattr -L $FILE_PATH, whereMisc attributesofARCHIVE OFFLINEmeans the file has been moved fromcachetotape, andARCHIVEmeans the file is oncache.Globus transfers files on
cacheseamlessly with fast speed, while experiencing unpredictable delays, ranging from several minutes to hours, and probably denotingendpoint errorfor files ontape.Manually bringing files from
tapetocachein advance, called rehydration, can speed up Globus transferring a lot, which can be achieved byfilecommand. Specific usage is runningfile $FILE_PATHfrom a home node (compute1-client-*). Wildcards (*) or rehydrating all files in a directory are also supported forfilecommand.When you restart a Globus transferring task, check
syncbox underTransfer & Timer Optionson Globus (it is unchecked by default).
The
safe-transferscript in the/Shareddirectory.
Checking usage#
Monthly usage reports for rvmartin allocation are generated in /storage1/fs1/rvmartin/Active/Shared/StorageReports. Run the
./view_most_recent_usage_report command:
$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_most_recent_usage_report
Monthly usage reports for rvmartin2 allocation are generated in /storage1/fs1/rvmartin2/Active/Shared/StorageReports. Run the
./view_most_recent_usage_report command:
$ cd /storage1/fs1/rvmartin2/Active/Shared/StorageReports
$ ./view_most_recent_usage_report
A combined usage report for both rvmartin and rvmartin2 allocation is generated by running the ./view_both_allocations_usage_report
command:
$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_both_allocations_usage_report
The controls are below.
q - quit
j - down
k - up
/ - search for text
n - next match
N - previous match
To view the total Active storage space left on rvmartin allocation:
$ mmlsquota --block-size auto -j rvmartin_active rdcw-fs1
To view the total Active storage space left on rvmartin2 allocation:
$ mmlsquota --block-size auto -j rvmartin2_active rdcw-fs2
Policy#
This policy is not intended to impose any restrictions on your usage of Compute1 or Storage1. It’s moreso intended to facilitate the data lifecycle of the group.
The Archive directory should have the same structure as the Active directory. If you are archiving
rvmartin/Active/lbindle/sgvit should go inrvmartin/Archive/lbindle/sgv.If you are done with a directory, and it can be archived, move it to Archive. If at any point you need to bring it back, it’s easy to do with Globus (mind you it might take several days if it’s large).
Before you leave the group, archive your entire project directory.
Every 6 months to a year, go through your directories and archive anything that you can.