Data Storage#
This is a rolling copy of the group’s storage policy.
Our group’s allocation is 1.5 PB of Active
plus 505 TB of Archive. Our Active allocation is at /storage1/fs1/rvmartin/Active
(1 PB) and /storage1/fs1/rvmartin2/Active
(0.5 PB).
Our Archive
allocation is at /storage1/fs1/rvmartin/Archive
(1.5 PB) and /storage1/fs1/rvmartin2/Archive
(5 TB).
The Active allocation is where you should put data that you are currently working with. Once a project finishes you should move its data to Archive.
Data Transfer between Active and Archive#
Globus (Recommended): The Globus endpoints for Storage1 are named
RIS Storage1
,Wash U RIS storage1 dtn1
andWash U RIS storage1 dtn2
.File status can be checked by:
mmlsattr -L $FILE_PATH
, whereMisc attributes
ofARCHIVE OFFLINE
means the file has been moved fromcache
totape
, andARCHIVE
means the file is oncache
.Globus transfers files on
cache
seamlessly with fast speed, while experiencing unpredictable delays, ranging from several minutes to hours, and probably denotingendpoint error
for files ontape
.Manually bringing files from
tape
tocache
in advance, called rehydration, can speed up Globus transferring a lot, which can be achieved byfile
command. Specific usage is runningfile $FILE_PATH
from a home node (compute1-client-*). Wildcards (*) or rehydrating all files in a directory are also supported forfile
command.When you restart a Globus transferring task, check
sync
box underTransfer & Timer Options
on Globus (it is unchecked by default).
The
safe-transfer
script in the/Shared
directory.
Checking usage#
Monthly usage reports for rvmartin allocation are generated in /storage1/fs1/rvmartin/Active/Shared/StorageReports
. Run the
./view_most_recent_usage_report
command:
$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_most_recent_usage_report
Monthly usage reports for rvmartin2 allocation are generated in /storage1/fs1/rvmartin2/Active/Shared/StorageReports
. Run the
./view_most_recent_usage_report
command:
$ cd /storage1/fs1/rvmartin2/Active/Shared/StorageReports
$ ./view_most_recent_usage_report
A combined usage report for both rvmartin and rvmartin2 allocation is generated by running the ./view_both_allocations_usage_report
command:
$ cd /storage1/fs1/rvmartin/Active/Shared/StorageReports
$ ./view_both_allocations_usage_report
The controls are below.
q - quit
j - down
k - up
/ - search for text
n - next match
N - previous match
To view the total Active storage space left on rvmartin allocation:
$ mmlsquota --block-size auto -j rvmartin_active rdcw-fs1
To view the total Active storage space left on rvmartin2 allocation:
$ mmlsquota --block-size auto -j rvmartin2_active rdcw-fs2
Policy#
This policy is not intended to impose any restrictions on your usage of Compute1 or Storage1. It’s moreso intended to facilitate the data lifecycle of the group.
The Archive directory should have the same structure as the Active directory. If you are archiving
rvmartin/Active/lbindle/sgv
it should go inrvmartin/Archive/lbindle/sgv
.If you are done with a directory, and it can be archived, move it to Archive. If at any point you need to bring it back, it’s easy to do with Globus (mind you it might take several days if it’s large).
Before you leave the group, archive your entire project directory.
Every 6 months to a year, go through your directories and archive anything that you can.