Rclone backups to AWS
Setup Rclone 1.62.2 for AWS backups from HPC
Setup the IAM user permissions in AWS
Perform the following before configuring rclone.
Log into the AWS console
Navigate to IAM
Click Add User
name: rclone-backup-from-hpc
Do not check "Provide User accounts to the AWS Management Console" as this account will be used for CLI based access via rclone.
Click Next > Select "Attach policies directly"
Search for S3 and select "AmazonS3FullAccess". If this AWS account already has multiple buckets created it's up to the end user to specify a more granular access policy to S3 for this new IAM user.
Click Next > Create User
Now that user is created, click on the User in the IAM console
Navigate to Security Credentials and locate the Access Keys section.
Click "Create access key" and select "Applications running outside of AWS"
Click Next
For Description enter "rclone-backups-resnick-hpc" > Create Access Key
Leave this window open and access keys visible during the configuration of rclone on the cluster side.
Configure Rclone
module load rclone/1.62.2
rclone config
Enter name for new remote.
name> my-aws-account
Select S3 compatible object store. (This same backend can also be used for backblaze B2, Wasabi etc.)
Storage > 5
Now select the actual provider of this S3 compatible object storage
Provider > 1
Option env_auth. Specify 1 to enter your credentials or add the credentials to your shell environment.
env_auth > 1
Enter the new access key id_shown in the aws console during IAM creation.
access_key_id > AKIAX4EICHL66XXXXXXX
Now enter secret access key provided by the AWS console
secret_access-key >
aa8jXXXXXXXXXXXXXXXXXXXXXXXXX
Select a region for the storage bucket. We recommend us-west-2 (Oregon) as it's one of the newer regions, is distinct seismically from California and provides fast network access via Cenic/Internet2
regions > 4
Option Endpoint (leave default by hitting enter)
endpoint > (enter)
Location constraint (used when creating buckets via rclone commands). Set to option 4 (Oregon us-west-2)
location_constraint > 4
Optional ACL (just hit enter to leave blank)
>acl > (enter)
Optional server side encryption. Shown disabled below, set accordingly.
server_side-encryption > (enter)
Option sse_kms_key_id
sse_kms_id > (enter)
Option Storage Class
Standard IA (Good for fast recovery with the intention of not touching the data much). Set accordingly based on requirements and full understanding of the implications.
storage_class > 4
Edit advanced config? No
y/n > n
-------------------------------
Configuration complete.
Options:
- type: s3
- provider: AWS
- access_key_id: AKIAX4EICHL66XXXXXXX
- secret_access_key:
aa8jXXXXXXXXXXXXXXXXXXXXXXXXX
- region: us-west-2
- location_constraint: us-west-2
- storage_class: STANDARD_IA
Keep this "my-aws-account" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d>
-------------------------------
Enter yes to save config.
y/e/d> y
Quit the config by entering q
e/n/d/r/c/s/q> q
Now create the new bucket for backups inside of the AWS account. Here we are creating the resnick-hpc-backups bucket.
rclone mkdir
my-aws-account:/resnick-hpc-backups/
Notes
* Rclone stores configs and aws keys in
config/rclone/rclone.conf.
* If using one of the deeper storage classes such as glacier-deep_archive-glacier_ir, you'll want to familiarize yourself with its limitations, retention periods and retrieval fees.
* To specify a remote, be sure to include the colon, i.e.
my-aws-account:
* Backups are self-managed, you will need to check on the backup process now and then to assure everything is backing up as intended.
Useful Links
[Amazon S3 Glacier Storage Classes | AWS](https://aws.amazon.com/s3/storage-classes/glacier/)
[AWS Pricing Calculator](https://calculator.aws/#/)
https://rclone.org/s3/
https://rclone.org/commands/
Common commands
Basic ls
rclone ls my-aws-account:/resnick-hpc-backups/
Basic copy command
rclone copy ~/my-source-directory
my-aws-account:/resnick-hpc-backups/
-P --progress flag to view real-time transfer statistics.
Scheduling reoccuring backups with crontab
In order to schedule a job that runs repeatedly, you may add it to your crontab. We suggest running crontab at 5 minute intervals as a test to assure the backups are capturing new data on the cluster side. (i.e. run a 5 minutes while adding data to the cluster/source directories then check S3 to verify they are showing up.)
crontab -e
*/5 * * * * /central/software/rclone/1.62.2/rclone copy
~/Google-GCP my-aws-account:/resnick-hpc-backups/
After verifying the data is being backed up successfully you can switch to a daily or longer type backup. Example for once a day at 4AM
0 4 * * * /central/software/rclone/1.62.2/rclone copy
~/Google-GCP my-aws-account:/resnick-hpc-backups/