Elastifile Backup based on Snapshot Shifting

Overview

Elastifile Cloud File System includes an internal built in snapshot scheduling and cooling mechanism that moves consistent snapshots from the primary storage to GCS in ECFS proprietary format.

The proprietary format preserves the file structure as well as all files attributes. Data is compressed while in transit and the cooling mechanism is dedup aware so only unique data is sent over the wire to the GCS.

As with other backup solutions the first backup is a full backup and the following backups are incremental.

Solution Capabilities and Limitations

The backup solution is for data backup, it does not backup the file system and cluster, recovering from a backup will require to remount and validate the file system.
It is recommended to use one snapshot per day per DC to avoid performance degradation as result of the scanning process and snapshot deletion.
No more than 30 namespaces (Data Containers) per ECFS clusters enabled with snapshot cooling scheduler.
The above mentioned limitations are not enforced by the system so customers are advised to keep them in mind.
Up to 100 snapshots per data container can be stored in GCS.
DR-restoring a full snapshot from GCS is a manual external procedure. The procedure must be done with elastifile-support@google.com. For more information, check this KB article.

Scope

Elastifile provides the ability to define snapshot schedulers on a data container level, as a backup mechanism.

When a customer defines a scheduler on multiple DCs, it can lead to a storm of snapshots deletion at the same time, which is a heavy operation that could impact system performance.

This article describes the procedure of implementing snapshot schedulers through the CLI, which brings additional capabilities which are not enabled in the GUI.

The important one is defining the start time which helps overcome the described challenges.

Using the start time flag, offsets can be defined in order to avoid snapshots deletion contention.

Pre-requisites

1. Make sure "Private Google Access" is enabled on the VPC subnetwork

Read the following article on How To Check and Change the Private Google Access setting

2. Object Tier is Available

In case the object tier feature is not available, please contact elastifile-support@google.com

Elastifile will evaluate the request. If approved, a suitable license will be provided.

3. DATA ILM is disabled

Data ILM is a feature to move live cold data to GCS buckets, not just snapshots.

This feature is currently not supported and must be disabled.

In order to easily disable the feature on all existing DCs:

for i in $(elfs-cli data_container list | awk '{print $1}' | grep -o '^[0-9]*$'); do elfs-cli data_container update --id $i --data_ilm data_ilm_disabled --automatic_data_ilm auto_data_ilm_disabled; done

For any new data container please run the following as well:

elfs-cli data_container update --id <NEW_DC_ID> --data_ilm data_ilm_disabled --automatic_data_ilm auto_data_ilm_disabled

4. Activate Object Tier

Must be configured after step 2 is completed successfully

5. Set the data tiering policy to 100% primary

Configuration Guide

List the data containers in the system and their IDs

[root@schedule ~(elfs_admin)]# elfs-cli data_container list | awk '{print$1, $2}'

id name

------------

1 dc01

2 dc02

3 dc03

4 dc04

2. Create the relevant schedule policies per DC

[root@schedule ~(elfs_admin)]# elfs-cli schedule create --name dc01 --data-container-ids 1

id: 1

name: dc01

state: enabled

data_containers:

id: 1 name: dc01

[root@schedule ~(elfs_admin)]# elfs-cli schedule create --name dc02 --data-container-ids 2

id: 2

name: dc02

state: enabled

data_containers:

id: 2 name: dc02

3. Define the task policy per each of the schedulers (use 1 hour gap between schedules)

The id represents the schedule id created in step 1
The repeat and delete after values are in minutes
Start time is defined in UTC time

Example 1-

Create a snapshot every day, cool after 7 days and delete after 10 days starting at 8PM (after workday)

[root@schedule ~(elfs_admin)]# elfs-cli schedule create_task --id 1 --name dc01 --type SnapshotIlmTask --repeat-after 1440 --cool-after 10080 --delete-after 14400 --start-time 2020-04-27T20:00:00

id: 1

name: dc01

schedule_id: 1

type: SnapshotIlmTask

start_time: 2020-04-24T20:00:00.000Z

repeat_after: 1440

cool_after: 10080

delete_after: 14400

created_at: Apr 11, 15:37:58

updated_at: Apr 11, 15:37:58

Example 2-

Create a snapshot every 7 days, cool after 10 days and delete after 30 days, starting at 11PM (not the same time as the daily ones)

[root@schedule ~(elfs_admin)]# elfs-cli schedule create_task --id 2 --name dc02 --type SnapshotIlmTask --repeat-after 10080 --cool_after 14400 --delete-after 43200 --start-time 2020-04-27T23:00:00

id: 2

name: dc02

schedule_id: 2

type: SnapshotIlmTask

start_time: 2020-04-24T23:00:00.000Z

repeat_after: 10080

cool_after: 14400

delete_after: 43200

created_at: Apr 11, 15:44:06

updated_at: Apr 11, 15:44:06

Manage ECFS Snapshots

DC01 has 4 snapshots- 2 are placed locally (in the SSD tier) and 2 are placed in the object tier.

The ECFS snapshots are reachable by accessing hidden directories, depending on where they are located:

[root@client ~]# mkdir /mnt/DC01

[root@client ~]# mount 10.229.255.1:DC01/root /mnt/DC01

[root@client ~]#

[root@client ~]# df -h /mnt/DC01

Filesystem Size Used Avail Use% Mounted on

10.229.255.1:DC01/root 1000G 0 1000G 0% /mnt/DC01

[root@client ~]#

[root@client ~]# cd /mnt/DC01

[root@client DC01]# ll .snapshot

total 0

drwxr-xr-x. 2 root root 0 Apr 26 11:00 local01

drwxr-xr-x. 2 root root 0 Apr 26 11:00 local02

[root@client DC01]#

[root@client DC01]# ll .object

total 0

drwxr-xr-x. 2 root root 0 Apr 26 11:00 object01

drwxr-xr-x. 2 root root 0 Apr 26 11:00 object02

How to restore a file/ dir from a snapshot

Elastifile KB

For a full DR-recovery manual procedure, please contact elastifile-support@google.com

How to Prevent a Snapshot Deletion

There could be some cases where you would like to stop the existing snapshots in the system from being deleted as part of the scheduler, e.g. when there is an important snapshots of a specific point in time or in a performance issue when there is a massive snapshots deletion at the same time.

1. List the snapshots in the system:

elfs-cli snapshot list

2. Choose the relevant snapshot and configure by its id:

elfs-cli snapshot update --id 2 --no-deletion_schedule_mins

Was this helpful?

How can we improve it?