Elastifile on GCP best practices

Overview

System limitations and Boundaries

Before you deploy

Deployment best practices

HA modes

Cluster type

DC configuration

Snapshot to object

Elastifile Cloud File System Capacity Planning (Add nodes)

Expand Elastifile Cloud file system size

 

Overview

This document gives a high level overview of Elastifile on GCP best practices. As with any SW product, boundaries and best practices change with any new release. Please reach out to elastifile-support@google.com to get the most updated document.

System limitations and Boundaries

  • Number of DCs - overstepping the DC limit can result in management UI slowness or missing Snapshot schedule and Async RPO
    • Up to 50 with snapshot scheduler or Snapshot to Object 
    • Up to 4 with Async DR
  • Max DC capacity - no limit
  • Files/Folders per DC - 
    • 4M for Async for short RPO (30 min), longer RPO can support more files
    • 10M, in a single directory (NOT total files on DC), for snapshot cooling
  • Number of nodes - 32
  • Max clients per node 1000.
  • AsyncDR
    • Supported RPO of 30 min under the following conditions:
      • Number of DCs to be replicated - 4
      • Number of RPAs per X amount of DCs - 1 per DC 
      • Number of files per DC -  4 M
      • 5% daily change rate
      • Exceeding these parameters will not guarantee meeting the RPO
  • Snapshots 
    • Number of Snapshots per DC - 255
    • Number of Snapshots per cluster - number of DC x number of snapshots
  • Snapshot to Object
    • Frequency - one per day
    • History - one month (30 snapshots)
  • CSI limitations
    • K8s version - 1.11 to 1.14
    • OC version - 3.11
    • DC deletes when there is data - manual only, not supported in CSI
    • Number of DCs - 50
  • Data Information Life Management (ClearTier) is not supported.
  • Data Deduplication is not supported
  • EMS limitations
    • Restapi calls - 104 concurrent requests
    • UI access in parallel - 4

Before you deploy

Select the network subnet that has enough free IP address to accommodate the largest cluster you will have time twice (Elastifile Non disruptive upgrade will double the cluster size for the duration of the upgrade)

For better availability and faster recovery times, Google recommends splitting the workloads between several smaller Elastifile clusters, rather than using one big Elastifile cluster with many Data Containers.

Deployment best practices

HA modes

For production use "Single Zone High Availability w/ Intra-Zone Replication" or "Cross Zone High Availability" 
"Single-Zone High Availability" is not recommended for production use cases as data is not replicated internally between Elastifile storage nodes, and any infrastructure issue will impact system availability

# terraform.tfvars

DEPLOYMENT_TYPE = "multizone"

or

DEPLOYMENT_TYPE = "dual"

Cluster type

HDD based clusters are not supported for production. 

Small size nodes are not for production. 
If you plan to use Async DR, please select Medium-Plus or Large Nodes

# terraform.tfvars

TEMPLATE_TYPE = "medium-plus"

Or

TEMPLATE_TYPE = "large"

Estimate the target capacity of your cluster and select the node configuration that will use the lower number of nodes to allow future growth. If you have performance concerns please contact elastifile-support@google.com for additional help.

Elastifile Instance Types

Size

Capacity

Min Instances

CPU cores

RAM

small

0.7TB

3

4

32

Medium

4TB

3

4

42

Medium-plus*

4TB

3

8

64

Large

20TB

3

16

104

*The Medium-plus configuration was created for clusters using Asynchronous Disaster Recovery but not requiring a large capacity

 

DC configuration

  • Dedup is not supported
  • Data tiering is not supported
Make sure you check what is the quota needed for this DC and adjust it accordingly. Thin provisioning is supported, please note the default numbers are 1TB soft and 1.5TB hard quota.




 

Snapshot to object

 

Please follow the recommendations in the following KB

 

Elastifile Cloud File System Capacity Planning (Add nodes)

 

Elastifile Cloud file system can be used till 100% of disk size (In "Single Zone High Availability” till 95%), when system gets into full state it moves to “Protocols Write Disable Mode” - 

  • No new data can be written to the system
  • Data can be deleted
  • Data can be read

To resolve the state 2 options available for the user - 

  1. Delete unused data to reclaim space.
  2. Expend the Elastifiel Cloud file system size, by adding 1 or more storage nodes

 

Expand Elastifile Cloud File System size

  • To prevent file system reaching “Protocols Write Disable Mode”, we recommend adding the capacity before system reaches the threshold
  • When a the system expends, all data is being redistributed on all storage nodes, using a rebuild operation
  • From the end user perspective in the UI, the capacity is added as soon as the new storage nodes are added to the system
  • In actuality the space is being added incrementally, in the background, using the rebuild operation
  • As the file system is being written to during the rebuild process and the dataset size potentially keeps growing, if capacity was added too late the user may enter “Protocols Write Disable Mode”, although the file system appears as not full in the UI.

To prevent the system entering “Protocols Write Disable Mode”, although capacity was expanded - 

  1. We strongly recommend starting to expand the file system (add nodes) between 70%-75% of file system utilization.
  2. Adding more than one node at a time will speed up the rebuild process and shorten the window system is exposed to “Protocols Write Disable Mode”


 

Was this helpful?

How can we improve it?
Search
Clear search
Close search
Google apps
Main menu