Replication Agent Limitations

The Elastifile Replication Agent (a.k.a. Replication Service) is in charge of multiple tasks within the Elastifile cluster such as asynchronous replication, snapshot cooling and snapshot deletion from object.

Every Replication Agent can handle up to 3 tasks in parallel, with the following limitations:

  • up to 2 concurrent cool snapshot (not more than 1 per data container)
  • up to 3 concurrent delete snapshot from object and/or asynchronous replication

e.g. a mix of 1 asynchronous replication + 2 cooling tasks is valid.

 

In case that a replication agent receives a new request while it already reaches the limitation of running tasks, the following error might be shown up as a system event:

Could not find available replication agent

NOTE: Elastifile has a retry mechanism for tasks completion, so in case of a failure to execute a task due to a temporary unavailable resource, next attempt might work.

 

Mitigation:

If you find the above error keep being repeated, you should increase the number of replication agents in the system, especially where many data containers use asynchronous replication or cooling snapshots exist.

 

Escalation:

If the system uses AsyncDR, the error might complains about the remote system and the commands should be executed there
  1. Make sure all replication agents are active and running by executing the following command:
elfs-cli replication_service list

If they are not, check the following from the EMS:

telnet <RA_IP> 10015
ssh <RA_IP> df -h

 

2. Check for the running control_tasks in the system by executing the following command:

elfs-cli control_task list --search status=in_progress

You can contact elastifile-support@google.com by email for consulting.

Please attach the above commands outputs for faster troubleshooting.

 

Was this helpful?

How can we improve it?
true
Search
Clear search
Close search
Main menu
4109650807484014691
true
Search Help Center
true
true
true
false
false