AsyncDR Troubleshooting

Introduction

This article provides useful commands to troubleshoot AsyncDR issues.

In case of any AsyncDR issue, follow the commands and review the outputs to determine the cause.

If needed, don't hesitate to contact elastifile-support@google.com for further assistance, accompanied by all commands outputs.

Troubleshooting Commands

elfs-cli replication_service list

List all the replication services in the system.

Validate all of them are running and active.

elfs-cli remote_site show --id 1

Show the status of the remote system connected to the local one.

Validate the connection_status between the two is connected.

elfs-cli dc_pair list --data-container-id <DC_ID>

List all the pairs for the specific data container ID (can be taken by running elfs-cli data_container list).

Validate the connection_status is connected.

Note also the replication_status which is the optimal case is sync.

The exceeded_rpo status means that the time for replicating the data/snapshot to the remote site takes longer than the set RPO divided by two. For example, if the RPO is set to 30 minutes, the data should be replicated within 15 minutes.

Note that exceeded_rpo is not an issue with the Elastifile feature, but with the configuration of RPO.

elfs-cli dc_pair replication_logs --id=<DC_PAIR_ID>

List all the snapshots for the specific pair ID (can be taken from the previous command output).

Validates what is the duration for replicating the historical snapshots.

In case the duration is shorter than the set RPO/2, the dr_status would be sync.

Otherwise, if the duration is longer than the set RPO/2, the dr_status would be exceeded_rpo.

For every other dr_status such as failed, try to understand what is the issue from the previous commands.

Was this helpful?

How can we improve it?