How to migrate data utilizing multiple TCP connections for NFSv3 (nconnect)

Background

For all Elastifile versions a single NFS client will open a single TCP connection to the system and all meta-data and data operations will be serviced over that single connection

An enhancement in the Linux kernel (nconnect) allows a single client to open multiple TCP connections for a single NFS session.

For migrating data between data-containers "nconnect" can be enabled so that the client can distribute the meta-data and data requests over multiple TCP connections.

Create a client instance

In the same project and VPC as the Elastifile fileserver create a new instance with the following minimum specs

16 vCPUs
16 GB memory
512GB SSD boot disk
Ubuntu image ubuntu-minimal-2010-groovy-v20201022

The high CPU count and large boot disk removes any throttling imposed on the client instance

The Linux distribution can be any available as long as the kernel version is => 5.2

Verifying client to fileserver latency

Make sure the client instance network is on the correct VPC
Ping the source and target fileserver VIP to ensure latency is low

root@lin-02:~# ping -c 3 10.255.255.1

PING 10.255.255.1 (10.255.255.1) 56(84) bytes of data.

64 bytes from 10.255.255.1: icmp_seq=1 ttl=64 time=0.397 ms

64 bytes from 10.255.255.1: icmp_seq=2 ttl=64 time=0.327 ms

64 bytes from 10.255.255.1: icmp_seq=3 ttl=64 time=0.394 ms

root@lin-02:~# ping -c 3 10.254.255.1

PING 10.254.255.1 (10.254.255.1) 56(84) bytes of data.

64 bytes from 10.254.255.1: icmp_seq=1 ttl=64 time=1.42 ms

64 bytes from 10.254.255.1: icmp_seq=2 ttl=64 time=0.450 ms

64 bytes from 10.254.255.1: icmp_seq=3 ttl=64 time=0.286 ms

High network latency will degrade NFS performance for data sets with large numbers of small files

VPC peering when configured for the source and target subnets reduces latency

Mount the fileserver from the client

In this example the source fileserver VIP is 10.255.255.1 and the target is 10.254.255.1

Create a directory on the client instance for each fileserver

mkdir -p /mnt/10.255.255.1/src_dc

mkdir -p /mnt/10.254.255.1/tgt_dc

Mount the source and target on the client instance

mount -t nfs -o noatime,nodiratime,actimeo=120,nconnect=16 10.255.255.1:/src_dc/root /mnt/10.255.255.1/src_dc

mount -t nfs -o noatime,nodiratime,actimeo=120,nconnect=16 10.254.255.1:/tgt_dc/root /mnt/10.254.255.1/tgt_dc

"noatime,nodiratime" prevents unnecessary meta-data updates

"actimeo=120" prevents unnecessary meta-data lookups

"nconnect=16" enables 16 TCP connections for each session

Verify "nconnect" is active for the NFS mounts

netstat -anpt | grep "10.255.255.1:2049" | wc -l

netstat -anpt | grep "10.254.255.1:2049" | wc -l

Install "rclone" on client instance

Standard utilities used to perform data copies can be single threaded and/or and may not effectively use all the available CPU resources. rclone has been show in internal testing to fully saturate the client CPUs

Full documentation and source code for rclone can be found at rclone.org

curl https://rclone.org/install.sh | sudo bash

Start "rclone" to synchronize data

rclone sync /mnt/10.255.255.1 /mnt/10.254.255.1 --checkers 256 --transfers 256 -v --log-file=/tmp/sync.log

"/tmp" as the output directory for the log file prevents any log prints slowing down the copy process

Was this helpful?

How can we improve it?