File Transfer

Rsync and FTP, here we come.

Everybody knows FTP, but for the linux user, why not run a server? If you have more than one computer, or more than one user on your intranet, it makes sense to share files via ftp, at times. SCP, the encrypted secure copy program that uses ssh to authenticate, greatly reduces the allowable bandwidth, because of that encryption. With a large file, you could squeeze about 400 kb/s out of a 100mbit LAN connection. Over FTP, you are virtually unlimited, and get about 10-11 mb/s from that same LAN. Basically if your hard drive can handle it and our CPU's not playing with something else at the time, the link is pretty much saturated. (100mbit/(8bit/byte) = 12.5 mbyte) The reason I qualified that speed with "large file" is that with smaller files, you basically transfer it in one second, then the negotiations for requesting the next file begin and use up time, so the reported transfer time is decreased. For this reason, transferring a large number of small files is best accomplished with some sort of archive format. See documentation on the tar command for more information. I can tell you right now that of the archival formats, tar compressed with bz2 is about 80% of the size of gzip compression, which is about 10%-30% of the original. Zip has horrible compression ratios compared to those two.

So which FTP server should we run? There is the infamous tftp server, called by the inet daemon, and proftpd, which can be run as a standalone server daemon or called by inetd, which I used for quite some time. With proftpd, the configuration is somewhat complex, and there is the /etc/ftpusers file that must be examined; there is also /etc/proftpd.conf which contains the access permissions for every directory that you want to open to ftp users. Proftpd becomes slow, however, for handshaking stages of the connection, like the initial connection, and after a while it hangs when your FTP client requests a LIST of the cwd (current working directory). I began to look for another client. Pure-ftpd came up, and I started using it. There is no configuration. If you have a user called "ftp" then it uses that user's home directory as the directory for anonymous connections. All other users are chroot-ed to their home directory (cannot browse above it except by symlink), and to my experience, it is much more responsive than proftpd. Additionally, there is no configuration, and it is recommended to run it as a standalone daemon. Access permissions can be specified on the commandline, such as how many concurrent connections the server will allow, how many from the same user, etc. Watch out though, not all options are available by default. They must be compiled in, so check out the output of "./configure --help" when you go to build the thing.

That's about it for ftp; now what about rsync? rsync is used to synchronize directories and files via the network (even the loopback interface). It uses a crafty technique to only transfer the difference between files and works much better than archiving a large directory into a tarball, sending that, then unpacking it on the other side. rsync is as encrypted as the underlying connection. It can use a simple hashing algorithm that rsync will handle; and it can use ssh. Because the transfer time is greatly reduced anyway, ssh is a good choice because the encryption is better, and the authentication is better. For rsync, you have to set up "shares" just like in samba (which is a beast to configure, btw). These are like pseudonames for the directories that you want rsync to be able to deal with. In the config file, you also have to specify which users are allowed to connect. This can be a username from that remote system, or from the local system that is going to connect to it. Here is the rsync command that I use to synchronize my documents between computers:

rsync -e ssh -a ~pnguyen/Documents/ lilmax88@192.168.0.101:Documents

On my lan, I have the other box, 192.168.0.101 that is supposed to keep a copy of all of my documents, from this computer. Using ssh (192.168.0.101 is configured for pnguyen to connect to lilmax88 with only the public/private key pair and no passwords), everything is brought up to date. It does chew on the CPU for a minute or less while it figures out what needs to be transfered, but then it's all done and ready to go. On the remote system, :Documents means the share called Documents as configured in the rsync config file. This actually points to ~lilmax88/Documents on that system, but you cannot specify an absolute path.

Hit me up with any questions! As a Slackware user, I know my stuff!

Registered Linux User #370740 (http://counter.li.org)

No comments:

Facebook

Paul Nguyen's Facebook profile

Nerd Test

v1.0:
I am nerdier than 94% of all people. Are you a nerd? Click here to take the Nerd Test, get nerdy images and jokes, and talk on the nerd forum!
v2.0:
NerdTests.com says I'm an Uber Cool High Nerd.  Click here to take the Nerd Test, get nerdy images and jokes, and write on the nerd forum!

Bloggers' Rights

Bloggers' Rights at EFF