How to move files to your webserver using rsync

10 minutes reading time (1980 words)
codeI recently moved webhosting and found my new host didn't support FTP. It's a reasonable feature. After all plain FTP isn't secure, and sends logon credientials "in the clear", meaning that if anyone was able to intercept the traffic as it's being transmitted over the internet, the logon credentials could easily be read.
 
Many hosts offer alternatives - SFTP, or FTPS (explicit), and a range of other more secure options. My specific use case was around sending text and htm(l) files from a local Raspberry Pi, generated via Weather Display for Rasperry Pi (ConsoleWD). The files contain weather data and need to go up to my webserver on a regular basis - every minute or so.
 
Weather Display for Raspberry Pi (ConsoleWD) has it's own inbuilt FTP program. In the config.txt file it has a number of options, including specifying the port number (check with your host but by default is 22), using curl, and using a passive connection. It is worthwhile exploring these to see if they can work for you.
 
In my case I had no luck, so I decided to head down another route using rsync.
 
Rsync is a linux tool that is used for synchronizing files and folders between two machines. So in my use case, synchronising files in my ConsoleWD folder to a folder on my webserver. rsync can push - ie send / sync files to a remote server or pull - ie retreive / sync files from a remote server. It can be used for a number of purposes. For example the well known program rclone, uses rsync to provide backups, copying them to remote or local servers.
 
Rsync by default will check if a file date has been changed (along with other checking). So if it sees the same filename on both servers, but one has a more recent timestamp, then it will upload the file to the remote server. That's perfect for my use case where a file called clientraw.txt gets updated regularly and needs to go up to the remote server (my website).
 
The basic rsync command line is the following:
 
rsync options SOURCE DESTINATION
 
Options:
There are a number of options for rsync. We'll work through the ones we need for this example, but of you want to know more, check out https://linux.die.net/man/1/rsync
 
  • -a - use archive mode. It is a quick way of saying you want recursion and want to preserve almost everything (with -H  (does not preserve hardlinks) being a notable omission).
  • -P - Show progress while transferring, and retain partial file transfers
  • -e  - Allows you to choose an alternative remote shell program to use for communication between the local and remote copies of rsync. In our case we'll be using SSH
  • --include - allows you to specify a list of files or file mask to be used as a filter to include files
  • --exclude - allows you to specify a list of files or file mask to be used as a filter to exclude files
Source:
 The source is where you want to sync the files from. So in my case, it's my Raspberry Pi, since this is where the files Weather Display (ConsoleWD) saves the text files that contain the current weather conditions
 
Destination:
 This is where we want the synced files to go to. In my case it's my webserver, including the location that I want the files to be synced to. The destination string contains the user logon details.
 
This email address is being protected from spambots. You need JavaScript enabled to view it.:<path to storage location>
 
You can use the rsync command without the password and it will prompt you to enter the password, so when figuring out your source and destination strings, you can easily do this from the Raspberry Pi Terminal.
 
 

Authentication

Obviously using the terminal each time you want to sync updated data isn't going to be something you would want to do. We need something that is going to manage the authentication process for us, and this is where another linux function comes in - SSH.
I've posted before about SSH and we even used it to send a webcam image from the pi to our server. 
 
SSH relies on a key pair to work. A key and it's pair are generated on the local server, then the public part of that key is uploaded to the remote server. When an SSH connection is made, the key pairs are checked, and if they match, then the authentication is complete.
 
I'm not going to go through the details on how to create the key pair in this article. This is already covered in the authentication article we did for the webcam. If you already have an authentication pair setup, you can use those, or if you want to, you can create a new pair specifically for the data transfer. Just give it a unique name when setting the key pair up.
 
Once you have the key pair installed, we are ready to move on and build the rsync command to sync your files.
 

Building the rsync command line

Following the above, I can immediately throw in a few things. 
 
rsync -aP /home/pi/consolewdfiles/ This email address is being protected from spambots. You need JavaScript enabled to view it.:/home/.../public_html/pidata
 
  • For This email address is being protected from spambots. You need JavaScript enabled to view it., put in your own username for your site. In my case my host only allowed the cPanel user to be able to logon this way, so it was that user.
  • For /home/.../public_html/pidata put in the location of the directory you want the files to be synced to.  Easiest way to find this is to go to the filemanager on your sites control panel (eg cPanel), locate the directory and look at the top of the page - it will show the location you are at, and it probably looks something similar to the one I've provided.
 
The above is enough to run in the Pi's terminal and you will then get prompted for a password. Once you enter the password it will copy all the files and folders in your consolewdfiles folder up to your server. You can try this if you want to test the command line, and then delete the files on your server afterwards.
 
Again, you probably don't want to enter the password each time, so let's add in the ssh commands. If you are interested there is more info on SSH here. For our scenario this part looks like this:
 
ssh -i /home/pi/.ssh/pidata_rsa
 
so it says, open an ssh session using the specified key (-i), and then the location of the key you created to be used. Adding that into the rsync command line above we get:
 
rsync -aPe "ssh -i /home/pi/.ssh/pidata_rsa" /home/pi/consolewdfiles/ This email address is being protected from spambots. You need JavaScript enabled to view it.:/home/.../public_html/pidata
 

Using rsync --include and --exclude

Finally you probably don't want to copy the whole directory up. It will sort itself out next time as rsync will only copy up updated files, but it means a full copy of your consolewdfiles directory  sitting up on the server, which is a bit of a waste of space (backup anyone?).
 
About all that you need for your website will be any .txt , .htm, and .html files. This will get any data files to your remote server that your template can use, or any specifically created webpages you want to use.. So we can do that as follows:
 
--include '*.txt' --include '*.htm*' --exclude '*' --exclude '*/'
 
This line tells rsync to include any files with an extension of .txt, any files with an extension of .htm, any files with an extension of .html, and exclude any other files in the current directory, and exclude any files in any subdirectories. You should be able to midofy this to suit your needs.
 
Note that the execution order is important. Think of it as a filter being applied to a list of files. If you ran the exclude commands first you would remove all the files from the filter before searching for the files to include!
 
Putting this back into our rsync command we now get:
 
rsync -aPe "ssh -i /home/pi/.ssh/pidata_rsa" --include '*.txt' --include '*.htm*' --exclude '*' --exclude '*/' /home/pi/consolewdfiles/ This email address is being protected from spambots. You need JavaScript enabled to view it.:/home/.../public_html/pidata
 
If you run this command from the Pi Terminal, it will then send the .txt, .htm, and .html files up to the server. Run it again and it will only send the updated files up to the server.
 

Automating and tidying up.

To automate this, you can use a cron job. I also wrote about this in an article on getting the webcam images up to the server, so you can read further about how to do that here. I'm not a big fan of having long complicated command lines in my crontab, so I prefer to put everything into a bash script, that I can call from the crontab. It also allows me to put in comments into the script that will help me troubleshoot should I get stuck further down the track.
 
Create a file in your consolewdfiles folder called rsync_data.sh (for example), and add your rsync command to it. Mine looks similar to the following:
 
#!/bin/bash
#Next line sets logging detail
set -x
#rsync command to sync files
rsync -aPe "ssh -i /home/pi/.ssh/pidata_rsa" --include '*.txt' --include '*.htm*' --exclude '*' --exclude '*/' /home/pi/consolewdfiles/ This email address is being protected from spambots. You need JavaScript enabled to view it.:/home/.../public_html/pidata
 
Then save the file.
 
Go to the Raspberry Pi Terminal and type in:
crontab -e
 
Add the following to the bottom of the file:
* * * * * /bin/sh /home/pi/consolewdfiles/rsync_data.sh >> /home/pi/templogs/wddatacron.log 2>&1
 
The asterisks says run the following command every minute. Change this if you need to. There is more detail here.
  •  /bin/sh is the location of sh which will run the bash script. Usually this doesn't need changing
  • >> /home/pi/templogs/wddatacron.log 2>&1  creates a log file which is handy initially to see if there are any errors. You can remove this bit when you are happy with your results, and no longer need the logs
 
Ctrl-x to exit the crontab session then save as you go out. 
 
And that's it. You'll now have your files heading up to your website once a minute. These is still a bit of tidying that can be done. For example instead of listing all the conditions for the includes and excludes, you can load them into two files, and call the files from the rsync command itself. That would add complexity, but make the command line a bit shorter.
 
One of the potential issues with this approach is that if the clientraw.txt file is being updated at the same time as the rsync file is updated you may end up with an incomplete file being sent to the server. I haven't yet seen any issues with this, and I would expect that things would tidy themselves up the next time an update was done. Still there is a risk that the data may not populate correctly onto the website for a short period of time.
 
Chris, the forum administrator at weather-watch.com has come up with a solution for this. It involves setting up a "watch" directory, that when files are updated within it, the files get updated to the webserver in such a way that avoids the broekn file issue. You can read more about it here.
 
  

Supporting us and feedback

 
If you think this article was helpful, there are a number of ways you can thank us. Feel free to add a comment and say thanks in the comments section below. Also you are welcome to register on the site. The more members we have the more it encourages us to write these types of articles. Alternatively if you are into all that social media stuff, feel free to share the article using the buttons below, and tell others about the site. Thanks!!
 
 
 
 
 
 

Font size: +

Related Posts