SHARK-FSW Data Transfer

Last modified by Kerwin Olfers on 2023/12/13 12:11

Prerequisites

Access to Shark

 A suitable destination

  • Your personal P-drive has limited storage capacity .
  • The J-drive on the other hand, offers more storage space, usually in /Public/ResearchData/  There you will find folder for the various institutions and sections. If you do not have access (including write permissions) for a folder there you can request one via the ISSC (Workgroup storage, 'Research data' bulk storage). To get access to a folder owned by a colleague or supervisor they need to submit the request. Also keep in mind, J-drive storage is relatively expensive at ~450 euro's per year per TB. These costs are billed towards your section. For very large data-sets and long-term archiving, perhaps consider external and/or Open Science oriented options (e.g. DANS) .  
  • OneDrive / Google Drive are not suitable options, as these are not secure.

(Optional) Access to the university sftp gateway

  • This is needed to be able to access the J-drive remotely, and will be required for transferring data using the Rclone or Rsync methods.
  • Access can be requested through the ISSC
  • To check whether you have access
    • Open any terminal (e.g. PowerShell or MobaXTerm on Window, or Terminal on Mac)
    • Run: sftp [ULCN_username]@sftpgw.leidenuniv.nl
    • You will be prompted for your ULCN password
    • If you have access you should be able connect and browse through the directories (e.g. using cd, ls, dir etcetera)

Understanding your data

  • When transferring / copying your data, especially if you have the intent or expectation that at some point you need to transfer it back to the cluster to resume analysis, it is very important that you understand the nature of your data.
  • For instance, if your data includes: sparse data, symlinks that are important for the functioning of your analysis, very long filenames, many levels of nested folders, this may need to be accounted for. Different types of transfer programs / protocols will handle the previous with varying success and may need specific configuration. When in doubt, please contact labsupport.

Choose a Transfer Method

There are various ways to transfer data between SHARK and the FSW network. Which works best will depend on: the OS you are using (Windows, Mac or Linux), if you are working from the FSW network (i.e. on a workplace desktop or laptop) or from outside (e.g. from home), and perhaps most importantly the size, number and types of files you want to transfer. Below are some suggestions depending on various situations

If you need to transfer:

  • A limited amount and size of files (< 1000 files and < 1GB total) AND are working from a pc or laptop with access to the J-drive (e.g. while at the FSW or connected to eduvpn):
    • (Windows) Use the file-transfer options of MobaxTerm
    • (Windows/Mac/Linux) Use the directory mount options of X2Go if you prefer a GUI and/or are familliar with X2Go. Otherwise, use an sftp client (like FileZilla) or sftp via the terminal and use that to log on to SHARK.
    • (Windows) Use WinSCP if you prefer a GUI and/or are already familliar with WinSCP.
  • Large datasets (many and/or large files), use the SFTP gateway, with either of the below methods:
    • (Any OS) Use RClone method if you have no specific requirements for the transfer.
    • (Any OS) Use Rsync method if you are already familiar with Rsync and/or you need more control (e.g. with regards to sparse files or symlinks).

Generally, for large transfers of data to be archived with no special considerations, we recommend the RClone method.

Check SFTP Folders & Rights

To transfer files via the SFT gateway, you need to have requested requested gateway rights, to check if you have those and to get the correct path to the destination folder, you can use the steps below (using MobaXterm or any terminal).

Using MobaXterm (Windows)

MobaXterm has built in graphical sftp support, which allows you to easily navigate the folders and get the right path.

  • Open MobaXterm
  • Sessions > New Session > SFTP
  • remote host: sftpgw.leidenuniv.nl   user: ULCN name
  • Press OK
  • If the SFTP session doesn't start right away, you can start it from the left menu
  • Navigate to the intended directory
  • Please note that the ResearchData directory can be accessed by going 'up' from the home directory to get to /winshare/Public/ResearchData/
  • Try to create an empty text file in the destination folder to check if you have write permissions.
    • If it does not work, i.e. you a get permission denied error, see if you can create a file there from a university pc/laptop through windows file explorer. If this also does not work, it means you do not have the correct permissions for this folder and the folder owner will need to contact ISSC to grant you permission. If you can create a file through windows file explorer, but nót through the sftp connection, there is a misconfiguration in the folder, please contact labsupport@fsw.leidenuniv.nl to have this resolved.
  • Copy the entire file path to somewhere you can retrieve it later (e.g. a notepad file). Note, if the path contains spaces, you need to put the whole path in between double quotes. E.g:
    "/winshare/Public/ResearchData/FSW/SOLO Labsupport"

1699625546989-140.gif
(click to enlarge)

Using Terminal (Linux / MaOS)

If you are not using MobaXterm you can also get the right path from any terminal.

  • Open a terminal (Optional: Log in to SHARK first).
  • Determine the path to the destination directory on the J-drive (preferably and empty folder). You can do this by logging into the sftp gateway:
    • sftp [ULCN_username]@sftpgw.leidenuniv.nl
    • and then using cd, dir and ls to navigate to the desired locations
    • once there, use pwd to get the full path
  • Most often your folder will be located in ResearchData, e.g.:
    • for ResearchData FSW: cd /winshare/Public/ResearchData/FSW
  • To navigate to directories that have space in their name, you can put the directory in double quotation marks e.g:
    • ResearchData FSW: cd "/winshare/Public/ResearchData/FSW/SOLO Labsupport"
    • It's helpful to copy this to a notepad / Word file etcetera, so you can use it later on.
  • Once you have the correct path, confirm that you have write permissions there, with the sftp connection open:
    • ! touch dummy.txt  (creates an empty text file on SHARK)
    • put dummy.txt  (copies the text file to the destination folder on J-Drive)
    • If it does not work, i.e. you a get permission denied error, see if you can create a file there from a university pc/laptop through windows file explorer. If this also does not work, it means you do not have the correct permissions for this folder and the folder owner will need to contact ISSC to grant you permission. If you can create a file through windows file explorer, but nót through the sftp connection, there is a misconfiguration in the folder, please contact labsupport@fsw.leidenuniv.nl to have this resolved

1699626257087-378.gif
(click to enlarge)

Tips:

  • Switching between Windows paths and Linux paths notation can be confusing. However tools like: https://universalpathconverter.com/ can allow you to quickly convert between them.

Transfer using RClone

Rclone is one module available on the cluster that we can use to transfer the data.

If you haven't done so yet, first check Prerequisites and STFP folder rights.

Configure Rclone

Before we can use RClone, we need to configure it to work with the university sftp gateway

  • Connect to SHARK, e.g. using MobaXterm on Windows, Terminal on Mac or X2Go on either. In case of the latter, open a terminal inside the X2Go environment.
  • Make a new configuration using the following command, replace the highlighted parts with your ulcn info and then run it:
    • rclone config create jdrive sftp host=sftpgw.leidenuniv.nl user=[ULCN_username]  pass=[ULCN_password] shell_type=unix
    • Note that jdrive is just a name we assign to this configuration. You can pick any other name and change the commands in the rest of the steps accordingly.
  • Test whether you can connect with rclone using:
    • rclone lsd jdrive:/
    • If all went well you should see something like
      1689171356173-103.png
  • Now try again but with your destination directory, I recommend having a simple test file in the target directory (e.g. test.txt) and using the ls flag to list directories and files, instead of lsd which lists only directories. E.g.:
    • rclone ls jdrive:/"winshare/Public/ResearchData/FSW/SOLO Labsupport/"
    • Keep in mind that spaces in the directory path should be handled with quotes around the path

Test Copy Command

  • To start, it's best to try to copy one small test file.
  • On SHARK, create a folder e.g. test/ and inside create a simple text document. E.g.
    • mkdir ./test/
    • echo "hello world" > ./test/myfile.txt
  • We can copy the file in the test directory to the target directory using:
    • rclone copy [SOURCE] [DESTINATION]
  • However, we can expand on this a bit to create a log-file that will allow us to later check what has happened. In the example below the logfile is called rclone_log_[current date and time].txt (i.e. unique for each transfer command), it is stored in the current terminal directory.
    • rclone copy [SOURCE] [DESTINATION]  --log-level=INFO --log-file="rclone_log_$(date +"%Y_%m_%d_T%H%M%S").log"
  • Together, the copy command might look like
    • rclone copy ./test/ jdrive:/"winshare/Public/ResearchData/FSW/SOLO Labsupport" --log-level=INFO --log-file="rclone_log_$(date +"%Y_%m_%d_T%H%M%S").log"
  • Run the command
  • Check the target directory on the J-drive to see if the file has indeed been copied.
  • Check the logfile on SHARK it should look something like:
    1697814646785-407.png
    And indicate which files have been transferred (or skipped), and how long it took.
  • NOTE: An alternative to the copy command is the sync command, the main difference is that the sync command also deletes files in the destination directory if they are not present in the source

Troubleshooting

  • No write permission: if you see any errors mentioning lack of write / edit permissions, it could be that 1) You did not specify the correct destination folder, please note that you only have (write) access to folders on the J-drive that you have explicitly received access to (i.e. by request to the ISSC from you or the folder owner). 2) It is possible that the destination folder is not configured properly (by ISSC). Follow the steps in "Check Destination Folder", to see you can write a file here manually. If not, contact labsupport@fsw.leidenuniv.nl
  • Log file missing: note that the log file is written in the current terminal directory from where you are running the copy command, and not the source directory where your to-be-transferred data is located nor the destination directory on the J-drive.

Final Steps

If all went well, you are now almost ready to transfer your actual data. Some final checks:

  • Make sure you understand your data, and if you have sparse files or symlinks that need to be kept intact, please configure the copy command accordingly (e.g. see here https://rclone.org/local/).
  • Adapt the copy command for the correct SOURCE directory, i.e. the files/directories you wish to transfer.
  • If your data is relatively large and/or you expect to turn of your laptop/computer or switch networks at some point during the transfer it is advised to run the transfer command in a Screen. Processes running in Screen will continue to run when their window is not visible even if you get disconnected.
    • In the SHARK terminal run:
      screen -S [SOME-NAME]
      example
      screen -S datatransfer
    • Start your data transfer command
    • Now 'detach' the screen using CTRL+a+d
    • You can now safely disconnect from SHARK
    • Next time you wish to check up on the transfer, you can 're-attach' the screen by running:
      screen -r
    • If you have multiple screens, you can choose which one to connect by appending its name:
      screen -r datatransfer
  • Once the transfer has completed, and you have checked the logs and the files on the J-drive, don't forget to remove the data from SHARK (if you no longer need it there)!

Transfer with Rsync

Preparing Rsync

You have now opened a local shell terminal. Most Linux terminal commands you know will work here in the same way. Conveniently, MobaxTerm also has a built in Linux file syncing tool called Rsync. To make this work with Shark we need to provide an Rsync command

  • Paste the following command into notepad or Word (or any text editor of your choice). And fill in the yellow highlighted parts. Note that this is all one line!

rsync -av -P --log-file="/drives/j/ResearchData/FSW/[folderpath to be synced to]/rsync_log-$(date +"%Y-%m-%d".txt)" -e 'ssh -o "ProxyCommand ssh [username]@145.88.35.10 -W %h:%p"' [username]@145.88.76.219:/exports/fsw/[source folder to be synced from] "/drives/j/ResearchData/FSW/[folderpath to be synced to]"

Sometimes the above command will not work. This often happens when on Mac and/or when using different language setting or third-party terminal software, which requires a different style of double and single quotation symbols. In such cases, please manually replace all the single and double quotations in the command and try again.

  • This command takes all the files and subfolders from the source directory on shark and syncs them to target directory on the J drive. Files that are already present on the J drive and have the same filesize and modification time will not be overwritten. This means that if your transfer gets interrupted (for instance because Citrix is closed, you lose network connection etc), re-running this command should pick up the transfer right where you left off (without having to start from the beginning).
  • Replace all the yellow highlighted [..] parts (also remove the [ ] symbols!)
    • [username]: should be your username on the cluster (e.g. for me it is kjfolfers)
    • [source folder to be synced from]: the path to the main folder you want to sync the files from. Rsync with the -av parameter works recursively, so subfolder will also be synced. (e.g for me that could be kjfolfers/transfertest). In Rsync there is a difference between ending the source directory with or without a slash. If you omit the slash, the source directory folder itself (in this case transfertest) will also be created in the destination folder. If you include the slash, the contents of the folder will be put directly into the destination folder.
    • [folder to be synced to]: the path to the folder on the J drive where you want to sync to. You need to have write access to this folder. You could check that by opening file explorer navigating to the folder and making a new text file there. (e.g. for me such a folder could be Psychologie Cognitieve/olferskjf/sharktest/). Importantthe folder paths used in the command has forward slashes "/" for folder separation, while windows natively (e.g. in file explorer) used back-slashes "\", this means you can't just copy the file path from explorer. Additionally, for older versions of rsync/MobaXterm if you have spaces in your folder names, you need to prepend them with a backslash. For example if your file-path in explorer looks like :
      "drives\ResearchData\FSW\Psychologie\My test folder" 
      the corrected version would be:
      "drives/ResearchData/FSW/Psychologie/My\ test\ folder/"
      If you receive errors regarding file paths, try including or excluding the backslash before spaces.
  • As an example, the full command could look like this (note the command should be one line, without enters):

rsync -av -P -log-file="/drives/j/ResearchData/FSW/Psychologie\ Cognitieve/olferskjf/sharktest/rsync_log-$(date +"%Y-%m-%d".txt)" -e 'ssh -o "ProxyCommand ssh kjfolfers@145.88.35.10 -W %h:%p"' kjfolfers@145.88.76.219:/exports/fsw/kjfolfers/transfertest/dir10MB "/drives/j/ResearchData/FSW/Psychologie\ Cognitieve/olferskjf/sharktest/"

  • Rsync can be run with many configurations. The one used above is pretty much the default setting for archiving (that is copying all folders and files). However, in some cases you may want to use different parameters to better suit your needs. For instance symbolic and hard-linked files and folders may need special treatment that the regular -av parameter does not give. Please consult the help pages for rsync and/or contact SOLO or cluster admins for help if needed.

Running Rsync

The first time you use rsync, I highly recommend starting with a small test source folder with some files to see if everything works as expected!

  • [OPTIONAL] By including the -n parameter (e.g. rsync -av -n -e ...) to the command, you can do a "dry run" this will show you a list of which files would be synced, without actually doing anything yet.
    1627991085389-383.png
  • [OPTIONAL] By including the -P parameter, you will get information about the transfer speeds and the file progress (this will make the output in the terminal a bit more complex though).
  • Copy and paste the command into the MobaxTerm console (use right mouseclick or the middleclick) and run it.
  • The first time you run this command, you will be asked to fill in your password for the Shark cluster twice.
  • Mobaxterm will ask you if you wish to save your password for later, if you do so, you will not have to fill it in the next time
  • If you use the rsync command as specified in the instruction, a log file (rsync_log.txt) will also be written in the specified target folder. This is especially helpful in case of very long transfers, when you may not notice the sync has ended (or the terminal / SURKO has already been closed). You can open the log file in any text editor to inspect the progress.

Optional: Keep Alive

If you are experiencing trouble with the computer or Citrix Desktop closing due to a timeout. You could try the following:

  • After starting the rsync command, open an additional local session in mobaxterm
  • Run the following command in the terminal:

while true; do echo -e `date` "\r\n" >> "/drives/p/Desktop/keepalive.txt"; sleep 60; done

  • The above command runs every minute to store a timestamp in a text file. It can help to keep the Citrix desktop alive. The transfer should even continue if the Citrix desktop is closed (if prompted confirm that you want the session to remove active). Virtual desktops may be still be automatically shut-down after some time, but running the rsync command again should pick up the transfer where it was right before shutdown.

Copying with WinScp

While this method may be a bit more user-friendly, given the graphical user interface, keep in mind that transfers with WinSCP will stop once you shut down your local FSW computer (or when using Citrix, after a certain period of time). After which, you will manually need to restart the transfer again, and it might be hard to keep track of which folders and files have already been copied succesfully.

  • Log in on your FSW computer (physically or via Remote Desktop Protocol or via the Citrix Remote environment).
  • Make sure WinSCP is installed, if not, install it from the Software Center or download and unzip the portable (non-installer) version here . If you are using the Citrix environment, you can start the WinSCP application directly (you do not need the full desktop view).
  • Start WinSCP
  • Select "New Site"
  • Fill in the session information:
    • File Protocol: SFTP
    • Host Name: res-hpc-lo02.researchlumc.nl  (or res-hpc-lo04.researchlumc.nl) Note: if neither work, try the ip addres: 145.88.76.217 or 145.88.76.219
    • Port number: 22
    • User name / Password: you SHARK account details.

If you are using the Citrix Remote environment some extra steps are required:

  • Open the advanced menu
  • Go to Connection -> Tunnel
  • Select "Connect through SSH-tunnel
  • Fill in:
    • Hostname: 145.88.35.10
    • Port number: 22
    • User name / Password: your Shark details
  • Click OK
  • If you want you can click "save" the save these settings for future sessions.
  • Then click "Login"

1620381931803-835.png

  • When asked to approve a key or fingerprint, press OK.
  • You will now see your FSW folders and files in the left window and your SHARK folders and files in the right window.
  • The reach the various FSW folder (e.g. the J or P Drive), use the dropdown menu (or for SHARK, the menu on the right side):

image-20210507120136-1.png

  • To transfer files you can now: drag & drop them from the one of the windows to the other, or, select them, right-mouse click and press download, they will then be copied to the currently open folder in the opposite window.

Data Migration ALICE

For migrating your project from SHARK to ALICE, please see the manual here.

Getting MobaXTerm

  • If you don't have MobaxTerm yet, download the portable edition version of MobaxTerm (https://mobaxterm.mobatek.net/download-home-edition.html) and unzip it, e.g. to a folder on the desktop
  • Start the MobaxTerm client from the folder by double-clicking the .exe file in the extracted folder.
  • If it is the first time you started MobaxTerm, go to Settings > Configuration >General.
    • For Persistant home directory, enter:  _MobaFolder_
    • For Persistant root, enter: _MobaFolder_\slash
    • Press OK at the bottom and restart MobaxTerm if asked. You can ignore any warnings about network drive speed
      .1627987731297-585.png
  • Start a new local session in MobaxTerm, using the "Start local terminal" button on the home screen, or go the "Session" in the top left then select "Shell" and press OK.
Tags:
   

Topics

solo
XWiki 14.10.13
contact@xwiki.com