Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Staging can be performed with a slurm job script that makes use of the data-transfer nodes (copy partition) to handle initial data originally stored in Acacia and to be staged into the working directory on /scratch. Note that the original data is packed within a .tar file, so the script also performs the "untarring" of the data:

900pxbashEmacs
Ui tabs
Ui tab
titlerclone
Code Block
languagebash
themeEmacs
titleListing 1.rclone stageFromAcaciaTar.sh
linenumberstrue
#!/bin/bash --login

#---------------
#About this script
#stageFromAcaciaTar.sh : copies a tar object from Acacia and extracts it in the destination path

#---------------
#Requested resources:
#SBATCH --account=[yourProjectName]
#SBATCH --job-name=stageTar.rclone
#SBATCH --partition=copy
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --time=[requiredTime]
#SBATCH --export=NONE

#-----------------
#Loading the required modules
module load rclone/<version> #This example is performed with the use of the rclone. 

#-----------------
#Defining variables that will handle the names related to your access, buckets and stored objects in Acacia
profileName=<profileNameGivenToYourProfileOfAccessToAcacia>
bucketName=<bucketInAcaciaContainingTheData>
prefixPath=<prefixPathInBucketUsedToOrginiseTheData>
fullPathInAcacia="${profileName}:${bucketName}/${prefixPath}" #Note the colon(:) when using rclone

#-----------------
#Name of the file to be transferred and auxiliary dir to temporarily place it
tarFileName=<nameOfTheTarFileContainingInitialData>
auxiliaryDirForTars="$MYSCRATCH/tars"
echo "Checking that the auxiliary directory exists"
if ! [ -d $auxiliaryDirForTars ]; then
   echo "Trying to create the auxiliary directory as it does not exist"
   mkdir -p $auxiliaryDirForTars; exitcode=$?
   if [ $exitcode -ne 0 ]; then
      echo "The auxiliary directory $auxiliaryDirForTars does not exist and can't be created"
      echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
      exit 1
   fi
fi

#-----------------
#Working directory in scratch for the supercomputing job
workingDir="$MYSCRATCH/<workingDirectoryForSupercomputingJob>"
echo "Checking that the working directory exists"
if ! [ -d $workingDir ]; then
   echo "Trying to create the working directory as it does not exist"
   mkdir -p $workingDir; exitcode=$?
   if [ $exitcode -ne 0 ]; then
      echo "The working directory $workingDir does not exist and can't be created"
      echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
      exit 1
   fi
fi

#-----------------
#Check if Acacia definitions make sense, and if the object to transfer exist
echo "Checking that the profile exists" rclone config show | grep "${profileName}" > /dev/null; exitcode=$?
if [ $exitcode -ne 0 ]; then
   echo "The given profileName=$profileName seems not to exist in the user configuration of rclone"
   echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
   exit 1
fi
echo "Checking that bucket exists and that you have read access"
rclone lsd "${profileName}:${bucketName}" > /dev/null; exitcode=$? #Note the colon(:) when using rclone
if [ $exitcode -ne 0 ]; then
   echo "The bucket name or the profile name may be wrong: ${profileName}:${bucketName}"
   echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
   exit 1
fi
echo "Checking if the file can be listed in Acacia"
listResult=$(rclone lsl "${fullPathInAcacia}/${tarFileName}")
if [ -z "$listResult" ]; then
   echo "Problems occurred during the listing of the file ${tarFileName}"
   echo "Check that the file exists in the fullPathInAcacia: ${fullPathInAcacia}/"
   echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
   exit 1
fi

#-----------------
#Perform the transfer of the tar file into the auxiliary directory and check for the transfer
echo "Performing the transfer ... "
srun rclone copy "${fullPathInAcacia}/${tarFileName}" "${auxiliaryDirForTars}/"; exitcode=$?
if [ $exitcode -ne 0 ]; then
   echo "Problems occurred during the transfer of file ${tarFileName}"
   echo "Check that the file exists in the fullPathInAcacia: ${fullPathInAcacia}/"
   echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
   exit 1
fi

#-----------------
#Perform untaring with desired options into the working directory
echo "Performing the untarring ... "
#tarOptions=( --strip-components 8 ) #Avoiding creation of some directories in the path
srun tar -xvzf "${auxiliaryDirForTars}/${tarFileName}" -C $workingDir "${tarOptions[@]}"; exitcode=$?
if [ $exitcode -ne 0 ]; then
   echo "Problems occurred during the untaring of file ${auxiliaryDirForTars}/${tarFileName}"
   echo "Exiting the script with non-zero code in order to inform job dependencies not to continue."
   exit 1
else
   echo "Removing the tar file as it has been successfully untarred"
   rm $auxiliaryDirForTars/$tarFileName #comment this line when debugging workflow
fi

#-----------------
## Final checks ....

#---------------
#Successfully finished
echo "Done"
exit 0


Note
titleClient support
  • Note the use of variables to store the names of directories, files, buckets, prefixes, objects etc.
  • Also note the several checks at the different parts of the script and the redirection to /dev/null in most of the commands used for checking correctness (as we are not interested in their output).
  • As messages from the mc client are too verbose when transferring files (even with the --quiet option), we make use of a redirection of the output messages to /dev/null when performing transfers with this client on scripts. For this reason we often find rclone a better choice for the use of clients in scripts.

...