Deleting Large Numbers of Files on Lustre Filesystems
To delete a large volume of files on /scratch
or /software
use munlink
, a Lustre-specific command that will simply delete without performing a stat()
operation.
Deletion of files is a permanent action
Deletion of files in Linux system is a non-recoverable action. Take care before executing any deletion command.
rm may overload the metadata server when deleting a large number of files
Using the standard Linux command rm
to delete multiple files on a Lustre filesystem is not recommended. The rm
command will generate a stat()
operation on for each file it removes, meaning all of the attributes of a file (filetype, owner, permission, modification time, etc.) will be returned from the metadata server. A large number of stat()
operations can place an increased load on the metadata server, resulting in lower performance and instabilities with the filesystem.
Below is an example of our recommended two-step approach:
0. Set-up step: Open an interactive session on the copy
partition or prepare a script to be submitted to the copy
partition and make use of the commands described below.
I. First step: use the munlink
command to delete all the files and soft links within a directory and its subdirectories (munlink
deletes the files and links previously found by the find
command):
$ find -P ./processor0 -type f -print0 -o -type l -print0 | xargs -0 munlink
Here is an overview of each step in that command:
find
This command will search the indicated directory (and subdirectories within). The syntax defines a search for files and soft links.-P
This option restricts the search within the indicated directory tree and forces NO dereference of symbolic links. This warranties that the find command will not look for files within the links../processor0
This argument is the directory on which the search (and deletion) will be performed.-type f -o -type l
These options indicate that the find command will search for anything that is a file (-type f
) or (-o
) a soft link (-type l
, this is the lower letterl
). As indicated with the-P
option above, the links are not followed, so only the links will be removed but the object to where they linked are not.-print0
This option indicates the format of the result of the "find
" command. This particular format is able to catch strange file names, and ensures that they are readable for the following command (xargs
) which has been concatenated with the pipe. Note that two-print0
indications are needed, one per "side" of the "or" option indicated above.The pipe command ( represented by a sinngle pipe line:
|
)
This command concatenates two commands. This makes the output of the previous command(find
) to serve as input to the following command (xargs
).xargs -0
xargs
will then convert the received list of files, line by line, into an argument for whatever command is specified to it (munlink
in this case). The-0
flag is related to the format of the listed files; if you use-print0
in thefind
command you must use-0
in thexargs
command.munlink
This command deletes each file and soft link in the list without overloading the metadata server. In this case, the list is the one received byxargs
.
II. Second step: remove the empty directories and subdirectories in the tree. Once all of the files and soft links are deleted, you can remove the empty directories with a similar command:
$ find -P ./processor0 -type d -empty -delete
Again, the find
command will search the directory processor0
and all subdirectories for any empty directories (-type d -empty
) and delete them. The -delete
action deletes the empty directories that have been found. The -delete
option uses the -depth
option implicitly (so the -depth
option does not need to be explicitly given to this find
command). The -depth
option (used implicitly here) instructs process of each directory's contents before the directory itself, then the most distant branches in the directory tree will be processed first.
Use the given order of the options
Flags passed to find
are evaluated as an expression, so if you pass -delete
before -type d -empty
, find
will attempt to delete EVERYTHING below your starting directory and not only the empty directories, as the rest of the options are interpreted afterwards.
Always use munlink to delete simulation output
Always use munlink
to delete directories with a large number of files. Doing so will not only help keep a stable filesystem, but also run faster than using the standard rm -rf
command.