pshell and Mediaflux
Overview
The pshell client can be used to send native service calls to the underlying mediaflux server.
This permits more powerful features to be exploited, at the cost of more complexity.
The first of these is that you have to give up the familiar idea of files and folders and instead use the concept of assets and namespaces.
An asset is metadata plus content (content is the file) and namespaces are a virtual hierarchy for assets.
Asset metadata
Standard metadata
Mediaflux automatically extracts metadata from files when they are uploaded to the server and become an asset.
The recommended approach is to pick a file format that Mediaflux natively supports in terms of content analysis/metadata extraction.
From http://www.arcitecta.com/Products/DataTypes
Geospatial file formats with geospatial metadata extraction include:
- NITF, ECW, CIB, MrSID, DTED, LIDAR, CADRG and ERDAS IMG, JPG2000, GeoTIFF, GeoJPG, GeoPDF.
Other file formats with other metadata extraction include:
- HDF, NetCDF, DICOM, BMP, GIF, JPG2000, JPEG, PCX, PNG, TIFF, DOC, DOCX, PDF, DPX.
To retrieve the default metadata for an asset, the following are equivalent:
asset.get :id "path=/projects/DMF-TEST/sample/10_2015-03-05_111059_DSC_0004.NEF" asset.get :id 61080987
The structure (indenting) of the document displayed is important and indicates the XML path of the particular metadata item.
Most of the information (name, ctime, version, etc) is standard file metadata. Custom metadata (if any) appears under the meta element.
Custom metadata
Custom metadata can be used if the files to be ingested are not recognised by default, but requires a little more effort.
Suppose a custom metadata document template, called mytype, defines a string called location.
Here is how to set the metadata value of location to a value of "Perth" using pshell:
asset.set :id 25480 :meta < :mytype < :location "Perth" > >
Installation of the custom XML document mytype can only be done by Pawsey administrators.
Importing metadata
An automated mechanism of adding metadata to uploaded files may also be utilised.
This is invoked if uploads are performed using the command import instead of the usual put.
Import will look for pairs of files that are of the form filename and filename.meta and attempt to populate filename with metadata from filename.meta
For example, suppose we had two files:
myfile.jpg myfile.jpg.meta
where myfile.jpg.meta contained the following:
[asset] geoshape/point/latitude = "10.0" geoshape/point/longitude = "20.0" geoshape/point/elevation = "100.0" [mytype] name = "the name" details = "some more information"
The asset section contains built-in "first class" asset metadata entries, whereas the mytype section contains the custom metadata.
We can then import the file using pshell, eg:
import myfile.jpg
Then examine the metadata:
file myfile.jpg
Which will display all the metadata, including:
geoshape = None { type=point datum=WGS84 } point = None latitude = None { dd=10 } deg = 10 min = 0 sec = 0 longitude = None { dd=20 } deg = 20 min = 0 sec = 0 elevation = 100.0 ... meta = None { stime=60812 } {mytype} = None { id=2 } name = "the name" details = "some more information"
which can be queried, as described below.
Simple asset queries
A useful mediaflux command is asset.query which performs a database search on asset metadata items, which are specified via their XML path
For example, to locate all files based on .PNG filename extension anywhere in the entire (visible) namespace hierarchy use:
asset.query :where "name='*.PNG'"
The default metadata item name is the same XML element that appears in the standard asset metadata output of asset.get
To restrict the search to look only in /projects/myproject, use:
asset.query :where "namespace='/projects/myproject' and name='*.PNG'"
If you wish to search /projects/myproject and all folders underneath it, use:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'"
Searching for all assets that were created after a certain date would be done as follows:
asset.query :namespace "/projects/myproject" :where "ctime>='01-Jan-2017'"
Similar searches could be done using a geometric bounding box if your data has geospatial information attached.
There is no equivalent database search for namespaces in Mediaflux. This is why, in pshell, ls * will return only files (assets) and not folders (namespaces) that match the pattern.
Advanced asset queries
Searching for custom metadata
Suppose your assets had some custom metadata attached to them that was described by the XML document mytype.
If one of the elements in mytype was location (which is of type string), then here is how to search for a particular value:
asset.query :namespace "/projects/myproject" :where "xpath(mytype/location)='Perth'"
Getting more information from the results
By default, asset.query returns only the unique asset identifier (id)
To get the name of the asset as well, use:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-name
To get arbitrary pieces of metadata, for example the name and checksum, use:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-value :xpath name :xpath content/csum
By default, mediaflux will label all the metadata items you request as value=. This can be annoying when retrieving multiple items.
To associate a better name with the returned metadata item, use something like:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-value :xpath -ename checksum content/csum
The presence of the -ename checksum attribute tells mediaflux to return the result as checksum= instead of value=
Simple actions on the results
The query function also has the ability to use the results of a search in certain ways.
For example, to count all PNG files and then to sum their sizes you would run, respectively:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action count asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action sum :xpath content/size
Here the action value (count or sum) determines what to do with the search result, and content/size is the XML location of the file-size that is stored in the metadata of every asset that matches the query.
Arbitrary actions on the results
Mediaflux can also send the results of a query to another service call that takes asset identifiers as an input argument.
For example, suppose we wanted to move all PNG files to a new location /projects/myproject/png-files we would use:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action pipe :service -name asset.move < :namespace /projects/myproject/png-files >
Here our action is to pipe the results to another service call, in this case an asset.move, which is a service call that takes the required destination namespace argument in a shorthand XML document.
Naturally, similar constructions could be used to simply delete these files instead:
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action pipe :service -name asset.destroy
Namespace manipulation
Recall that assets and namespaces are quite different objects in mediaflux. None of the above queries will help in managing namespaces.
To change the name of the folder1 namespace to folder2 use:
asset.namespace.rename :namespace /projects/myproject/folder1 :name folder2
To move the namespace folder1 to a new parent location use:
asset.namespace.move :namespace /projects/myproject/folder1 :to /projects/myproject/newparent
To delete a namespace entirely, use:
asset.namespace.destroy :namespace /projects/myproject/folder1