Overview
The pshell client can be used to send native service calls to the underlying mediaflux server.
This permits more powerful features to be exploited, at the cost of more complexity.
The first of these is that you have to give up the familiar idea of files and folders and instead use the concept of assets and namespaces.
An asset is metadata plus content (content is the file) and namespaces are a virtual hierarchy of assets.
Asset metadata
Standard metadata
Mediaflux automatically extracts metadata from files when they are uploaded to the server and become an asset.
The recommended approach is to pick a file format that Mediaflux natively supports in terms of content analysis/metadata extraction.
From http://www.arcitecta.com/Products/DataTypes
Geospatial file formats with geospatial metadata extraction include:
- NITF, ECW, CIB, MrSID, DTED, LIDAR, CADRG and ERDAS IMG, JPG2000, GeoTIFF, GeoJPG, GeoPDF.
Other file formats with other metadata extraction include:
- HDF, NetCDF, DICOM, BMP, GIF, JPG2000, JPEG, PCX, PNG, TIFF, DOC, DOCX, PDF, DPX.
To retrieve the default metadata for an asset, the following are equivalent:
asset.get :id "path=/projects/DMF-TEST/sample/10_2015-03-05_111059_DSC_0004.NEF" asset.get :id 61080987
The structure (indenting) of the document displayed is important and indicates the XML path of the particular metadata item.
Most of the information (name, ctime, version, etc) is standard file metadata. Custom metadata (if any) appears under the meta element.
Custom metadata
Custom metadata can be used if the files to be ingested are unsupported, but requires a little more effort.
Suppose you wished to add some custom metadata, described by the XML document mytype.
Here is how to set the value of the a child element in mytype called location:
asset.set :id 25480 :meta < :mytype < :location "Perth" > >
Installation of the custom XML document mytype can only be done by Pawsey administrators.
Importing metadata
An automated mechanism of adding metadata to uploaded files may also be utilised.
This is invoked if uploads are performed using the command import instead of the usual put.
Import will look for pairs of files that are of the form filename and filename.meta and attempt to populate filename with metadata from filename.meta
For example, suppose we had two files:
myfile.jpg myfile.jpg.meta
where myfile.jpg.meta contained the following:
[geoshape/point] latitude = "10.0" longitude = "20.0" elevation = "100.0" [mytype] name = "the name" details = "some more information"
We can then import the file using pshell, eg:
import myfile.jpg
Then examine the metadata:
file myfile.jpg
Which will display all the metadata, including:
geoshape = None { type=point datum=WGS84 } point = None latitude = None { dd=10 } deg = 10 min = 0 sec = 0 longitude = None { dd=20 } deg = 20 min = 0 sec = 0 elevation = 100.0 ... meta = None { stime=60812 } {mytype} = None { id=2 } name = "the name" details = "some more information"
which can be queried, as described below.
Simple asset queries
A useful mediaflux command is asset.query which performs a database search on asset metadata items, which are specified via their XML path
For example, to locate all files based on .PNG filename extension anywhere in the entire (visible) namespace hierarchy use:
asset.query :where "name='*.PNG'"
The default metadata item name is the same XML element that appears in the standard asset metadata output of asset.get
To restrict the search to look only in /projects/Data Team, use:
asset.query :where "namespace='/projects/Data Team' and name='*.PNG'"
To search in /projects/Data Team and all folders underneath it, use:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'"
Searching for assets that were created after a certain date would be done as follows:
asset.query :where "namespace>='/projects/Data Team' and ctime>='01-Jan-2017'"
Similar searches could be done using a geometric bounding box if your data has geospatial information attached.
There is no equivalent database search for namespaces in Mediaflux. This is why, in pshell, ls * will return only files (assets) and not folders (namespaces) that match the pattern.
Advanced asset queries
Searching for custom metadata
Suppose your assets had some custom metadata attached to them that was described by the XML document mytype.
If one of the elements in mytype was location (which is of type string), then here is how to search for a particular value:
asset.query :where "xpath(mytype/location) = 'Perth' and namespace>='/projects/myproject'"
Getting more information from the results
By default, asset.query returns only the unique asset identifier (id)
To get the name of the asset as well, use:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-name
To get arbitrary pieces of metadata, for example the name and checksum, use:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-value :xpath name :xpath content/csum
By default, mediaflux will label all the metadata items you request as value=. This can be annoying when retrieving multiple items.
To associate a better name with the returned metadata item, use something like:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-value :xpath -ename checksum content/csum
The presence of the -ename checksum attribute tells mediaflux to return the result as checksum= instead of value=
Simple actions on the results
The query function also has the ability to use the results of a search in certain ways.
For example, to count all PNG files stored in the entire Data Team project or to sum their sizes you would run, respectively:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action count asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action sum :xpath content/size
Here the action value (count or sum) determines what to do with the search result, and content/size is the XML location of the file-size that is stored in the metadata of every asset that matches the query.
Arbitrary actions on the results
Mediaflux can also send the results of a query to another service call that takes asset identifiers as an input argument.
For example, suppose we wanted to move all PNG files to a new location /projects/Data Team/png-files we would use:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action pipe :service -name asset.move < :namespace /projects/Data Team/png-files >
Here our action is to pipe the results to another service call, in this case an asset.move, which is a service call that takes the required destination namespace argument in a shorthand XML document.
Naturally, similar constructions could be used to simply delete these files instead:
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action pipe :service -name asset.destroy
Namespace manipulation
Recall that assets and namespaces are quite different objects in mediaflux. None of the above queries will help in managing namespaces.
To change the name of the sean namespace to sean2 use:
asset.namespace.rename :namespace /projects/Data Team/sean :name sean2
To move the namespace sean to a new parent location use:
asset.namespace.move :namespace /projects/Data Team/sean :to /projects/Data Team/admins
To delete a namespace entirely, use:
asset.namespace.destroy :namespace /projects/Data Team/sean