pshell and Mediaflux

This page:


Overview

The pshell client can be used to send native service calls to the underlying mediaflux server.

This permits more powerful features to be exploited, at the cost of more complexity.

The first of these is that you have to give up the familiar idea of files and folders and instead use the concept of assets and namespaces.

An asset is metadata plus content (content is the file) and namespaces are a virtual hierarchy for assets.


Asset metadata

Standard metadata

Mediaflux automatically extracts metadata from files when they are uploaded to the server and become an asset.

The recommended approach is to pick a file format that Mediaflux natively supports in terms of content analysis/metadata extraction.

From http://www.arcitecta.com/Products/DataTypes

Geospatial file formats with geospatial metadata extraction include:

  • NITF, ECW, CIB, MrSID, DTED, LIDAR, CADRG and ERDAS IMG, JPG2000, GeoTIFF, GeoJPG, GeoPDF.

Other file formats with other metadata extraction include:

  • HDF, NetCDF, DICOM, BMP, GIF, JPG2000, JPEG, PCX, PNG, TIFF, DOC, DOCX, PDF, DPX.

To retrieve the default metadata for an asset, the following are equivalent:

asset.get :id "path=/projects/DMF-TEST/sample/10_2015-03-05_111059_DSC_0004.NEF"
asset.get :id 61080987

The structure (indenting) of the document displayed is important and indicates the XML path of the particular metadata item.

Most of the information (name, ctime, version, etc) is standard file metadata. Custom metadata (if any) appears under the meta element.

Custom metadata

Custom metadata can be used if the files to be ingested are not recognised by default, but requires a little more effort.

Suppose a custom metadata document template, called mytype, defines a string called location.

Here is how to set the metadata value of location to a value of "Perth" using pshell:

asset.set :id 25480 :meta < :mytype < :location "Perth" > >

Installation of the custom XML document mytype can only be done by Pawsey administrators.


Importing metadata

An automated mechanism of adding metadata to uploaded files may also be utilised.

This is invoked if uploads are performed using the command import instead of the usual put.

Import will look for pairs of files that are of the form filename and filename.meta and attempt to populate filename with metadata from filename.meta

For example, suppose we had two files:

myfile.jpg
myfile.jpg.meta


where myfile.jpg.meta contained the following:

[asset]
geoshape/point/latitude = "10.0"
geoshape/point/longitude = "20.0"
geoshape/point/elevation = "100.0"

[mytype]
name = "the name"
details = "some more information"


The asset section contains built-in "first class" asset metadata entries, whereas the mytype section contains the custom metadata.


We can then import the file using pshell, eg:

import myfile.jpg

Then examine the metadata:

file myfile.jpg

Which will display all the metadata, including:

    geoshape = None    { type=point datum=WGS84 }
        point = None
            latitude = None    { dd=10 }
                deg = 10
                min = 0
                sec = 0
            longitude = None    { dd=20 }
                deg = 20
                min = 0
                sec = 0
            elevation = 100.0
...
    meta = None    { stime=60812 }
      {mytype} = None    { id=2 }
          name = "the name"
          details = "some more information"

which can be queried, as described below.

Simple asset queries

A useful mediaflux command is asset.query which performs a database search on asset metadata items, which are specified via their XML path

For example, to locate all files based on .PNG filename extension anywhere in the entire (visible) namespace hierarchy use:

asset.query :where "name='*.PNG'"

The default metadata item name is the same XML element that appears in the standard asset metadata output of asset.get


To restrict the search to look only in /projects/myproject, use:

asset.query :where "namespace='/projects/myproject' and name='*.PNG'"


If you wish to search /projects/myproject and all folders underneath it, use:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'"


Searching for all assets that were created after a certain date would be done as follows:

asset.query :namespace "/projects/myproject" :where "ctime>='01-Jan-2017'"

Similar searches could be done using a geometric bounding box if your data has geospatial information attached.

There is no equivalent database search for namespaces in Mediaflux. This is why, in pshell, ls * will return only files (assets) and not folders (namespaces) that match the pattern.


Advanced asset queries

Searching for custom metadata

Suppose your assets had some custom metadata attached to them that was described by the XML document mytype.

If one of the elements in mytype was location (which is of type string), then here is how to search for a particular value:

asset.query :namespace "/projects/myproject" :where "xpath(mytype/location)='Perth'"


Getting more information from the results

By default, asset.query returns only the unique asset identifier (id)

To get the name of the asset as well, use:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-name

To get arbitrary pieces of metadata, for example the name and checksum, use:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-value :xpath name :xpath content/csum

By default, mediaflux will label all the metadata items you request as value=. This can be annoying when retrieving multiple items.

To associate a better name with the returned metadata item, use something like:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action get-value :xpath -ename checksum content/csum

The presence of the -ename checksum attribute tells mediaflux to return the result as checksum= instead of value=

Simple actions on the results

The query function also has the ability to use the results of a search in certain ways.

For example, to count all PNG files and then to sum their sizes you would run, respectively:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action count
asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action sum :xpath content/size

Here the action value (count or sum) determines what to do with the search result, and content/size is the XML location of the file-size that is stored in the metadata of every asset that matches the query.

Arbitrary actions on the results 

Mediaflux can also send the results of a query to another service call that takes asset identifiers as an input argument.

For example, suppose we wanted to move all PNG files to a new location /projects/myproject/png-files we would use:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action pipe :service -name asset.move < :namespace /projects/myproject/png-files >

Here our action is to pipe the results to another service call, in this case an asset.move, which is a service call that takes the required destination namespace argument in a shorthand XML document.

Naturally, similar constructions could be used to simply delete these files instead:

asset.query :namespace "/projects/myproject" :where "name='*.PNG'" :action pipe :service -name asset.destroy


Namespace manipulation

Recall that assets and namespaces are quite different objects in mediaflux. None of the above queries will help in managing namespaces.

To change the name of the folder1 namespace to folder2 use:

asset.namespace.rename :namespace /projects/myproject/folder1 :name folder2


To move the namespace folder1 to a new parent location use:

asset.namespace.move :namespace /projects/myproject/folder1 :to /projects/myproject/newparent


To delete a namespace entirely, use:

asset.namespace.destroy :namespace /projects/myproject/folder1