Skip to end of banner
Go to start of banner

pshell and Mediaflux

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 77 Next »

This page:


Overview

The pshell client can be used to send native service calls to the underlying mediaflux server.

This permits more powerful features to be exploited, at the cost of more complexity.

The first of these is that you have to give up the familiar idea of files and folders and instead use the concept of assets and namespaces.

An asset is metadata plus content (content is the file) and namespaces are a virtual hierarchy of assets.


Authentication and Delegation

If you are doing a lot of scripting connecting to a Mediaflux remote, it is undesirable to populate lots of plain text files with your login details. This is where delegation becomes useful. Delegate credentials are binary tokens that reside in the home directory of the user that creates them. These credentials automatically perform a login on your behalf when interacting with Pawsey data storage. Here's an example of creating a delegate credential that lasts for approximately one month:

pawsey:/projects>delegate 30
Delegating until: 01-Jun-2016 14:13:15
 
pawsey:/projects>exit

From now on, until the delegate expires, there is no need to login to data storage on the machine where this was run. Any pshell sessions that are run will automatically use the delegate to login. 

Finally, to destroy all delegate credentials, use:

pawsey:/projects>delegate off
 Delegate credentials removed.

Renewing the delegate token before it expires

You cannot create a new token while you are using an old token. So if you want to renew a token before it expires, you'll first need to delete the old token (use the "delegate off" command) and then log into the system using your username and password (use the "login" command). Once logged in like this, you can create a new delegate token as above.


Asset metadata

Standard metadata

Mediaflux automatically extracts metadata from files when they are uploaded to the server and become an asset.

The recommended approach is to pick a file format that Mediaflux natively supports in terms of content analysis/metadata extraction.

From http://www.arcitecta.com/Products/DataTypes

Geospatial file formats with geospatial metadata extraction include:

  • NITF, ECW, CIB, MrSID, DTED, LIDAR, CADRG and ERDAS IMG, JPG2000, GeoTIFF, GeoJPG, GeoPDF.

Other file formats with other metadata extraction include:

  • HDF, NetCDF, DICOM, BMP, GIF, JPG2000, JPEG, PCX, PNG, TIFF, DOC, DOCX, PDF, DPX.

To retrieve the default metadata for an asset, the following are equivalent:

asset.get :id "path=/projects/DMF-TEST/sample/10_2015-03-05_111059_DSC_0004.NEF"
asset.get :id 61080987

The structure (indenting) of the document displayed is important and indicates the XML path of the particular metadata item.

Most of the information (name, ctime, version, etc) is standard file metadata. Custom metadata (if any) appears under the meta element.

Custom metadata

Custom metadata can be used if the files to be ingested are not recognised by default, but requires a little more effort.

Suppose a custom metadata document template, called mytype, defines a string called location.

Here is how to set the metadata value of location to a value of "Perth" using pshell:

asset.set :id 25480 :meta < :mytype < :location "Perth" > >

Installation of the custom XML document mytype can only be done by Pawsey administrators.


Importing metadata

An automated mechanism of adding metadata to uploaded files may also be utilised.

This is invoked if uploads are performed using the command import instead of the usual put.

Import will look for pairs of files that are of the form filename and filename.meta and attempt to populate filename with metadata from filename.meta

For example, suppose we had two files:

myfile.jpg
myfile.jpg.meta


where myfile.jpg.meta contained the following:

[asset]
geoshape/point/latitude = "10.0"
geoshape/point/longitude = "20.0"
geoshape/point/elevation = "100.0"

[mytype]
name = "the name"
details = "some more information"


The asset section contains built-in "first class" asset metadata entries, whereas the mytype section contains the custom metadata.


We can then import the file using pshell, eg:

import myfile.jpg

Then examine the metadata:

file myfile.jpg

Which will display all the metadata, including:

    geoshape = None    { type=point datum=WGS84 }
        point = None
            latitude = None    { dd=10 }
                deg = 10
                min = 0
                sec = 0
            longitude = None    { dd=20 }
                deg = 20
                min = 0
                sec = 0
            elevation = 100.0
...
    meta = None    { stime=60812 }
      {mytype} = None    { id=2 }
          name = "the name"
          details = "some more information"

which can be queried, as described below.

Simple asset queries

A useful mediaflux command is asset.query which performs a database search on asset metadata items, which are specified via their XML path

For example, to locate all files based on .PNG filename extension anywhere in the entire (visible) namespace hierarchy use:

asset.query :where "name='*.PNG'"

The default metadata item name is the same XML element that appears in the standard asset metadata output of asset.get


To restrict the search to look only in /projects/Data Team, use:

asset.query :where "namespace='/projects/Data Team' and name='*.PNG'"


If you wish to search /projects/Data Team and all folders underneath it, use:

asset.query :namespace "/projects/Data Team" :where "name='*.PNG'"


Searching for all assets that were created after a certain date would be done as follows:

asset.query :namespace "/projects/Data Team" :where "ctime>='01-Jan-2017'"

Similar searches could be done using a geometric bounding box if your data has geospatial information attached.

There is no equivalent database search for namespaces in Mediaflux. This is why, in pshell, ls * will return only files (assets) and not folders (namespaces) that match the pattern.


Advanced asset queries

Searching for custom metadata

Suppose your assets had some custom metadata attached to them that was described by the XML document mytype.

If one of the elements in mytype was location (which is of type string), then here is how to search for a particular value:

asset.query :where "xpath(mytype/location) = 'Perth' and namespace>='/projects/myproject'"


Getting more information from the results

By default, asset.query returns only the unique asset identifier (id)

To get the name of the asset as well, use:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-name


To get arbitrary pieces of metadata, for example the name and checksum, use:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-value :xpath name :xpath content/csum


By default, mediaflux will label all the metadata items you request as value=. This can be annoying when retrieving multiple items.

To associate a better name with the returned metadata item, use something like:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action get-value :xpath -ename checksum content/csum

The presence of the -ename checksum attribute tells mediaflux to return the result as checksum= instead of value=

Simple actions on the results

The query function also has the ability to use the results of a search in certain ways.

For example, to count all PNG files stored in the entire Data Team project or to sum their sizes you would run, respectively:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action count
asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action sum :xpath content/size

Here the action value (count or sum) determines what to do with the search result, and content/size is the XML location of the file-size that is stored in the metadata of every asset that matches the query.

Arbitrary actions on the results 

Mediaflux can also send the results of a query to another service call that takes asset identifiers as an input argument.

For example, suppose we wanted to move all PNG files to a new location /projects/Data Team/png-files we would use:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action pipe :service -name asset.move < :namespace /projects/Data Team/png-files >

Here our action is to pipe the results to another service call, in this case an asset.move, which is a service call that takes the required destination namespace argument in a shorthand XML document.


Naturally, similar constructions could be used to simply delete these files instead:

asset.query :where "namespace>='/projects/Data Team' and name='*.PNG'" :action pipe :service -name asset.destroy


Namespace manipulation

Recall that assets and namespaces are quite different objects in mediaflux. None of the above queries will help in managing namespaces.

To change the name of the sean namespace to sean2 use:

asset.namespace.rename :namespace /projects/Data Team/sean :name sean2


To move the namespace sean to a new parent location use:

asset.namespace.move :namespace /projects/Data Team/sean :to /projects/Data Team/admins


To delete a namespace entirely, use:

asset.namespace.destroy :namespace /projects/Data Team/sean






  • No labels