Skip to end of banner
Go to start of banner

pshell and metadata

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 42 Next »

Mediaflux concepts

Mediaflux (ref) is the underlying storage platform that pshell communicates with.
Instead of files and folders (pshell) you have:
  • assets - the file content and associated metadata
  • namespaces - remote folder structure on the mediaflux server

Mediaflux metadata is:
  • XML-based 
  • has it's own query language 

The main commands, operating on assets and asset metadata, are:
  • asset.get - display metadata for an asset
  • asset.set - alter metadata for an asset
  • asset.query - find assets based on metadata queries

Mediaflux commands

These can be run in pshell or the vendor's own client (aterm.jar)

Retrieving metadata

The general command is:

asset.get :id ASSET-ID
asset.get :id "path=FULL-PATH-TO-FILE"


If we wanted to extract a specific piece of information, such as the checksum, we use the optional xpath argument:

pawsey:/projects/Data Team/testfiles>asset.get :id 69776098 :xpath content/csum
value = 384461DE


Populating metadata

The general command is:


asset.set :id ASSET-ID <ELEMENT AND VALUE>


For example, this is equivalent to a rename:


asset.set :id 69776160 :name "new filename"
version = 2    { changed-or-created=true stime=74452766 }

Note: a new version of the asset is created - reflecting the fact that the name has been altered.


A more complex example, where we assign a geospatial location (in this case a point) to an asset:

asset.set :id 69776160 :geoshape < :point < :latitude -31.95 :longitude 115.86 :elevation 10.0 > >
version = 3    { changed-or-created=true stime=74452768 }

This last case is an example of specifying an XML document that details the xpath + value for metadata items.

It is equivalent to an XML metadata document that looks like this:

<geoshape>
	<point>
		<latitude> -31.95 </latitude>
		<longitude> 115.86 </longitude>
		<elevation> 10.0 </elevation>
	</point>
</geoshape>


Slight digression - metadata templates

The doc type shows the metadata that can be queried and specified.

Custom doc types can be made.

Asset is a special "first-order" type attached to everything.

Custom doc types get added under the <meta> element.

asset.doc.type.describe :type asset

Things such as the asset name and the geoshape and other items in the asset template are first order metadata items and treated as described above in terms of asset.get and asset.set.

Custom metadata templates are slightly different, as they sit under the <meta> element of the asset, rather than at the top level.

pawsey:/projects>asset.doc.type.describe :type csiro:seismic
...
    definition = None
        element = None    { type=string min-occurs=0 name=name }
        element = None    { type=string min-occurs=0 name=geometry }
        element = None    { type=string min-occurs=0 name=basin }
        element = None    { type=string min-occurs=0 name=sub-basin }
        element = None    { type=string min-occurs=0 name=data-type }
        element = None    { type=string min-occurs=0 name=vertical-scale }
        element = None    { type=string min-occurs=0 name=project }


Applying this template to a piece of data could be achieved as follows:

pawsey:/projects>asset.set :id 69776160 :meta < :csiro:seismic < :name "Perth" :geometry "sprawling" > >
version = 5    { changed-or-created=true stime=74452772 }

pawsey:/projects>asset.get :id 69776160 :xpath meta/csiro:seismic
value = Perth
value = sprawling

pawsey:/projects>asset.get :id 69776160 :xpath meta/csiro:seismic/name
value = Perth


Querying the metadata

The simple form of an asset query is:

asset.query :where "LOGICAL-EXPRESSION"


Here is an example of searching for assets in a directory that match a pattern:

pawsey:/projects>asset.query :where "namespace='/projects/Demo/sean' and (name='*.JPG' or name='*007*')" 
id = 69776131    { version=1 }
id = 69776132    { version=1 }
id = 69776133    { version=1 }
id = 69776134    { version=1 }
id = 69776137    { version=1 }


Note that this is equivalent to the pshell "simplified language" command:

pawsey:/projects/Demo/sean>ls *007*.jpg
5 items, 29 items per page, remote folder: /projects/Demo/sean
 69776131   | online  |  17.02 KB | i000765.jpg
 69776132   | online  |  17.21 KB | i000767.jpg
 69776133   | online  |     17 KB | i000769.jpg
 69776134   | online  |  19.09 KB | i000768.jpg
 69776137   | online  |  17.59 KB | i000766.jpg
Page 1 of 1, file filter ['*007*.jpg']: 


However, using Mediaflux language allows more sophisticated tasks to be performed.

Here's how we could retrieve the checksums of the above matches:

pawsey:/projects>asset.query :where "namespace>='/projects/Demo/sean' and name='*007*.jpg'" :action get-value :xpath content/csum
asset = None    { version=1 id=69776131 }
    value = 561CDB00
asset = None    { version=1 id=69776132 }
    value = F517536B
asset = None    { version=1 id=69776133 }
    value = 384461DE
asset = None    { version=1 id=69776134 }
    value = 35189B08
asset = None    { version=1 id=69776137 }
    value = 1D206A24


Here's how we could sum the content sizes of the files in the above match:

pawsey:/projects>asset.query :where "namespace>='/projects/Demo/sean' and name='*007*.jpg'" :action sum :xpath content/size
value = 87902    { nbe=5 }


We could also move those files somewhere else:

pawsey:/projects/Demo/sean>mkdir newfolder

pawsey:/projects/Demo/sean>asset.query :where "namespace>='/projects/Demo/sean' and name='*007*.jpg'" :action pipe :service -name asset.move < :namespace /projects/Demo/sean/newfolder >

pawsey:/projects/Demo/sean>cd newfolder/
Remote: /projects/Demo/sean/newfolder

pawsey:/projects/Demo/sean/newfolder>ls
5 items, 29 items per page, remote folder: /projects/Demo/sean/newfolder
 69776131   | online  |  17.02 KB | i000765.jpg
 69776132   | online  |  17.21 KB | i000767.jpg
 69776133   | online  |     17 KB | i000769.jpg
 69776134   | online  |  19.09 KB | i000768.jpg
 69776137   | online  |  17.59 KB | i000766.jpg


Exercises

Exercise 1 - upload a file and then inspect the metadata in the system.

 Solution to exercise 1
pawsey:/projects/Demo/sean>put IMG_0009.jpg
Total files=1, transferring...  
Progress: 100% at 0.0 MB/s  
Completed.

pawsey:/projects/Demo/sean>asset.get :id "path=/projects/Demo/sean/IMG_0009.jpg"
asset = None    { version=1 id=69779872 vid=74452775 }
    type = image/jpeg
    namespace = /projects/Demo/sean
    path = /projects/Demo/sean/IMG_0009.jpg
    name = IMG_0009.jpg
...


Exercise - querying metadata

In the namespace /projects/Data Team/testfiles find the asset where the metadata element mf-note/note has a value equal to cat. Download the file and verify that it is a cute cat.

 Solution to exercise
pawsey:/projects>asset.query :where "namespace>='/projects/Data Team/testfiles' and mf-note/note='cat'"
id = 69776108    { version=2 }

pawsey:/projects>asset.get :id 69776108 :xpath name
value = IMG_0033.jpg

pawsey:/projects>get /projects/Data Team/testfiles/IMG_0033.jpg
Total files=1, transferring ...  
Progress=100%, rate=0.0 MB/s  
Completed.
pawsey:/projects>exit

iblis:~> open IMG_0033.jpg 


Exercise - adding metadata - import and/or asset.set?
Exercise - doing things with queries (ie pipes) - eg move all .jpg's to a folder
  • No labels