Mediaflux and pshell
Mediaflux concepts
- assets - the file content and associated metadata
- namespaces - remote folder structure on the mediaflux server
- XML-based
- has it's own query language
- asset.get - display metadata for an asset
- asset.set - alter metadata for an asset
- asset.query - find assets based on metadata queries
Mediaflux commands
Retrieving metadata
The general command is:
asset.get :id ASSET-ID asset.get :id "path=FULL-PATH-TO-FILE"
If we wanted to extract a specific piece of information, such as the checksum, we use the optional xpath argument:
pawsey:/projects>asset.get :id 69776098 :xpath content/csum value = 384461DE
Populating metadata
asset.set :id ASSET-ID <ELEMENT AND VALUE>
asset.set :id 2958 :name "new name" version="2" -changed-or-created="true" -id="2958" -stime="12095"
Note: a new version of the asset is created - reflecting the fact that the name has been altered.
A more complex example, where we assign a geospatial location (in this case a point) to an asset:
asset.set :id 69776160 :geoshape < :point < :latitude 31.95 :longitude 115.86 :elevation 10.0 > > version="3" -changed-or-created="true" -id="2958" -stime="12097"
This last case is an example of specifying an XML document that details the xpath + value for metadata items.
It is equivalent to an XML metadata document that looks like this:
<geoshape> <point> <latitude> 31.95 </latitude> <longitude> 115.86 </longitude> <elevation> 10.0 </elevation> </point> </geoshape>
Slight digression - metadata templates
The doc type shows the metadata that can be queried and specified.
Custom doc types can be made.
Asset is a special "first-order" type attached to everything.
Custom doc types get added under the <meta> element.
asset.doc.type.describe :type asset
Things such as the asset name and the geoshape and other items in the asset template are first order metadata items and treated as described above in terms of asset.get and asset.set.
Custom metadata templates are slightly different, as they sit under the <meta> element of the asset, rather than at the top level.
pawsey:/projects>asset.doc.type.describe :type csiro:seismic ... definition element -type="string" -min-occurs="0" -name="name" -max-occurs="1" element -type="string" -min-occurs="0" -name="geometry" -max-occurs="1" element -type="string" -min-occurs="0" -name="basin" -max-occurs="1" element -type="string" -min-occurs="0" -name="sub-basin" -max-occurs="1"
Applying this template to a piece of data could be achieved as follows:
pawsey:/projects>asset.set :id 69776098 :meta < :csiro:seismic < :name "Perth" :geometry "sprawling" > > version="5" -changed-or-created="false" -stime="76619960" pawsey:/projects>asset.get :id 69776098 :xpath meta/csiro:seismic value="5" value="Perth" value="sprawling" pawsey:/projects>asset.get :id 69776098 :xpath meta/csiro:seismic/name value="Perth"
Querying the metadata
The simple form of an asset query is:
asset.query :where "LOGICAL-EXPRESSION"
Here is an example of searching for assets in a directory that match a pattern:
pawsey:/projects>asset.query :where "namespace='/projects/Data Team/testfiles' and name='*007*.jpg'" id="69776098" -version="5" id="69776099" -version="1" id="69776101" -version="1" id="69776103" -version="1" id="69776104" -version="1" id="70486890" -version="1" id="70486891" -version="1" id="70486892" -version="1" id="70486893" -version="1"
Note that this is equivalent to the pshell "simplified language" command:
pawsey:/projects/Demo/sean>ls *007*.jpg 5 items, 29 items per page, remote folder: /projects/Demo/sean 69776131 | online | 17.02 KB | i000765.jpg 69776132 | online | 17.21 KB | i000767.jpg 69776133 | online | 17 KB | i000769.jpg 69776134 | online | 19.09 KB | i000768.jpg 69776137 | online | 17.59 KB | i000766.jpg Page 1 of 1, file filter ['*007*.jpg']:
However, using Mediaflux language allows more sophisticated tasks to be performed.
Here's how we could retrieve the checksums of the above matches:
pawsey:/projects>asset.query :where "namespace>='/projects/Data Team/testfiles' and name='*007*.jpg'" :action get-value :xpath content/csum asset -version="5" -id="69776098" value="384461DE" asset -version="1" -id="69776099" value="561CDB00" asset -version="1" -id="69776101" value="F517536B" ...
Here's how we could sum the content sizes of the files in the above match:
pawsey:/projects>asset.query :where "namespace>='/projects/Data Team/testfiles' and name='*007*.jpg'" :action sum :xpath content/size value="143330" -nbe="9"
We could also move those files somewhere else:
pawsey:/projects/Demo/sean>mkdir newfolder pawsey:/projects/Demo/sean>asset.query :where "namespace>='/projects/Data Team/testfiles' and name='*007*.jpg'" :action pipe :service -name asset.move < :namespace /projects/Demo/sean/newfolder > pawsey:/projects/Demo/sean>cd newfolder/ Remote: /projects/Demo/sean/newfolder pawsey:/projects/Demo/sean/newfolder>ls 5 items, 29 items per page, remote folder: /projects/Demo/sean/newfolder 69776131 | online | 17.02 KB | i000765.jpg 69776132 | online | 17.21 KB | i000767.jpg 69776133 | online | 17 KB | i000769.jpg 69776134 | online | 19.09 KB | i000768.jpg 69776137 | online | 17.59 KB | i000766.jpg
Exercises
Exercise 1 - displaying metadata
Upload an image file (for example one of the jpgs from the previous exercises) and then inspect the metadata in the system.
Remember that you only have readwrite access to /projects/Demo
Exercise 2 - modifying metadata
Using the file uploaded above, add geospatial location metadata to the asset and confirm by displaying the metadata.
Exercise 3 - querying metadata
Recursively search namespaces starting from /projects/Data Team to find the asset where the metadata element mf-note/note has a literal string value equal to cat. Download the file and verify that it is a cute cat.