Overview
- Mediaflux intro
- Import data with metadata
- Query data based on metadata searches
Material
Mediaflux concepts
Mediaflux (ref) is the underlying storage platform that pshell communicates with. It is out of scope to discuss the full capabilities of Mediaflux, but, a few relevant items will be mentioned.
Mediaflux is a database on a filesystem (or filesystems) that can be queried in a somewhat comparable manner as SQL. The Mediaflux database is XML based so the arguments and search terms are couched in the language of XML.
Every file stored in a Mediaflux server gets transformed into an asset and has a unique ID reference - this is the ID reported by pshell in the previous introductory section.
An asset consists of metadata and a reference to where the actual file content is stored ie the file-system path. Both the metadata and file content can be versioned and previous version retrieved. The default behaviour if no particular version is specified is to use the most recent version. Multiple versions of file content impact your usage/quota.
namespaces are the Mediaflux name for the virtual folder structure that was introduced in the previous section.
metadata format - this tells you what queries can be done
Code Block |
---|
asset.get :id ASSET-ID
asset.get :id "path=FULL-PATH-TO-FILE" |
MetadataInstead of files and folders (pshell) you have:
- assets - the file content and associated metadata
- namespaces - remote folder structure on the mediaflux server
Mediaflux metadata is:
- XML-based
- has it's own query language
The main commands, operating on assets and asset metadata, are:
- asset.get - display metadata for an asset
- asset.set - alter metadata for an asset
- asset.query - find assets based on metadata queries
Mediaflux commands
These can be run in pshell or the vendor's own client (aterm.jar)
The general command is:
Code Block |
---|
asset.get :id ASSET-ID
asset.get :id "path=FULL-PATH-TO-FILE" |
If we wanted to extract a specific piece of information, such as the checksum, we use the optional xpath argument:
Code Block |
---|
pawsey:/projects>asset.get :id 69776098 :xpath content/csum
value = 384461DE |
The general command is:
Code Block |
---|
asset.set :id ASSET-ID <ELEMENT AND VALUE> |
For example, this is equivalent to a rename:
Code Block |
---|
asset.set :id 2958 :name "new name"
version="2" -changed-or-created="true" -id="2958" -stime="12095" |
Note: a new version of the asset is created - reflecting the fact that the name has been altered.
A more complex example, where we assign a geospatial location (in this case a point) to an asset:
Code Block |
---|
asset.set :id 69776160 :geoshape < :point < :latitude 31.95 :longitude 115.86 :elevation 10.0 > >
version="3" -changed-or-created="true" -id="2958" -stime="12097" |
This last case is an example of specifying an XML document that details the xpath + value for metadata items.
It is equivalent to an XML metadata document that looks like this:
Code Block |
---|
<geoshape>
<point>
<latitude> 31.95 </latitude>
<longitude> 115.86 </longitude>
<elevation> 10.0 </elevation>
</point>
</geoshape> |
The doc type shows the metadata that can be queried and specified.
Custom doc types can be made.
Asset is a special "first-order" type attached to everything.
Custom doc types get added under the <meta> element.
Code Block |
---|
asset.doc.type.describe :type asset |
Things such as the asset name and the geoshape and other items in the asset template are first order metadata items and treated as described above in terms of asset.get and asset.set.
Custom metadata templates are slightly different, as they sit under the <meta> element of the asset, rather than at the top level.
Code Block |
---|
pawsey:/projects>asset.doc.type.describe :type csiro:seismic
...
definition
element -type="string" -min-occurs="0" -name="name" -max-occurs="1"
element -type="string" -min-occurs="0" -name="geometry" -max-occurs="1"
element -type="string" -min-occurs="0" -name="basin" -max-occurs="1"
element -type="string" -min-occurs="0" -name="sub-basin" -max-occurs="1" |
Applying this template to a piece of data could be achieved as follows:
Code Block |
---|
pawsey:/projects>asset.set :id 69776098 :meta < :csiro:seismic < :name "Perth" :geometry "sprawling" > >
version="5" -changed-or-created="false" -stime="76619960"
pawsey:/projects>asset.get :id 69776098 :xpath meta/csiro:seismic
value="5"
value="Perth"
value="sprawling"
pawsey:/projects>asset.get :id 69776098 :xpath meta/csiro:seismic/name
value="Perth" |
The simple form of an asset query is:
Code Block |
---|
asset.query :where "LOGICAL-EXPRESSION"
asset |
Here is an example of searching for assets in a directory that match a pattern:
Code Block |
---|
pawsey:/projects>asset.query :where "namespace='/projects/Data Team/testfiles' and name='*007*.jpg'"
id="69776098" -version="5"
id="69776099" -version="1"
id="69776101" -version="1"
id="69776103" -version="1"
id="69776104" -version="1"
id="70486890" -version="1"
id="70486891" -version="1"
id="70486892" -version="1"
id="70486893" -version="1" |
Note that this is equivalent to the pshell "simplified language" command:
Code Block |
---|
pawsey:/projects/Demo/sean>ls *007*.jpg
5 items, 29 items per page, remote folder: /projects/Demo'"
asset/sean
69776131 | online | 17.02 KB | i000765.jpg
69776132 | online | 17.21 KB | i000767.jpg
69776133 | online | 17 KB | i000769.jpg
69776134 | online | 19.09 KB | i000768.jpg
69776137 | online | 17.59 KB | i000766.jpg
Page 1 of 1, file filter ['*007*.jpg']: |
However, using Mediaflux language allows more sophisticated tasks to be performed.
Here's how we could retrieve the checksums of the above matches:
Code Block |
---|
pawsey:/projects>asset.query :where "namespace>='/projects/DemoData Team/testfiles' and name='*007*.jpg'" :action get-value :xpath content/csum
asset -version="5" -id="69776098"
value="384461DE"
asset -version="1" -id="69776099"
value="561CDB00"
asset -version="1" -id="69776101"
value="F517536B"
... |
Here's how we could sum the content sizes of the files in the above match:
Code Block |
---|
pawsey:/projects>asset.query :where "namespace>='/projects/Data Team/seantestfiles' and (name='*007*.JPGjpg' or name='*.PNG')" |
Simple actions on the results
Code Block |
---|
asset" :action sum :xpath content/size
value="143330" -nbe="9" |
We could also move those files somewhere else:
Code Block |
---|
pawsey:/projects/Demo/sean>mkdir newfolder
pawsey:/projects/Demo/sean>asset.query :where "namespace>='/projects/Data Team/seantestfiles' and name='*007*.jpg'" :action get-value :xpath -ename name namepipe :service -name asset.move < :namespace /projects/Demo/sean/newfolder >
pawsey:/projects/Demo/sean>cd newfolder/
Remote: /projects/Demo/sean/newfolder
pawsey:/projects/Demo/sean/newfolder>ls
5 items, 29 items per page, remote folder: /projects/Demo/sean/newfolder
69776131 | online | 17.02 KB | i000765.jpg
69776132 | online | 17.21 KB | i000767.jpg
69776133 | online | 17 KB | i000769.jpg
69776134 | online | 19.09 KB | i000768.jpg
69776137 | online | 17.59 KB | i000766.jpg |
upload a file Upload an image file (for example one of the jpgs from the previous exercises) and then inspect the metadata in the system.
Remember that you only have readwrite access to /projects/Demo
doing things with queries (ie pipes) Expand |
---|
title | Solution to exercise 1 |
---|
| put IMG_0222.PNG
asset.get :id |
Code Block |
---|
pawsey:/projects/Demo/sean>put IMG_0009.jpg
Total files=1, transferring...
Progress: 100% at 0.0 MB/s
Completed.
pawsey:/projects/Demo/sean>asset.get :id "path=/projects/ | Data Team0222PNGasset = None { version=4 id=1377760 vid=74319092 } type = image/png namespace =
asset -version="1" -id="69776107" -vid="74447906"
type="image/jpeg"
namespace="/projects/Data Team/ | sean testfiles"
path="/projects/Data Team/ | sean0222.PNG name = IMG_0222.PNG meta = None { stime=74319092 } etc |
querying metadata
adding metadata
0009.jpg"
name="IMG_0009.jpg"
... |
|
Using the file uploaded above, add geospatial location metadata to the asset and confirm by displaying the metadata.
Expand |
---|
title | Solution to exercise 2 |
---|
|
Code Block |
---|
pawsey:/projects>asset.set :id 69776098 :geoshape < :point < :latitude 17.0 :longitude 178.0 :elevation 30.0 > >
version="6" -changed-or-created="true" -stime="76628158"
pawsey:/projects>asset.get :id 69776098 :xpath geoshape
value
geoshape -type="point" -datum="WGS84"
point
latitude="17"
longitude="178"
elevation="30.0" |
|
Recursively search namespaces starting from /projects/Data Team to find the asset where the metadata element mf-note/note has a literal string value equal to cat. Download the file and verify that it is a cute cat.
Expand |
---|
title | Solution to exercise 3 |
---|
|
Code Block |
---|
pawsey:/projects>asset.query :where "namespace>='/projects/Data Team' and mf-note/note='cat'"
id="69776108" -version="2"
pawsey:/projects>asset.get :id 69776108 :xpath namespace :xpath name
value="/projects/Data Team/testfiles"
value="IMG_0033.jpg"
pawsey:/projects>get /projects/Data Team/testfiles/IMG_0033.jpg
Total files=1, transferring ...
Progress=100%, rate=0.0 MB/s
Completed.
pawsey:/projects>exit
iblis:~> open IMG_0033.jpg |
|