Metadata Creation on Managed Storage

This page explains how you might attach custom metadata to an object on Magenta (early development object store) and Acacia (production system not yet available).Metadata are a useful way to add information other than just the object name (filename) to an object. The default size limit for metadata on Ceph (the system that provides Magenta and Acacia) is 2KB.

In the case of a Ceph object store, metadata take the form of arbitrary key-value pairs.

Example

In this example we have a file containing an image from NASA's Lunar Reconaissance Orbiter. Along with the image the file also contains data describing the image, such as the coordinate system used for the observation, the resolution of the camera etc.  We would like to store some of this information along with the file as object metadata.

Properties of our image stored within the file
Driver: GTiff/GeoTIFF
Files: results/M1157749492RExxM1293037365LE-median-DRG.tif
Size is 3023, 22173
Coordinate System is:
ENGCRS["Sinusoidal MOON",
    EDATUM[""],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]
Data axis to CRS axis mapping: 1,2
Origin = (0.520919612092538,0.866779815858503)
Pixel Size = (0.000043455233543,-0.000043455233543)
Metadata:
  AREA_OR_POINT=Area
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (   0.5209196,   0.8667798)
Lower Left  (   0.5209196,  -0.0967531)
Upper Right (   0.6522848,   0.8667798)
Lower Right (   0.6522848,  -0.0967531)
Center      (   0.5866022,   0.3850134)
Band 1 Block=256x256 Type=Float32, ColorInterp=Gray
  NoData Value=-9999

Lets start by using the mc tool to interact with the object store from the command line (a brief introduction on how to set this up for Magenta can be found here).

Our first step is to create a bucket, if one doesn't already exist, using the mb command and a name for our bucket:

Creating a bucket
> mc mb ceph/metadata-demo
Bucket created successfully `ceph/metadata-demo`.

The next step is to upload the file with the cp  command, and set the metadata with the attr  flag followed by key-value pairs, the location of the file and the bucket we want to upload it to. The information we've chosen to store with our object are the corner coordinate fields, the origin and the pixel size. Note the metadata is written inside apostrophes '' and the key-value pairs are separated by semicolons:

Uploading and setting the metadata
> mc cp --attr 'PixelSize=0.000043455233543,-0.000043455233543;LowerRight=0.6522848,-0.0967531;UpperLeft=0.5209196,0.8667798;LowerLeft=0.5209196,-0.0967531;Origin=0.520919612092538,0.866779815858503;UpperRight=0.6522848,0.8667798;Center=0.5866022,0.3850134' results/M1157749492RExxM1293037365LE-median-DRG.tif ceph/metadata-demo/
...037365LE-median-DRG.tif:  152.67 MiB / 152.67 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 1.94 MiB/s 1m18s

We can verify that our metadata is there by using the stat command followed by our object name:

Using mc stat to verify the metadata
>  mc stat ceph/metadata-demo/M1157749492RExxM1293037365LE-median-DRG.tif
Name      : M1157749492RExxM1293037365LE-median-DRG.tif
Date      : 2021-09-23 12:42:42 AWST
Size      : 153 MiB
ETag      : 99e7f81b403c6c3aff4feab9d5f135df-10
Type      : file
Metadata  :
  X-Amz-Meta-Origin    : 0.520919612092538,0.866779815858503
  X-Amz-Meta-Pixelsize : 0.000043455233543,-0.000043455233543
  X-Amz-Meta-Center    : 0.5866022,0.3850134
  X-Amz-Meta-Lowerleft : 0.5209196,-0.0967531
  X-Amz-Meta-Upperleft : 0.5209196,0.8667798
  X-Amz-Meta-Upperright: 0.6522848,0.8667798
  Content-Type         : image/tiff
  X-Amz-Meta-Lowerright: 0.6522848,-0.0967531

Notice that the user-defined metadata fields have the prefix X-Amz-Meta- and the key case has been modified. For example, LowerRight becomes X-Amz-Meta-Lowerright.

We've now successfully uploaded a file to the object store and added metadata. 

Programmatic alternative

Rather than use the command line, we can interact with Ceph programmatically, and this is supported by all libraries that can interact with S3. As an example, using boto in Python one would simply do:

Interacting with Ceph using a Boto3 call
s3.upload_file(
    'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME',
    ExtraArgs={'Metadata': {'mykey': 'myvalue'}}
)

That call uploads the file and sets the metadata at the same time.

More details in the boto documentation

External links