Metadata Creation on Managed Storage
This page explains how you might attach custom metadata to an object on Magenta (early development object store) and Acacia (production system not yet available).Metadata are a useful way to add information other than just the object name (filename) to an object. The default size limit for metadata on Ceph (the system that provides Magenta and Acacia) is 2KB.
In the case of a Ceph object store, metadata take the form of arbitrary key-value pairs.
Example
In this example we have a file containing an image from NASA's Lunar Reconaissance Orbiter. Along with the image the file also contains data describing the image, such as the coordinate system used for the observation, the resolution of the camera etc. We would like to store some of this information along with the file as object metadata.
Driver: GTiff/GeoTIFF Files: results/M1157749492RExxM1293037365LE-median-DRG.tif Size is 3023, 22173 Coordinate System is: ENGCRS["Sinusoidal MOON", EDATUM[""], CS[Cartesian,2], AXIS["(E)",east, ORDER[1], LENGTHUNIT["metre",1, ID["EPSG",9001]]], AXIS["(N)",north, ORDER[2], LENGTHUNIT["metre",1, ID["EPSG",9001]]]] Data axis to CRS axis mapping: 1,2 Origin = (0.520919612092538,0.866779815858503) Pixel Size = (0.000043455233543,-0.000043455233543) Metadata: AREA_OR_POINT=Area Image Structure Metadata: COMPRESSION=LZW INTERLEAVE=BAND Corner Coordinates: Upper Left ( 0.5209196, 0.8667798) Lower Left ( 0.5209196, -0.0967531) Upper Right ( 0.6522848, 0.8667798) Lower Right ( 0.6522848, -0.0967531) Center ( 0.5866022, 0.3850134) Band 1 Block=256x256 Type=Float32, ColorInterp=Gray NoData Value=-9999
Lets start by using the mc tool to interact with the object store from the command line (a brief introduction on how to set this up for Magenta can be found here).
Our first step is to create a bucket, if one doesn't already exist, using the mb
command and a name for our bucket:
> mc mb ceph/metadata-demo Bucket created successfully `ceph/metadata-demo`.
The next step is to upload the file with the cp
command, and set the metadata with the attr
flag followed by key-value pairs, the location of the file and the bucket we want to upload it to. The information we've chosen to store with our object are the corner coordinate fields, the origin and the pixel size. Note the metadata is written inside apostrophes ''
and the key-value pairs are separated by semicolons:
> mc cp --attr 'PixelSize=0.000043455233543,-0.000043455233543;LowerRight=0.6522848,-0.0967531;UpperLeft=0.5209196,0.8667798;LowerLeft=0.5209196,-0.0967531;Origin=0.520919612092538,0.866779815858503;UpperRight=0.6522848,0.8667798;Center=0.5866022,0.3850134' results/M1157749492RExxM1293037365LE-median-DRG.tif ceph/metadata-demo/ ...037365LE-median-DRG.tif: 152.67 MiB / 152.67 MiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 1.94 MiB/s 1m18s
We can verify that our metadata is there by using the stat
command followed by our object name:
> mc stat ceph/metadata-demo/M1157749492RExxM1293037365LE-median-DRG.tif Name : M1157749492RExxM1293037365LE-median-DRG.tif Date : 2021-09-23 12:42:42 AWST Size : 153 MiB ETag : 99e7f81b403c6c3aff4feab9d5f135df-10 Type : file Metadata : X-Amz-Meta-Origin : 0.520919612092538,0.866779815858503 X-Amz-Meta-Pixelsize : 0.000043455233543,-0.000043455233543 X-Amz-Meta-Center : 0.5866022,0.3850134 X-Amz-Meta-Lowerleft : 0.5209196,-0.0967531 X-Amz-Meta-Upperleft : 0.5209196,0.8667798 X-Amz-Meta-Upperright: 0.6522848,0.8667798 Content-Type : image/tiff X-Amz-Meta-Lowerright: 0.6522848,-0.0967531
Notice that the user-defined metadata fields have the prefix X-Amz-Meta-
and the key case has been modified. For example, LowerRight
becomes X-Amz-Meta-Lowerright
.
We've now successfully uploaded a file to the object store and added metadata.
Programmatic alternative
Rather than use the command line, we can interact with Ceph programmatically, and this is supported by all libraries that can interact with S3. As an example, using boto
in Python one would simply do:
s3.upload_file( 'FILE_NAME', 'BUCKET_NAME', 'OBJECT_NAME', ExtraArgs={'Metadata': {'mykey': 'myvalue'}} )
That call uploads the file and sets the metadata at the same time.
More details in the boto documentation