MetaCat Client#

MetaCat client consists of Command Line Interface (CLI) and Python API, described in a separate section. CLI is in fact based on the Python API. When you install MetaCat client, both CLI and Python API get installed.

Installation#

You will need Python 3.7 or newer.

Preferred way to install the client is using pip:

$ pip install metacat-client --user
$ pip3 install metacat-client --user

Alternatively, it can be installed from github:

$ git clone https://github.com/fermitools/metacat.git
$ cd metacat
$ python setup.py install --user

When installing MetaCat client using pip or setup.py, pay attention to messages like this:

WARNING: The script metacat is installed in '/Users/user-local/Library/Python/3.10/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

Make sure the location where the metacat command is installed is in your PATH:

$ export PATH=/Users/user-local/Library/Python/3.10/bin:$PATH

If you use your own Python installation, e.g. Anaconda or Miniconda, then you can do this instead:

$ python setup.py install

General CLI command syntax#

In general, the command looks like this:

$ metacat [-s <server URL>] [-a <auth server URL>] <command> [command options] [arguments ...]

-a is used to specify the URL for the authentication server. It is used only for authentication commands. -s option specifies the server URL. Alternatively, you can define the METACAT_AUTH_SERVER_URL and METACAT_SERVER_URL environment variables:

$ export METACAT_SERVER_URL="http://server:port/path"
$ export METACAT_AUTH_SERVER_URL="http://auth_server:port/auth_path"
$ metacat <command group> <command> [command options] [arguments ...]

Versions#

To quickly check the connectivity to the MetaCat server and see what software versions are used on the server and the client sides, use the version command:

$ metacat version
MetaCat Server URL:         https://metacat.fnal.gov:9443/dune_meta_demo/app
Authentication server URL:  https://metacat.fnal.gov:8143/auth/dune
Server version:             3.9.1
Client version:             3.9.1

User Authentication#

Main purpose of MetaCat authentication commands is to obtain a MetaCat authentication token and store it in the MetaCat token library located at ~/.metacat_tokens. The library may contain multiple tokens, one per MetaCat server instance the user communicates with. The instances are identified by their URL.

The authentication token generated by MetaCat should not be confused with WLCG token issued by the VO token issuer. WLCG token, issued by a recognized VO token issuer, can be used as one of supported authenticators by the client to obtain a MetaCat authentication token.

To obtain a new token, use metacat auth login command. Currently, 2 authentication mechanisms are implemented: password and X.509 certificates. LDAP or MetaCat server “local” password can be used with the password autentication. X.509 method supports both X.509 certificates and proxies.

Token obtained using CLI metacat auth login command can be further used by both CLI and API until it expires. When the MetaCat authentication token expires, the client must obtain new token to continue using MetaCat.

Password authentication#

MetaCat gets user passwords from 2 sources: LDAP and MetaCat database. If configured, MetaCat will always present the password the user presented for authentication to LDAP. In addition, a user can have another password hashed and then stored in MetaCat database.

To obtain a new token using password authentication, use the following command:

$ metacat auth login -m password <username>

X.509 authentication#

MetaCat supports X.509 authentication. In order to enable X.509 authentication, the user has to add their DN to their user record stored in the MetaCat users database. This can be done by the user or by a MetaCat admin using MetaCat GUI. MetaCat CLI offers a convenience command helping the user to make view their DN in their certificate the way MetaCat sees it:

$ metacat auth mydn -c my_cert.pem -k my_key.pem
CN=UID:jjohnson,CN=John Johnson,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org

Once the DN is added to the MetaCat user records, the user can use use the following command:

$ metacat auth login -m x509 -c <cert file> -k <key file> <username>
$ metacat auth login -m x509 -c <proxy file> <username>

Environment variables X509_USER_CERT, X509_USER_KEY and X509_USER_PROXY can be used instead of -c and -k options:

$ export X509_USER_PROXY=~/user_proxy
$ metacat auth login -m x509 <username>

Note that MetaCat ignores all CN fields of the DN with numeric values. So if the DN looks like this:

CN=UID:jjohnson,CN=John Johnson,OU=People,O=Fermi National Accelerator Laboratory,C=US,DC=cilogon,DC=org,CN=5674

then adding the following DNs to the database has exactly the same effect:

CN=UID:jjohnson,CN=John Johnson,OU=People,O=FNAL,C=US,DC=cilogon,DC=org,CN=5674
CN=UID:jjohnson,CN=John Johnson,OU=People,O=FNAL,C=US,DC=cilogon,DC=org,CN=57673
CN=UID:jjohnson,CN=John Johnson,OU=People,O=FNAL,C=US,DC=cilogon,DC=org,CN=5674,CN=1234
CN=UID:jjohnson,CN=John Johnson,OU=People,O=FNAL,C=US,DC=cilogon,DC=org

WLCG token/ SciToken authentication#

MetaCat supports WLCG tokens authentication. MetaCat client will look for the token in the following standard locations:

  1. BEARER_TOKEN environment variable value

  2. contents of a file pointed to by the BEARER_TOKEN_FILE environment variable

  3. if XDG_RUNTIME_DIR environment variable is defined:

    1. if ID environment variable is defined, contents of the file $XDG_RUNTIME_DIR/bt_u$ID

    2. if ID is not defined, contents of the file:

      $XDG_RUNTIME_DIR/bt_u<effective uid of the process>

To use a WLCG token stored in one of the standard locations, use:

$ metacat auth login -m token <username>

Alternatively, you can specify the token value or the location of the token file explicitly:

$ metacat auth login -m token (-t|--token) <serilized token> <username>
$ metacat auth login -m token (-t|--token) <file with serilized token> <username>

Listing available MetaCat authentication tokens#

Once the MetaCat client obtains the MetaCat authentication token using one of the authentication mechanisms listed above, it stores the token into its own token library indexed by the MetaCat server URL. This way, the client can communicate to several MetaCat instances, using corresponding tokens.

To see available MetaCat authentication tokens:

metacat auth list

Export token to a file or to stdout


metacat auth export [-o|–out <token file>] [<token id>|<server url>]

On successful authentication, the following command will show your username and the token expiration:

$ metacat auth whoami [-t <token file>]
User:    jdoe
Expires: Fri Jul 20 12:35:10 2022

Namespaces#

$ metacat namespace create <namespace>                     # create namespace owned by me
$ metacat namespace create -o <owner_role> <namespace>     # create namespace owned by a role
$ metacat namespace show <namespace>

To list existing namespaces:

$ metacat namespace list [options] <pattern>
    <pattern> is a UNIX shell style pattern (*?[])
    -u|--user <username>        - list namespaces owned by the user
    -d                          - exclude namespaces owned by the user via a role
    -r|--role <role>            - list namespaces owned by the role

Parameter Categories#

To list existing parameter categories:

$ metacat category list [options] [<root category>]
          -j|--json           - print as JSON

To get particular category information:

$ metacat category show [options] <category>
          -j|--json           - print as JSON

Datasets#

To create dataset in a namespace or to modify the dataset content or metadata, the user must be an owner of the dataset’s namespace, either directly or through a role.

Creating a dataset#

$ metacat dataset create [<options>] <namespace>:<name> [<description>]

    -f|--flags (monotonic|frozen)               - optional, dataset flags
    -m|--metadata '<JSON expression>'
    -m|--metadata <JSON file>
    -m|--metadata -                             - read metadata as JSON from stdin
    -q|--query '<MQL file query>'               - run the query and add files to the dataset
    -q|--query <file_with_query>                - run the query and add files to the dataset
    -q|--query -                                - read the query from stdin
    -j|--json                                   - print dataset information as JSON

A multi-word description does not have to be put in quotes. E.g., the following two commands are equivalent:

$ metacat dataset create scope:name Carefully selected files
$ metacat dataset create scope:name "Carefully selected files"

Removing a dataset#

$ metacat dataset remove <namespace>:<name>

To remove a dataset, the user has to be an owner of the dataset namespace either directly or through a role.

Adding files to dataset#

$ metacat dataset add-files [options] <dataset namespace>:<dataset name>

    add files by DIDs or namespace/names or MQL query

    -f|--files (<did>|<file id>)[,...]          - dids and fids can be mixed
    -f|--files <file with DIDs or file ids>     - one did or fid per line
    -f|--files <JSON file>                      - list of dictionaries:
                                                    { "fid": ...} or
                                                    { "namespace": ..., "name":... } or
                                                    { "did":... } or
    -f|--files -                                - read file list from stdin

    add files selected by a query
    -q|--query "<MQL query>"
    -q|--query <file>                           - read query from the file
    -q|--query -                                - read query from stdin

To add files which match an MQL query, use -q option.

An alternative way to add files matching a query is to pipe the outout of query command into ``dataset add-files`:

$ metacat query -i files from scope:dataset1 where x.y = 123 | metacat dataset add-files -f - scope:dataset2

Using -q can be faster because piping involves sending the file list to the client and back to the server, whereas -q does not send the list of files.

Note that it is not an error to attempt to add a file if it is already included in the dataset.

To add files from a dataset, the user has to be an owner of the dataset namespace either directly or through a role. A user can add any files to a dataset regardless of the file’s namespace ownership.

Removing files from dataset#

$ metacat dataset remove-files [options] <dataset namespace>:<dataset name>

    remove files by DIDs or namespace/names
    -f|--files (<did>|<file id>)[,...]          - dids and fids can be mixed
    -f|--files <file with DIDs or file ids>     - one did or fid per line
    -f|--files <JSON file>                      - list of dictionaries:
                                                    { "fid": ...} or
                                                    { "namespace": ..., "name":... } or
                                                    { "did":... } or
    -f|--files -                                - read file list from stdin

    remove files selected by a query
    -q|--query "<MQL query>"
    -q|--query <file>                           - read query from the file
    -q|--query -                                - read query from stdin

The command parameters are the same as for add-files.

If the dataset is frozen or monotonic, the command will return an error.

To remove files from a dataset, the user has to be an owner of the dataset namespace either directly or through a role. A user can remove any files from a dataset regardless of the file’s namespace ownership.

Listing existing datasets#

$ metacat dataset list [<options>] [[<namespace pattern>:]<name pattern>]
        -l|--long           - detailed output
        -c|--file-counts    - include file counts if detailed output

Namespace and name patterns are UNIX ls style patterns (recognizing *?[]). Examples:

$ metacat dataset list 'production:*.[0-3].dat'
$ metacat dataset list *:A*

When using -l option, user can also use -c to request dataset file counts. In this case, it may take additional time to calculate the file counts for large datasets.

Updating a dataset metadata and flags#

$ metacat dataset update <options> <namespace>:<name> [<description>]
        -M|--monotonic (yes|no) - set/reset monotonic flag
        -F|--frozen (yes|no)    - set/reset monotonic flag
        -r|--replace            - replace metadata, otherwise update
        -m|--metadata <JSON file with metadata>
        -m|--metadata '<JSON expression>'
        -j|--json               - print updated dataset information as JSON

Listing files in the dataset#

$ metacat dataset files [<options>] <dataset namespace>:<dataset name>
        -m|--with-metadata      - include file metadata
        -j                      - as JSON

Adding/removing subsets to/from a dataset#

$ metacat dataset add-subset <parent dataset namespace>:<parent name> <child dataset namespace>:<child name> [<child dataset namespace>:<child name> ...]

When adding a dataset to another dataset, MetaCat checks whether the operation will create a circle in the ancestor/descendent relationship and refuses to do so.

Files#

Auto-naming#

When declaring new files to MetaCat, sometimes it is useful to have MetaCat generate file names according to some naming schema. To do that, instead of name file attribute, specify auto-name attribute. Auto-name is a text string with some fields, which will be replaced by MetaCat server with actual values at the time of the declaration. The following fields are recognized and will be substituted in the following order:

  • $clock3 - lower 3 digits of UNIX timestamp in milliseconds as integer (milliseconds portion of the timestamp)

  • $clock6 - lower 6 digits of UNIX timestamp in milliseconds as integer

  • $clock - entire UNIX UNIX timestamp in milliseconds as integer

  • $uuid8 - 8 hex digits of a random UUID

  • $uuid16 - 16 hex digits of a random UUID

  • $uuid - 32 hex digits of a random UUID

  • $fid - file ID

For example, the pattern file_$uuid8_$clock6.dat may generate file name like file_13d79a37_601828.dat.

Declare a single file#

When declaring a new file, the file has to be added to an existing dataset.

To declare a file, create a JSON file with file metadata, without any file attributes such as namespace, name, size, etc. e.g.:

{
    "math.pi": 3.14,
    "processing.status": "done",
    "processing.version": "1.3.5"
}

then declare the file specifying file attributes and the metadata as part of the command line:

$ metacat file declare -m metadata.json \
    --size 2048 \
    test:file_123_test.data \                 # file namespace, name
    test:dataset_a                            # dataset namespace, name

An alternative way to declare a file is to create a JSON file description - a file metadata and file attributes like this:

{
    "namespace":    "production",
    "name":         "file_123.data",
    "size":         1024,
    "metadata": {
        "math.pi": 3.14,
        "processing.status": "done",
        "processing.version": "1.3.5"
    },
    "parents": [ {"fid": "abc123"} ]
}

The following file attributes can be specified:

fidoptional

File ID for the new file. Must be unique for the MetaCat instance. If unspecified, MetaCat will assign the hexadecimal representation of a random UUID (32 hex digits) as the file ID.

namespaceoptional

Namespace for the file. If unspecified, the default namespace specified with -N will be used.

nameoptional

File name. The file name must be unique within the namespace. If unspecified, the name will be auto-generated or the file ID will be used as the name.

auto_nameoptional

Auto-name pattern

If neither name nor auto_name are provided, then file ID will be used as the file name.

sizerequired

File size in bytes

metadataoptional

File metadata as dictionary

parentsoptional

List of dictionaries, one dictionary per parent file, in one of 3 formats:

  • { “did”: “<namespace>:<name>” }

  • { “namespace”:”…”, “name”:”…” }

  • { “fid”: “<file id>” }

Individual parent dictionaries do not have to be in the same format. Specifing parents with list of string file ids instead of dictionaries is deprecated.

Once the file description is ready, it can be used with -f option:

$ metacat file declare \
        -f <JSON file description> \
        [other options] \
        [[<file namespace>]:<file name>] [<dataset namespace>:]<dataset name>

Also, the user can combine the two methods by using -f option with some file attributes specified in the command line. In this case attribute values from the command line will override corresponding values from the JSON file. For example:

$ metacat file declare -f my_file.json \
    --size 2048 \                             # file size will be set to 2048 instead of 1024
    test:file_123_test.data \                 # file namespace, name to use
    test:dataset_a

In this case, file namespace/name do not have to be specified in the command line as long as the file description has those attributes specified, e.g:

$ metacat file declare -f my_file.json \
    --size 2048 \                             # file size will be set to 2048 instead of 1024
    test:dataset_a                            # file namespace/name will be taken from the file description

Complete set of options for this command is:

$ metacat file declare [options] [[<file namespace>:]<filename>] [<dataset namespace>:]<dataset name>

      -d|--dry-run                        - dry run: run all the checks but stop short of actual file declaration
      -N|--namespace <default namespace>
      -f|--file-description <JSON file>   - JSON file with description, including file attributes and metadata

      The following options can be used to override the values coming from the file description (-f)

      -s|--size <size>                    - file size
      -c|--checksums <type>:<value>[,...] - checksums
      -p|--parents <parent>[,...]         - parents can be specified with their file ids or DIDs.
                                            if the item contains colon ':', it is interpreted as DID
      -m|--metadata <JSON metadata file>  - if unspecified, file will be declared with empty metadata
      -a|--auto-name [[<namespace>:]<pattern>]   - generate file name automatically

      -j|--json                           - print results as JSON
      -v|--verbose                        - verbose output

      --sample                            - print JSON file description sample

Declare multiple files#

When declaring multiple files, the command accepts JSON file path. The file must contain a JSON representation of a list of file descriptions like this:

[
    {
        "namespace":"namespace",    # optional - use -N to specify default
        "name":"name",              # optional
        "auto_name":pattern,        # optional
        "fid":"...",                # optional - if missing, new will be generated. If specified, must be unique
        "metadata": { ... },        # optional
        "parents":  [ ... ]         # optional, list of dictionaries, one dictionary per parent, see below
        "size":   1234              # required - size of the file in bytes
    },
    ...
]

You can get a sample of the JSON file:

$ metacat file declare-sample

Once you have the JSON file with files description, you can declare them:

$ metacat file declare-many [options] <file list JSON file> [<dataset namespace>:]<dataset name>
Declare multiple files:
      -d|--dry-run                        - dry run: run all the checks but stop short of actual file declaration
      -j|--json                           - print results as JSON
      -N|--namespace <default namespace>

Listing datasets the file is in#

This command will print namespace/name for all the datasets the file is in. Currently, not recursively.

$ metacat file datasets [-j|-p] -i <file id>
$ metacat file datasets [-j|-p] <namespace>:<name>
  -p pretty-print the list of datasets
  -j print the dataset list as JSON
  otherwise print <namespace>:<name> for each dataset

Updating Metadata for Files#

If you want to make similar changes to metadata for multiple files, you can use update-meta subcommand. This subcommand allows you to perform one of 3 functions on multiple files:

  • Add missing metadata values

  • Update existing metadata values

  • Replace entire file metadata with new values

To use this command, first, you will need to create a JSON file with metadata values you want to add, update or replace:

{
    "x": 3.14,
    "run_type": "calibration"
}

Then make the list of files you want to update, write the list into a text file, one file entry per line. Each file entry can be either the file DID (“namespace:name”) or the file id.

Then run the metacat file update-meta command.

If the files are specified with their DIDs, then use -n option:

$ metacat update -n <namespace>:<name>[,...] metadata.json
$ metacat update -n <file with DIDs> metadata.json
$ metacat update -n - metadata.json             # read file namespace:name's from stdin

If the files are specified with their file ids, then use -i option:

$ metacat update -i <file_id>[,...] metadata.json
$ metacat update -i <file with file ids> metadata.json
$ metacat update -i - metadata.json             # read file ids from stdin

If you want to replace entire metadata for the files with new dictionary instead of adding or updating few parameters, use -r option, e.g.:

$ metacat update -r -i <file_id>[,...] metadata.json
$ metacat update -r <file with DIDs> metadata.json

Updating Single File Attributes and Metadata#

The update subcommand works with single file, but it allows you to update file attributes such as:

  • file size

  • checksums dictionary

  • provenance (parents and/or children)

in addition to the file metadata. The command works in 2 modes:

  • replace - checksums dictionary, metadata and parents and children lists will be replaced with new values

  • add/update - checksums dictionary and metadata will be updated with new values, specified parents/children will be added to existing lists

The command will modify only those attributes included in the command and will not affect other attributes. For example:

# update size and add/update adler32 checksum for the file:
$ metacat file update -s 12345 -k '{"adler32":"1234abcd"}' my_scope:my_file.hdf5

# remove any checksums from the file:
$ metacat file update -r -k - my_scope:my_file.hdf5

# replace children for the file:
$ metacat file update -r -c my_scope:derived_a.hdf5,my_scope:derived_b.hdf5 my_scope:my_file.hdf5

# add/update metadata values from JSON file:
$ metacat file update -m meta.json my_scope:my_file.hdf5

# replace metadata:
$ metacat file update -r -m '{"math.pi":3.1415, "format":"hdf5"}' my_scope:my_file.hdf5

# read all the updates from a JSON file:
$ metacat file update -u update.json my_scope:my_file.hdf5

The command has more options:

$ metacat file update --help
metacat file update [options] (<file namespace>:<file name>|<file id>)

  -d|--dry-run
  -v|--verbose
  -r|--replace                        - Replace metadata, checksums, parents and children
                                        otherwise update metadata, checksums, add parents and children.
                                        Applies to -k, -p, -c, -m, -f options
  -j|--json                           - print updated file attributes as JSON. Otherwise - as Python pprint

  -u|--updates <JSON file>            - JSON file with file attributes to be updated as a dictionary.
                                        The following keys are accepted:
                                            size: int,
                                            checksums: dict,
                                            metadata: dict,
                                            parents: list of strings,
                                            children: list of strings

  -s|--size <size>                    - file size
  -k|--checksums <type>:<value>[,...] - checksums
  -m|--metadata <JSON metadata file>  - metadata
  -m|--metadata '<JSON dictionary>'   - inline metadata
  -m|--metadata -                     - read metadata dictionary as JSON from stdin
  -p|--parents <parent>[,...]         - parents can be specified with their file ids or DIDs.
                                        if the item contains colon ':', it is interpreted as DID
  -p|--parents -                      - use '-' with -r to remove all parents
  -c|--children <child>[,...]         - children can be specified with their file ids or DIDs.
                                        if the item contains colon ':', it is interpreted as DID
  -c|--children -                     - use '-' with -r to remove all children

  If -u is used together with some individual attributes options, the attributes from the -u file will
  be updated with those coming from the individual attribute options first.

Moving files into another namespace#

To move a set of files to another namespace, use move subcommand. There are 2 ways to specify the list of files to move:

  • explicitly listing their DIDs or file ids

  • selecting files using an MQL query

To specify files explicitly, use -f option:

$ metacat file move -n <target namespace> -f <file list specification>

The File list can be specified in one of the following ways:

-f|--files <file namespace>:<file name>[,...]   - list of DIDs
-f|--files <file id>[,...]                      - list of file ids
-f|--files <file>                               - read the list of DIDs or file ids from a text file
-f|--files <JSON file>                          - read the list from JSON file
-f|--files -                                    - read the list from stdin

To use an MQL query:

$ metacat file move -n <target namespace> <inline query>
$ metacat file move -n <target namespace> -q <query source>

-q|--query <file>                           - read query from the file
-q|--query -                                - read query from stdin

Using this command, keep in mind that this operation is slow because it involves updating not only the file data but also several indexes in the database. Depending on the number of files to move, it can take ~hours to complete.

The user has to own (directly or through a role) both source namespace for each file and the destination namespace. The command will move files the user is authorized to move and print errors for those the user does not have permission to move.

Retrieving#

Retrieving single file metadata

metacat file show [<options>] (-i <file id>|<namespace>:<name>)
  -m|--meta-only            - print file metadata only
  -n|--name-only            - print file namespace, name only
  -d|--id-only              - print file id only

  -j|--json                 - as JSON
  -p|--pretty               - pretty-print information

  -l|--lineage|--provenance (p|c)        - parents or children instead of the file itself
  -I|--ids                               - for parents and children, print file ids instead of namespace/names

Validation#

Sometimes is it desirable to validate metadata without actually declaring a file. One way of doing this would be to use dry run mode of the file declaration command. Another way is to use metacat validate command:

$ metacat validate [options] <JSON file with metadata>
  -d <dataset namespace>:<dataset name>           - if specified, validate the file metadata against the dataset requirements
  -q                                              - quiet - do not print anything, just use exit status to signal results

To use the command, create a JSON file with file metadata only and use the command to validate it. The metadata will be validated against all the parameter category constraints and, if the target dataset for the file is specified with -d, against the dataset metadata requirements. The command will exit with 0 (success) status if the metadata is valid. Otherwise it will print the violations found and exit with error status 1. -q can be used to suppress any error printing, to have the command quietly exit with 0 or 1 status.

Metadata Categories#

Existing parameter categories can be listed using:

$ metacat category list
.
DUNE
DUNE_MC
ivm
...

Information about an individual category can be printed using:

$ metacat category show ivm
Path:             ivm
Description:      ivm test category
Owner user:       ivm
Owner role:
Creator:          ivm
Created at:       2022-09-27 10:51:19 UTC
Restricted:       no
Constraints:
  counter                                         int [0 - ]
  done                                        boolean
  odds                                            int (1, 3, 5, 7)
  pi                                            float [3.0 - 4.0]
  word                                           text ~ '[A-Z].*'

Query#

MetaCat queries are written in Metadata Query Language.

metacat query [<options>] "<MQL query>"
metacat query [<options>] -f <MQL query file>

Options:
        -t|--timeout <timeout in seconds>   - request timeout - useful for long running queries (default 600)
        -j|--json                           - print raw JSON output
        -p|--pretty                         - pretty-print metadata
        -l|--line                           - print all metadata on single line (good for grepping, ignored with -j and -p)
        -i|--ids                            - print file ids instead of names
           --summary (count|keys)           - print only summary information
                                                  count: file count and total size
                                                  keys: list of all top level metadata keys for selected files
        -s                                  - shortcut for --summary count
             -2|--1024                      - for count, print sizes in KiB, GiB (1024, ...), instead of powers of 1000 (KB, GB, ...)
        -m|--metadata [<field>,...]         - print metadata fields
                                              overrides --summary
        -m|--metadata all                   - print all metadata fields
                                              overrides --summary
        -P|--with-provenance                - include provenance information
        -N|--namespace=<default namespace>  - default namespace for the query
        -S|--save-as=<namespace>:<name>     - save files as a new datset
        -A|--add-to=<namespace>:<name>      - add files to an existing dataset
        -r|--include-retired-files          - include retired files into the query results
        -b|--batch_size N                   - request results in batches of N files

        -x|--explain                        - do not run the query, show resulting SQL only

The –batch_size N option is useful for queries that otherwise time out.

Named Queries#

MetaCat allows to store a query in the database under some namespace/name and then reuse the same query as part of another query. For example, one can save a query like this:

files from production:data_2021
    where data.format = "hd5"

Let’s say they saved it as joeuser:hd5_files. Then they can run the query by name as is:

query joeuser:hd5_files

or as part of a more complex query:

query joeuser:hd5_files where app.version in ("2.3", "2.3")
union (
    query joeuser:hd5_files,
    files from mc:mc_2021 where data.format = "hd5"
)

MetaCat provides basic named query management commands:

To create a named query:

$ metacat named_query create [options] <namespace>:<name> <MQL query>         - inline query
$ metacat named_query create [options] -f|--file <file> <namespace>:<name>    - read query from file
$ metacat named_query create [options] <namespace>:<name>                     - read query from stdin

Options:
    -u|--update                -- update if the named query exists

To list existing named queries:

$ metacat named_query list [<options>]

Options:
    -n|--namespace                          - include queries from the namespace only
    -j|--json                               - as JSON

To show a named query:

$ metacat named_query show [<options>] <namespace>:<name>

Options:
    -j|--json                               - as JSON
    -v|--verbose                            - verbose outout. Otherwise - print query source only

To search for named query:

$ metacat named_query search ...

    search <inline query>                                   - inline query
    search -q|--query <query file>                          - read query from file
    search -q|--query -                                     - read query from stdin

    Options:
        -f|--format (json|pretty|names)                     - output format

Named query search uses a subset of MQL to specify the search criteria. Here are some examples:

queries matching my_namespace:favorite_*
queries matching regexp my_namespace:"prod_202[0-3]"

To include the query metadata into the search criteria, add where clause:

queries matching my_namespace:favorite_*
    where file.quality > 1 and file.type = "hdf5"