Datasets

Arkimet supports different dataset formats, which offer different performance and indexing features to match various ways in which data is stored and queried.

Dataset configuration

Datasets are configured with simple key = value configuration options.

These are all the supported options:

  • archive age: data older than this number of days will be moved to the dataset archive during maintenance.

  • delete age: data older than this number of days will be deleted during maintenance.

  • eatmydata: disable fsync/fdatasync operations while writing data to dataset, and disable sqlite’ journaling and other data integrity features. This makes acquiring data very fast, but an interrupted import or a concurrent import may cause data corruption.

  • format: format of data in the dataset (one of grib, bufr, odimh5, vm2)

  • index: comma-separated list of names of metadata to index for faster queries

  • path: path to the dataset, or URL for remote datasets.

  • replace: when yes, importing duplicate data will replace the existing version . When no, importing duplicate data will be rejected. When usn, importing duplicate BUFR data will replace the existing version only if the BUFR Update Sequence Number is greater than the one currently in the dataset. A replace leaves the old data in the segment and appends the new data at the end, updating the index to refer to the new data. As with deleted data, disk space is only reclaimed when running arki-check --repack

  • restrict: comma-separated list of names that have access to the dataset. This allows filtering with the --restrict option on command line.

  • smallfiles: yes or no. When yes, the file contents are also saved in the index, to speed up extraction of data with tiny payloads like vm2.

  • step: segmentation step for the dataset (one of daily, weekly, biweekly, monthly, and yearly).

  • type: dataset type (one of iseg, simple, error, duplicates, remote, outbound, discard, file).

  • unique: comma-separated list of names of metadata that, taken together, make it unique in the dataset