dataset archives¶
Datasets can distinguish between online and archived data stored in two different formats.
Online data can be stored in a format with detailed indexing like iseg
(see
iseg dataset format), while archived data can be stored in a format with less
features and overhead like simple
(see simple dataset format).
Archives are stored in the .archive/
subdirectory of a dataset. The
archive age
configuration value can be used to automate moving to
.archive/
, during repack, those segments that only contain data older than
the given age.
Data is moved to .archive/
as entire segments after checking and repacking of
the online part. This should ensure that archived segments contain no
duplicates and need no further repacking.
The .archive/
directory contain several subdirectory, each with a dataset in
simple format containing data. Queries on the archives query
each dataset in sequence, ordered by name, with last
always kept at the end.
Segments that get archived are moved from the online dataset to the
.archive/last
dataset, and can be manually moved from last
to any
other subdirectory of .archive/
. arkimet will only ever move segments to
.archive/last
and the rest can be maintained with procedures external to
arkimet with no interference.
Online archives¶
Online archives are the default type of archive that is created automatically
as .archive/last
during maintenance. They are instances of simple datasets, and all the simple dataset documentation applies
to them.
Offline archives¶
Offline archives are archives whose data has been moved to external media. The
only thing that is left is a .summary
file that describes the data that
would be there.
It is possible to bring an offline archive online by copying/linking/mounting
it next to its .summmary
file. If both the .summary
file and the archive
directory are present, arkimet will ignore the .summary
file when reading,
and will ignore the archive directory when checking. This has the effect of
making an archive read-only when the $archivename.summary
file is present.
To run a check/fix/repack operation on an offline archive, bring it online,
remove the $archivename.summary
file, run the check/fix/repack operation, and
copy $archivename/summary
to $archivename.summary
to mark it read-only
again.