arkimet.dataset package

Submodules

arkimet.dataset.http module

arkimet.dataset.http.expand_remote_query(remotes: arkimet.cfg.Sections, query: str) str

Expand aliases on the query for all remote datasets given.

An exception is raised if some remotes have conflicting aliases definition.

arkimet.dataset.http.get_alias_database(url: str) arki.cfg.Sections

Read the alias database for the server at the given URL

arkimet.dataset.http.load_cfg_sections(url: str) arki.cfg.Sections

Read the configuration of the datasets at the given URL

Module contents

class arkimet.dataset.Checker

Check functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples
check(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)

Perform checking/fixing on the dataset

repack(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)

Perform repacking on the dataset

segment_state(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False, time_override: int=None) Dict[str, str]

Compute the state of each segment in the archive

class arkimet.dataset.Dataset

A dataset in arkimet. It provides information about the dataset configuration, and allows to create readers, writers, and checkers to work with the dataset.

You can avoid the intermediate step of accessing Dataset objects, by calling directly arkimet.dataset.Session functions arkimet.dataset.Session.dataset_reader(), arkimet.dataset.Session.dataset_writer(), and arkimet.dataset.Session.dataset_checker().

If in some cases it can be useful to instantiate a Dataset and pass it around, this class is available, matching the C++ API.

Examples:

with session.dataset("dsname") as dataset:
    print(dataset.name)
    with dataset.reader() as reader:
        return reader.query_data()
checker() arkimet.dataset.Checker

return a checker for this dataset

config

dataset configuration as an arkimet.cfg.Section object

name

dataset name

reader() arkimet.dataset.Reader

return a reader for this dataset

writer() arkimet.dataset.Writer

return a writer for this dataset

exception arkimet.dataset.ImportDuplicateError

The item to import already exists on the dataset

exception arkimet.dataset.ImportError

Base class for dataset import errors

exception arkimet.dataset.ImportFailedError

The import process failed on this metadata

class arkimet.dataset.Reader

Read functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples
query_bytes(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, data_start_hook: Callable[[], None] = None, postprocess: str = None, metadata_report: str = None, summary_report: str = None, file: Union[int, BinaryIO] = None, progres=None) Union[None, bytes]

query a dataset, piping results to a file

Parameters
  • matcher – the matcher string to filter data to return.

  • with_data – if True, also load data together with the metadata.

  • sort – string with the desired sort order of results.

  • data_start_hook – function called before sending the data to the file

  • postprocess – name of a postprocessor to use to filter data server-side

  • metadata_report – name of the server-side report function to run on results metadata

  • summary_report – name of the server-side report function to run on results summary

  • file – the output file. The file can be a file-like object, or an integer file or socket handle. If missing, data is returned in a bytes object

  • progress – an object with 3 methods: start(expected_count: int=0, expected_bytes: int=0), update(count: int, bytes: int), and done(total_count: int, total_bytes: int), to call for progress updates.

query_data(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, on_metadata: Callable[[metadata], Optional[bool]] = None) Union[None, List[arki.Metadata]]

query a dataset, processing the resulting metadata one by one

Parameters
  • matcher – the matcher string to filter data to return.

  • with_data – if True, also load data together with the metadata.

  • sort – string with the desired sort order of results.

  • on_metadata – a function called on each metadata, with the Metadata object as its only argument. Return None or True to continue processing results, False to stop.

  • progress – an object with 3 methods: start(expected_count: int=0, expected_bytes: int=0), update(count: int, bytes: int), and done(total_count: int, total_bytes: int), to call for progress updates.

query_summary(matcher: Union[arki.Matcher, str] = None, summary: arkimet.Summary = None, progress=None) arkimet.Summary

query a dataset, returning an arkimet.Summary with the results

Parameters
  • matcher – the matcher string to filter data to return.

  • summary – not None, add results to this arkimet.Summary, and return it, instead of creating a new one.

class arkimet.dataset.Session

Shared configuration, aliases, and working data used to work with arkimet datasets.

A Session stores alias information, and preloads arkimet.dataset.Dataset objects.

Adding a remote dataset to the Session dataset pool will download the alias database of its server and merge it into the current one, raising an error in case of inconsistencies.

Datasets in the pool can be referred by name. It is possible to create datasets given their configuration, without adding them to the pool.

Session is also used to instantiate matchers using its alias database.

Examples:

# Long version
with arkimet.dataset.Session() as session:
    for section in config.values():
        session.add_dataset(section)
    matcher = session.matcher("level:ground")
    with session.dataset("dsname") as dataset:
        with dataset.reader() as reader:
            result = reader.query_data(matcher)

# Short version
with arkimet.dataset.Session() as session:
    for section in config.values():
        session.add_dataset(section)
    with session.dataset_reader("dsname") as reader:
        with dataset.reader() as reader:
            result = reader.query_data("level:ground")
add_dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]])

add a dataset to the Session pool

If a dataset with the same name already exists in the pool, it raises an exception.

If the dataset is remote, the aliases used by the remote server will be added to the session alias database. If different servers define some aliases differently, it raises an exception.

dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Dataset

return a Dataset give its configuration

dataset_checker(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Checker

return a dataset checker give its configuration

dataset_pool_size() int

return how many datasets are in the dataset pool

dataset_reader(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Reader

return a dataset reader give its configuration

dataset_writer(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Writer

return a dataset writer give its configuration

datasets() List[arkimet.dataset.Dataset]

return a list of all datasets in the session pool

expand_query(query: str) str

expand aliases in an Arkimet query, returning the same query without use of aliases

get_alias_database() arkimet.cfg.Sections

return matcher alias database for this session

has_dataset(name: str) bool

check if the dataset pool has a dataset with the given name

has_datasets() bool

return True if the session contains datasets in the dataset pool

load_aliases(aliases: Union[str, arkimet.cfg.Sections])

add the given set of aliases to the alias database in this session

matcher(query: str) arkimet.Matcher

parse an arkimet matcher expression, using the aliases in this session, and return the Matcher object for it

merged() arkimet.dataset.Dataset

return a merged dataset querying all datasets in this session

querymacro(name: str, macro: str) arkimet.dataset.Dataset

create a QueryMacro dataset querying datasets from this session’s pool

class arkimet.dataset.SessionTimeOverride

Write functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples
class arkimet.dataset.Writer

Write functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples
acquire(md: arki.Metadata, replace: str = None, drop_cached_data_on_commit: bool = False)

Acquire the given metadata item (and related data) in this dataset

After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata object.

If the import failed, a subclass of arki.dataset.ImportError is raised.

acquire_batch(md: Iterable[arkimet.Metadata], replace: str = None, drop_cached_data_on_commit: bool = False) Tuple[str]

Acquire the given metadata items (and related data) in this dataset

After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata objects.

No exception is raised in case of import failures. The function returns a tuple with the same length and the input sequence of metadata, and a string representing the outcome: “ok”, “duplicate”, or “error”.

flush()

Flush pending changes to disk

arkimet.dataset.read_config(pathname: str) arki.cfg.Section

Read the configuration of a dataset at the given path or URL

arkimet.dataset.read_configs(pathname: str) arki.cfg.Sections

Read the merged dataset configuration at the given path or URL