arkimet.dataset package¶

Submodules¶

arkimet.dataset.http module¶

arkimet.dataset.http.expand_remote_query(remotes: arkimet.cfg.Sections, query: str) → str¶

Expand aliases on the query for all remote datasets given.

An exception is raised if some remotes have conflicting aliases definition.

arkimet.dataset.http.get_alias_database(url: str) → arki.cfg.Sections¶: Read the alias database for the server at the given URL

arkimet.dataset.http.load_cfg_sections(url: str) → arki.cfg.Sections¶: Read the configuration of the datasets at the given URL

Module contents¶

class arkimet.dataset.Checker¶

Check functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples

check(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)¶: Perform checking/fixing on the dataset

repack(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)¶: Perform repacking on the dataset

segment_state(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False, time_override: int=None) → Dict[str, str]¶: Compute the state of each segment in the archive

class arkimet.dataset.Dataset¶

A dataset in arkimet. It provides information about the dataset configuration, and allows to create readers, writers, and checkers to work with the dataset.

You can avoid the intermediate step of accessing Dataset objects, by calling directly arkimet.dataset.Session functions arkimet.dataset.Session.dataset_reader(), arkimet.dataset.Session.dataset_writer(), and arkimet.dataset.Session.dataset_checker().

If in some cases it can be useful to instantiate a Dataset and pass it around, this class is available, matching the C++ API.

Examples:

with session.dataset("dsname") as dataset:
    print(dataset.name)
    with dataset.reader() as reader:
        return reader.query_data()

checker() → arkimet.dataset.Checker¶: return a checker for this dataset

config¶: dataset configuration as an arkimet.cfg.Section object

name¶: dataset name

reader() → arkimet.dataset.Reader¶: return a reader for this dataset

writer() → arkimet.dataset.Writer¶: return a writer for this dataset

exception arkimet.dataset.ImportDuplicateError¶: The item to import already exists on the dataset

exception arkimet.dataset.ImportError¶: Base class for dataset import errors

exception arkimet.dataset.ImportFailedError¶: The import process failed on this metadata

class arkimet.dataset.Reader¶

Read functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples

query_bytes(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, data_start_hook: Callable[[], None] = None, postprocess: str = None, metadata_report: str = None, summary_report: str = None, file: Union[int, BinaryIO] = None, progres=None) → Union[None, bytes]¶

query a dataset, piping results to a file

Parameters

matcher – the matcher string to filter data to return.
with_data – if True, also load data together with the metadata.
sort – string with the desired sort order of results.
data_start_hook – function called before sending the data to the file
postprocess – name of a postprocessor to use to filter data server-side
metadata_report – name of the server-side report function to run on results metadata
summary_report – name of the server-side report function to run on results summary
file – the output file. The file can be a file-like object, or an integer file or socket handle. If missing, data is returned in a bytes object
progress – an object with 3 methods: start(expected_count: int=0, expected_bytes: int=0), update(count: int, bytes: int), and done(total_count: int, total_bytes: int), to call for progress updates.

query_data(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, on_metadata: Callable[[metadata], Optional[bool]] = None) → Union[None, List[arki.Metadata]]¶

query a dataset, processing the resulting metadata one by one

Parameters

matcher – the matcher string to filter data to return.
with_data – if True, also load data together with the metadata.
sort – string with the desired sort order of results.
on_metadata – a function called on each metadata, with the Metadata object as its only argument. Return None or True to continue processing results, False to stop.
progress – an object with 3 methods: start(expected_count: int=0, expected_bytes: int=0), update(count: int, bytes: int), and done(total_count: int, total_bytes: int), to call for progress updates.

query_summary(matcher: Union[arki.Matcher, str] = None, summary: arkimet.Summary = None, progress=None) → arkimet.Summary¶

query a dataset, returning an arkimet.Summary with the results

Parameters

matcher – the matcher string to filter data to return.
summary – not None, add results to this arkimet.Summary, and return it, instead of creating a new one.

class arkimet.dataset.Session¶

Shared configuration, aliases, and working data used to work with arkimet datasets.

A Session stores alias information, and preloads arkimet.dataset.Dataset objects.

Adding a remote dataset to the Session dataset pool will download the alias database of its server and merge it into the current one, raising an error in case of inconsistencies.

Datasets in the pool can be referred by name. It is possible to create datasets given their configuration, without adding them to the pool.

Session is also used to instantiate matchers using its alias database.

Examples:

# Long version
with arkimet.dataset.Session() as session:
    for section in config.values():
        session.add_dataset(section)
    matcher = session.matcher("level:ground")
    with session.dataset("dsname") as dataset:
        with dataset.reader() as reader:
            result = reader.query_data(matcher)

# Short version
with arkimet.dataset.Session() as session:
    for section in config.values():
        session.add_dataset(section)
    with session.dataset_reader("dsname") as reader:
        with dataset.reader() as reader:
            result = reader.query_data("level:ground")

add_dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]])¶

add a dataset to the Session pool

If a dataset with the same name already exists in the pool, it raises an exception.

If the dataset is remote, the aliases used by the remote server will be added to the session alias database. If different servers define some aliases differently, it raises an exception.

dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) → arkimet.dataset.Dataset¶: return a Dataset give its configuration

dataset_checker(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) → arkimet.dataset.Checker¶: return a dataset checker give its configuration

dataset_pool_size() → int¶: return how many datasets are in the dataset pool

dataset_reader(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) → arkimet.dataset.Reader¶: return a dataset reader give its configuration

dataset_writer(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) → arkimet.dataset.Writer¶: return a dataset writer give its configuration

datasets() → List[arkimet.dataset.Dataset]¶: return a list of all datasets in the session pool

expand_query(query: str) → str¶: expand aliases in an Arkimet query, returning the same query without use of aliases

get_alias_database() → arkimet.cfg.Sections¶: return matcher alias database for this session

has_dataset(name: str) → bool¶: check if the dataset pool has a dataset with the given name

has_datasets() → bool¶: return True if the session contains datasets in the dataset pool

load_aliases(aliases: Union[str, arkimet.cfg.Sections])¶: add the given set of aliases to the alias database in this session

matcher(query: str) → arkimet.Matcher¶: parse an arkimet matcher expression, using the aliases in this session, and return the Matcher object for it

merged() → arkimet.dataset.Dataset¶: return a merged dataset querying all datasets in this session

querymacro(name: str, macro: str) → arkimet.dataset.Dataset¶: create a QueryMacro dataset querying datasets from this session’s pool

class arkimet.dataset.SessionTimeOverride¶

Write functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples

class arkimet.dataset.Writer¶

Write functions for an arkimet dataset.

TODO: document

Examples:

TODO: add examples

acquire(md: arki.Metadata, replace: str = None, drop_cached_data_on_commit: bool = False)¶

Acquire the given metadata item (and related data) in this dataset

After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata object.

If the import failed, a subclass of arki.dataset.ImportError is raised.

acquire_batch(md: Iterable[arkimet.Metadata], replace: str = None, drop_cached_data_on_commit: bool = False) → Tuple[str]¶

Acquire the given metadata items (and related data) in this dataset

After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata objects.

No exception is raised in case of import failures. The function returns a tuple with the same length and the input sequence of metadata, and a string representing the outcome: “ok”, “duplicate”, or “error”.

flush()¶: Flush pending changes to disk

arkimet.dataset.read_config(pathname: str) → arki.cfg.Section¶: Read the configuration of a dataset at the given path or URL

arkimet.dataset.read_configs(pathname: str) → arki.cfg.Sections¶: Read the merged dataset configuration at the given path or URL

arkimet.dataset package¶

Submodules¶

arkimet.dataset.http module¶

Module contents¶

arkimet

Navigation

Related Topics