arkimet.dataset package¶
Submodules¶
arkimet.dataset.http module¶
- arkimet.dataset.http.expand_remote_query(remotes: arkimet.cfg.Sections, query: str) str ¶
Expand aliases on the query for all remote datasets given.
An exception is raised if some remotes have conflicting aliases definition.
- arkimet.dataset.http.get_alias_database(url: str) arki.cfg.Sections ¶
Read the alias database for the server at the given URL
- arkimet.dataset.http.load_cfg_sections(url: str) arki.cfg.Sections ¶
Read the configuration of the datasets at the given URL
Module contents¶
- class arkimet.dataset.Checker¶
Check functions for an arkimet dataset.
TODO: document
Examples:
TODO: add examples
- check(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)¶
Perform checking/fixing on the dataset
- repack(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False)¶
Perform repacking on the dataset
- segment_state(reporter: Any=None, segment_filter: Union[arkimet.Matcher, str]=, offline: bool=True, online: bool=True, readonly: bool=True, accurate: bool=False, time_override: int=None) Dict[str, str] ¶
Compute the state of each segment in the archive
- class arkimet.dataset.Dataset¶
A dataset in arkimet. It provides information about the dataset configuration, and allows to create readers, writers, and checkers to work with the dataset.
You can avoid the intermediate step of accessing Dataset objects, by calling directly
arkimet.dataset.Session
functionsarkimet.dataset.Session.dataset_reader()
,arkimet.dataset.Session.dataset_writer()
, andarkimet.dataset.Session.dataset_checker()
.If in some cases it can be useful to instantiate a Dataset and pass it around, this class is available, matching the C++ API.
Examples:
with session.dataset("dsname") as dataset: print(dataset.name) with dataset.reader() as reader: return reader.query_data()
- checker() arkimet.dataset.Checker ¶
return a checker for this dataset
- config¶
dataset configuration as an
arkimet.cfg.Section
object
- name¶
dataset name
- reader() arkimet.dataset.Reader ¶
return a reader for this dataset
- writer() arkimet.dataset.Writer ¶
return a writer for this dataset
- exception arkimet.dataset.ImportDuplicateError¶
The item to import already exists on the dataset
- exception arkimet.dataset.ImportError¶
Base class for dataset import errors
- exception arkimet.dataset.ImportFailedError¶
The import process failed on this metadata
- class arkimet.dataset.Reader¶
Read functions for an arkimet dataset.
TODO: document
Examples:
TODO: add examples
- query_bytes(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, data_start_hook: Callable[[], None] = None, postprocess: str = None, metadata_report: str = None, summary_report: str = None, file: Union[int, BinaryIO] = None, progres=None) Union[None, bytes] ¶
query a dataset, piping results to a file
- Parameters
matcher – the matcher string to filter data to return.
with_data – if True, also load data together with the metadata.
sort – string with the desired sort order of results.
data_start_hook – function called before sending the data to the file
postprocess – name of a postprocessor to use to filter data server-side
metadata_report – name of the server-side report function to run on results metadata
summary_report – name of the server-side report function to run on results summary
file – the output file. The file can be a file-like object, or an integer file or socket handle. If missing, data is returned in a bytes object
progress – an object with 3 methods:
start(expected_count: int=0, expected_bytes: int=0)
,update(count: int, bytes: int)
, anddone(total_count: int, total_bytes: int)
, to call for progress updates.
- query_data(matcher: Union[arki.Matcher, str] = None, with_data: bool = False, sort: str = None, on_metadata: Callable[[metadata], Optional[bool]] = None) Union[None, List[arki.Metadata]] ¶
query a dataset, processing the resulting metadata one by one
- Parameters
matcher – the matcher string to filter data to return.
with_data – if True, also load data together with the metadata.
sort – string with the desired sort order of results.
on_metadata – a function called on each metadata, with the Metadata object as its only argument. Return None or True to continue processing results, False to stop.
progress – an object with 3 methods:
start(expected_count: int=0, expected_bytes: int=0)
,update(count: int, bytes: int)
, anddone(total_count: int, total_bytes: int)
, to call for progress updates.
- query_summary(matcher: Union[arki.Matcher, str] = None, summary: arkimet.Summary = None, progress=None) arkimet.Summary ¶
query a dataset, returning an arkimet.Summary with the results
- Parameters
matcher – the matcher string to filter data to return.
summary – not None, add results to this arkimet.Summary, and return it, instead of creating a new one.
- class arkimet.dataset.Session¶
Shared configuration, aliases, and working data used to work with arkimet datasets.
A Session stores alias information, and preloads
arkimet.dataset.Dataset
objects.Adding a remote dataset to the Session dataset pool will download the alias database of its server and merge it into the current one, raising an error in case of inconsistencies.
Datasets in the pool can be referred by name. It is possible to create datasets given their configuration, without adding them to the pool.
Session is also used to instantiate matchers using its alias database.
Examples:
# Long version with arkimet.dataset.Session() as session: for section in config.values(): session.add_dataset(section) matcher = session.matcher("level:ground") with session.dataset("dsname") as dataset: with dataset.reader() as reader: result = reader.query_data(matcher) # Short version with arkimet.dataset.Session() as session: for section in config.values(): session.add_dataset(section) with session.dataset_reader("dsname") as reader: with dataset.reader() as reader: result = reader.query_data("level:ground")
- add_dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]])¶
add a dataset to the Session pool
If a dataset with the same name already exists in the pool, it raises an exception.
If the dataset is remote, the aliases used by the remote server will be added to the session alias database. If different servers define some aliases differently, it raises an exception.
- dataset(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Dataset ¶
return a Dataset give its configuration
- dataset_checker(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Checker ¶
return a dataset checker give its configuration
- dataset_pool_size() int ¶
return how many datasets are in the dataset pool
- dataset_reader(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Reader ¶
return a dataset reader give its configuration
- dataset_writer(cfg: Union[str, arkimet.cfg.Section, Dict[str, str]] = None, name: str = None) arkimet.dataset.Writer ¶
return a dataset writer give its configuration
- datasets() List[arkimet.dataset.Dataset] ¶
return a list of all datasets in the session pool
- expand_query(query: str) str ¶
expand aliases in an Arkimet query, returning the same query without use of aliases
- get_alias_database() arkimet.cfg.Sections ¶
return matcher alias database for this session
- has_dataset(name: str) bool ¶
check if the dataset pool has a dataset with the given name
- has_datasets() bool ¶
return True if the session contains datasets in the dataset pool
- load_aliases(aliases: Union[str, arkimet.cfg.Sections])¶
add the given set of aliases to the alias database in this session
- matcher(query: str) arkimet.Matcher ¶
parse an arkimet matcher expression, using the aliases in this session, and return the Matcher object for it
- merged() arkimet.dataset.Dataset ¶
return a merged dataset querying all datasets in this session
- querymacro(name: str, macro: str) arkimet.dataset.Dataset ¶
create a QueryMacro dataset querying datasets from this session’s pool
- class arkimet.dataset.SessionTimeOverride¶
Write functions for an arkimet dataset.
TODO: document
Examples:
TODO: add examples
- class arkimet.dataset.Writer¶
Write functions for an arkimet dataset.
TODO: document
Examples:
TODO: add examples
- acquire(md: arki.Metadata, replace: str = None, drop_cached_data_on_commit: bool = False)¶
Acquire the given metadata item (and related data) in this dataset
After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata object.
If the import failed, a subclass of arki.dataset.ImportError is raised.
- acquire_batch(md: Iterable[arkimet.Metadata], replace: str = None, drop_cached_data_on_commit: bool = False) Tuple[str] ¶
Acquire the given metadata items (and related data) in this dataset
After acquiring the data successfully, the data can be retrieved from the dataset. Also, information such as the dataset name and the id of the data in the dataset are added to the Metadata objects.
No exception is raised in case of import failures. The function returns a tuple with the same length and the input sequence of metadata, and a string representing the outcome: “ok”, “duplicate”, or “error”.
- flush()¶
Flush pending changes to disk
- arkimet.dataset.read_config(pathname: str) arki.cfg.Section ¶
Read the configuration of a dataset at the given path or URL
- arkimet.dataset.read_configs(pathname: str) arki.cfg.Sections ¶
Read the merged dataset configuration at the given path or URL