Skip to content

Featurizer Real Time Data Recording

To get input data into SVOE, Featurizer provides two main methods: Data Ingest Pipeline and Real Time Data Recording. This section will describe the latter.

When Featurizer is run in streaming mode, users can specify which features/data sources they want to store.

Config and CLI

WIP

Block Writer

BlockWriter temporarily stores all real-time events in memory and periodically dumps them to storage. It expects Compactor to determine event grouping logic.

Compactors

Compactor derived classes contain logic to split in-memory events into blocks, which will be later put in block storage. It expects users to override compaction_split_indexes which defines logic of how to group in-memory events into blocks

class Compactor:
    def __init__(self, config):
        self.config = config

    def compaction_split_indexes(self, feature: Feature, events: List[Event]) -> List[int]:
        raise NotImplementedError

Default is MemoryBasedCompactor. It groups events into blocks of the same in-memory size, provided by user.