Featurizer Real Time Data Recording
To get input data into SVOE, Featurizer provides two main methods: Data Ingest Pipeline and Real Time Data Recording. This section will describe the latter.
When Featurizer is run in streaming mode, users can specify which features/data sources they want to store.
Config and CLI
WIP
Block Writer
BlockWriter
temporarily stores all real-time events in memory and periodically dumps them to storage.
It expects Compactor
to determine event grouping logic.
Compactors
Compactor
derived classes contain logic to split in-memory events into blocks, which will be later put in
block storage. It expects users to override compaction_split_indexes
which defines logic of how to group in-memory
events into blocks
class Compactor:
def __init__(self, config):
self.config = config
def compaction_split_indexes(self, feature: Feature, events: List[Event]) -> List[int]:
raise NotImplementedError
Default is MemoryBasedCompactor
. It groups events into blocks of the same in-memory size, provided by user.