Home

Overview

SVOE is a low-code framework providing scalable and highly configurable pipelines for streaming and batch feature engineering, predictive model training, real-time inference and backtesting. Built on top of Ray, the framework allows to build and scale your custom pipelines from multi-core laptop to a cluster of 1000s of nodes.

SVOE was originally built to accommodate a typical financial data research workflow (i.e. for Quant Researchers) with specific data models in mind (trades, quotes, order book updates, etc., hence some examples are provided in this domain), however the framework itself is domain-agnostic and it's components can easily be generalised and used in other fields which rely on real-time time-series based data processing and simulation (anomaly detection, sales forecasting etc.)

diagram

How does it work?

SVOE consists of three main components, each providing a set of tools for a typical Quant/ML engineer workflow

Featurizer helps defining, calculating, storing and managing real-time/offline (batch) features. It uses custom stream processing engine (Ray Actors + ZeroMQ) and Kappa-architecture to calculate offline features using online pipelines
Trainer allows training predictive models in distributed setting using popular ML libraries (XGBoost, PyTorch)
Backtester is used to validate and test predictive models along with user defined logic (i.e. trading strategies if used in financial domain)

You can read more in docs

Why use SVOE?

Easy to use standardized and flexible data and computation models for unified batch and stream computations - seamlessly switch between real-time and historical data for feature engineering, ML training and backtesting
Low code, modularity and configurability - define reusable components such as FeatureDefinition, DataSourceDefinition, FeaturizerConfig, TrainerConfig, BacktesterConfig etc. to easily run your experiments
Avoid train-predict inconsistency - Featurizer uses same feature definition for real-time inference and batch training
No need for external data infra/DWH - Featurizer Storage allows to store and catalog computed features in any object storage while keeping index in any SQL backend, provides Data Exploration API
Ray integration - SVOE runs wherever Ray runs (everywhere!)
MLFlow integration - store, retrieve and analyze your ML models with MLFlow API
Cloud / Kubernetes ready - use KubeRay or native Ray on AWS to scale out your workloads in a cloud
Easily integrates with orchestrators (Airflow, Luigi, Prefect) - SVOE provides basic Airflow Operators for each component to easily orchestrate your workflows
Real-time inference without MLOps burden - no need to maintain model containerization pipelines, FastAPI services and model registries. Deploy with simple Python API or yaml using InferenceLoop
Designed for high volume low granularity data - as an example, when used in financial domain, unlike existing financial ML frameworks which use only OHLCV as a base data model, SVOE's Featurizer provides flexible tools to use and customize any data source (ticks, trades, book updates, etc.) and build streaming and historical features
Minimized number of external dependencies - SVOE is built using Ray Core primitives and has no heavyweight external dependencies (stream processor, distributed computing engines, storages, etc.) which allows for easy deployment, maintenance and minimizes costly data transfers. The only dependency is an SQL database of user's choice. And it's all Python!

Please refer to Installation and Quick Start for more details