Quick start

For this example, we will consider a scenario which often occurs in financial markets simulation, however please note that the framework is not limited to financial data and can be used with whatever scenario user provides. As an example, here is a simple 3 step tutorial to build a simple mid-price prediction model based on past price and volatility.

Run Featurizer to construct mid-price and volatility features from partial order book updates, 5 second lookahead label as prediction target, using 1 second granularity data

Define featurizer-config.yaml

start_date: '2023-02-01 10:00:00'
end_date: '2023-02-01 11:00:00'
label_feature_index: 0
label_lookahead: '5s'
features_to_store: [0, 1]
feature_configs:
  - feature_definition: price.mid_price_fd.MidPriceFD
    name: mid_price
    params:
      data_source: &id001
        - exchange: BINANCE
          instrument_type: spot
          symbol: BTC-USDT
      feature:
        sampling: 1s
  - feature_definition: volatility.volatility_stddev_fd.VolatilityStddevFD
    params
      data_source: *id001
      feature:
        sampling: 1s

See MidPriceFD and VolatilityStddevFD for implementation details

Run Featurizer

CLIPython API

svoe featurizer run <path_to_config> --ray-address <addr> --parallelism <num-workers>

Featurizer.run(path=<path_to_config>, ray_address=<addr>, parallelism=<num_workers>)

Once calculation is finished, load sampled FeatureLabelSet dataframe to your local client

CLI

svoe featurizer get-data --every-n <every_nth_row>

This produces

      timestamp  receipt_timestamp  label_mid_price-mid_price  mid_price-mid_price  feature_VolatilityStddevFD_62271b09-volatility
0     1.675234e+09       1.675234e+09                  23084.800            23084.435                                        0.000547
1     1.675234e+09       1.675234e+09                  23083.760            23084.355                                        0.040003
2     1.675234e+09       1.675234e+09                  23083.505            23084.635                                        0.117757
3     1.675234e+09       1.675234e+09                  23084.610            23085.020                                        0.257091
4     1.675234e+09       1.675234e+09                  23084.725            23084.800                                        0.242034
...            ...                ...                        ...                  ...                                             ...

We can also visualize the results

CLI

svoe featurizer plot --every-n <every_nth_row>

Once we have our FeatureLabelSet calculated and loaded in cluster memory, let's use Trainer to train XGBoost model to predict mid-price 5 seconds ahead, validate the model, tune hyperparams and pick best model

Define config

xgboost:
  params:
    tree_method: 'approx'
    objective: 'reg:linear'
    eval_metric: [ 'logloss', 'error' ]
  num_boost_rounds: 10
  train_valid_test_split: [0.5, 0.3]
num_workers: 3
tuner_config:
  param_space:
    params:
      max_depth:
        randint:
          lower: 2
          upper: 8
      min_child_weight:
        randint:
          lower: 1
          upper: 10
  num_samples: 8
  metric: 'train-logloss'
  mode: 'min'
max_concurrent_trials: 3

Run Trainer

CLIPython API

svoe trainer run --config-path <config-path> --ray-address <addr>

config = TrainerConfig.load_config(config_path)
trainer_manager = TrainerManager(config=config, ray_address=ray_address)
trainer_manager.run(trainer_run_id='sample-run-id', tags={})

Visualize predictions

CLI

svoe trainer predictions --model-uri <model-uri>

Select best model

CLIPython API

svoe trainer best-model --metric-name valid-logloss --mode min

best-model-uri = mlflow_client.get_best_checkpoint_uri(metric_name=metric_name, experiment_name=experiment_name, mode=mode)

In this example, we use Backtester in the context of financial markets, hence our user-defined logic is based on a notion of trading strategy. This can be extended to any other scenario which user wants to emulate. Once we have our best model, we can plug it in our BaseStrategy derived class and run Backtester

Define config

featurizer_config_path: featurizer-config.yaml
inference_config:
  model_uri: <your-best-model-uri>
  predictor_class_name: 'XGBoostPredictor'
  num_replicas: <number-of-predictor-replicas> 
simulation_class_name: 'backtester.strategy.ml_strategy.MLStrategy'
simulation_params:
  buy_delta: 0
  sell_delta: 0
user_defined_params:
  portfolio_config: <portfolio_config>
  tradable_instruments_params:
    - exchange: 'BINANCE'
      instrument_type: 'spot'
      symbol: 'BTC-USDT'

See MLStrategy for example implementation

Run Backtester

CLIPython API

svoe backtester run --config-path <config-path> --ray-address <addr> --num-workers <num-workers>

config = BacktesterConfig.load_config(config_path)
backtester = Backtester.from_config(config)
backtester.run_remotely(ray_address=ray_address, num_workers=num_workers)

This will run a distributed event-driven backtest using features and models defined earlier

Get statistics with
Python API
stats = backtester.get_stats()