Skip to content

Analyze OPARA

Mine the 4.6M pre-computed experiment results — no experiments needed.

The OGAL benchmark archived at OPARA (DOI:10.25532/OPARA-862) contains all results from the paper arXiv:2506.03817. Reproducing these experiments from scratch requires significant computational resources (multiple CPU-weeks on an HPC cluster with hundreds of parallel SLURM jobs). However, you can skip the expensive computation entirely by downloading the pre-computed results and running only the lightweight evaluation scripts.


Getting Started

1. Get the Data

wget -c -O full_exp_jan.zip \
  "https://opara.zih.tu-dresden.de/bitstreams/38951489-5076-4544-a99b-c20dddfc2c6b/download"
unzip full_exp_jan.zip -d /path/to/results/

2. Setup Environment

git clone https://github.com/jgonsior/olympic-games-of-active-learning.git
cd olympic-games-of-active-learning
conda create --name ogal --file conda-linux-64.lock && conda activate ogal && poetry install
cp .server_access_credentials.cfg.example .server_access_credentials.cfg
# edit .server_access_credentials.cfg → set OUTPUT_PATH and DATASETS_PATH under [LOCAL]

3. Generate Leaderboard

python -m eva_scripts.final_leaderboard --EXP_TITLE full_exp_jan

This produces plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet — the main strategy ranking table from the paper (Table 1). The leaderboard is a matrix of strategies × datasets, where each cell contains the strategy's rank on that dataset. Lower average rank = better strategy overall.

4. Load and Explore

import pandas as pd

# Load leaderboard
lb = pd.read_parquet("/path/to/results/full_exp_jan/plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet")
print("Top strategies:", lb.mean(axis=0).sort_values().head(5))

# Load completed experiments
done = pd.read_csv("/path/to/results/full_exp_jan/05_done_workload.csv")
print(f"Total experiments: {len(done):,}")

Starter Recipe A: Compare Strategies Across Datasets

import pandas as pd
import os

RESULTS_DIR = "/path/to/results"  # Should match OUTPUT_PATH in .server_access_credentials.cfg

# Load time series (auto-generated by evaluation scripts if missing)
ts = pd.read_parquet(f"{RESULTS_DIR}/full_exp_jan/_TS/full_auc_weighted_f1-score.parquet")

# Average performance by strategy
by_strategy = ts.groupby('EXP_STRATEGY')['metric_value'].agg(['mean', 'std'])
print(by_strategy.sort_values('mean', ascending=False).head(10))

# Performance by batch size
by_batch = ts.groupby('EXP_BATCH_SIZE')['metric_value'].mean()
print(by_batch)

If _TS/*.parquet is missing: Run any evaluation script (e.g., python -m eva_scripts.final_leaderboard --EXP_TITLE full_exp_jan). The _TS/*.parquet files are auto-generated by multiple evaluation scripts when needed.


Starter Recipe B: Load Per-Cycle Metrics

# Load per-cycle accuracy for a specific strategy/dataset
accuracy = pd.read_csv(
    f"{RESULTS_DIR}/full_exp_jan/ALIPY_RANDOM/Iris/accuracy.csv.xz",
    compression='xz'
)

# Join with workload to get hyperparameters
done = pd.read_csv(f"{RESULTS_DIR}/full_exp_jan/05_done_workload.csv")
merged = done.merge(accuracy, on="EXP_UNIQUE_ID")
print(merged.head())

Key Files

File What It Contains
05_done_workload.csv 4.6M completed experiments (workload index)
<STRATEGY>/<DATASET>/accuracy.csv.xz Per-cycle accuracy
<STRATEGY>/<DATASET>/weighted_f1-score.csv.xz Per-cycle F1
_TS/*.parquet Aggregated time series (generated)
plots/final_leaderboard/*.parquet Strategy rankings (generated)
Key Columns
Column Description
EXP_UNIQUE_ID Primary key
EXP_DATASET Dataset integer ID — calculated at runtime based on the ordering of datasets in resources/kaggle_datasets.yaml followed by resources/openml_datasets.yaml. IDs are assigned sequentially starting from 1. See datasets/__init__.py.
EXP_STRATEGY Strategy enum — see AL_STRATEGY in resources/data_types.py (e.g., 1=ALIPY_RANDOM, 2=ALIPY_UNCERTAINTY_LC, etc.)
EXP_LEARNER_MODEL Learner model enum — see LEARNER_MODEL in resources/data_types.py (e.g., 1=RF, 5=RBF_SVM, 8=MLP)
EXP_BATCH_SIZE 1, 5, 10, 20, 50, 100

For data format details and correlation metric definitions, see Reference.


Next Steps

Goal Page
Reproduce paper figures / Run experiments Reproduce & Run
Research ideas from the data Research Ideas
Data formats, correlation metrics, terminology Reference