Analyze OPARA¶

Mine the 4.6M pre-computed experiment results — no experiments needed.

The OGAL benchmark archived at OPARA (DOI:10.25532/OPARA-862) contains all results from the paper arXiv:2506.03817. Reproducing these experiments from scratch requires significant computational resources (multiple CPU-weeks on an HPC cluster with hundreds of parallel SLURM jobs). However, you can skip the expensive computation entirely by downloading the pre-computed results and running only the lightweight evaluation scripts.

Getting Started¶

1. Get the Data¶

wget -c -O full_exp_jan.zip \
  "https://opara.zih.tu-dresden.de/bitstreams/38951489-5076-4544-a99b-c20dddfc2c6b/download"
unzip full_exp_jan.zip -d /path/to/results/

2. Setup Environment¶

git clone https://github.com/jgonsior/olympic-games-of-active-learning.git
cd olympic-games-of-active-learning
conda create --name ogal --file conda-linux-64.lock && conda activate ogal && poetry install
cp .server_access_credentials.cfg.example .server_access_credentials.cfg
# edit .server_access_credentials.cfg → set OUTPUT_PATH and DATASETS_PATH under [LOCAL]

3. Generate Leaderboard¶

python -m eva_scripts.final_leaderboard --EXP_TITLE full_exp_jan

This produces plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet — the main strategy ranking table from the paper (Table 1). The leaderboard is a matrix of strategies × datasets, where each cell contains the strategy's rank on that dataset. Lower average rank = better strategy overall.

4. Load and Explore¶

import pandas as pd

# Load leaderboard
lb = pd.read_parquet("/path/to/results/full_exp_jan/plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet")
print("Top strategies:", lb.mean(axis=0).sort_values().head(5))

# Load completed experiments
done = pd.read_csv("/path/to/results/full_exp_jan/05_done_workload.csv")
print(f"Total experiments: {len(done):,}")

Starter Recipe A: Compare Strategies Across Datasets¶

import pandas as pd
import os

RESULTS_DIR = "/path/to/results"  # Should match OUTPUT_PATH in .server_access_credentials.cfg

# Load time series (auto-generated by evaluation scripts if missing)
ts = pd.read_parquet(f"{RESULTS_DIR}/full_exp_jan/_TS/full_auc_weighted_f1-score.parquet")

# Average performance by strategy
by_strategy = ts.groupby('EXP_STRATEGY')['metric_value'].agg(['mean', 'std'])
print(by_strategy.sort_values('mean', ascending=False).head(10))

# Performance by batch size
by_batch = ts.groupby('EXP_BATCH_SIZE')['metric_value'].mean()
print(by_batch)

If _TS/*.parquet is missing: Run any evaluation script (e.g., python -m eva_scripts.final_leaderboard --EXP_TITLE full_exp_jan). The _TS/*.parquet files are auto-generated by multiple evaluation scripts when needed.

Starter Recipe B: Load Per-Cycle Metrics¶

# Load per-cycle accuracy for a specific strategy/dataset
accuracy = pd.read_csv(
    f"{RESULTS_DIR}/full_exp_jan/ALIPY_RANDOM/Iris/accuracy.csv.xz",
    compression='xz'
)

# Join with workload to get hyperparameters
done = pd.read_csv(f"{RESULTS_DIR}/full_exp_jan/05_done_workload.csv")
merged = done.merge(accuracy, on="EXP_UNIQUE_ID")
print(merged.head())

Key Files¶

File	What It Contains
`05_done_workload.csv`	4.6M completed experiments (workload index)
`<STRATEGY>/<DATASET>/accuracy.csv.xz`	Per-cycle accuracy
`<STRATEGY>/<DATASET>/weighted_f1-score.csv.xz`	Per-cycle F1
`_TS/*.parquet`	Aggregated time series (generated)
`plots/final_leaderboard/*.parquet`	Strategy rankings (generated)

Key Columns

Column	Description
`EXP_UNIQUE_ID`	Primary key
`EXP_DATASET`	Dataset integer ID — calculated at runtime based on the ordering of datasets in `resources/kaggle_datasets.yaml` followed by `resources/openml_datasets.yaml`. IDs are assigned sequentially starting from 1. See `datasets/__init__.py`.
`EXP_STRATEGY`	Strategy enum — see `AL_STRATEGY` in `resources/data_types.py` (e.g., 1=ALIPY_RANDOM, 2=ALIPY_UNCERTAINTY_LC, etc.)
`EXP_LEARNER_MODEL`	Learner model enum — see `LEARNER_MODEL` in `resources/data_types.py` (e.g., 1=RF, 5=RBF_SVM, 8=MLP)
`EXP_BATCH_SIZE`	1, 5, 10, 20, 50, 100

For data format details and correlation metric definitions, see Reference.

Next Steps¶

Goal	Page
Reproduce paper figures / Run experiments	Reproduce & Run
Research ideas from the data	Research Ideas
Data formats, correlation metrics, terminology	Reference