Analyze OPARA¶
Mine the 4.6M pre-computed experiment results — no experiments needed.
The OGAL benchmark archived at OPARA (DOI:10.25532/OPARA-862) contains all results from the paper arXiv:2506.03817. Reproducing these experiments from scratch requires significant computational resources (multiple CPU-weeks on an HPC cluster with hundreds of parallel SLURM jobs). However, you can skip the expensive computation entirely by downloading the pre-computed results and running only the lightweight evaluation scripts.
Getting Started¶
1. Get the Data¶
wget -c -O full_exp_jan.zip \
"https://opara.zih.tu-dresden.de/bitstreams/38951489-5076-4544-a99b-c20dddfc2c6b/download"
unzip full_exp_jan.zip -d /path/to/results/
2. Setup Environment¶
git clone https://github.com/jgonsior/olympic-games-of-active-learning.git
cd olympic-games-of-active-learning
conda create --name ogal --file conda-linux-64.lock && conda activate ogal && poetry install
cp .server_access_credentials.cfg.example .server_access_credentials.cfg
# edit .server_access_credentials.cfg → set OUTPUT_PATH and DATASETS_PATH under [LOCAL]
3. Generate Leaderboard¶
This produces plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet — the main strategy ranking table from the paper (Table 1). The leaderboard is a matrix of strategies × datasets, where each cell contains the strategy's rank on that dataset. Lower average rank = better strategy overall.
4. Load and Explore¶
import pandas as pd
# Load leaderboard
lb = pd.read_parquet("/path/to/results/full_exp_jan/plots/final_leaderboard/rank_sparse_zero_full_auc_weighted_f1-score.parquet")
print("Top strategies:", lb.mean(axis=0).sort_values().head(5))
# Load completed experiments
done = pd.read_csv("/path/to/results/full_exp_jan/05_done_workload.csv")
print(f"Total experiments: {len(done):,}")
Starter Recipe A: Compare Strategies Across Datasets¶
import pandas as pd
import os
RESULTS_DIR = "/path/to/results" # Should match OUTPUT_PATH in .server_access_credentials.cfg
# Load time series (auto-generated by evaluation scripts if missing)
ts = pd.read_parquet(f"{RESULTS_DIR}/full_exp_jan/_TS/full_auc_weighted_f1-score.parquet")
# Average performance by strategy
by_strategy = ts.groupby('EXP_STRATEGY')['metric_value'].agg(['mean', 'std'])
print(by_strategy.sort_values('mean', ascending=False).head(10))
# Performance by batch size
by_batch = ts.groupby('EXP_BATCH_SIZE')['metric_value'].mean()
print(by_batch)
If _TS/*.parquet is missing: Run any evaluation script (e.g., python -m eva_scripts.final_leaderboard --EXP_TITLE full_exp_jan). The _TS/*.parquet files are auto-generated by multiple evaluation scripts when needed.
Starter Recipe B: Load Per-Cycle Metrics¶
# Load per-cycle accuracy for a specific strategy/dataset
accuracy = pd.read_csv(
f"{RESULTS_DIR}/full_exp_jan/ALIPY_RANDOM/Iris/accuracy.csv.xz",
compression='xz'
)
# Join with workload to get hyperparameters
done = pd.read_csv(f"{RESULTS_DIR}/full_exp_jan/05_done_workload.csv")
merged = done.merge(accuracy, on="EXP_UNIQUE_ID")
print(merged.head())
Key Files¶
| File | What It Contains |
|---|---|
05_done_workload.csv |
4.6M completed experiments (workload index) |
<STRATEGY>/<DATASET>/accuracy.csv.xz |
Per-cycle accuracy |
<STRATEGY>/<DATASET>/weighted_f1-score.csv.xz |
Per-cycle F1 |
_TS/*.parquet |
Aggregated time series (generated) |
plots/final_leaderboard/*.parquet |
Strategy rankings (generated) |
Key Columns
| Column | Description |
|---|---|
EXP_UNIQUE_ID |
Primary key |
EXP_DATASET |
Dataset integer ID — calculated at runtime based on the ordering of datasets in resources/kaggle_datasets.yaml followed by resources/openml_datasets.yaml. IDs are assigned sequentially starting from 1. See datasets/__init__.py. |
EXP_STRATEGY |
Strategy enum — see AL_STRATEGY in resources/data_types.py (e.g., 1=ALIPY_RANDOM, 2=ALIPY_UNCERTAINTY_LC, etc.) |
EXP_LEARNER_MODEL |
Learner model enum — see LEARNER_MODEL in resources/data_types.py (e.g., 1=RF, 5=RBF_SVM, 8=MLP) |
EXP_BATCH_SIZE |
1, 5, 10, 20, 50, 100 |
For data format details and correlation metric definitions, see Reference.
Next Steps¶
| Goal | Page |
|---|---|
| Reproduce paper figures / Run experiments | Reproduce & Run |
| Research ideas from the data | Research Ideas |
| Data formats, correlation metrics, terminology | Reference |