Skip to Content
DocsCLI Reference

CLI Reference

All commands are Hydra-configured. Key-value overrides (key=value) are passed directly on the CLI.

saffron <command> [overrides...]

collect

Runs model inference with forward hooks attached. Writes activations to <out_dir>/activations/activations.h5.

saffron collect model=<rf3|rfd3> \ hooks=<group> \ inputs=<path/to/inputs.json> \ out_dir=<output_dir>
OverrideRequiredDescription
modelrf3 or rfd3
hooksHook config group: rf3_default, rfd3_partial
steeringSteering config group: none, sae_block12_f639, …
inputsPath to inputs JSON (per-design list)
out_dirOutput directory
inference_engineOverride the Hydra inference engine config

hooks= and steering= resolve from sae/src/sae/configs/{hooks,steering}/. Identical mechanism to inference_engine=.


train

Trains a Matryoshka BatchTopK SAE on collected activations.

saffron train \ activations_path=<path/to/activations.h5> \ hook_name=<block6|block8|block12|> \ [steps=20000] [batch_size=4096]
OverrideDefaultDescription
activations_pathHDF5 file from saffron collect
hook_nameWhich hook’s activations to train on
architecturematryoshka_batch_top_kSAE architecture
steps20000Training steps
batch_size4096Batch size
devicecuda:0Torch device
use_wandbfalseEnable W&B logging
out_dirauto-timestampedCheckpoint output directory

eval

Scores SAE features: activation frequency, top residues, per-feature AUROC.

saffron eval \ checkpoint_path=<path/to/final.pt> \ activations_path=<path/to/activations.h5> \ hook_name=<block12> \ [metadata_dir=<dir>]
OverrideDefaultDescription
checkpoint_pathSAE checkpoint .pt
activations_pathHDF5 activations file
hook_nameHook to evaluate
metadata_dirDir with per-design labels (for AUROC)
num_features12Features to display per analysis
top_residues5Top residues per feature
eval_batch_size4096Eval batch size

screen

Runs a trained detector bundle on new sequences or structures. Alias for detect screen.

saffron screen model=<rf3|rfd3> \ inputs=<path/to/inputs.json> \ detector=<path/to/detector_bundle.pt> \ out_dir=<output_dir>

steer

Identical dispatch to collect. Adds or ablates directions in activation space at each diffusion step.

saffron steer model=<rf3|rfd3> \ hooks=<group> \ steering=<group> \ inputs=<path/to/inputs.json> \ out_dir=<output_dir>

See Steering for steering config group reference.


compute_steering_vector

Computes mean(positives) − mean(negatives) per hook from two activation sets. Outputs one .pt per hook.

saffron compute_steering_vector \ inputs=<path/to/diff_config.yaml>

diff_config.yaml schema:

activations_h5: outputs/collect/train/activations/activations.h5 labels_csv: data_pipelines/vaxijen/labels.tsv positive_label: 1 negative_label: 0 hooks: [block6, block8, block12] normalize: unit # unit | none out_dir: outputs/steering/vectors/haz_minus_ben
Last updated on