Writing experiment configs¶
This page shows the common patterns for expressing a study — measuring an application, adding
interference, chaining runs. For the exhaustive field list (every global_options key, defaults,
allocation/scheduling rules), see the Configuration schema.
Recall the victim / aggressor model: a victim is measured and runs to
completion (end: ""); an aggressor generates interference and is force-killed once the
victims finish (end: "f").
Pattern 1 — a single measured application (baseline)¶
The simplest experiment: one victim, collected, run on all nodes. Use this to establish a baseline before introducing interference.
{
"global_options": { "name": "baseline", "numnodes": "4", "ppn": "1" },
"experiments": {
"a2a_alone": {
"apps": {
"0": { "path": "a2a_comm_only.py", "args": "-msgsize 8192 -iter 1000",
"collect": true, "start": "0", "end": "" }
}
}
}
}
Pattern 2 — victim vs aggressor (the core use case)¶
Run a measured victim alongside an aggressor to quantify interference. Use partitioned
allocation so each gets its own nodes, and mark the aggressor end: "f" so it's killed as soon as
the victim finishes.
{
"global_options": {
"name": "interference",
"numnodes": "8", "ppn": "1",
"allocationmode": "p", "partitionsplit": "50:50", "partitionlayout": "i"
},
"experiments": {
"a2a_vs_g500": {
"apps": {
"0": { "path": "a2a_comm_only.py", "args": "-msgsize 8192 -iter 1000",
"collect": true, "start": "0", "end": "", "partition": 0 },
"1": { "path": "graph500/g500_wrapper.py", "args": "",
"collect": false, "start": "0", "end": "f", "partition": 1 }
}
}
}
}
partitionlayout: "i" interleaves the two partitions' nodes so the victim and aggressor share the
network fabric — exactly the contention you usually want to measure.
Pattern 3 — sequential / dependent runs¶
Use start: "sN" to launch an application only after application N has finished. This chains
applications in sequence within one experiment.
"apps": {
"0": { "path": "a2a_comm_only.py", "args": "-msgsize 8 -iter 1000", "collect": true, "start": "0", "end": "" },
"1": { "path": "a2a_comm_only.py", "args": "-msgsize 1024 -iter 1000", "collect": true, "start": "s0", "end": "" },
"2": { "path": "a2a_comm_only.py", "args": "-msgsize 65536 -iter 1000","collect": true, "start": "s1", "end": "" }
}
Here app 1 starts when app 0 ends, and app 2 when app 1 ends — a message-size sweep run one after
another. (This is the shape used in examples/lorenzo/cong_analysis.json.)
Pattern 4 — timed termination¶
Give end a number to stop an application after a fixed number of seconds — useful for a
long-running aggressor you want bounded regardless of the victim:
"1": { "path": "graph500/g500_wrapper.py", "args": "", "collect": false, "start": "5", "end": "60" }
This app starts 5 seconds in and is terminated after 60 seconds.
Pattern 5 — several experiments in one run¶
Put multiple named experiments in one config to run a baseline and a co-run back-to-back; they
execute sequentially and each writes its own results. Use local_options to vary settings per
experiment.
"experiments": {
"baseline": { "description": "victim alone", "apps": { "0": { "path": "a2a_comm_only.py", "args": "-msgsize 8192 -iter 1000", "collect": true, "start": "0", "end": "" } } },
"with_aggressor": {
"description": "victim + aggressor",
"local_options": { "allocationmode": "p", "partitionsplit": "50:50" },
"apps": {
"0": { "path": "a2a_comm_only.py", "args": "-msgsize 8192 -iter 1000", "collect": true, "start": "0", "end": "", "partition": 0 },
"1": { "path": "graph500/g500_wrapper.py", "args": "", "collect": false, "start": "0", "end": "f", "partition": 1 }
}
}
}
Choosing an allocation mode¶
| Mode | Value | Use when |
|---|---|---|
| Linear | l (default) |
Each app should get a contiguous block of nodes. |
| Interleaved | i |
Apps should be spread round-robin across nodes (more fabric sharing). |
| Partitioned | p |
You want explicit victim/aggressor partitions, sized with partitionsplit. |
For partitioned mode, an app's partition field (or, by default, 0 if collected else 1)
decides which partition it lands in. See Allocation fields.
Tuning convergence¶
CRAB repeats each experiment until its convergence-target metrics stabilize. The relevant knobs:
minruns/maxruns— the floor and ceiling on repetitions.alpha/beta— the confidence level and the relative width threshold (CI width< beta × mean).convergeall— require every metric to converge, not just those flaggedconvin the wrapper.timeout— a wall-clock cap that ends the experiment regardless of convergence.
Start with the defaults (minruns: 10, maxruns: 20, alpha/beta: 0.05) and tighten beta
if you need narrower intervals. See Convergence.
Browse real examples¶
The examples/ directory has working configs per system (examples/leonardo/, examples/local/,
examples/cluster_di/, examples/lorenzo/) covering baselines, interleaved tests, sequential
sweeps, and aggressor mixes — a good starting point to copy and adapt. A representative
end-to-end example is examples/leonardo/blink_noise_study.json (Blink interference sweep with
partitioned victim/aggressor allocation).