ReasonBlocks ships three built-in monitor weight profiles —Documentation Index
Fetch the complete documentation index at: https://reasonblocks.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
coding, pr_review, and qa — that pre-tune which trajectory signals matter most for different kinds of agent work. You pick a profile via monitor_task_profile, and optionally override individual weights with monitor_weights if you need finer control.
The six monitors and their default weights
The monitor suite runs six heuristic detectors on every step. Each returns a score from0.0 (healthy) to 1.0 (maximum badness). The weighted sum of all six scores forms the composite health signal that drives steering decisions.
The default weights (DEFAULT_WEIGHTS from monitors/suite.py) are:
| Monitor | Weight | What it detects |
|---|---|---|
streak | 0.35 | Same action called consecutively — the strongest loop signal |
call_count | 0.15 | Total tool-call budget consumed as a fraction of the run limit |
edit_revert | 0.15 | Edit-fail-edit thrashing or content-similar reverts on the same file |
test_repeat | 0.15 | Same normalized test failure signature appearing repeatedly |
diversity | 0.10 | Collapsed tool exploration — last 5 calls use ≤2 distinct tools |
hedge | 0.10 | Rising hedging density or explicit retraction phrases in reasoning |
streak monitor carries the most weight because repetitive identical actions are the most reliable indicator of a stuck agent. call_count and edit_revert share the second tier because budget overruns and file-level thrashing are common secondary symptoms.
The fire threshold
A monitor is considered fired when its individual score is at or aboveDEFAULT_FIRE_THRESHOLD = 0.6. Fired monitors are listed in step_log[n].monitors_fired and reported in the dashboard. The composite score (the weighted sum of all six) is what determines whether the middleware injects steering guidance — the individual fire threshold is used for reporting and for gating E1 retrieval.
Built-in task profiles
You set the active profile withmonitor_task_profile on the ReasonBlocks constructor or on ReasonBlocksConfig:
| Monitor | coding (default) | pr_review | qa |
|---|---|---|---|
streak | 0.35 | 0.35 | 0.35 |
call_count | 0.15 | 0.20 | 0.20 |
edit_revert | 0.15 | 0.05 | 0.05 |
test_repeat | 0.15 | 0.05 | 0.20 |
diversity | 0.10 | 0.20 | 0.10 |
hedge | 0.10 | 0.15 | 0.10 |
coding is the default. It weights edit_revert and test_repeat equally with call_count because coding agents frequently edit files and run tests, and thrashing on either is a strong failure signal.
pr_review reduces edit_revert and test_repeat to near-zero because a PR review agent legitimately reads many files in sequence without calling test runners — those signals would fire constantly on healthy behavior. It raises diversity and hedge instead, because a reviewer that stops exploring new files or becomes uncertain is the failure shape to detect.
qa raises test_repeat back up because QA agents genuinely run tests in tight loops, and repeated identical failures are the primary thing to catch. It also raises call_count because QA runs tend to be long.
Overriding individual weights
Usemonitor_weights (on ReasonBlocksConfig) or pass a weights dict directly to evaluate_all when you want to nudge the profile without replacing it entirely. The dict is applied on top of the profile, so you only need to list the monitors you want to change.
- Via ReasonBlocksConfig
- Via ReasonBlocks constructor
Unspecified monitor names fall through to the active profile. Unknown monitor names (ones the server does not recognize) are silently dropped. Negative weights are clamped to
0.Boosting streak sensitivity
Thestreak monitor fires at 1.0 when the same action is called five or more times in a row (cap=5 in score_streak). If your agent legitimately calls a tool several times in succession during normal operation, you can reduce its weight rather than disable it entirely:
Reducing hedge sensitivity
Thehedge monitor scores 1.0 on any explicit retraction phrase ("I was wrong", "never mind", "disregard that") and scores by rising hedging density otherwise. If your agent frequently uses hedging language as a stylistic choice rather than as a sign of confusion, reduce the hedge weight:
Evaluating the suite locally
You can run the monitor suite directly against a step list without making any API calls — useful for debugging or unit testing:scores— per-monitor float scores in[0, 1]total— weighted composite scorefired— list of monitor names whose score is at or aboveDEFAULT_FIRE_THRESHOLD(0.6)weights— the weights that were actually used