Documentation Index
Fetch the complete documentation index at: https://reasonblocks.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
TokenSavingMiddleware is an optional, domain-agnostic middleware that reduces token consumption in long-running agent trajectories. It provides two independent mechanisms: tool-output compression and early-exit nudging. Both levers are on by default and can be toggled independently. A third, opt-in mechanism — perplexity-based word-level compression — is available when you supply a classifier.
Failures inside the middleware hook are logged and swallowed. The middleware never interrupts the agent loop.
TokenSavingMiddleware stacks alongside ReasonBlocksMiddleware rather than being embedded inside it. You can use either independently.Constructor
Minimum character length a
ToolMessage body must reach before it is compressed. Messages shorter than this threshold are left unchanged.Number of characters to keep from the start of a tool output when compressing. The head tends to contain the most actionable content.
Number of characters to keep from the end of a tool output when compressing. The tail often contains closing context, error messages, or final values.
Number of the most recent
ToolMessage objects to exempt from compression. These are the messages the agent is actively reasoning about; compressing them would degrade step quality.Minimum number of model calls that must have occurred before an early-exit nudge can be injected. This prevents the nudge from firing on short, healthy runs.
The text injected as a
HumanMessage when an early-exit nudge fires. The default message instructs the agent to stop investigating and submit its current best answer. Override this to match your agent’s specific submission instructions.A function
(steps: list[dict]) -> dict[str, float] that evaluates the agent’s trajectory and returns monitor scores keyed by signal name. The middleware checks the "streak", "hedge", and "diversity" keys to decide whether to fire the early-exit nudge. Pass default_suite_signals to use the built-in 6-monitor suite. When None, the early-exit lever is disabled even if enable_early_exit=True.Whether to enable head+tail tool-output compression. Set to
False to disable compression entirely while keeping the early-exit lever active.Whether to enable the early-exit nudge. Set to
False to disable the nudge entirely while keeping compression active.Whether to enable word-level perplexity-based compression. Off by default. Requires
perplexity_classifier to be set; if perplexity_classifier is None and this is True, no perplexity compression occurs.A
WordClassifier callable — (words: list[str]) -> list[bool] — that returns a keep/drop decision for each word. Use make_anthropic_classifier() to build one backed by a small Anthropic model, or supply your own heuristic. Required when enable_perplexity_compression=True.Messages from fewer than this many model calls ago are considered “recent” and are excluded from perplexity compression. Keeps the agent’s most active context at full fidelity.
Messages from between
perplexity_recent_cutoff and this many calls ago are in the “mid” tier and compressed at perplexity_keep_ratio_mid. Messages older than this are in the “old” tier.Target fraction of words to keep in “mid” tier messages (3–9 model calls ago).
0.55 means the classifier aims to keep roughly 55% of words.Target fraction of words to keep in “old” tier messages (10+ model calls ago). More aggressive than the mid tier.
The number of words per window passed to the classifier in a single call. Larger windows give the classifier more context but cost more tokens per call.
Stats attribute
EveryTokenSavingMiddleware instance exposes a stats attribute of type TokenSavingStats that accumulates counters across all before_model calls.
TokenSavingStats dataclass
TokenSavingStats is a plain dataclass. All fields default to 0.
Running count of head+tail compressions applied to
ToolMessage objects.Total characters removed across all head+tail compressions.
Number of times the early-exit nudge was injected into the message history.
Number of word-level perplexity compressions applied. Only increments when
enable_perplexity_compression=True.Total characters removed by word-level perplexity compression.
Number of times a cached compression decision was reused instead of calling the classifier again. Cache keys are
(message_id, target_keep_ratio).Standalone utilities
compress_tool_output()
Head+tail truncates a single tool output string when it exceeds a character threshold. Returns the content unchanged if it is within the threshold. You can call this directly when you want to compress a string outside of the middleware lifecycle.
The tool output string to compress.
Character length above which compression is applied. Strings at or below this length are returned unchanged.
Characters to keep from the start of the string.
Characters to keep from the end of the string.
The original string if it’s within the threshold, otherwise a head + omission notice + tail string of the form
"{head}\n\n[... N chars truncated ...]\n\n{tail}".make_anthropic_classifier()
Wraps an anthropic.Anthropic-compatible client as a WordClassifier for use with perplexity-based compression. The classifier asks a small Anthropic model to label each word keep or drop (LLMLingua-2 style, prompt-only — not true log-probability perplexity).
Falls back to the built-in heuristic classifier on any failure (parse error, timeout, rate limit), so the middleware never breaks because of a classifier error.
An
anthropic.Anthropic-compatible client instance. Must expose a client.messages.create() method with the standard Anthropic Messages API signature.The model used to classify words. A small, fast model such as Haiku is recommended to keep classification costs low.
The fraction of words the classifier should aim to keep. This value is included in the system prompt so the model can calibrate its labeling.
0.5 means aim for roughly 50% retention.A
WordClassifier callable with signature (words: list[str]) -> list[bool]. Pass this to TokenSavingMiddleware(perplexity_classifier=...).default_suite_signals()
Runs the built-in 6-monitor suite over a list of agent steps and returns per-monitor scores as a dict. This is the default signals_fn for the early-exit lever.
A list of step dicts in the format produced by
build_steps_from_messages(). Each dict has keys: step_index, action, action_input, thought, observation, is_error.A dict mapping monitor names to float scores. The early-exit lever checks
"streak", "hedge", and "diversity" keys specifically.build_steps_from_messages()
Converts a LangChain message history into the step dict format expected by default_suite_signals() and the monitor suite. Pairs each AIMessage’s tool calls with their matching ToolMessage objects via tool_call_id.
A list of LangChain messages (
AIMessage, ToolMessage, HumanMessage, etc.) representing the agent’s trajectory so far.A list of step dicts, one per
AIMessage (or one per tool call when an AIMessage has multiple tool calls). Each dict contains: