RTLLMEntropy (Entropy-based boundary detection)
Idea
RTLLMEntropy segments a reasoning trace by identifying local increases/decreases in the language model’s predictive uncertainty.
Intuitively, when the model transitions between reasoning steps (e.g., after a sentence boundary), the distribution over next tokens can change abruptly. This engine operationalizes that via token-level entropy computed under forced decoding of the trace.
The method is model-driven but does not require generating text: it runs a single pass over the trace and computes an uncertainty signal.
Method (high-level)
Given a trace x = (t_1, …, t_n) tokenized under a causal LM:
-
Forced decoding pass
For each positioni, compute the next-token distributionp(· | t_<i)and its entropy: [ H_i = - \sum_v p(v|t_{<i}) \log p(v|t_{<i}) ] The true next tokent_iis then forced as input for the next step. - Local window comparison
For candidate boundary positionsi, compare the mean entropy in a preceding and following window:prev = mean(H_{i-window : i})next = mean(H_{i : i+window})delta = next - prev
- Candidate filtering at punctuation
Deltas are computed only at punctuation anchors (hard punctuation), i.e. when the previous token corresponds to one of:. ! ? \n
-
Z-normalization + thresholding
The delta values are z-scored (delta_z). A threshold is set as a percentile (quantile) ofdelta_z.
A boundary is inserted when:delta_z < threshold
(i.e., the entropy shift is sufficiently “low” relative to the distribution—see Notes below on interpretation.)
- Offsets output
The boundary token positions are converted into character offsets and returned as(start, end)segments.
Labels are currently returned as"UNK"for all segments.
Models used
The engine supports (and was designed to run with) the following chat/instruction-tuned causal LMs:
Qwen/Qwen2.5-7B-Instruct-1MQwen/Qwen2.5-7B-Instructmistralai/Mixtral-8x7B-Instruct-v0.1
Implementation details:
- Uses
AutoModelForCausalLMandAutoTokenizer. - Uses
tokenizer.apply_chat_template(...)to wrap the trace with asystem_prompt. - Runs with
device_map="auto"andtorch_dtype="auto".
If your experiments used a subset of these models, list the exact ones in the paper/docs for reproducibility.
Key parameters
-
system_prompt: str
System instruction used in the chat template. This affects token distributions and therefore the entropy signal. -
model_name: str
One of the supported model identifiers above. -
window: int(default:30in_segment,6in_trace_pass)
Window size used to compute local mean entropies before/after a candidate boundary. -
quantile: int(default:10)
Percentile used to set the threshold overdelta_z. Lower values typically imply a more conservative threshold (fewer splits), higher values typically imply more permissive splitting—subject to the sign convention (see Notes). -
max_kv_tokens: int(default:512)
Maximum length of cached context for the LM KV cache. If exceeded, the cache is truncated.
Usage
from rt_seg import RTSeg
from rt_seg import RTLLMEntropy
trace = "First step... Then second step... Finally conclude."
segmentor = RTSeg(
engines=RTLLMEntropy,
seg_base_unit="clause", # base unit may be ignored by this engine; included for API consistency
)
offsets, labels = segmentor(
trace,
system_prompt="You are a helpful assistant that reads reasoning traces.",
model_name="Qwen/Qwen2.5-7B-Instruct",
window=30,
quantile=10,
max_kv_tokens=512,
)
segments = [trace[s:e] for s, e in offsets]