RTLLMForcedDecoderBased (Forced-decoding with separator-token competition)
Idea
RTLLMForcedDecoderBased performs segmentation by explicitly testing whether the model would prefer to emit a segmentation marker at each position in the trace.
Instead of relying on a scalar uncertainty signal (entropy) or the likelihood of the observed token (surprisal), this engine compares:
- the log-probability of the true next token (forced decoding), vs.
- the log-probability of one or more separator candidates (e.g., a special token
<|seg|>and optionally a single-token"Step")
If separator candidates become relatively more likely than the true continuation—especially around punctuation—the engine inserts a boundary.
This can be interpreted as a lightweight proxy for:
“Would the model naturally insert a step marker here?”
Method (high-level)
Separator candidates
The engine augments the tokenizer with a special separator token (default: <|seg|>), and also attempts to include "Step" / " Step" if they are single-token under the tokenizer.
Let S be the set of separator candidate token IDs.
Pass 1: calibrate a threshold (score distribution)
For each position in the trace, under forced decoding:
- Compute:
true_logp = log p(t_i | context)sep_logp = max_{s∈S} log p(s | context)
-
Compute a gap: [ \text{gap}_i = \text{sep_logp}_i - \text{true_logp}_i ] Larger gap means the separator is more competitive.
- Add a punctuation prior based on the last emitted character:
- hard punctuation (
. ! ? ; : \n) →sep_bias = 1.0 - soft punctuation (
, \t) →sep_bias ≈ 0.5(first pass) /0.2(second pass) - otherwise →
0.0
- hard punctuation (
-
Normalize
gapover the trace using z-scoring, then compute: [ \text{score}_i = \alpha \cdot \text{gap_z}_i + \beta \cdot \text{sep_bias}_i ] - Choose a threshold as a percentile (
quantile) of the scores: [ \tau = \text{percentile}(\text{score}, q=\text{quantile}) ]
This produces (τ, gap_mean, gap_std) for reuse.
Pass 2: segmentation with token insertion
Perform a second forced-decoding pass. At each step compute the same score, and if:
score > τ, and- the model is not already emitting a separator, and
- the previous emitted token was not
<|seg|>
then the engine inserts the separator token into the model context (without consuming a trace token) and records a boundary at the current character cursor.
Finally, convert boundaries into (start, end) character offsets.
Models used
The engine supports (as implemented):
Qwen/Qwen2.5-7B-Instruct-1MQwen/Qwen2.5-7B-Instructmistralai/Mixtral-8x7B-Instruct-v0.1
Implementation details:
- Uses
AutoModelForCausalLMandAutoTokenizer. - Uses
tokenizer.apply_chat_template(...)with asystem_prompt. - Adds
sep_tokas an additional special token and callsmodel.resize_token_embeddings(...). - Uses
device_map="auto"andtorch_dtype="auto".
For reproducibility in the paper: report the exact model(s), the
system_prompt, and whether"Step"is used as a separator candidate for that model/tokenizer (it depends on tokenization).
Key parameters
-
system_prompt: str
System instruction used in chat formatting. -
model_name: str
One of the supported model identifiers above. -
sep_tok: str(default:"<|seg|>")
Special separator token added to the tokenizer and used for insertion. -
alpha: float(default:1.0)
Weight on the normalized gap term (separator-vs-true competition). -
beta: float(default:2.0in_segment,1.5in internal passes)
Weight on the punctuation prior. -
quantile: float(default:90.0)
Percentile used to choose the insertion threshold. Larger values typically mean fewer insertions (more conservative). -
max_kv_tokens: int(default:512)
Caps the KV cache length viaDynamicCache.crop(...).
Usage
from rt_seg import RTSeg
from rt_seg import RTLLMForcedDecoderBased
trace = "First step... Then second step... Finally conclude."
segmentor = RTSeg(
engines=RTLLMForcedDecoderBased,
seg_base_unit="clause", # included for API consistency; this engine operates token-wise
)
offsets, labels = segmentor(
trace,
system_prompt="You are a helpful assistant that reads reasoning traces.",
model_name="Qwen/Qwen2.5-7B-Instruct",
sep_tok="<|seg|>",
alpha=1.0,
beta=2.0,
quantile=90.0,
max_kv_tokens=512,
)
segments = [trace[s:e] for s, e in offsets]