RTEntailmentBasedSegmentation (Coherence via NLI / entailment scoring)

Idea

RTEntailmentBasedSegmentation segments a reasoning trace by measuring local discourse coherence using a Natural Language Inference (NLI) model.
Instead of embedding similarity, it evaluates whether the next unit (sentence/clause) is entailed by or at least consistent with the recent context of the current segment.

When coherence drops below a dynamic threshold, the engine starts a new segment. This makes the method sensitive to logical/semantic continuity rather than topical similarity alone.

Method (high-level)

Given base units (u_1, \dots, u_m) (sentences or clauses):

Base segmentation Compute base offsets via SegBase.get_base_offsets(trace, seg_base_unit=...), then extract the unit strings.
Local context construction Maintain the current segment current_segment.
For the next unit (u_i), build a short context from the last two units in the current segment: [ c_i = u_{i-2} \,\Vert\, u_{i-1} ] (joined with whitespace)
NLI-based coherence score Feed (premise=context, hypothesis=next_unit) into an NLI model and compute class probabilities:
- (p(\text{entailment}))
- (p(\text{neutral}))
- (optionally (p(\text{contradiction})), not used in the final score)
The engine defines a scalar coherence score: [ \text{score} = p(\text{entailment}) + 0.4 \cdot p(\text{neutral}) ] clipped to ([0,1]).
Adaptive thresholding Maintain a running average coherence score running_avg_score (initialized to 0.85).
Compute a dynamic threshold:
- dynamic_threshold = max(min_threshold, running_avg_score - tolerance) Accept the next unit if:
- score >= dynamic_threshold
Lookahead (“bridge/outlier”) heuristic If the current unit fails, check the next unit (u_{i+1}) against the same context.
If (u_{i+1}) is coherent, keep the current unit to preserve flow (treat as bridge/outlier).
Segment emission If coherence fails (and lookahead does not rescue), finalize the current segment offset and start a new segment at the current unit.
Return the final character offsets and "UNK" labels.

Models used

Although the load_model type annotation lists causal LMs, the implementation uses:

AutoModelForSequenceClassification
AutoTokenizer

The default model_name in _segment is:

MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7

This is an NLI/XNLI-style model that provides logits over NLI classes.

Note (implementation detail): The NLI label index mapping is assumed in _predict_coherence (entailment = probs[0], neutral = probs[1]). If you switch models, verify the class order in config.id2label, as different checkpoints may order labels differently (e.g., contradiction/neutral/entailment).

Key parameters

seg_base_unit: Literal["sent", "clause"]
Base unit granularity for coherence testing.
model_name: str
NLI sequence classification model (default above).
tolerance: float (default: 0.15)
How far the coherence score may fall below the running average before triggering a split.
Larger tolerance → fewer splits; smaller tolerance → more splits.
min_threshold: float (default: 0.25)
Absolute floor on the dynamic threshold, preventing very low thresholds when the running average drops.

Usage

from rt_seg import RTSeg
from rt_seg import RTEntailmentBasedSegmentation

trace = "..."

segmentor = RTSeg(
    engines=RTEntailmentBasedSegmentation,
    seg_base_unit="sent",
)

offsets, labels = segmentor(
    trace,
    seg_base_unit="sent",
    model_name="MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7",
    tolerance=0.15,
    min_threshold=0.25,
)

segments = [trace[s:e] for s, e in offsets]