RTNewLine (Paragraph-based segmentation via blank lines)

Idea

RTNewLine is a minimal rule-based baseline that segments a reasoning trace using formatting structure rather than lexical or model-based cues.

It splits the trace into segments at blank-line boundaries, i.e., occurrences of \n\n, which often correspond to paragraph breaks, step breaks, or deliberate spacing inserted by the generator or annotator.

This engine is particularly useful as:

a fast baseline,
a preprocessing heuristic when traces already contain meaningful line breaks,
a robust fallback in environments without model dependencies.

Method (high-level)

Find segment start positions Identify all positions in the trace immediately following either:
- the start of the string (\A), or
- a blank-line delimiter (\n\n)
Concretely, the engine collects:
- positions = [m.end() for m in re.finditer(r'\n\n|\A', trace)]
Create consecutive spans Pair each start position with the next start position (or end of trace for the last segment):
- offsets = zip(positions, positions[1:] + [len(trace)])
Return offsets Emit character offsets and assign "UNK" labels to all segments.

Models used

None. This engine is purely regex-based.

Parameters

RTNewLine does not require any configuration parameters. Any **kwargs are ignored.

Usage

from rt_seg import RTSeg
from rt_seg import RTNewLine

trace = "Step 1: ...\n\nStep 2: ...\n\nFinal: ..."

segmentor = RTSeg(engines=RTNewLine)
offsets, labels = segmentor(trace)

segments = [trace[s:e] for s, e in offsets]
for seg in segments:
    print("---")
    print(seg)