RTNewLine (Paragraph-based segmentation via blank lines)
Idea
RTNewLine is a minimal rule-based baseline that segments a reasoning trace using formatting structure rather than lexical or model-based cues.
It splits the trace into segments at blank-line boundaries, i.e., occurrences of \n\n, which often correspond to paragraph breaks, step breaks, or deliberate spacing inserted by the generator or annotator.
This engine is particularly useful as:
- a fast baseline,
- a preprocessing heuristic when traces already contain meaningful line breaks,
- a robust fallback in environments without model dependencies.
Method (high-level)
- Find segment start positions Identify all positions in the trace immediately following either:
- the start of the string (
\A), or - a blank-line delimiter (
\n\n)
Concretely, the engine collects:
positions = [m.end() for m in re.finditer(r'\n\n|\A', trace)]
- the start of the string (
- Create consecutive spans Pair each start position with the next start position (or end of trace for the last segment):
offsets = zip(positions, positions[1:] + [len(trace)])
- Return offsets Emit character offsets and assign
"UNK"labels to all segments.
Models used
None. This engine is purely regex-based.
Parameters
RTNewLine does not require any configuration parameters. Any **kwargs are ignored.
Usage
from rt_seg import RTSeg
from rt_seg import RTNewLine
trace = "Step 1: ...\n\nStep 2: ...\n\nFinal: ..."
segmentor = RTSeg(engines=RTNewLine)
offsets, labels = segmentor(trace)
segments = [trace[s:e] for s, e in offsets]
for seg in segments:
print("---")
print(seg)