Jiahang Lin <sup>1</sup> <sup>∗‡</sup>, Shichun Liu <sup>1</sup> <sup>∗‡</sup>, Chengjun Pan <sup>2</sup> <sup>∗‡</sup>, Lizhi Lin <sup>3</sup>,  
Shihan Dou <sup>1</sup>, Xuanjing Huang <sup>1</sup>, Hang Yan <sup>3</sup>, Zhenhua Han <sup>3</sup> <sup>†</sup>, Tao Gui <sup>1</sup> <sup>†</sup>  
<sup>1</sup> Fudan University   <sup>2</sup> Peking University   <sup>3</sup> Shanghai Qiji Zhifeng Co., Ltd

###### Abstract

Harnesses are now central to coding-agent performance, mediating how models interact with tools and execution environments. Yet harness engineering remains a manual craft, because automating it faces a heterogeneous action space across editable components, voluminous trajectories that bury actionable signal, and edits whose effect is hard to attribute. We introduce Agentic Harness Engineering (AHE), a closed loop that addresses these challenges through three matched observability pillars: ❶ *component observability* gives every editable harness component a file-level representation so the action space is explicit and revertible; ❷ *experience observability* distills millions of raw trajectory tokens into a layered, drill-down evidence corpus that an evolving agent can actually consume; and ❸ *decision observability* pairs every edit with a self-declared prediction, later verified against the next round’s task-level outcomes. Together, these pillars turn every edit into a falsifiable contract, so harness evolution proceeds autonomously without collapsing into trial-and-error. Empirically, ten AHE iterations lift pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, surpassing the human-designed harness Codex-CLI (71.9%) and the self-evolving baselines ACE and TF-GRPO. The frozen harness transfers without re-evolution: on SWE-bench-verified it tops aggregate success at $12\%$ fewer tokens than the seed, and on Terminal-Bench 2 it yields $+5.1$ to $+10.1$  pp cross-family gains across three alternate model families, indicating the evolved components encode general engineering experience rather than benchmark-specific tuning. Ablations localize the gain to tools, middleware, and long-term memory rather than the system prompt, suggesting factual harness structure transfers while prose-level strategy does not. These results position observability-driven evolution as a practical pathway to keep coding-agent harnesses continually improving alongside their base models.

<sup>†</sup>![Refer to caption](https://arxiv.org/html/2604.25850v3/x1.png)

Figure 1: AHE evolves a bash-only seed past every human-designed and self-evolving baseline on Terminal-Bench 2. All three role agents share one base model, isolating the gain to harness edits rather than analyzer or editor capability.

## 1 Introduction

Coding agents are increasingly deployed on long-horizon software-engineering tasks, with measurable progress on issue resolution over real-world code repositories [^14] [^46] [^7] and multi-step terminal workflows [^21]. In practice, such progress relies not only on the underlying language model, but equally on the surrounding engineering components: the system prompt that shapes work style, the tools that expose the file system and shell, and the middleware that controls context, execution, and recovery. This collection of model-external, editable components is collectively referred to as the agent’s *harness* [^30] [^18] [^42] [^45] [^33] [^31].

Harness design materially shifts task completion on long-horizon coding benchmarks, even with the base model held fixed [^40] [^42], making harness engineering a first-class lever for improving coding agents. Moreover, the optimal harness is model-specific: a harness tuned for one base model often underperforms on another and must be re-adapted as the base model changes. In current practice, this adaptation is performed manually—developers inspect trajectories, identify recurring failure patterns, and hand-craft edits across prompts, tools, middleware, and skills. Yet as base models advance rapidly [^39] [^38] [^44] [^6] [^36] [^35], this manual loop struggles to keep pace, creating a widening gap between model capability and the harness needed to realize it [^33].

An intuitive direction is to automate this loop with an evolution agent that optimizes harness components based on experience [^1] [^49] [^4]. However, few existing approaches jointly evolve the full set of editable components [^16]; most focus on a single component, typically the prompt [^32] [^50] [^20], skills [^19] [^43], or an in-context playbook [^49]. Jointly evolving multiple components end-to-end faces two structural obstacles: long, unstructured trajectories yield little actionable signal, and tightly coupled harness frameworks make edits beyond the prompt error-prone. This leaves the central question of agent-driven harness evolution open: How can an evolution agent jointly and stably evolve all editable components of a coding agent’s harness?

Our central insight is that this question is bottlenecked by *observability*, not by agent capability: once the evolution agent receives structured context over a clear action space, it can reliably converge on better harness designs [^34] [^53]. We implement this in Agentic Harness Engineering (AHE, Figure 2), a closed loop driven by three observability pillars: ❶ *component observability* via a decoupled harness that exposes seven editable component types as files, so each failure pattern maps cleanly to a single component class; ❷ *experience observability* via a layered, drill-down evidence corpus distilled from millions of raw trajectory tokens, so the evolver consumes structured root causes rather than raw logs; and ❸ *decision observability* via a change manifest that pairs every edit with a self-declared prediction, later verified against the next round’s task-level outcomes, so each edit becomes a falsifiable contract and ineffective ones are reverted at file granularity.

We empirically validate AHE on Terminal-Bench 2 [^21]: ten iterations lift pass@1 from 69.7% to 77.0%, surpassing the human-designed Codex CLI [^25] and the self-evolving baselines ACE [^49] and TF-GRPO [^4]. Without further evolution, the frozen harness transfers to SWE-bench-verified [^14], and across three alternate base-model families it yields consistent pass@1 gains of $+5.1$ to $+10.1$ pp, with the largest on bases further from saturation, suggesting that AHE encodes coordination patterns that less-saturated models lean on more heavily. A component ablation pinpoints where this gain lives: tools, middleware, and long-term memory each carry the improvement on their own, while the system prompt alone regresses, indicating that factual harness structure transfers across tasks and models whereas prose-level strategy does not.

This paper makes three contributions:

- We formulate *agent-driven harness evolution* for coding agents and propose AHE, which identifies *observability across components, trajectories, and decisions* as the design pivot and turns every harness edit into a falsifiable, file-level contract through three observability pillars: a decoupled component substrate, a layered trajectory-distillation pipeline, and a change manifest whose self-declared predictions are verified by next-round task deltas.
- We empirically show that AHE lifts pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, surpasses human-designed and automated baselines, and produces a frozen harness that transfers across benchmarks and base-model families.
- Our analysis reveals two limits of agent-driven evolution: harness components interact non-additively, so stacking effective edits caps the aggregate gain; and the loop’s self-attribution is reliable for fixes but blind to regressions, pinpointing regression foresight as the clearest direction for future self-evolution loops.

## 2 Related Work

### 2.1 Harness Engineering and Evaluation for Coding Agents

Harness engineering refers to the practice of designing the system surrounding the model, including its tools, interfaces, memory, execution constraints, and feedback loops, which together shape what an agent can do on long-horizon tasks [^30] [^18] [^40] [^3] [^33] [^31]. Concretely, the harness mediates how the model perceives and acts on its environment: it exposes the action and observation interfaces over which tool-augmented reasoning unfolds [^3], custom agent-computer interfaces for repository navigation, file editing, and command execution [^45], as well as sandboxed execution and orchestration support that keep long-horizon runs reproducible [^42].

Verifying that such systems actually help has driven the parallel maturation of coding-agent evaluation along two axes: task horizon and environmental realism. Coverage extends from short-horizon function-level benchmarks focused on contamination and freshness control [^52] [^12], through repository-scale executable patch resolution [^14] [^46] [^7], to multi-hour, terminal-driven workflows that exercise long-horizon, realistic execution [^22] [^5] [^21]. A parallel infrastructure track packages executable runtimes and verifiers around these benchmarks [^28] [^13] [^47], whose attention to reproducible, traceable, and verifiable execution directly motivates the observation system AHE builds on.

### 2.2 Automated Optimization of LLM Agents

Approaches to automated agent optimization differ in what evidence the optimizer observes and what it can edit. Some revise the agent’s own outputs through episodic critique and reflection [^20] [^32] [^9]. Others target prompts and instructions [^15]: structured playbooks [^49], semantic-advantage priors [^4], jointly optimized instruction-demonstration pipelines for multi-stage programs [^27], and reflective updates driven by Pareto-frontier traces [^1]. A separate line edits program structure itself, in the form of skill libraries [^41], scored program and agent archives evolved through mutation [^24] [^11], and graph-structured workflows searched or learned from rollouts [^48] [^51].

AHE tunes the full harness as a combinatorial whole rather than a single editable surface, so cross-component trade-offs become legible to the optimizer. It also keeps the human prior minimal, leaving methodology for the optimizer to discover from rollouts rather than fixing it by hand. We describe the substrate, trajectory analysis, and iteration that realize these choices in Section 3.

## 3 Method

AHE turns harness optimization into a closed loop driven by another agent, with the base model held fixed and only the explicit harness edited. Our design principle is that every phase of this loop must be *observable*: AHE faithfully records the artifacts each phase produces (the harness components an iteration writes, the rollout trajectories it generates, the edit decisions it commits) and represents them in structured, layered forms that another agent can read and act on.

Three observability layers implement this principle. Component observability (§3.1) is realized by a decoupled, file-level harness substrate that maps each failure pattern to a single component class. Experience observability (§3.2) is realized by a layered evidence corpus distilled from raw rollouts and indexed for drill-down access. Decision observability (§3.3) is realized by a change manifest that pairs every edit with a self-declared prediction the next round verifies. The three layers compose into the iteration of Algorithm 1, which runs unattended round after round.

### 3.1 NexAU: an editable, decoupled harness substrate

![Refer to caption](https://arxiv.org/html/2604.25850v3/x2.png)

Figure 2: The AHE pipeline links three observable surfaces into one closed loop. Components, rollout experience, and edit decisions each surface as structured artifacts another agent reads, and every edit becomes a falsifiable prediction the next round verifies.

We instantiate the harness $H$ on the NexAU framework [^23] [^37], which exposes seven orthogonal component types as explicit files at fixed mount points in a single workspace: system prompt, tool description, tool implementation, middleware, skill, sub-agent configuration, and long-term memory. The component types are loosely coupled, so adding a middleware does not require editing the system prompt, and adding a skill does not require touching any tool.

This decoupling is what realizes component observability: each failure pattern maps to a single component class, giving the evolve agent a clean action space and localizing every pass-rate change to one file rather than scattering it across hundreds of lines of unstructured prompt prose. Each logical edit becomes one commit on the workspace’s git history, which yields file-level diffs and rollback granularity for free.

Our seed harness $H_{0}$ is deliberately minimal: a single shell-execution tool, no middleware, no skills, no sub-agents. A seed already fitted to the target benchmark would contaminate every subsequent edit’s attribution, since we could not tell whether a gain came from the loop or from the seed. The minimal seed forces every component AHE adds to earn its place against measured rollouts.

### 3.2 Agent Debugger: layered trajectory evidence

We generate $k$ traces for each task in a benchmark using a harness $H$, which may contain errors resulting from the deficiencies of the harness that can be acted on, but scattered across millions of tokens of raw messages. To extract insights from agent trajectories and enable experience observability, we apply Agent Debugger [^17] framework to use an agent to explore trajectories framed as a navigable, file-based environment where each trajectory message lives in its own file and is reached through generic shell and scripting tools. Traces with the same query are placed in one environment, and the debugger is required to analyze the root cause of the failure or the success pattern, which is stored in *per-task analysis* report for each task. The analysis also includes pass/fail status of the task to ground the Evolve Agent. Finally, a *benchmark-level overview* is aggregated from every report into a single document as an entry point for every iteration.

In addition to these reports, we also provide *original* traces in case the agents need to verify the claims in the reports. The traces are provided both in raw form and lightly processed to remove unnecessary content. All of these content is provided as files allowing progressive disclosure [^29] which saves on tokens and enable better agent decisions.

### 3.3 Evolve Agent: evidence-driven, auditable edits

The Evolve Agent closes the AHE loop. In each round it reads the layered evidence corpus produced by the Agent Debugger, decides which harness components to add, modify, or remove, applies those edits to the workspace, and records the reasoning behind every edit. Two constraints govern these edits, and together they realize decision observability: every edit becomes a falsifiable, file-level claim recorded in a versioned manifest, and the next round’s verdict either confirms or reverts it.

The first constraint is controllability: the Evolve Agent writes only inside the harness workspace, while the runs directory, tracer, verifier, and LLM configuration are read-only, and the seed system prompt (Appendix B.1) is marked non-deletable. These restrictions block the shortcuts an unconstrained self-modifier would take, such as disabling the verifier, swapping the model, or raising the reasoning budget, and keep every recorded gain attributable to harness edits.

The second constraint is that every change is evidence-driven and ships with a recorded prediction. Each edit attaches a manifest entry that names the failure evidence, the inferred root cause, the targeted fix, and a predicted impact comprising both expected fixes and at-risk regressions; this manifest is the loop’s evidence ledger (see Appendix B.2). In the next round, the loop intersects the predicted-fix and predicted-regression sets with the observed task-level deltas to produce a per-edit verdict. Each edit thereby becomes falsifiable by the next evaluation, which replaces rationale-driven self-justification with a measurable contract between rounds.

Algorithm 1 AHE outer loop.

seed harness $H_{0}$, base model $M$, benchmark $D$, rollouts per task $k$, max iterations $N$

 $H_{\text{best}}\leftarrow H_{0}$

for $t=1$ to $N$ do

   $T_{t}\leftarrow\textsc{Rollout}(M,H_{t-1},D,k)$ $\triangleright$ phase 1: $k$ rollouts per task

   $\widetilde{T}_{t}\leftarrow\textsc{Clean}(T_{t})$ $\triangleright$ phase 2: drop base64, dedup tool output

  if $t\geq 2$ then $\triangleright$ phase 3: attribute prior manifest, then rollback

    $V_{t}\leftarrow\textsc{Attribute}(C_{t-1},T_{t-1},T_{t})$     $H_{t-1}\leftarrow\textsc{Rollback}(H_{t-1},V_{t})$

  else

    $V_{t}\leftarrow\emptyset$

  end if

   $R_{t}\leftarrow\textsc{AgentDebugger}(\widetilde{T}_{t})$ $\triangleright$ phase 4: layered distillation

   $(H_{t},C_{t})\leftarrow\textsc{Evolve}(H_{t-1},R_{t},V_{t})$ $\triangleright$ phase 5: workspace edits + new manifest

   $\textsc{Commit}(H_{t},C_{t},t)$ $\triangleright$ phase 6: tag iteration in git

  if $\textsc{Pass@1}(T_{t})>\textsc{Pass@1}(H_{\text{best}})$ then $H_{\text{best}}\leftarrow H_{t}$

  end if

end for

return $H_{\text{best}}$

Algorithm 1 composes the three substrates into one iteration: rollout, clean, attribute the prior manifest and revert rejected edits, distill, edit, commit. We run $k\geq 2$ rollouts per task so each task carries a pass-rate signal, which stabilizes pass@1 and lets partial-pass tasks anchor comparative diagnosis. Attribution runs *before* distillation, so its verdict lands inside the evidence corpus and binds each prior manifest entry as a contract rather than a rationale. A one-shot explore agent (Appendix B.3) runs in parallel with iteration $1$ to seed a small number of reusable skills from the NexAU source and public coding-agent references. These skills receive no special protection: from iteration $2$ onward the Evolve Agent may keep, refine, or remove them based on observed rollouts.

## 4 Experiments

We organize our empirical study around three questions: where AHE sits on the map of existing approaches to harness design, whether what it produces is portable beyond its optimization target, and what inside the loop drives the gain.

<svg id="S4.p2.pic1" height="427.63" overflow="visible" version="1.1" viewBox="0 0 600 427.63" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,427.63) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#9999FF;" fill="#9999FF" fill-opacity="1.0"><path style="stroke:none" d="M 0 3.46 L 0 424.17 C 0 426.08 1.55 427.63 3.46 427.63 L 596.54 427.63 C 598.45 427.63 600 426.08 600 424.17 L 600 3.46 C 600 1.55 598.45 0 596.54 0 L 3.46 0 C 1.55 0 0 1.55 0 3.46 Z"></path></g><g style="--ltx-fill-color:#F7F7FF;" fill="#F7F7FF" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 3.46 L 0.69 126.6 L 599.31 126.6 L 599.31 3.46 C 599.31 1.93 598.07 0.69 596.54 0.69 L 3.46 0.69 C 1.93 0.69 0.69 1.93 0.69 3.46 Z"></path></g><g style="--ltx-fill-color:#E6E6FF;" fill="#E6E6FF" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 127.29 L 0.69 424.17 C 0.69 425.7 1.93 426.94 3.46 426.94 L 596.54 426.94 C 598.07 426.94 599.31 425.7 599.31 424.17 L 599.31 127.29 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 12.93 413.51)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.49em;--ltx-fo-height:0.69em;--ltx-fo-depth:20.4em;" width="574.14" height="291.77" transform="matrix(1 0 0 -1 0 9.49)" overflow="visible" color="#000000"><span id="S4.p2.pic1.1.1.1.1.1" style="width:36.08em;"><span id="S4.p2.pic1.1.1.1.1.1.1"><span id="S4.p2.pic1.1.1.1.1.1.1.1">Research Questions</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 12.93 107.64)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.49em;--ltx-fo-height:0.69em;--ltx-fo-depth:7.04em;" width="574.14" height="106.97" transform="matrix(1 0 0 -1 0 9.49)" overflow="visible" color="#000000"><span id="S4.p2.pic1.2.2.2.1.1" style="width:41.49em;"><span id="S4.I1"><span id="S4.I1.i1" style="list-style-type:none;">1. <span id="S4.I1.i1.p1"><span id="S4.I1.i1.p1.1"><span id="S4.I1.i1.p1.1.1">RQ1</span> (§4.2)<span id="S4.I1.i1.p1.1.2">: Why agentic harness engineering, rather than human-engineered harnesses or other automated methods?</span></span></span></span> <span id="S4.I1.i2" style="list-style-type:none;padding-top:2.0pt;">2. <span id="S4.I1.i2.p1"><span id="S4.I1.i2.p1.1"><span id="S4.I1.i2.p1.1.1">RQ2</span> (§4.3)<span id="S4.I1.i2.p1.1.2">: Does agentic harness engineering overfit to its optimization target?</span></span></span></span> <span id="S4.I1.i3" style="list-style-type:none;padding-top:2.0pt;">3. <span id="S4.I1.i3.p1"><span id="S4.I1.i3.p1.1"><span id="S4.I1.i3.p1.1.1">RQ3</span> (§4.4)<span id="S4.I1.i3.p1.1.2">: What inside AHE drives its gains, and how reliable is the loop’s self-attribution?</span></span></span></span></span></span></foreignObject></g></g></svg>

### 4.1 Setup

##### Evaluation.

We drive evolution on the full 89 tasks of Terminal-Bench 2 [^21], split as 4 easy, 55 medium, and 30 hard, with per-task timeout extended to 1 hour. For cross-benchmark transfer we evaluate the AHE harness on SWE-bench-verified [^14], 500 tasks across seven repositories. We report two metrics per configuration: pass@1, the mean binary success rate over $k$ rollouts per task; and tokens/trial, the mean per-trial total of prompt plus completion tokens across all LLM calls, in thousands. Infrastructure-aborted or timed-out trials count as failures under pass@1 (matching the official terminal-bench leaderboard) and are excluded from token means to avoid truncated figures. Runtime infrastructure (framework, dispatcher, sandbox, tracer, and concurrency) is detailed in Appendix A.

##### Models.

For both the evolution loop and the main experiment of §4.2, all three role agents (the Code Agent, the Agent Debugger, and the Evolve Agent) share one base model, GPT-5.4 [^26] at the high reasoning setting. For cross-model transfer (§4.3), we re-evaluate the Code Agent on five alternate bases: GPT-5.4 at medium and xhigh reasoning, qwen-3.6-plus [^38] [^44], gemini-3.1-flash-lite-preview [^8], and deepseek-v4-flash [^6].

### 4.2 RQ1: Main Results

Table 1: Pass@1 on Terminal-Bench 2 across 89 tasks, by official difficulty. NexAU <sub>0</sub> is the shared seed; ACE, TF-GRPO, and AHE are three self-evolution loops layered on top of it. Bold marks the best per column; ties are all bold.

<table><tbody><tr><th>Method</th><td>All</td><td>Easy</td><td>Med.</td><td>Hard</td></tr><tr><th></th><td>89</td><td>4</td><td>55</td><td>30</td></tr><tr><th colspan="5">Human-designed harness</th></tr><tr><th>opencode</th><td>47.2%</td><td>75.0%</td><td>52.7%</td><td>33.3%</td></tr><tr><th>terminus-2</th><td>62.9%</td><td>75.0%</td><td>74.5%</td><td>40.0%</td></tr><tr><th>Codex</th><td>71.9%</td><td>75.0%</td><td>80.0%</td><td>56.7%</td></tr><tr><th colspan="5">Self-evolved from NexAU <sub>0</sub></th></tr><tr><th>NexAU <sub>0</sub></th><td>69.7%</td><td>87.5%</td><td>78.2%</td><td>51.7%</td></tr><tr><th>ACE</th><td>68.9%</td><td>91.7%</td><td>78.2%</td><td>48.9%</td></tr><tr><th>TF-GRPO</th><td>72.3%</td><td>100.0%</td><td>79.4%</td><td>55.6%</td></tr><tr><th>AHE</th><td>77.0%</td><td>100.0%</td><td>88.2%</td><td>53.3%</td></tr></tbody></table>

We run a single AHE campaign of ten iterations from the bash-only NexAU <sub>0</sub> seed (§3.1), with $k{=}2$ rollouts per task per iteration on Terminal-Bench 2, finishing in roughly 32 hours; the best resulting configuration is reported as AHE. The two self-evolve baselines ACE [^49] and TF-GRPO [^4] start from the same NexAU <sub>0</sub> seed.

##### AHE outperforms both human-designed and self-evolve baselines.

AHE outperforms every baseline on our panel: three human-designed harnesses, opencode [^2], terminus-2 [^10], and Codex-CLI [^25], and the two self-evolve baselines ACE and TF-GRPO. Figure 1 shows the gain accumulates across iterations, with continued evolution pushing pass@1 further above the NexAU <sub>0</sub> seed. By difficulty, the only exception is the Hard tier, where AHE marginally trails Codex-CLI. We trace this gap to interference between AHE’s components on long-horizon tasks rather than to a missing capability: swapping AHE’s long-term memory alone into the NexAU <sub>0</sub> seed, without the other AHE components, already surpasses Codex-CLI on Hard (§4.4.1).

##### Prompt-only self-evolution misses the components that carry AHE’s gain.

The gaps to ACE and TF-GRPO trace to a layer mismatch. ACE distills natural-language playbooks the agent reads in-context, and TF-GRPO is a trajectory-feedback variant of GRPO that reinforces successful tool sequences; starting from the same NexAU <sub>0</sub> seed as AHE, neither method opens the surrounding scaffolding to edits. AHE jointly evolves system prompt, tools, middleware, and long-term memory across iterations, and §4.4.1 quantifies which of these layers carries the improvement: swapping in AHE’s tools, middleware, or long-term memory alone yields $+3.3$, $+2.2$, and $+5.6$  pp, while the system prompt alone is $-2.3$  pp. The harness components ACE and TF-GRPO never edit are exactly where the gain lives.

### 4.3 RQ2: Transfer to Unseen Tasks and Base Models

AHE’s harness is evolved on Terminal-Bench 2 with GPT-5.4 high. We probe whether it encodes general coding-agent experience or overfits to that target by re-using the workspace as-is, without further evolution, in two off-target settings: a different task surface (SWE-bench-verified) and four alternate base models.

Table 2: Cross-benchmark transfer on SWE-bench-verified. ACE, TF-GRPO, and AHE share the NexAU <sub>0</sub> seed and differ only in their self-evolution loop; all four columns run on GPT-5.4. AHE and the two self-evolve baselines are evolved on Terminal-Bench 2 and evaluated without in-domain re-evolution. Per-column bold marks the best; ties are all bold.

<table><tbody><tr><td></td><th></th><th colspan="4">Success rate <math><semantics><mo>↑</mo> <annotation>\uparrow</annotation></semantics></math></th><th colspan="4">Tokens k <math><semantics><mo>↓</mo> <annotation>\downarrow</annotation></semantics></math></th></tr><tr><th>Repo</th><th><math><semantics><mi>N</mi> <annotation>N</annotation></semantics></math></th><th>ACE</th><th>TF-GRPO</th><th>NexAU <sub>0</sub></th><th>AHE</th><th>ACE</th><th>TF-GRPO</th><th>NexAU <sub>0</sub></th><th>AHE</th></tr><tr><th>All</th><th>500</th><th>74.6%</th><th>74.2%</th><th>75.2%</th><th>75.6%</th><th>679</th><th>582</th><th>526</th><th>461</th></tr><tr><td>django</td><td>231</td><td>79.2%</td><td>78.8%</td><td>79.2%</td><td>81.0%</td><td>707</td><td>583</td><td>527</td><td>484</td></tr><tr><td>sympy</td><td>75</td><td>69.3%</td><td>68.0%</td><td>70.7%</td><td>70.7%</td><td>602</td><td>572</td><td>494</td><td>479</td></tr><tr><td>sphinx-doc</td><td>44</td><td>61.4%</td><td>65.9%</td><td>68.2%</td><td>70.5%</td><td>990</td><td>848</td><td>731</td><td>656</td></tr><tr><td>matplotlib</td><td>34</td><td>70.6%</td><td>70.6%</td><td>73.5%</td><td>73.5%</td><td>622</td><td>530</td><td>486</td><td>391</td></tr><tr><td>scikit-learn</td><td>32</td><td>93.8%</td><td>93.8%</td><td>93.8%</td><td>87.5%</td><td>451</td><td>378</td><td>307</td><td>257</td></tr><tr><td>pydata</td><td>22</td><td>77.3%</td><td>77.3%</td><td>77.3%</td><td>72.7%</td><td>563</td><td>516</td><td>386</td><td>338</td></tr><tr><td>astropy</td><td>22</td><td>59.1%</td><td>59.1%</td><td>54.5%</td><td>50.0%</td><td>546</td><td>470</td><td>667</td><td>277</td></tr></tbody></table>

##### Cross-benchmark transfer.

We re-point the AHE harness at SWE-bench-verified against the seed and the two self-evolve baselines (NexAU <sub>0</sub>, ACE, TF-GRPO) under identical infrastructure (Table 2).

ACE and TF-GRPO both regress below the untouched NexAU <sub>0</sub> seed in aggregate success while spending $11\%$ to $29\%$ more tokens than the seed: the playbook ACE injects and the trajectory distribution TF-GRPO reinforces were distilled on terminal-bench traces and ride the prompt at every model call, so on a different task surface that text adds cost without reshaping the underlying policy.

AHE instead achieves the highest aggregate, with the seed-relative gain concentrating on django and sphinx-doc, the two largest and most token-expensive repositories whose multi-step edit-and-verify loop matches the structure AHE’s tools, middleware, and long-term memory compress on Terminal-Bench 2. Marginal regressions appear only on the three smallest repositories, consistent with pass@1 variance on small repos exceeding the per-repo gain. AHE also cuts aggregate tokens by $32\%$ against ACE, $21\%$ against TF-GRPO, and $12\%$ against the seed: encoding behavior in tools, middleware, and memory rather than in the prompt avoids the per-call re-derivation cost that prompt-only baselines pay.

![Refer to caption](https://arxiv.org/html/2604.25850v3/x3.png)

Figure 3: Cross-model transfer on Terminal-Bench 2, 89 tasks. The AHE workspace evolved on GPT-5.4 high is re-evaluated on each base without further evolution, paired against the NexAU 0 seed on the same base.

##### Cross-model transfer.

We re-evaluate both the NexAU <sub>0</sub> seed and AHE on the five alternate bases listed in §4.1. Figure 3 reports five positive pass@1 gains from $+2.3$ to $+10.1$  pp.

Cross-family gains dominate within-family ones: deepseek-v4-flash moves $+10.1$  pp from $51.7\%$ to $61.8\%$, qwen-3.6-plus $+6.3$  pp from $56.2\%$ to $62.5\%$, and gemini-3.1-flash-lite-preview $+5.1$  pp from $36.5\%$ to $41.6\%$, all above the $+2.3$  pp on GPT-5.4 medium and xhigh. We read this as bases further from saturation leaning more on the coordination patterns AHE has fixed inside tools, middleware, and long-term memory, while a stronger base re-derives the same coordination from its prompt at low marginal cost.

Within one family the profile is non-monotone: $+2.3$  pp on medium, $+7.3$  pp on high from §4.2, and $+2.3$  pp on xhigh. AHE’s step budget and per-task timeout were fitted to GPT-5.4 high during evolution; medium has more time-per-step slack but loses a reasoning tier of raw capability, while xhigh pushes more trials past the per-task timeout, which our pass@1 convention (§4.1) counts as failures. Either direction discounts the gain.

The load-bearing finding is that all five gains land positive: the AHE workspace is not specific to one provider’s idioms or one reasoning depth. Their magnitude tracks the evolution operating point rather than raw base capability, so we treat the timeout-budget coupling as a generalization hazard discussed in our [Limitations](#Sx1 "In Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses") section.

### 4.4 RQ3: Analysis

We analyze the loop along two architectural choices that §3 places weight on: decomposed components (§4.4.1) and self-declared attribution (§4.4.2).

#### 4.4.1 RQ3a: where value accumulates across components

Table 3: Component-level ablations on Terminal-Bench 2. Each “+ X only” row swaps a single AHE component into the NexAU <sub>0</sub> seed: long-term memory, tool set, middleware, or system prompt. Per-column best is bolded.

| Variant | All | Easy | Medium | Hard |
| --- | --- | --- | --- | --- |
|  | 89 tasks | 4 tasks | 55 tasks | 30 tasks |
| NexAU <sub>0</sub> | 69.7% | 87.5% | 78.2% | 51.7% |
| \+ memory only | 75.3% | 50.0% | 83.6% | 63.3% |
| \+ tool only | 73.0% | 75.0% | 87.3% | 46.7% |
| \+ middleware only | 71.9% | 100.0% | 81.8% | 50.0% |
| \+ system\_prompt only | 67.4% | 75.0% | 78.2% | 46.7% |
| AHE full | 77.0% | 100.0% | 88.2% | 53.3% |

Table 3 decomposes the AHE gain at the component level. Each “+ X only” row takes the NexAU <sub>0</sub> seed and swaps in one component from the fully evolved AHE configuration, namely long-term memory, tools, middleware, or system prompt, leaving the other three at their seed defaults. Three of the four single-component variants outperform the seed; the system-prompt swap is the only regression.

##### Each component owns a different failure surface.

Memory adds 12 boundary-case lessons (performance margin, queued-over-limit cancellation, evaluator-style closure, source-packaging layout); on Hard the lessons lift it above full AHE, while on Easy they reduce to superfluous re-verification. Tools become a 1364-line shell that auto-surfaces contract hints from files near each command; on Medium it lands within $0.9$  pp of full AHE, while on Hard a built-in publish guard closes the loop too early. Middleware adds a finish-hook that forces one evaluator-isomorphic closure check; on Easy it clears every task, while on Hard it inflates turn count. The system prompt encodes 79 lines of universal discipline whose executability depends on the other three; inserted alone it scores $-2.3$  pp aggregate.

##### Components interact non-additively, capping the aggregate gain.

The three positive single-component gains sum to $+11.1$  pp against full AHE’s $+7.3$  pp, and on Hard the memory-only variant exceeds full AHE: memory, middleware, and the system prompt all push toward the same closure-style verification, so stacking them spends turns on redundant re-checks within the long-horizon budget. Since the evolve agent optimises an aggregate dominated by 55 Medium tasks, it converges to a Medium-heavy trade-off that returns part of the Hard memory effect, and we leave interaction-aware evolution to future work.

#### 4.4.2 RQ3b: how reliably the loop’s self-attribution tracks reality

Each evolution round, our evolve model produces a change manifest naming which Terminal-Bench 2 tasks it expects to fix in the next round and which it flags at risk of regression. We compare the round- $N{-}1$ prediction against the round- $N$ ground truth, computing standard precision and recall over the 89 tasks separately for fixes and regressions.

##### Evidence-driven targeting.

The fix panel of Figure˜4 shows the evolve model’s targeting is evidence-driven rather than guesswork. Cross-iteration fix-precision of 33.7% and fix-recall of 51.4% sit roughly 5x above the random-prediction baselines of 6.5% and 10.6%, so each harness edit lands on a real, agent-anticipated target rather than on an arbitrary subset of the panel.

![Refer to caption](https://arxiv.org/html/2604.25850v3/x4.png)

Figure 4: Cross-iteration mean precision and recall of the evolve model’s self-predictions across 9 evaluation rounds of the GPT-5.4 AHE loop on Terminal-Bench 2, alongside the random-prediction baseline. Left: fix predictions. Right: regression predictions.

##### Regression blindness.

The regression panel tells the opposite story: cross-iteration regression-precision of 11.8% and regression-recall of 11.1% sit only about 2x above their random baselines of 5.6% and 5.4%, so most upcoming regressions go unforeseen. The agent can justify why an edit should help, but it cannot reliably name the tasks the same edit is about to break, which is what produces the non-monotone steps in the evolution curve of §4.2. Closing this gap is the clearest direction for future self-evolution loops. Appendix˜D gives the per-round breakdown.

## 5 Conclusion

We introduced Agentic Harness Engineering (AHE), an observability-driven loop that turns a coding agent’s harness into a learnable adaptation surface while the base model remains fixed. AHE exposes components as files, distills rollouts into a layered evidence corpus, and binds each edit to a falsifiable next-round prediction; ten iterations lift pass@1 on Terminal-Bench 2 from 69.7% to 77.0%, and the frozen harness transfers to SWE-bench-verified and three alternate model families. We see harness-level evolution as a complementary axis to model-side training: an externalized, auditable surface where coding-agent experience can accumulate.

## Limitations

This work studies a promising but high-variance setting, and the scope of our claims should be interpreted accordingly.

##### Benchmark scope.

Our evaluation drives evolution on Terminal-Bench 2 and probes transfer on SWE-bench-verified. Even though the frozen harness transfers to a second task surface and to three alternate base-model families, broader programming languages, repository-scale deployments, and human-in-the-loop workflows remain untested.

##### Evolution operating point.

AHE’s step budget and per-task timeout were fitted to GPT-5.4 high during evolution, so cross-model transfer numbers conflate harness portability with operating-point coupling—within one family the gain is non-monotone across reasoning tiers (§4.3). Untangling these factors will require re-running the loop under multiple operating points.

##### Self-modification governance.

AHE bounds edits to a workspace, attributes every change in a versioned manifest, and rolls back ineffective edits at file granularity, but it does not provide a complete guardrail stack. Long-horizon harness cleanup and stronger misuse prevention remain incomplete, and AHE should be viewed as a controlled research prototype rather than a fully mature autonomous self-improvement system.

## References

## Appendix A Experimental Setup: Full Details

This appendix expands the condensed Setup in §4.1 with the formal metric definitions and the runtime infrastructure.

##### Seed agent.

The seed configuration, denoted NexAU <sub>0</sub>, is a simple code agent built on the NexAU framework of §3.1 that exposes only the bash tool to the model, with no skills, no middleware, and no long-term memory. Every iteration of the AHE outer loop edits this workspace, so all reported gains are measured against NexAU <sub>0</sub> as the common starting point.

##### Runtime infrastructure.

All runs use the NexAU framework of §3.1 to instantiate the coding agent. Harbor dispatches tasks, isolates each rollout, and verifies pass/fail. Every rollout runs inside a fresh E2B remote sandbox, so shell side-effects cannot leak between tasks. InMemoryTracer records trajectories and mirrors them to Langfuse. The Agent Debugger executes at concurrency 16 with a 600-second per-task timeout.

##### Terminal-bench difficulty labels.

The official terminal-bench-2 leaderboard <sup>0</sup> partitions the 89-task subset into 4 easy, 55 medium, and 30 hard tasks.

##### pass@1.

For a configuration on a task set $D$ with $k$ rollouts per task, let $r_{i,j}\in\{0,1\}$ denote the binary reward of rollout $j$ on task $i$. The pass@1 score is the mean

$$
\mathrm{pass@1}=\frac{1}{k|D|}\sum_{i=1}^{|D|}\sum_{j=1}^{k}r_{i,j}.
$$

Trials that terminate on an infrastructure exception, such as a sandbox crash or API timeout, contribute $r=0$ rather than being dropped, a strictly harsher convention than discarding failures that keeps our numbers comparable to the official terminal-bench leaderboard. The rollout count $k$ varies across experiments; each table states it explicitly.

##### Token cost and Succ/Mtok.

For token cost we count every LLM call as prompt plus completion across the rollout and report the mean over completed trials in thousands, denoted Tokens k; infrastructure-aborted trials are excluded to avoid truncated figures. To compare configurations that trade accuracy for cost we combine the two via

$$
\mathrm{Succ/Mtok}=\frac{\mathrm{pass@1}\times 10^{6}}{\mathrm{mean\ tokens\ per\ trial}},
$$

the expected number of successes per million tokens. The main paper reports pass@1 and Tokens k separately so each axis stays legible; Table 4 folds them into Succ/Mtok per repository on SWE-bench-verified, derived from the pass@1 and Tokens k columns of Table 2.

Table 4: Cost-efficiency on SWE-bench-verified, reported as Succ/Mtok, the expected successes per million tokens. Values are derived from Table 2 as $\mathrm{pass@1}\times 10^{3}/\text{Tokens k}$. Higher is better. Per-row bold marks the best.

| Repo | $N$ | ACE | TF-GRPO | NexAU <sub>0</sub> | AHE |
| --- | --- | --- | --- | --- | --- |
| All | 500 | 1.10 | 1.27 | 1.43 | 1.64 |
| django | 231 | 1.12 | 1.35 | 1.50 | 1.67 |
| sympy | 75 | 1.15 | 1.19 | 1.43 | 1.48 |
| sphinx-doc | 44 | 0.62 | 0.78 | 0.93 | 1.07 |
| matplotlib | 34 | 1.14 | 1.33 | 1.51 | 1.88 |
| scikit-learn | 32 | 2.08 | 2.48 | 3.06 | 3.40 |
| pydata | 22 | 1.37 | 1.50 | 2.00 | 2.15 |
| astropy | 22 | 1.08 | 1.26 | 0.82 | 1.81 |

## Appendix B Prompts and Configurations

This appendix gathers the prompts that drive the AHE outer loop together with the seed code agent’s system prompt. The five blocks below reproduce the literal contents of the corresponding files in the public repository at [https://github.com/china-qijizhifeng/agentic-harness-engineering](https://github.com/china-qijizhifeng/agentic-harness-engineering) as of the commit that produced the experiments in Section 4. Jinja-style {{ var }} placeholders are filled in by the harness at runtime.

### B.1 Code Agent Seed System Prompt

The seed system prompt loaded into NexAU <sub>0</sub> at iteration 1. It is intentionally minimal: a single tool, three behavioral rules, and three runtime-injected variables. Every iteration after iteration 1 may append rules to this file, and the case study in Appendix C traces the first such append.

<svg id="A2.SS1.p2.pic1" height="3849.99" overflow="visible" version="1.1" viewBox="0 0 600 3849.99" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,3849.99) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 3845.23 C 0 3847.86 2.13 3849.99 4.77 3849.99 L 595.23 3849.99 C 597.87 3849.99 600 3847.86 600 3845.23 L 600 4.77 C 600 2.13 597.87 0 595.23 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F8FCFF;" fill="#F8FCFF" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 3482.06 L 599.17 3482.06 L 599.17 4.77 C 599.17 2.59 597.41 0.83 595.23 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 3482.89 L 0.83 3845.23 C 0.83 3847.4 2.59 3849.16 4.77 3849.16 L 595.23 3849.16 C 597.41 3849.16 599.17 3847.4 599.17 3845.23 L 599.17 3482.89 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 3841.05)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:41.87em;--ltx-fo-height:0.3em;--ltx-fo-depth:25.6em;" width="579.4" height="358.4" transform="matrix(1 0 0 -1 0 4.17)" overflow="visible" color="#FFFFFF"><span id="A2.SS1.p2.pic1.1.1.1.1.1" style="width:46.21em;"><span id="A2.SS1.p2.pic1.1.1.1.1.1.1"><span id="A2.SS1.p2.pic1.1.1.1.1.1.1.1" style="font-size:70%;">code_agent_simple/systemprompt.md</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 3465.06)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.87em;--ltx-fo-height:0.64em;--ltx-fo-depth:249.77em;" width="579.4" height="3465.05" transform="matrix(1 0 0 -1 0 8.92)" overflow="visible" color="#000000"><span id="A2.SS1.p2.pic1.2.2.2.1.1" style="width:41.87em;"><span id="A2.SS1.p2.pic1.2.2.2.1.1.1"><a href="data:text/plain;base64,WW91IHNvbHZlIHNvZnR3YXJlIHRhc2tzIGluIGEgbm9uLWludGVyYWN0aXZlIHNldHRpbmcuIFlvdXIgb25seSB0b29sIGlzICoqYHJ1bl9zaGVsbF9jb21tYW5kYCoqOiB1c2UgdGhlIHNoZWxsIHRvIGluc3BlY3QgdGhlIHJlcG8sIGVkaXQgZmlsZXMsIHJ1biBidWlsZHMvdGVzdHMsIGFuZCBmaW5pc2ggdGhlIHdvcmsuIERvIG5vdCBhc2sgdGhlIHVzZXIgcXVlc3Rpb25zLgoKLSBQcmVmZXIgc2hvcnQgcmVwbGllczsgdXNlIHRoZSB0b29sIGZvciBhY3Rpb25zLgotIEJlZm9yZSBjb21tYW5kcyB0aGF0IGRlbGV0ZSBvciBvdmVyd3JpdGUgaW1wb3J0YW50IGRhdGEsIHN0YXRlIGJyaWVmbHkgd2hhdCB0aGV5IGRvLgotIExvbmctcnVubmluZyBwcm9jZXNzZXM6IHVzZSBgaXNfYmFja2dyb3VuZDogdHJ1ZWAgb24gYHJ1bl9zaGVsbF9jb21tYW5kYCAoZG8gbm90IHVzZSBgJmAgaW4gdGhlIGNvbW1hbmQgc3RyaW5nKS4KCkRhdGU6IHt7IGRhdGUgfX0KVXNlcm5hbWU6IHt7IHVzZXJuYW1lIH19CldvcmtpbmcgRGlyOiB7eyB3b3JraW5nX2RpcmVjdG9yeSB9fQ==" download="">⬇</a> <span id="lstnumberx1"><span id="lstnumberx1.1" style="font-size:70%;">You</span> <span id="lstnumberx1.3" style="font-size:70%;">solve</span> <span id="lstnumberx1.5" style="font-size:70%;">software</span> <span id="lstnumberx1.7" style="font-size:70%;">tasks</span> <span id="lstnumberx1.9" style="font-size:70%;">in</span> <span id="lstnumberx1.11" style="font-size:70%;">a</span> <span id="lstnumberx1.13" style="font-size:70%;">non</span> <span id="lstnumberx1.14" style="font-size:70%;">-</span> <span id="lstnumberx1.15" style="font-size:70%;">interactive</span> <span id="lstnumberx1.17" style="font-size:70%;">setting</span><span id="lstnumberx1.18" style="font-size:70%;">.</span><span id="lstnumberx1.20" style="font-size:70%;">Your</span> <span id="lstnumberx1.22" style="font-size:70%;">only</span> <span id="lstnumberx1.24" style="font-size:70%;">tool</span> <span id="lstnumberx1.26" style="font-size:70%;">is</span> <span id="lstnumberx1.28" style="font-size:70%;">**`</span> <span id="lstnumberx1.29" style="font-size:70%;">run_shell_command</span> <span id="lstnumberx1.30" style="font-size:70%;">`**:</span><span id="lstnumberx1.32" style="font-size:70%;">use</span> <span id="lstnumberx1.34" style="font-size:70%;">the</span> <span id="lstnumberx1.36" style="font-size:70%;">shell</span> <span id="lstnumberx1.38" style="font-size:70%;">to</span> <span id="lstnumberx1.40" style="font-size:70%;">inspect</span> <span id="lstnumberx1.42" style="font-size:70%;">the</span> <span id="lstnumberx1.44" style="font-size:70%;">repo</span><span id="lstnumberx1.45" style="font-size:70%;">,</span><span id="lstnumberx1.47" style="font-size:70%;">edit</span> <span id="lstnumberx1.49" style="font-size:70%;">files</span><span id="lstnumberx1.50" style="font-size:70%;">,</span><span id="lstnumberx1.52" style="font-size:70%;">run</span> <span id="lstnumberx1.54" style="font-size:70%;">builds</span> <span id="lstnumberx1.55" style="font-size:70%;">/</span> <span id="lstnumberx1.56" style="font-size:70%;">tests</span><span id="lstnumberx1.57" style="font-size:70%;">,</span><span id="lstnumberx1.59" style="font-size:70%;">and</span> <span id="lstnumberx1.61" style="font-size:70%;">finish</span> <span id="lstnumberx1.63" style="font-size:70%;">the</span> <span id="lstnumberx1.65" style="font-size:70%;">work</span><span id="lstnumberx1.66" style="font-size:70%;">.</span><span id="lstnumberx1.68" style="font-size:70%;">Do</span> <span id="lstnumberx1.70" style="font-size:70%;">not</span> <span id="lstnumberx1.72" style="font-size:70%;">ask</span> <span id="lstnumberx1.74" style="font-size:70%;">the</span> <span id="lstnumberx1.76" style="font-size:70%;">user</span> <span id="lstnumberx1.78" style="font-size:70%;">questions</span><span id="lstnumberx1.79" style="font-size:70%;">.</span></span> <span id="lstnumberx3"><span id="lstnumberx3.1" style="font-size:70%;">-</span> <span id="lstnumberx3.3" style="font-size:70%;">Prefer</span> <span id="lstnumberx3.5" style="font-size:70%;">short</span> <span id="lstnumberx3.7" style="font-size:70%;">replies</span><span id="lstnumberx3.8" style="font-size:70%;">;</span><span id="lstnumberx3.10" style="font-size:70%;">use</span> <span id="lstnumberx3.12" style="font-size:70%;">the</span> <span id="lstnumberx3.14" style="font-size:70%;">tool</span> <span id="lstnumberx3.16" style="font-size:70%;">for</span> <span id="lstnumberx3.18" style="font-size:70%;">actions</span><span id="lstnumberx3.19" style="font-size:70%;">.</span></span> <span id="lstnumberx4"><span id="lstnumberx4.1" style="font-size:70%;">-</span> <span id="lstnumberx4.3" style="font-size:70%;">Before</span> <span id="lstnumberx4.5" style="font-size:70%;">commands</span> <span id="lstnumberx4.7" style="font-size:70%;">that</span> <span id="lstnumberx4.9" style="font-size:70%;">delete</span> <span id="lstnumberx4.11" style="font-size:70%;">or</span> <span id="lstnumberx4.13" style="font-size:70%;">overwrite</span> <span id="lstnumberx4.15" style="font-size:70%;">important</span> <span id="lstnumberx4.17" style="font-size:70%;">data</span><span id="lstnumberx4.18" style="font-size:70%;">,</span><span id="lstnumberx4.20" style="font-size:70%;">state</span> <span id="lstnumberx4.22" style="font-size:70%;">briefly</span> <span id="lstnumberx4.24" style="font-size:70%;">what</span> <span id="lstnumberx4.26" style="font-size:70%;">they</span> <span id="lstnumberx4.28" style="font-size:70%;">do</span><span id="lstnumberx4.29" style="font-size:70%;">.</span></span> <span id="lstnumberx5"><span id="lstnumberx5.1" style="font-size:70%;">-</span> <span id="lstnumberx5.3" style="font-size:70%;">Long</span> <span id="lstnumberx5.4" style="font-size:70%;">-</span> <span id="lstnumberx5.5" style="font-size:70%;">running</span> <span id="lstnumberx5.7" style="font-size:70%;">processes</span><span id="lstnumberx5.8" style="font-size:70%;">:</span><span id="lstnumberx5.10" style="font-size:70%;">use</span> <span id="lstnumberx5.12" style="font-size:70%;">`</span> <span id="lstnumberx5.13" style="font-size:70%;">is_background</span><span id="lstnumberx5.14" style="font-size:70%;">:</span><span id="lstnumberx5.16" style="font-size:70%;">true</span> <span id="lstnumberx5.17" style="font-size:70%;">`</span> <span id="lstnumberx5.19" style="font-size:70%;">on</span> <span id="lstnumberx5.21" style="font-size:70%;">`</span> <span id="lstnumberx5.22" style="font-size:70%;">run_shell_command</span> <span id="lstnumberx5.23" style="font-size:70%;">`</span> <span id="lstnumberx5.25" style="font-size:70%;">(</span><span id="lstnumberx5.26" style="font-size:70%;">do</span> <span id="lstnumberx5.28" style="font-size:70%;">not</span> <span id="lstnumberx5.30" style="font-size:70%;">use</span> <span id="lstnumberx5.32" style="font-size:70%;">`&amp;`</span> <span id="lstnumberx5.34" style="font-size:70%;">in</span> <span id="lstnumberx5.36" style="font-size:70%;">the</span> <span id="lstnumberx5.38" style="font-size:70%;">command</span> <span id="lstnumberx5.40" style="font-size:70%;">string</span><span id="lstnumberx5.41" style="font-size:70%;">).</span></span> <span id="lstnumberx7"><span id="lstnumberx7.1" style="font-size:70%;">Date</span><span id="lstnumberx7.2" style="font-size:70%;">:</span><span id="lstnumberx7.4" style="font-size:70%;">{{</span> <span id="lstnumberx7.6" style="font-size:70%;">date</span> <span id="lstnumberx7.8" style="font-size:70%;">}}</span> </span><span id="lstnumberx8"><span id="lstnumberx8.1" style="font-size:70%;">Username</span><span id="lstnumberx8.2" style="font-size:70%;">:</span><span id="lstnumberx8.4" style="font-size:70%;">{{</span> <span id="lstnumberx8.6" style="font-size:70%;">username</span> <span id="lstnumberx8.8" style="font-size:70%;">}}</span> </span><span id="lstnumberx9"><span id="lstnumberx9.1" style="font-size:70%;">Working</span> <span id="lstnumberx9.3" style="font-size:70%;">Dir</span><span id="lstnumberx9.4" style="font-size:70%;">:</span><span id="lstnumberx9.6" style="font-size:70%;">{{</span> <span id="lstnumberx9.8" style="font-size:70%;">working_directory</span> <span id="lstnumberx9.10" style="font-size:70%;">}}</span></span></span></span></foreignObject></g></g></svg>

### B.2 Evolve Agent Prompt

The Evolve Agent’s system prompt encodes the three hard contracts described in Section 3: workspace-only controllability, evidence-driven changes, and the change-manifest deliverable. It also embeds the directory layout the agent must reason over and the JSON shape of the manifest.

<svg id="A2.SS2.p2.pic1" height="85283.51" overflow="visible" version="1.1" viewBox="0 0 600 85283.51" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,85283.51) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 85278.74 C 0 85281.37 2.13 85283.51 4.77 85283.51 L 595.23 85283.51 C 597.87 85283.51 600 85281.37 600 85278.74 L 600 4.77 C 600 2.13 597.87 0 595.23 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F8FCFF;" fill="#F8FCFF" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 84959.85 L 599.17 84959.85 L 599.17 4.77 C 599.17 2.59 597.41 0.83 595.23 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 84960.68 L 0.83 85278.74 C 0.83 85280.91 2.59 85282.68 4.77 85282.68 L 595.23 85282.68 C 597.41 85282.68 599.17 85280.91 599.17 85278.74 L 599.17 84960.68 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22666.37)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:41.87em;--ltx-fo-height:0.3em;--ltx-fo-depth:22.4em;" width="579.4" height="314.12" transform="matrix(1 0 0 -1 0 4.17)" overflow="visible" color="#FFFFFF"><span id="A2.SS2.p2.pic1.1.1.1.1.1" style="width:46.21em;"><span id="A2.SS2.p2.pic1.1.1.1.1.1.1"><span id="A2.SS2.p2.pic1.1.1.1.1.1.1.1" style="font-size:70%;">evolve_agent/evolve_prompt.md</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22661.62)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.87em;--ltx-fo-height:0.64em;--ltx-fo-depth:6138.17em;" width="579.4" height="84942.85" transform="matrix(1 0 0 -1 0 8.92)" overflow="visible" color="#000000"><span id="A2.SS2.p2.pic1.2.2.2.1.1" style="width:41.87em;"><span id="A2.SS2.p2.pic1.2.2.2.1.1.1"><a href="data:text/plain;base64,eyUgc2V0IHdzID0gd29ya3NwYWNlX3BhdGggaWYgd29ya3NwYWNlX3BhdGggaXMgZGVmaW5lZCBlbHNlICJ3b3Jrc3BhY2UiICV9CllvdSBhcmUgdGhlIE5leEFVIEV2b2x1dGlvbiBFbmdpbmUgLS0gYSBtZXRhLWFnZW50IHRoYXQgaXRlcmF0ZXMgb24gYSBjb2RpbmcgYWdlbnQncyBoYXJuZXNzIHRvIG1heGltaXplICoqcGFzc0AxKiogKHNpbmdsZS1hdHRlbXB0IHN1Y2Nlc3MgcmF0ZSkgdGhyb3VnaCBldmlkZW5jZS1iYXNlZCBleHBlcmltZW50YXRpb24uIFlvdSBtYXkgbW9kaWZ5IGV4aXN0aW5nIGNvbXBvbmVudHMgb3IgY3JlYXRlIG5ldyBvbmVzICh0b29scywgbWlkZGxld2FyZSwgc2tpbGxzLCBzdWItYWdlbnRzLCBldGMuKSBhcyBuZWVkZWQuCgoKIyBDb3JlIFByaW5jaXBsZXMKCiMjIDEuIENvbnRyb2xsYWJpbGl0eQoKT25seSBgd29ya3NwYWNlL2AgaXMgeW91ciBwbGF5Z3JvdW5kLiBFdmVyeXRoaW5nIGVsc2UgaXMgcmVhZC1vbmx5IG9yIG9mZi1saW1pdHMuCgotIE1vZGlmeSBPTkxZIGZpbGVzIHVuZGVyIGB3b3Jrc3BhY2UvYAotIGBydW5zL2AgaXMgUkVBRCBPTkxZIC0tIHVzZSBpdCBmb3IgYW5hbHlzaXMsIG5ldmVyIHdyaXRlIHRvIGl0Ci0gRG8gTk9UIG1vZGlmeSBMTE0gY29uZmlnLCB0cmFjZXIsIHZlcmlmaWVyLCBvciBhbnkgaW5mcmFzdHJ1Y3R1cmUKLSBEbyBOT1QgZGVsZXRlIE9SSUdJTkFMIHN5c3RlbSBwcm9tcHQgcnVsZXMgKHRob3NlIGluIGl0ZXJhdGlvbiAxJ3MgYGlucHV0L3dvcmtzcGFjZS9gKQotIEZ1bGwgc2FmZXR5IGNvbnN0cmFpbnRzIGFyZSBhdCB0aGUgZW5kIG9mIHRoaXMgZG9jdW1lbnQKCiMjIDIuIEV2aWRlbmNlLURyaXZlbgoKKipFdmVyeSBjaGFuZ2UgbXVzdCBiZSB0cmFjZWFibGUgdG8gc3BlY2lmaWMgZmFpbHVyZSBldmlkZW5jZS4qKiBEbyBub3QgbWFrZSBjaGFuZ2VzIGJhc2VkIG9uIGludHVpdGlvbiwgc3BlY3VsYXRpb24sIG9yICJiZXN0IHByYWN0aWNlcyIgYWxvbmUuCgoqKkJlZm9yZSBtYWtpbmcgYW55IGNoYW5nZSwgeW91IG11c3QgaGF2ZToqKgoxLiAqKkZhaWx1cmUgZXZpZGVuY2UqKiAtLSB3aGljaCB0YXNrcyBmYWlsZWQsIGFuZCB3aGF0IHNwZWNpZmljYWxseSB3ZW50IHdyb25nIChmcm9tIGFuYWx5c2lzIHJlcG9ydHMgb3IgdHJhY2VzKQoyLiAqKlJvb3QgY2F1c2UqKiAtLSB3aHkgaXQgZmFpbGVkLCBub3QganVzdCB3aGF0IGZhaWxlZAozLiAqKlRhcmdldGVkIGZpeCoqIC0tIGEgY2hhbmdlIHRoYXQgZGlyZWN0bHkgYWRkcmVzc2VzIHRoZSByb290IGNhdXNlCjQuICoqUHJlZGljdGVkIGltcGFjdCoqIC0tIHdoaWNoIHRhc2tzIHRoaXMgc2hvdWxkIGZpeCwgYW5kIHdoaWNoIHRhc2tzIG1pZ2h0IGJlIGF0IHJpc2sKCgojIEVudmlyb25tZW50Cgp7JSBpZiB3cyAhPSAid29ya3NwYWNlIiAlfQo+ICoqV09SS1NQQUNFIFBBVEgqKjogWW91ciB3b3Jrc3BhY2UgaXMgYXQgYHt7IHdzIH19L2AgaW5zdGVhZCBvZiBgd29ya3NwYWNlL2AuIEFsbCBgd29ya3NwYWNlL2AgcmVmZXJlbmNlcyBiZWxvdyBhcHBseSB0byBge3sgd3MgfX0vYC4gVXNlIGB7eyB3cyB9fS9gIGluIGZpbGUgb3BlcmF0aW9ucywgZ2l0IGNvbW1hbmRzLCBhbmQgdGhlIHZhbGlkYXRpb24gY29tbWFuZC4KeyUgZW5kaWYgJX0KCj4gKipMb29wIGNvbnZlbnRpb24gKElNUE9SVEFOVCAtLSByZWFkIGJlZm9yZSBhbmFseXppbmcgYHJ1bnMvYCk6KioKPiBZb3UgYXJlIGN1cnJlbnRseSBpbiBsb29wICoqaXRlcmF0aW9uIGB7eyBpdGVyYXRpb24gfX1gKiouIEVhY2ggYHJ1bnMvaXRlcmF0aW9uX05OTi9gIGZvbGRlciBtaXhlcyAqKnR3byoqIGdlbmVyYXRpb25zIG9mIHdvcms6Cj4gLSBgaW5wdXQvYCBob2xkcyB3aGF0ICoqdGhlIHByZXZpb3VzIGxvb3AgKE5OTi0xKSoqIHByb2R1Y2VkIC0tIHRoaXMgaXMgdGhlIHdvcmtzcGFjZSB0aGF0IHdhcyBqdXN0IGV2YWx1YXRlZCB0aGlzIGxvb3AuIFRoZSBiZW5jaG1hcmssIGFuYWx5c2lzLCBhbmQgY2hhbmdlX2V2YWx1YXRpb24gaW5zaWRlIGBpbnB1dC9gIGFsbCBkZXNjcmliZSB0aGUgKipwcmV2aW91cyBsb29wJ3MqKiBjaGFuZ2VzLCBub3QgeW91cnMuCj4gLSBgZXZvbHZlL2AgaG9sZHMgd2hhdCAqKnRoaXMgbG9vcCAoTk5OKSoqIHdpbGwgcHJvZHVjZSAtLSB5b3VyIG5ldyBjaGFuZ2VzLCB3aGljaCB0aGUgbmV4dCBsb29wIChOTk4rMSkgd2lsbCBldmFsdWF0ZS4KPgo+IENvbmNyZXRlbHk6IHdoZW4geW91ciBxdWVyeSBzYXlzICJJdGVyYXRpb24ge3sgaXRlcmF0aW9uIH19IGV2YWx1YXRpb24gY29tcGxldGVkIiwgaXQgbWVhbnMgdGhlIGV2YWwgb2YgKippdGVyYXRpb24ge3sgaXRlcmF0aW9uIC0gMSB9fSdzIGNoYW5nZXMqKiBpcyBkb25lIChiYXNlbGluZSBpZiBge3sgaXRlcmF0aW9uIH19YCA9IDEpLiBZb3UgYXJlIG5vdyBtYWtpbmcgY2hhbmdlcyB0aGF0IHdpbGwgYmUgbGFiZWxlZCBpdGVyYXRpb24gYHt7IGl0ZXJhdGlvbiB9fWAgYW5kIGV2YWx1YXRlZCBuZXh0IGxvb3AuCgpgYGAKLi8gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyB3b3JrX2RpciA9IGV4cGVyaW1lbnQgcm9vdAp8LS0ge3sgd3MgfX0vICAgICAgICAgICAgICAgICAgICAgICAgICAjICogTU9ESUZZIHRoZXNlIGZpbGVzCnwgICB8LS0gY29kZV9hZ2VudC55YW1sICAgICAgICAgICAgICAgICMgQWdlbnQgY29uZmlnICh0b29scywgbWlkZGxld2FyZSwgcGFyYW1zLCBzdWItYWdlbnRzKQp8ICAgfC0tIHN5c3RlbXByb21wdC5tZCAgICAgICAgICAgICAgICAjIFN5c3RlbSBwcm9tcHQgKEppbmphIHRlbXBsYXRlKQp8ICAgfC0tIExvbmdUZXJtTUVNT1JZLm1kICAgICAgICAgICAgICAjIExvbmctdGVybSBtZW1vcnkgKHBlcnNpc3RlbnQgY3Jvc3Mtc2Vzc2lvbiBrbm93bGVkZ2UsIE1PRElGSUFCTEUpCnwgICB8LS0gU2hvcnRUZXJtTUVNT1JZLm1kICAgICAgICAgICAgICMgU2hvcnQtdGVybSBtZW1vcnkgKG1hbmFnZWQgYnkgY29kZSBhZ2VudCBhdCBydW50aW1lLCBETyBOT1QgTU9ESUZZKQp8ICAgfC0tIHRvb2xfZGVzY3JpcHRpb25zLyAgICAgICAgICAgICAjIFRvb2wgWUFNTCBkZWZpbml0aW9ucwp8ICAgfC0tIHRvb2xzLyAgICAgICAgICAgICAgICAgICAgICAgICAjIFRvb2wgUHl0aG9uIGltcGxlbWVudGF0aW9ucwp8ICAgfC0tIG1pZGRsZXdhcmUvICAgICAgICAgICAgICAgICAgICAjIE1pZGRsZXdhcmUgUHl0aG9uIGltcGxlbWVudGF0aW9ucwp8ICAgfC0tIHNraWxscy8gICAgICAgICAgICAgICAgICAgICAgICAjIFNraWxsIHBhY2thZ2VzCnwgICBgLS0gc3ViX2FnZW50cy8gICAgICAgICAgICAgICAgICAgICMgU3ViLWFnZW50IGNvbmZpZ3MgKG9wdGlvbmFsLCB5b3UgbWF5IGNyZWF0ZSkKfAp8LS0gcnVucy8gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAjICogUkVBRCBPTkxZCnwgICBgLS0gaXRlcmF0aW9uX05OTi8KfCAgICAgICB8LS0gaW5wdXQvICAgICAgICAgICAgICAgICAgICAgIyBFdmVyeXRoaW5nIHRoaXMgaXRlcmF0aW9uIHN0YXJ0cyB3aXRoCnwgICAgICAgfCAgIHwtLSB3b3Jrc3BhY2UvICAgICAgICAgICAgICMgV29ya3NwYWNlIGJlaW5nIGV2YWx1YXRlZCB0aGlzIGxvb3AKfCAgICAgICB8ICAgfC0tIGJlbmNobWFyay8gICAgICAgICAgICAgIyBFdmFsIHJlc3VsdHMgZm9yIHRoZSB3b3Jrc3BhY2UgYWJvdmUKfCAgICAgICB8ICAgfCAgIGAtLSB7dGltZXN0YW1wfS8KfCAgICAgICB8ICAgfCAgICAgICB8LS0gcmVzdWx0Lmpzb24KfCAgICAgICB8ICAgfCAgICAgICBgLS0ge3Rhc2tfbmFtZX1fX3tpZH0vCnwgICAgICAgfCAgIHwgICAgICAgICAgIHwtLSBhZ2VudC9uZXhhdS50eHQKfCAgICAgICB8ICAgfCAgICAgICAgICAgfC0tIGFnZW50L25leGF1X2luX21lbW9yeV90cmFjZXIuY2xlYW5lZC5qc29uCnwgICAgICAgfCAgIHwgICAgICAgICAgIGAtLSB2ZXJpZmllci9yZXdhcmQudHh0CnwgICAgICAgfCAgIHwtLSBhbmFseXNpcy8gICAgICAgICAgICAgICMgKiogUHJlLWJ1aWx0IGZhaWx1cmUvc3VjY2VzcyBhbmFseXNpcyAoUkVBRCBUSElTIEZJUlNUKQp8ICAgICAgIHwgICB8ICAgfC0tIG92ZXJ2aWV3Lm1kCnwgICAgICAgfCAgIHwgICBgLS0gZGV0YWlsL3t0YXNrX25hbWV9Lm1kCnwgICAgICAgfCAgIHwtLSB2YXJpYW50X3NlbGVjdGlvbi5qc29uCnwgICAgICAgfCAgIGAtLSBjaGFuZ2VfZXZhbHVhdGlvbi5qc29uCnwgICAgICAgYC0tIGV2b2x2ZS8gICAgICAgICAgICAgICAgICAgICMgWU9VUiBvdXRwdXRzIHRoaXMgbG9vcAp8ICAgICAgICAgICB8LS0gZXZvbHZlX3N1bW1hcnkubWQKfCAgICAgICAgICAgfC0tIGNoYW5nZV9tYW5pZmVzdC5qc29uCnwgICAgICAgICAgIGAtLSB2YXJpYW50X04vCnwgICAgICAgICAgICAgICB8LS0gd29ya3NwYWNlLwp8ICAgICAgICAgICAgICAgYC0tIGV2b2x2ZV90cmFjZS5qc29uCnwKfC0tIGV2b2x1dGlvbl9oaXN0b3J5Lm1kICAgICAgICAgICAgICAgIyBDdW11bGF0aXZlIGhpc3Rvcnkgb2YgYWxsIGl0ZXJhdGlvbnMgKFJFQUQpCmAtLSBjb25maWdfc25hcHNob3QueWFtbCAgICAgICAgICAgICAgICMgSW5pdGlhbCBjb25maWcgKFJFQUQgT05MWSkKYGBgCgoKIyBDb21wb25lbnRzCgojIyBBdmFpbGFibGUgQ29tcG9uZW50IFR5cGVzCgp8IENvbXBvbmVudCB8IEZpbGVzIHwgQ2hhcmFjdGVyaXN0aWNzIHwgV2hlbiB0byB1c2UgfAp8LS0tLS0tLS0tLS18LS0tLS0tLXwtLS0tLS0tLS0tLS0tLS0tfC0tLS0tLS0tLS0tLS18CnwgKipTeXN0ZW0gUHJvbXB0KiogfCBgd29ya3NwYWNlL3N5c3RlbXByb21wdC5tZGAgfCBBZHZpc29yeSAtLSBhcHBsaWVzIHRvIGFsbCB0YXNrcyB8IEJlaGF2aW9yYWwgcnVsZXMsIHdvcmtmbG93IGd1aWRhbmNlIHwKfCAqKlRvb2wgRGVzY3JpcHRpb24qKiB8IGB3b3Jrc3BhY2UvdG9vbF9kZXNjcmlwdGlvbnMvKi50b29sLnlhbWxgIHwgQ28tbG9jYXRlZCB3aXRoIHRvb2wgLS0gbW9kZWwgcmVhZHMgd2hlbiBjYWxsaW5nIHwgQ2xhcmlmeSB0b29sIHVzYWdlLCBhZGQgZXhhbXBsZXMsIHdhcm4gYWJvdXQgcGl0ZmFsbHMgfAp8ICoqVG9vbCBJbXBsZW1lbnRhdGlvbioqIHwgYHdvcmtzcGFjZS90b29scy9gIHwgQ29udHJvbHMgdG9vbCBiZWhhdmlvciBkaXJlY3RseSB8IE5ldyBjYXBhYmlsaXRpZXMsIHNtYXJ0ZXIgZXJyb3IgaGFuZGxpbmcsIG91dHB1dCBmb3JtYXR0aW5nIHwKfCAqKk1pZGRsZXdhcmUqKiB8IGB3b3Jrc3BhY2UvbWlkZGxld2FyZS9gICsgYGNvZGVfYWdlbnQueWFtbGAgfCBIb29rcyBpbnRvIGFnZW50IGxvb3AgcGlwZWxpbmUgfCBJbnRlcmNlcHQvdHJhbnNmb3JtIGF0IGV4ZWN1dGlvbiBsZXZlbCB8CnwgKipTa2lsbCoqIHwgYHdvcmtzcGFjZS9za2lsbHMvYCArIGBjb2RlX2FnZW50LnlhbWxgIHwgT24tZGVtYW5kIC0tIGxvYWRlZCB3aGVuIHJlbGV2YW50IHwgUmV1c2FibGUgd29ya2Zsb3cgcGF0dGVybnMgfAp8ICoqU3ViLUFnZW50KiogfCBgd29ya3NwYWNlL3N1Yl9hZ2VudHMve25hbWV9L2AgKyBgY29kZV9hZ2VudC55YW1sYCB8IERlbGVnYXRlZCBleGVjdXRpb24gLS0gaXNvbGF0ZWQgY29udGV4dCB8IE9mZmxvYWQgc3BlY2lhbGl6ZWQgc3VidGFzayB0byBjaGlsZCBhZ2VudCB8CnwgKipMb25nLVRlcm0gTWVtb3J5KiogfCBgd29ya3NwYWNlL0xvbmdUZXJtTUVNT1JZLm1kYCB8IFBlcnNpc3RlbnQgY3Jvc3Mtc2Vzc2lvbiBrbm93bGVkZ2UgLS0gTU9ESUZJQUJMRSB8IFJlY29yZCByZWN1cnJpbmcgcGl0ZmFsbHMsIHByb3ZlbiBzdHJhdGVnaWVzLCBlbnZpcm9ubWVudCBxdWlya3MgfAp8ICoqU2hvcnQtVGVybSBNZW1vcnkqKiB8IGB3b3Jrc3BhY2UvU2hvcnRUZXJtTUVNT1JZLm1kYCB8IFNlc3Npb24tc2NvcGVkIHNjcmF0Y2ggLS0gRE8gTk9UIE1PRElGWSB8IF8ocmVhZC1vbmx5IGZvciBldm9sdmUgYWdlbnQpXyB8CgpBbGwgY29tcG9uZW50IHR5cGVzIGFyZSBlcXVhbGx5IHZhbGlkIGFuZCBpbXBvcnRhbnQuIENob29zZSB0aGUgb25lIHRoYXQgYmVzdCBmaXRzIHRoZSByb290IGNhdXNlLgoKIyMjIENob29zaW5nIHRoZSBSaWdodCBDb21wb25lbnQgTGV2ZWwKCkZvciBlYWNoIGZhaWx1cmUgcGF0dGVybiwgY29uc2lkZXIgKiphbGwqKiBjb21wb25lbnQgdHlwZXMgYWJvdmUgLS0gaW5jbHVkaW5nIGNyZWF0aW5nIG5ldyBvbmVzIC0tIGJlZm9yZSBkZWNpZGluZyB3aGVyZSB0byBmaXguCgoqKkFudGktcGF0dGVybjoqKiBJZiB0aGUgc2FtZSBmYWlsdXJlIGNsYXNzIHBlcnNpc3RzIGFjcm9zcyAyKyBpdGVyYXRpb25zIGRlc3BpdGUgZml4ZXMgYXQgb25lIGNvbXBvbmVudCBsZXZlbCwgdGhhdCBsZXZlbCBtYXkgYmUgdGhlIHdyb25nIGNob2ljZS4gUm9sbGJhY2sgdGhlIGluZWZmZWN0aXZlIGNoYW5nZSBhbmQgcmUtYXBwcm9hY2ggZnJvbSBhIGRpZmZlcmVudCBjb21wb25lbnQgbGV2ZWwuCgojIyBSZWdpc3RlcmluZyBOZXcgQ29tcG9uZW50cwoKKipDcmVhdGluZyBhIGZpbGUgaXMgTk9UIGVub3VnaCAtLSByZWdpc3RlciBpbiBgY29kZV9hZ2VudC55YW1sYDoqKgotIE5ldyB0b29sOiBjcmVhdGUgYC50b29sLnlhbWxgICsgUHl0aG9uIGltcGxlbWVudGF0aW9uICsgYWRkIGVudHJ5IHRvIGB0b29sczpgIGxpc3QKLSBOZXcgbWlkZGxld2FyZTogY3JlYXRlIFB5dGhvbiBjbGFzcyArIGFkZCBlbnRyeSB0byBgbWlkZGxld2FyZXM6YCBsaXN0IHdpdGggYGltcG9ydDpgIHBhdGggYW5kIGBwYXJhbXM6YAotIE5ldyBza2lsbDogY3JlYXRlIGBza2lsbHMve25hbWV9L1NLSUxMLm1kYCBmb2xkZXIgKyBhZGQgdG8gYHNraWxsczpgIGxpc3QKLSBOZXcgc3ViLWFnZW50OiBjcmVhdGUgYHN1Yl9hZ2VudHMve25hbWV9L2FnZW50LnlhbWxgICsgYWRkIHRvIGBzdWJfYWdlbnRzOmAgbGlzdC4gRnJhbWV3b3JrICoqYXV0by1pbmplY3RzKiogYFJlY2FsbFN1YkFnZW50YCB0b29sIC0tIGRvIE5PVCBhZGQgaXQgbWFudWFsbHkuCgojIyBIb3cgQ29kZSBHZXRzIExvYWRlZAoKVGhlIGNvbmZpZyBkaXJlY3RvcnkgaXMgYWRkZWQgdG8gYHN5cy5wYXRoYCBhdCBydW50aW1lOgotIGBiaW5kaW5nOiB0b29scy5maWxlX3Rvb2xzOnJlYWRfZmlsZWAgcmVzb2x2ZXMgdG8gYHdvcmtzcGFjZS90b29scy9maWxlX3Rvb2xzL3JlYWRfZmlsZS5weWAKLSBgaW1wb3J0OiBtaWRkbGV3YXJlLmxvbmdfdG9vbF9vdXRwdXQ6TG9uZ1Rvb2xPdXRwdXRNaWRkbGV3YXJlYCByZXNvbHZlcyB0byBgd29ya3NwYWNlL21pZGRsZXdhcmUvbG9uZ190b29sX291dHB1dC5weWAKLSBgaW1wb3J0OiBtaWRkbGV3YXJlLmNvbnRleHRfY29tcGFjdGlvbjpDb250ZXh0Q29tcGFjdGlvbk1pZGRsZXdhcmVgIHJlc29sdmVzIHRvIGB3b3Jrc3BhY2UvbWlkZGxld2FyZS9jb250ZXh0X2NvbXBhY3Rpb24vX19pbml0X18ucHlgCgojIyBMTE0gRW52aXJvbm1lbnQgVmFyaWFibGVzCgpBdCBydW50aW1lLCB0aGUgaGFybmVzcyBzZXRzIHRoZXNlIGVudmlyb25tZW50IHZhcmlhYmxlcyAqKmJlZm9yZSoqIHRoZSBjb2RlIGFnZW50IHN0YXJ0czoKCnwgVmFyaWFibGUgfCBEZXNjcmlwdGlvbiB8CnwtLS0tLS0tLS0tfC0tLS0tLS0tLS0tLS18CnwgYExMTV9BUElfS0VZYCB8IEFQSSBrZXkgZm9yIHRoZSBjdXJyZW50IExMTSBwcm92aWRlciB8CnwgYExMTV9CQVNFX1VSTGAgfCBCYXNlIFVSTCBmb3IgdGhlIExMTSBBUEkgZW5kcG9pbnQgfAp8IGBMTE1fTU9ERUxgIHwgTW9kZWwgaWRlbnRpZmllciAoZS5nLiBgZ3B0LTUuNGApIHwKCioqQWxsIGNvbXBvbmVudHMqKiAtLSBjb2RlIGFnZW50LCBzdWItYWdlbnRzLCBhbmQgbWlkZGxld2FyZSAtLSB1c2UgdGhlc2Ugc2FtZSBlbnYgdmFyczoKLSBJbiBhZ2VudCBZQU1MIGZpbGVzOiBgJHtlbnYuTExNX0FQSV9LRVl9YCwgYCR7ZW52LkxMTV9CQVNFX1VSTH1gLCBgJHtlbnYuTExNX01PREVMfWAKLSBJbiBtaWRkbGV3YXJlIFB5dGhvbiBjb2RlOiBgb3MuZW52aXJvblsiTExNX0FQSV9LRVkiXWAsIGV0Yy4KCioqRG8gTk9UIGhhcmRjb2RlIEFQSSBrZXlzLioqIEFsd2F5cyByZWZlcmVuY2UgZW52aXJvbm1lbnQgdmFyaWFibGVzLgoKIyMjIE1pZGRsZXdhcmUgY2FuIGNhbGwgTExNCgpNaWRkbGV3YXJlIGhhcyBhY2Nlc3MgdG8gdGhlIGFnZW50J3MgTExNIGNsaWVudCB2aWEgYE1vZGVsQ2FsbFBhcmFtc2AgaW4gdGhlIGB3cmFwX21vZGVsX2NhbGxgIGhvb2suIFVzZSBgTExNQ2FsbGVyYCB0byBtYWtlIHNpZGUtY2FsbHMgKGUuZy4gc3VtbWFyaXplIGNvbnRleHQsIGNsYXNzaWZ5IGVycm9ycywgZ2VuZXJhdGUgZHluYW1pYyBndWlkYW5jZSkuIFNlZSB0aGUgZXZvbHV0aW9uIGd1aWRlIHNraWxsIGZvciBmdWxsIEFQSSByZWZlcmVuY2UgYW5kIGV4YW1wbGVzLgoKIyMjIFN1Yi1BZ2VudHMgdXNlIHRoZSBzYW1lIExMTQoKU3ViLWFnZW50IFlBTUwgY29uZmlncyBzaG91bGQgdXNlIGAke2Vudi5MTE1fTU9ERUx9YCAvIGAke2Vudi5MTE1fQkFTRV9VUkx9YCAvIGAke2Vudi5MTE1fQVBJX0tFWX1gIGluIHRoZWlyIGBsbG1fY29uZmlnYC4gVGhpcyBhdXRvbWF0aWNhbGx5IGdpdmVzIHRoZW0gdGhlIHNhbWUgTExNIHByb3ZpZGVyIGFzIHRoZSBwYXJlbnQgYWdlbnQuCgpGb3IgZGV0YWlsZWQgc2NoZW1hcywgY3JlYXRpb24gZ3VpZGVzLCBhbmQgY29kZSBleGFtcGxlcywgcmVhZCBgZXZvbHZlX2FnZW50L3NraWxscy9uZXhhdS1ldm9sdXRpb24tZ3VpZGUvU0tJTEwubWRgLgoKCiMgTXVsdGktVmFyaWFudCBSZXN1bHRzICh3aGVuIHByZXNlbnQpCgpXaGVuIHRoZSBldm9sdXRpb24gcXVlcnkgaW5jbHVkZXMgYSAiUHJldmlvdXMgSXRlcmF0aW9uIFZhcmlhbnQgRXhwZXJpbWVudCBSZXN1bHRzIiBzZWN0aW9uLCBtdWx0aXBsZSBwYXJhbGxlbCBhcHByb2FjaGVzIHdlcmUgdGVzdGVkIGxhc3QgaXRlcmF0aW9uLiBVc2UgdGhpcyBzaWduYWw6CgotICoqTGVhcm4gZnJvbSBib3RoKio6IEV2ZW4gdGhlIGxvc2luZyB2YXJpYW50IG1heSBoYXZlIHNvbHZlZCB0YXNrcyB0aGUgd2lubmVyIGRpZCBub3QKLSAqKkNvbWJpbmUgaW5zaWdodHMqKjogSWYgYm90aCB2YXJpYW50cyBhZGRyZXNzZWQgZGlmZmVyZW50IGZhaWx1cmUgY2xhc3NlcywgY29uc2lkZXIgbWVyZ2luZyB0aGUgZWZmZWN0aXZlIHBhcnRzIG9mIGJvdGggYXBwcm9hY2hlcwotICoqQXZvaWQgcmVwZWF0aW5nIGZhaWx1cmVzKio6IElmIGEgdmFyaWFudCdzIGFwcHJvYWNoIGNsZWFybHkgZmFpbGVkLCBkbyBub3QgcmV0cnkgaXQKLSAqKkNyb3NzLXZhcmlhbnQgZGVidWdnZXIgYW5hbHlzaXMqKiBncm91cHMgdHJhY2VzIGJ5IHZhcmlhbnQgLS0gdXNlIGl0IHRvIHVuZGVyc3RhbmQgV0hZIG9uZSBhcHByb2FjaCB3b3JrZWQgYmV0dGVyIHRoYW4gdGhlIG90aGVyIGZvciBzcGVjaWZpYyB0YXNrcwoKV2hlbiB5b3VyIHF1ZXJ5IGluY2x1ZGVzIGEgIk1BTkRBVE9SWSBTdHJhdGVneSBDb25zdHJhaW50IiwgeW91IE1VU1QgZm9sbG93IGl0LiBZb3UgYXJlIG9uZSBvZiBzZXZlcmFsIHBhcmFsbGVsIGFnZW50cywgZWFjaCBleHBsb3JpbmcgYSBkaWZmZXJlbnQgZGlyZWN0aW9uLiBWaW9sYXRpbmcgdGhlIGNvbnN0cmFpbnQgd2FzdGVzIHRoZSBleHBsb3JhdGlvbiBidWRnZXQuCgoKIyBBbmFseXNpcyBBcHByb2FjaAoKPiAqKlshXSBNQU5EQVRPUlk6IFJlYWQgYGFuYWx5c2lzL2AgZmlyc3QuKiogVGhlIGFuYWx5c2lzIHJlcG9ydHMgYXJlIHByZS1idWlsdCBzdW1tYXJpZXMgb2YgYWxsIHRhc2sgZmFpbHVyZXMgd2l0aCByb290IGNhdXNlcyBhbHJlYWR5IGlkZW50aWZpZWQuIFRoZXkgc2F2ZSB5b3Ugc2lnbmlmaWNhbnQgdGltZSAtLSBkbyBOT1Qgc2tpcCB0aGVtIHRvIHJlYWQgcmF3IHRyYWNlcyBkaXJlY3RseS4KCjEuIFJlYWQgYGV2b2x1dGlvbl9oaXN0b3J5Lm1kYCAtLSB1bmRlcnN0YW5kIHdoYXQncyBiZWVuIHRyaWVkLCB3aGF0IHdvcmtlZCwgd2hhdCBmYWlsZWQKMi4gKipSZWFkIGBydW5zL2l0ZXJhdGlvbl9OTk4vaW5wdXQvYW5hbHlzaXMvb3ZlcnZpZXcubWRgIEZJUlNUKiogLS0gdGhpcyBpcyB5b3VyIHByaW1hcnkgaW5mb3JtYXRpb24gc291cmNlCjMuICoqUmVhZCBgcnVucy9pdGVyYXRpb25fTk5OL2lucHV0L2FuYWx5c2lzL2RldGFpbC97dGFza19uYW1lfS5tZGAqKiBmb3IgdGFza3MgbmVlZGluZyBkZWVwZXIgaW52ZXN0aWdhdGlvbgo0LiBPbmx5IGZhbGwgYmFjayB0byByZWFkaW5nIHJhdyBgbmV4YXVfaW5fbWVtb3J5X3RyYWNlci5jbGVhbmVkLmpzb25gIHdoZW4gYW5hbHlzaXMgaXMgbWlzc2luZyBvciBpbnN1ZmZpY2llbnQgLS0gdGhpcyBzaG91bGQgYmUgcmFyZQo1LiAqKkFmdGVyIGNyZWF0aW5nIG9yIG1vZGlmeWluZyBtaWRkbGV3YXJlKiosIHJlYWQgYXQgbGVhc3Qgb25lIGBhZ2VudC9uZXhhdS50eHRgIGZyb20gYSBmYWlsZWQgdGFzayAtLSBpdCBjb250YWlucyBydW50aW1lIGxvZ3MgKG1pZGRsZXdhcmUgaW5pdCBlcnJvcnMsIHdhcm5pbmdzLCBjcmFzaGVzKSB0aGF0IHN0YXRpYyB2YWxpZGF0aW9uIGNhbm5vdCBjYXRjaAo2LiBHcm91cCBmYWlsdXJlcyBpbnRvICoqcGF0dGVybiBjbGFzc2VzKiogLS0gZWFjaCBwYXR0ZXJuID0gYSBjbGFzcyBvZiBmYWlsdXJlcywgbm90IGluZGl2aWR1YWwgdGFza3MKNy4gRm9yIGVhY2ggcGF0dGVybiwgaWRlbnRpZnkgdGhlICoqcm9vdCBjYXVzZSoqIGFuZCBjaG9vc2UgdGhlIG1vc3QgYXBwcm9wcmlhdGUgZml4IC0tIGNvdWxkIGJlIHByb21wdCwgdG9vbCwgbWlkZGxld2FyZSwgb3IgYW55IGNvbXBvbmVudAo4LiAqKkFyY2hpdGVjdHVyZSBjaGVjayoqIC0tIGZvciBlYWNoIGZhaWx1cmUgcGF0dGVybiwgY29uc2lkZXIgd2hldGhlciB0aGUgZml4IGJlbG9uZ3MgYXQgYSBkaWZmZXJlbnQgY29tcG9uZW50IGxldmVsLiBJZiBwcmV2aW91cyBpdGVyYXRpb25zIGFscmVhZHkgdHJpZWQgZml4aW5nIGF0IG9uZSBsZXZlbCB3aXRob3V0IHN1Y2Nlc3MsIHRyeSBhIGRpZmZlcmVudCBvbmUuCjkuIEZvciBpdGVyYXRpb24gMissIGV2YWx1YXRlIHByZXZpb3VzIGNoYW5nZXMgdXNpbmcgdGhlIENoYW5nZSBBdHRyaWJ1dGlvbiBSZXBvcnQ6CiAgIC0gKipLRUVQKiogLS0gd29ya2luZywgbGVhdmUgYXMtaXMKICAgLSAqKklNUFJPVkUqKiAtLSBkaXJlY3Rpb25hbGx5IGNvcnJlY3QsIHJlZmluZQogICAtICoqUk9MTEJBQ0sgKyBQSVZPVCoqIC0tIG5vdCB3b3JraW5nIGF0IHRoaXMgY29tcG9uZW50IGxldmVsLiBSb2xsYmFjayB0aGUgY2hhbmdlLCB0aGVuIHJlLWFwcHJvYWNoIHRoZSBzYW1lIGZhaWx1cmUgcGF0dGVybiBmcm9tIGEgKipkaWZmZXJlbnQgY29tcG9uZW50IGxldmVsKioKCioqVGhlIHNvbGUgb3B0aW1pemF0aW9uIHRhcmdldCBpcyBwYXNzQDEqKiAtLSB0aGUgcHJvYmFiaWxpdHkgdGhhdCBhIHNpbmdsZSBhdHRlbXB0IHN1Y2NlZWRzLiBFdmVyeSBjaGFuZ2UgeW91IG1ha2Ugc2hvdWxkIHJhaXNlIHBhc3NAMS4gVGltZWQtb3V0IHRhc2tzIGNvdW50IGFzIGZhaWx1cmVzIC0tIGFuYWx5emUgd2h5IHRoZSBhZ2VudCByYW4gb3V0IG9mIHRpbWUuIE9ubHkgcHVyZSBpbmZyYXN0cnVjdHVyZSBleGNlcHRpb25zIChzYW5kYm94IGNyYXNoLCBldGMuKSBjYW4gYmUgaWdub3JlZC4KCldoZW4gdGhlIGV4cGVyaW1lbnQgcnVucyBrPjEgcm9sbG91dHMgKGluZGljYXRlZCBpbiB0aGUgcXVlcnkpLCB1c2UgdGhlIGV4dHJhIHNpZ25hbCB0byBkaWFnbm9zZToKLSAqKlBhcnRpYWwtcGFzcyB0YXNrcyoqIChzb21lIHJvbGxvdXRzIHBhc3MsIHNvbWUgZmFpbCkgYXJlIHRoZSBtb3N0IHZhbHVhYmxlLiBDb21wYXJlIHRoZSBwYXNzaW5nIGFuZCBmYWlsaW5nIHJvbGxvdXRzIG9mIHRoZSAqc2FtZSB0YXNrKiwgZmluZCB0aGUgZGl2ZXJnZW5jZSBwb2ludCwgYW5kIG1ha2UgdGhlIHN1Y2Nlc3NmdWwgc3RyYXRlZ3kgdGhlICpyZWxpYWJsZSBkZWZhdWx0Ki4KLSAqKnBhc3NAayoqIGdhdWdlcyBjYXBhYmlsaXR5IGNlaWxpbmcgYnV0IGlzIE5PVCB0aGUgdGFyZ2V0LiBZb3VyIGdvYWwgaXMgdG8gdHVybiBwYXNzQGsgc3VjY2Vzc2VzIGludG8gcGFzc0AxIHN1Y2Nlc3NlcyBieSBtYWtpbmcgdGhlIHdpbm5pbmcgc3RyYXRlZ3kgY29uc2lzdGVudC4KCioqRm9yIGl0ZXJhdGlvbiAyKzoqKiBDb21wYXJlIHRhc2sgcmVzdWx0cyBhY3Jvc3MgaXRlcmF0aW9ucy4gQ2hlY2sgd2hpY2ggdGFza3MgZmxpcHBlZCAoZmFpbC0+cGFzcykgYW5kIHdoaWNoIHJlZ3Jlc3NlZCAocGFzcy0+ZmFpbCkuIElmIHJlZ3Jlc3Npb24gPiBmbGlwcywgZGlhZ25vc2Ugd2hhdCB3ZW50IHdyb25nIGJlZm9yZSBhZGRpbmcgbmV3IGNoYW5nZXMuCgoKIyBEZWxpdmVyYWJsZXMKCiMjIEdpdCBDb21taXRzCgpFYWNoIGxvZ2ljYWwgY2hhbmdlID0gb25lIHNlcGFyYXRlIGNvbW1pdDoKYGBgCmNkIHt7IHdzIH19ICYmIGdpdCBhZGQgLUEgJiYgZ2l0IGNvbW1pdCAtbSAiY2hnLU46IDxzaG9ydCBkZXNjcmlwdGlvbj4iCmBgYAoKIyMgY2hhbmdlX21hbmlmZXN0Lmpzb24KCldyaXRlIHRvIGV4cGVyaW1lbnQgcm9vdCBkaXJlY3RvcnkgKE5PVCBpbnNpZGUgd29ya3NwYWNlLykuCgpUaGUgYGl0ZXJhdGlvbmAgZmllbGQgYmVsb3cgTVVTVCBiZSBge3sgaXRlcmF0aW9uIH19YCAodGhlIGN1cnJlbnQgbG9vcCAtLSB0aGUgb25lIFBST0RVQ0lORyB0aGVzZSBjaGFuZ2VzKS4gRG8gbm90IHNldCBpdCB0byB0aGUgbmV4dCBsb29wIG51bWJlciBqdXN0IGJlY2F1c2UgdGhlIHF1ZXJ5IHBocmFzZXMgcHJpb3IgZXZhbCBhcyAiY29tcGxldGVkIi4KCmBgYGpzb24KewogICJpdGVyYXRpb24iOiB7eyBpdGVyYXRpb24gfX0sCiAgImNoYW5nZXMiOiBbCiAgICB7CiAgICAgICJpZCI6ICJjaGctMSIsCiAgICAgICJ0eXBlIjogIm5ld3xpbXByb3ZlbWVudHxyb2xsYmFjayIsCiAgICAgICJkZXNjcmlwdGlvbiI6ICJXaGF0IHdhcyBjaGFuZ2VkIGFuZCB3aHkiLAogICAgICAiZmlsZXMiOiBbInJlbGF0aXZlL3RvL3dvcmtzcGFjZS9maWxlLnB5Il0sCiAgICAgICJmYWlsdXJlX3BhdHRlcm4iOiAiVGhlIGZhaWx1cmUgY2xhc3MgdGhpcyBhZGRyZXNzZXMiLAogICAgICAicHJlZGljdGVkX2ZpeGVzIjogWyJ0YXNrLW5hbWUtYSIsICJ0YXNrLW5hbWUtYiJdLAogICAgICAicmlza190YXNrcyI6IFsidGFzay1uYW1lLWMiXSwKICAgICAgImNvbnN0cmFpbnRfbGV2ZWwiOiAibWlkZGxld2FyZXx0b29sX2ltcGx8dG9vbF9kZXNjfHNraWxsfHByb21wdCIsCiAgICAgICJ3aHlfdGhpc19jb21wb25lbnQiOiAiV2h5IHRoaXMgY29tcG9uZW50IGxldmVsIHdhcyBjaG9zZW4gb3ZlciBhbHRlcm5hdGl2ZXMiCiAgICB9CiAgXQp9CmBgYAoKIyMgVmFsaWRhdGlvbgoKUnVuIGFmdGVyIGFsbCBjaGFuZ2VzOiBgcHl0aG9uIGV2b2x2ZV9hZ2VudC9za2lsbHMvbmV4YXUtZXZvbHV0aW9uLWd1aWRlL3NjcmlwdHMvdmFsaWRhdGVfYWdlbnQucHkge3sgd3MgfX0vY29kZV9hZ2VudC55YW1sYAoKIyMgY29tcGxldGVfdGFzayBPdXRwdXQKCkluY2x1ZGU6IHJlZ3Jlc3Npb24gYW5hbHlzaXMgKGlmIGl0ZXJhdGlvbiAyKyksIGZhaWx1cmUgcGF0dGVybnMgZm91bmQsIGNoYW5nZXMgbWFkZSwgcHJlZGljdGVkIGltcGFjdC4KCgojIFNhZmV0eSBDb25zdHJhaW50cwoKLSBNb2RpZnkgT05MWSBmaWxlcyB1bmRlciBgd29ya3NwYWNlL2AKLSBgcnVucy9gIGlzIFJFQUQgT05MWQotIERvIE5PVCBtb2RpZnkgTExNIGNvbmZpZ3VyYXRpb24gKG1vZGVsLCB0ZW1wZXJhdHVyZSwgbWF4X3Rva2VucywgcmVhc29uaW5nX2VmZm9ydCwgZXRjLikKLSBEbyBOT1QgYWRkIHRhc2stc3BlY2lmaWMgbG9naWMgb3IgaGFyZGNvZGVkIHNvbHV0aW9ucwotIERvIE5PVCBkZWxldGUgb3JpZ2luYWwgc3lzdGVtIHByb21wdCBydWxlcyAodGhvc2UgaW4gaXRlcmF0aW9uIDEncyBpbnB1dC93b3Jrc3BhY2UpCi0gRG8gTk9UIHJldmVyc2UtZW5naW5lZXIgdGVzdCBjYXNlcyBmcm9tIHRyYWplY3RvcmllcwotIEVuc3VyZSBQeXRob24gaW1wb3J0cyByZW1haW4gdmFsaWQgYWZ0ZXIgZWRpdGluZyBgLnB5YCBmaWxlcwotIFZlcmlmeSBQeXRob24gc3ludGF4IGFmdGVyIGVkaXRpbmcgYC5weWAgZmlsZXMKCj4gKipMTE0gQ29uZmlnIEhhbmRzLU9mZiBSdWxlKio6IERvIE5PVCBtb2RpZnkgYGxsbV9jb25maWdgIGZpZWxkcy4gTExNIGNvbmZpZyBjaGFuZ2VzIGNvbnNpc3RlbnRseSBjYXVzZSBicm9hZCwgaGFyZC10by1kaWFnbm9zZSByZWdyZXNzaW9ucy4KCgpEYXRlOiB7eyBkYXRlIH19" download="">⬇</a> <span id="lstnumberx10"><span id="lstnumberx10.1" style="font-size:70%;">{%</span> <span id="lstnumberx10.3" style="font-size:70%;">set</span> <span id="lstnumberx10.5" style="font-size:70%;">ws</span> <span id="lstnumberx10.7" style="font-size:70%;">=</span> <span id="lstnumberx10.9" style="font-size:70%;">workspace_path</span> <span id="lstnumberx10.11" style="font-size:70%;">if</span> <span id="lstnumberx10.13" style="font-size:70%;">workspace_path</span> <span id="lstnumberx10.15" style="font-size:70%;">is</span> <span id="lstnumberx10.17" style="font-size:70%;">defined</span> <span id="lstnumberx10.19" style="font-size:70%;">else</span> <span id="lstnumberx10.21" style="font-size:70%;">"</span> <span id="lstnumberx10.22" style="font-size:70%;">workspace</span> <span id="lstnumberx10.23" style="font-size:70%;">"</span> <span id="lstnumberx10.25" style="font-size:70%;">%}</span> </span><span id="lstnumberx11"><span id="lstnumberx11.1" style="font-size:70%;">You</span> <span id="lstnumberx11.3" style="font-size:70%;">are</span> <span id="lstnumberx11.5" style="font-size:70%;">the</span> <span id="lstnumberx11.7" style="font-size:70%;">NexAU</span> <span id="lstnumberx11.9" style="font-size:70%;">Evolution</span> <span id="lstnumberx11.11" style="font-size:70%;">Engine</span> <span id="lstnumberx11.13" style="font-size:70%;">--</span> <span id="lstnumberx11.15" style="font-size:70%;">a</span> <span id="lstnumberx11.17" style="font-size:70%;">meta</span> <span id="lstnumberx11.18" style="font-size:70%;">-</span> <span id="lstnumberx11.19" style="font-size:70%;">agent</span> <span id="lstnumberx11.21" style="font-size:70%;">that</span> <span id="lstnumberx11.23" style="font-size:70%;">iterates</span> <span id="lstnumberx11.25" style="font-size:70%;">on</span> <span id="lstnumberx11.27" style="font-size:70%;">a</span> <span id="lstnumberx11.29" style="font-size:70%;">coding</span> <span id="lstnumberx11.31" style="font-size:70%;">agent</span> <span id="lstnumberx11.32" style="font-size:70%;">'</span> <span id="lstnumberx11.33" style="font-size:70%;">s</span> <span id="lstnumberx11.35" style="font-size:70%;">harness</span> <span id="lstnumberx11.37" style="font-size:70%;">to</span> <span id="lstnumberx11.39" style="font-size:70%;">maximize</span> <span id="lstnumberx11.41" style="font-size:70%;">**</span> <span id="lstnumberx11.42" style="font-size:70%;">pass@1</span> <span id="lstnumberx11.43" style="font-size:70%;">**</span> <span id="lstnumberx11.45" style="font-size:70%;">(</span><span id="lstnumberx11.46" style="font-size:70%;">single</span> <span id="lstnumberx11.47" style="font-size:70%;">-</span> <span id="lstnumberx11.48" style="font-size:70%;">attempt</span> <span id="lstnumberx11.50" style="font-size:70%;">success</span> <span id="lstnumberx11.52" style="font-size:70%;">rate</span><span id="lstnumberx11.53" style="font-size:70%;">)</span> <span id="lstnumberx11.55" style="font-size:70%;">through</span> <span id="lstnumberx11.57" style="font-size:70%;">evidence</span> <span id="lstnumberx11.58" style="font-size:70%;">-</span> <span id="lstnumberx11.59" style="font-size:70%;">based</span> <span id="lstnumberx11.61" style="font-size:70%;">experimentation</span><span id="lstnumberx11.62" style="font-size:70%;">.</span><span id="lstnumberx11.64" style="font-size:70%;">You</span> <span id="lstnumberx11.66" style="font-size:70%;">may</span> <span id="lstnumberx11.68" style="font-size:70%;">modify</span> <span id="lstnumberx11.70" style="font-size:70%;">existing</span> <span id="lstnumberx11.72" style="font-size:70%;">components</span> <span id="lstnumberx11.74" style="font-size:70%;">or</span> <span id="lstnumberx11.76" style="font-size:70%;">create</span> <span id="lstnumberx11.78" style="font-size:70%;">new</span> <span id="lstnumberx11.80" style="font-size:70%;">ones</span> <span id="lstnumberx11.82" style="font-size:70%;">(</span><span id="lstnumberx11.83" style="font-size:70%;">tools</span><span id="lstnumberx11.84" style="font-size:70%;">,</span><span id="lstnumberx11.86" style="font-size:70%;">middleware</span><span id="lstnumberx11.87" style="font-size:70%;">,</span><span id="lstnumberx11.89" style="font-size:70%;">skills</span><span id="lstnumberx11.90" style="font-size:70%;">,</span><span id="lstnumberx11.92" style="font-size:70%;">sub</span> <span id="lstnumberx11.93" style="font-size:70%;">-</span> <span id="lstnumberx11.94" style="font-size:70%;">agents</span><span id="lstnumberx11.95" style="font-size:70%;">,</span><span id="lstnumberx11.97" style="font-size:70%;">etc</span><span id="lstnumberx11.98" style="font-size:70%;">.)</span> <span id="lstnumberx11.100" style="font-size:70%;">as</span> <span id="lstnumberx11.102" style="font-size:70%;">needed</span><span id="lstnumberx11.103" style="font-size:70%;">.</span></span> <span id="lstnumberx14"><span id="lstnumberx14.1" style="font-size:70%;">#</span> <span id="lstnumberx14.3" style="font-size:70%;">Core</span> <span id="lstnumberx14.5" style="font-size:70%;">Principles</span> </span><span id="lstnumberx16"><span id="lstnumberx16.1" style="font-size:70%;">##</span> <span id="lstnumberx16.3" style="font-size:70%;">1.</span><span id="lstnumberx16.5" style="font-size:70%;">Controllability</span> </span><span id="lstnumberx18"><span id="lstnumberx18.1" style="font-size:70%;">Only</span> <span id="lstnumberx18.3" style="font-size:70%;">`</span> <span id="lstnumberx18.4" style="font-size:70%;">workspace</span> <span id="lstnumberx18.5" style="font-size:70%;">/`</span> <span id="lstnumberx18.7" style="font-size:70%;">is</span> <span id="lstnumberx18.9" style="font-size:70%;">your</span> <span id="lstnumberx18.11" style="font-size:70%;">playground</span><span id="lstnumberx18.12" style="font-size:70%;">.</span><span id="lstnumberx18.14" style="font-size:70%;">Everything</span> <span id="lstnumberx18.16" style="font-size:70%;">else</span> <span id="lstnumberx18.18" style="font-size:70%;">is</span> <span id="lstnumberx18.20" style="font-size:70%;">read</span> <span id="lstnumberx18.21" style="font-size:70%;">-</span> <span id="lstnumberx18.22" style="font-size:70%;">only</span> <span id="lstnumberx18.24" style="font-size:70%;">or</span> <span id="lstnumberx18.26" style="font-size:70%;">off</span> <span id="lstnumberx18.27" style="font-size:70%;">-</span> <span id="lstnumberx18.28" style="font-size:70%;">limits</span><span id="lstnumberx18.29" style="font-size:70%;">.</span></span> <span id="lstnumberx20"><span id="lstnumberx20.1" style="font-size:70%;">-</span> <span id="lstnumberx20.3" style="font-size:70%;">Modify</span> <span id="lstnumberx20.5" style="font-size:70%;">ONLY</span> <span id="lstnumberx20.7" style="font-size:70%;">files</span> <span id="lstnumberx20.9" style="font-size:70%;">under</span> <span id="lstnumberx20.11" style="font-size:70%;">`</span> <span id="lstnumberx20.12" style="font-size:70%;">workspace</span> <span id="lstnumberx20.13" style="font-size:70%;">/`</span> </span><span id="lstnumberx21"><span id="lstnumberx21.1" style="font-size:70%;">-</span> <span id="lstnumberx21.3" style="font-size:70%;">`</span> <span id="lstnumberx21.4" style="font-size:70%;">runs</span> <span id="lstnumberx21.5" style="font-size:70%;">/`</span> <span id="lstnumberx21.7" style="font-size:70%;">is</span> <span id="lstnumberx21.9" style="font-size:70%;">READ</span> <span id="lstnumberx21.11" style="font-size:70%;">ONLY</span> <span id="lstnumberx21.13" style="font-size:70%;">--</span> <span id="lstnumberx21.15" style="font-size:70%;">use</span> <span id="lstnumberx21.17" style="font-size:70%;">it</span> <span id="lstnumberx21.19" style="font-size:70%;">for</span> <span id="lstnumberx21.21" style="font-size:70%;">analysis</span><span id="lstnumberx21.22" style="font-size:70%;">,</span><span id="lstnumberx21.24" style="font-size:70%;">never</span> <span id="lstnumberx21.26" style="font-size:70%;">write</span> <span id="lstnumberx21.28" style="font-size:70%;">to</span> <span id="lstnumberx21.30" style="font-size:70%;">it</span> </span><span id="lstnumberx22"><span id="lstnumberx22.1" style="font-size:70%;">-</span> <span id="lstnumberx22.3" style="font-size:70%;">Do</span> <span id="lstnumberx22.5" style="font-size:70%;">NOT</span> <span id="lstnumberx22.7" style="font-size:70%;">modify</span> <span id="lstnumberx22.9" style="font-size:70%;">LLM</span> <span id="lstnumberx22.11" style="font-size:70%;">config</span><span id="lstnumberx22.12" style="font-size:70%;">,</span><span id="lstnumberx22.14" style="font-size:70%;">tracer</span><span id="lstnumberx22.15" style="font-size:70%;">,</span><span id="lstnumberx22.17" style="font-size:70%;">verifier</span><span id="lstnumberx22.18" style="font-size:70%;">,</span><span id="lstnumberx22.20" style="font-size:70%;">or</span> <span id="lstnumberx22.22" style="font-size:70%;">any</span> <span id="lstnumberx22.24" style="font-size:70%;">infrastructure</span> </span><span id="lstnumberx23"><span id="lstnumberx23.1" style="font-size:70%;">-</span> <span id="lstnumberx23.3" style="font-size:70%;">Do</span> <span id="lstnumberx23.5" style="font-size:70%;">NOT</span> <span id="lstnumberx23.7" style="font-size:70%;">delete</span> <span id="lstnumberx23.9" style="font-size:70%;">ORIGINAL</span> <span id="lstnumberx23.11" style="font-size:70%;">system</span> <span id="lstnumberx23.13" style="font-size:70%;">prompt</span> <span id="lstnumberx23.15" style="font-size:70%;">rules</span> <span id="lstnumberx23.17" style="font-size:70%;">(</span><span id="lstnumberx23.18" style="font-size:70%;">those</span> <span id="lstnumberx23.20" style="font-size:70%;">in</span> <span id="lstnumberx23.22" style="font-size:70%;">iteration</span> <span id="lstnumberx23.24" style="font-size:70%;">1'</span> <span id="lstnumberx23.25" style="font-size:70%;">s</span> <span id="lstnumberx23.27" style="font-size:70%;">`</span> <span id="lstnumberx23.28" style="font-size:70%;">input</span> <span id="lstnumberx23.29" style="font-size:70%;">/</span> <span id="lstnumberx23.30" style="font-size:70%;">workspace</span> <span id="lstnumberx23.31" style="font-size:70%;">/`)</span> </span><span id="lstnumberx24"><span id="lstnumberx24.1" style="font-size:70%;">-</span> <span id="lstnumberx24.3" style="font-size:70%;">Full</span> <span id="lstnumberx24.5" style="font-size:70%;">safety</span> <span id="lstnumberx24.7" style="font-size:70%;">constraints</span> <span id="lstnumberx24.9" style="font-size:70%;">are</span> <span id="lstnumberx24.11" style="font-size:70%;">at</span> <span id="lstnumberx24.13" style="font-size:70%;">the</span> <span id="lstnumberx24.15" style="font-size:70%;">end</span> <span id="lstnumberx24.17" style="font-size:70%;">of</span> <span id="lstnumberx24.19" style="font-size:70%;">this</span> <span id="lstnumberx24.21" style="font-size:70%;">document</span> </span><span id="lstnumberx26"><span id="lstnumberx26.1" style="font-size:70%;">##</span> <span id="lstnumberx26.3" style="font-size:70%;">2.</span><span id="lstnumberx26.5" style="font-size:70%;">Evidence</span> <span id="lstnumberx26.6" style="font-size:70%;">-</span> <span id="lstnumberx26.7" style="font-size:70%;">Driven</span> </span><span id="lstnumberx28"><span id="lstnumberx28.1" style="font-size:70%;">**</span> <span id="lstnumberx28.2" style="font-size:70%;">Every</span> <span id="lstnumberx28.4" style="font-size:70%;">change</span> <span id="lstnumberx28.6" style="font-size:70%;">must</span> <span id="lstnumberx28.8" style="font-size:70%;">be</span> <span id="lstnumberx28.10" style="font-size:70%;">traceable</span> <span id="lstnumberx28.12" style="font-size:70%;">to</span> <span id="lstnumberx28.14" style="font-size:70%;">specific</span> <span id="lstnumberx28.16" style="font-size:70%;">failure</span> <span id="lstnumberx28.18" style="font-size:70%;">evidence</span><span id="lstnumberx28.19" style="font-size:70%;">.**</span> <span id="lstnumberx28.21" style="font-size:70%;">Do</span> <span id="lstnumberx28.23" style="font-size:70%;">not</span> <span id="lstnumberx28.25" style="font-size:70%;">make</span> <span id="lstnumberx28.27" style="font-size:70%;">changes</span> <span id="lstnumberx28.29" style="font-size:70%;">based</span> <span id="lstnumberx28.31" style="font-size:70%;">on</span> <span id="lstnumberx28.33" style="font-size:70%;">intuition</span><span id="lstnumberx28.34" style="font-size:70%;">,</span><span id="lstnumberx28.36" style="font-size:70%;">speculation</span><span id="lstnumberx28.37" style="font-size:70%;">,</span><span id="lstnumberx28.39" style="font-size:70%;">or</span> <span id="lstnumberx28.41" style="font-size:70%;">"</span> <span id="lstnumberx28.42" style="font-size:70%;">best</span> <span id="lstnumberx28.44" style="font-size:70%;">practices</span> <span id="lstnumberx28.45" style="font-size:70%;">"</span> <span id="lstnumberx28.47" style="font-size:70%;">alone</span><span id="lstnumberx28.48" style="font-size:70%;">.</span></span> <span id="lstnumberx30"><span id="lstnumberx30.1" style="font-size:70%;">**</span> <span id="lstnumberx30.2" style="font-size:70%;">Before</span> <span id="lstnumberx30.4" style="font-size:70%;">making</span> <span id="lstnumberx30.6" style="font-size:70%;">any</span> <span id="lstnumberx30.8" style="font-size:70%;">change</span><span id="lstnumberx30.9" style="font-size:70%;">,</span><span id="lstnumberx30.11" style="font-size:70%;">you</span> <span id="lstnumberx30.13" style="font-size:70%;">must</span> <span id="lstnumberx30.15" style="font-size:70%;">have</span><span id="lstnumberx30.16" style="font-size:70%;">:**</span> </span><span id="lstnumberx31"><span id="lstnumberx31.1" style="font-size:70%;">1.</span><span id="lstnumberx31.3" style="font-size:70%;">**</span> <span id="lstnumberx31.4" style="font-size:70%;">Failure</span> <span id="lstnumberx31.6" style="font-size:70%;">evidence</span> <span id="lstnumberx31.7" style="font-size:70%;">**</span> <span id="lstnumberx31.9" style="font-size:70%;">--</span> <span id="lstnumberx31.11" style="font-size:70%;">which</span> <span id="lstnumberx31.13" style="font-size:70%;">tasks</span> <span id="lstnumberx31.15" style="font-size:70%;">failed</span><span id="lstnumberx31.16" style="font-size:70%;">,</span><span id="lstnumberx31.18" style="font-size:70%;">and</span> <span id="lstnumberx31.20" style="font-size:70%;">what</span> <span id="lstnumberx31.22" style="font-size:70%;">specifically</span> <span id="lstnumberx31.24" style="font-size:70%;">went</span> <span id="lstnumberx31.26" style="font-size:70%;">wrong</span> <span id="lstnumberx31.28" style="font-size:70%;">(</span><span id="lstnumberx31.29" style="font-size:70%;">from</span> <span id="lstnumberx31.31" style="font-size:70%;">analysis</span> <span id="lstnumberx31.33" style="font-size:70%;">reports</span> <span id="lstnumberx31.35" style="font-size:70%;">or</span> <span id="lstnumberx31.37" style="font-size:70%;">traces</span><span id="lstnumberx31.38" style="font-size:70%;">)</span> </span><span id="lstnumberx32"><span id="lstnumberx32.1" style="font-size:70%;">2.</span><span id="lstnumberx32.3" style="font-size:70%;">**</span> <span id="lstnumberx32.4" style="font-size:70%;">Root</span> <span id="lstnumberx32.6" style="font-size:70%;">cause</span> <span id="lstnumberx32.7" style="font-size:70%;">**</span> <span id="lstnumberx32.9" style="font-size:70%;">--</span> <span id="lstnumberx32.11" style="font-size:70%;">why</span> <span id="lstnumberx32.13" style="font-size:70%;">it</span> <span id="lstnumberx32.15" style="font-size:70%;">failed</span><span id="lstnumberx32.16" style="font-size:70%;">,</span><span id="lstnumberx32.18" style="font-size:70%;">not</span> <span id="lstnumberx32.20" style="font-size:70%;">just</span> <span id="lstnumberx32.22" style="font-size:70%;">what</span> <span id="lstnumberx32.24" style="font-size:70%;">failed</span> </span><span id="lstnumberx33"><span id="lstnumberx33.1" style="font-size:70%;">3.</span><span id="lstnumberx33.3" style="font-size:70%;">**</span> <span id="lstnumberx33.4" style="font-size:70%;">Targeted</span> <span id="lstnumberx33.6" style="font-size:70%;">fix</span> <span id="lstnumberx33.7" style="font-size:70%;">**</span> <span id="lstnumberx33.9" style="font-size:70%;">--</span> <span id="lstnumberx33.11" style="font-size:70%;">a</span> <span id="lstnumberx33.13" style="font-size:70%;">change</span> <span id="lstnumberx33.15" style="font-size:70%;">that</span> <span id="lstnumberx33.17" style="font-size:70%;">directly</span> <span id="lstnumberx33.19" style="font-size:70%;">addresses</span> <span id="lstnumberx33.21" style="font-size:70%;">the</span> <span id="lstnumberx33.23" style="font-size:70%;">root</span> <span id="lstnumberx33.25" style="font-size:70%;">cause</span> </span><span id="lstnumberx34"><span id="lstnumberx34.1" style="font-size:70%;">4.</span><span id="lstnumberx34.3" style="font-size:70%;">**</span> <span id="lstnumberx34.4" style="font-size:70%;">Predicted</span> <span id="lstnumberx34.6" style="font-size:70%;">impact</span> <span id="lstnumberx34.7" style="font-size:70%;">**</span> <span id="lstnumberx34.9" style="font-size:70%;">--</span> <span id="lstnumberx34.11" style="font-size:70%;">which</span> <span id="lstnumberx34.13" style="font-size:70%;">tasks</span> <span id="lstnumberx34.15" style="font-size:70%;">this</span> <span id="lstnumberx34.17" style="font-size:70%;">should</span> <span id="lstnumberx34.19" style="font-size:70%;">fix</span><span id="lstnumberx34.20" style="font-size:70%;">,</span><span id="lstnumberx34.22" style="font-size:70%;">and</span> <span id="lstnumberx34.24" style="font-size:70%;">which</span> <span id="lstnumberx34.26" style="font-size:70%;">tasks</span> <span id="lstnumberx34.28" style="font-size:70%;">might</span> <span id="lstnumberx34.30" style="font-size:70%;">be</span> <span id="lstnumberx34.32" style="font-size:70%;">at</span> <span id="lstnumberx34.34" style="font-size:70%;">risk</span> </span><span id="lstnumberx37"><span id="lstnumberx37.1" style="font-size:70%;">#</span> <span id="lstnumberx37.3" style="font-size:70%;">Environment</span> </span><span id="lstnumberx39"><span id="lstnumberx39.1" style="font-size:70%;">{%</span> <span id="lstnumberx39.3" style="font-size:70%;">if</span> <span id="lstnumberx39.5" style="font-size:70%;">ws</span><span id="lstnumberx39.7" style="font-size:70%;">!=</span> <span id="lstnumberx39.9" style="font-size:70%;">"</span> <span id="lstnumberx39.10" style="font-size:70%;">workspace</span> <span id="lstnumberx39.11" style="font-size:70%;">"</span> <span id="lstnumberx39.13" style="font-size:70%;">%}</span> </span><span id="lstnumberx40"><span id="lstnumberx40.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx40.3" style="font-size:70%;">**</span> <span id="lstnumberx40.4" style="font-size:70%;">WORKSPACE</span> <span id="lstnumberx40.6" style="font-size:70%;">PATH</span> <span id="lstnumberx40.7" style="font-size:70%;">**:</span><span id="lstnumberx40.9" style="font-size:70%;">Your</span> <span id="lstnumberx40.11" style="font-size:70%;">workspace</span> <span id="lstnumberx40.13" style="font-size:70%;">is</span> <span id="lstnumberx40.15" style="font-size:70%;">at</span> <span id="lstnumberx40.17" style="font-size:70%;">`{{</span> <span id="lstnumberx40.19" style="font-size:70%;">ws</span> <span id="lstnumberx40.21" style="font-size:70%;">}}/`</span> <span id="lstnumberx40.23" style="font-size:70%;">instead</span> <span id="lstnumberx40.25" style="font-size:70%;">of</span> <span id="lstnumberx40.27" style="font-size:70%;">`</span> <span id="lstnumberx40.28" style="font-size:70%;">workspace</span> <span id="lstnumberx40.29" style="font-size:70%;">/`.</span><span id="lstnumberx40.31" style="font-size:70%;">All</span> <span id="lstnumberx40.33" style="font-size:70%;">`</span> <span id="lstnumberx40.34" style="font-size:70%;">workspace</span> <span id="lstnumberx40.35" style="font-size:70%;">/`</span> <span id="lstnumberx40.37" style="font-size:70%;">references</span> <span id="lstnumberx40.39" style="font-size:70%;">below</span> <span id="lstnumberx40.41" style="font-size:70%;">apply</span> <span id="lstnumberx40.43" style="font-size:70%;">to</span> <span id="lstnumberx40.45" style="font-size:70%;">`{{</span> <span id="lstnumberx40.47" style="font-size:70%;">ws</span> <span id="lstnumberx40.49" style="font-size:70%;">}}/`.</span><span id="lstnumberx40.51" style="font-size:70%;">Use</span> <span id="lstnumberx40.53" style="font-size:70%;">`{{</span> <span id="lstnumberx40.55" style="font-size:70%;">ws</span> <span id="lstnumberx40.57" style="font-size:70%;">}}/`</span> <span id="lstnumberx40.59" style="font-size:70%;">in</span> <span id="lstnumberx40.61" style="font-size:70%;">file</span> <span id="lstnumberx40.63" style="font-size:70%;">operations</span><span id="lstnumberx40.64" style="font-size:70%;">,</span><span id="lstnumberx40.66" style="font-size:70%;">git</span> <span id="lstnumberx40.68" style="font-size:70%;">commands</span><span id="lstnumberx40.69" style="font-size:70%;">,</span><span id="lstnumberx40.71" style="font-size:70%;">and</span> <span id="lstnumberx40.73" style="font-size:70%;">the</span> <span id="lstnumberx40.75" style="font-size:70%;">validation</span> <span id="lstnumberx40.77" style="font-size:70%;">command</span><span id="lstnumberx40.78" style="font-size:70%;">.</span></span> <span id="lstnumberx41"><span id="lstnumberx41.1" style="font-size:70%;">{%</span> <span id="lstnumberx41.3" style="font-size:70%;">endif</span> <span id="lstnumberx41.5" style="font-size:70%;">%}</span> </span><span id="lstnumberx43"><span id="lstnumberx43.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx43.3" style="font-size:70%;">**</span> <span id="lstnumberx43.4" style="font-size:70%;">Loop</span> <span id="lstnumberx43.6" style="font-size:70%;">convention</span> <span id="lstnumberx43.8" style="font-size:70%;">(</span><span id="lstnumberx43.9" style="font-size:70%;">IMPORTANT</span> <span id="lstnumberx43.11" style="font-size:70%;">--</span> <span id="lstnumberx43.13" style="font-size:70%;">read</span> <span id="lstnumberx43.15" style="font-size:70%;">before</span> <span id="lstnumberx43.17" style="font-size:70%;">analyzing</span> <span id="lstnumberx43.19" style="font-size:70%;">`</span> <span id="lstnumberx43.20" style="font-size:70%;">runs</span> <span id="lstnumberx43.21" style="font-size:70%;">/`):**</span> </span><span id="lstnumberx44"><span id="lstnumberx44.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx44.3" style="font-size:70%;">You</span> <span id="lstnumberx44.5" style="font-size:70%;">are</span> <span id="lstnumberx44.7" style="font-size:70%;">currently</span> <span id="lstnumberx44.9" style="font-size:70%;">in</span> <span id="lstnumberx44.11" style="font-size:70%;">loop</span> <span id="lstnumberx44.13" style="font-size:70%;">**</span> <span id="lstnumberx44.14" style="font-size:70%;">iteration</span> <span id="lstnumberx44.16" style="font-size:70%;">`{{</span> <span id="lstnumberx44.18" style="font-size:70%;">iteration</span> <span id="lstnumberx44.20" style="font-size:70%;">}}`**.</span><span id="lstnumberx44.22" style="font-size:70%;">Each</span> <span id="lstnumberx44.24" style="font-size:70%;">`</span> <span id="lstnumberx44.25" style="font-size:70%;">runs</span> <span id="lstnumberx44.26" style="font-size:70%;">/</span> <span id="lstnumberx44.27" style="font-size:70%;">iteration_NNN</span> <span id="lstnumberx44.28" style="font-size:70%;">/`</span> <span id="lstnumberx44.30" style="font-size:70%;">folder</span> <span id="lstnumberx44.32" style="font-size:70%;">mixes</span> <span id="lstnumberx44.34" style="font-size:70%;">**</span> <span id="lstnumberx44.35" style="font-size:70%;">two</span> <span id="lstnumberx44.36" style="font-size:70%;">**</span> <span id="lstnumberx44.38" style="font-size:70%;">generations</span> <span id="lstnumberx44.40" style="font-size:70%;">of</span> <span id="lstnumberx44.42" style="font-size:70%;">work</span><span id="lstnumberx44.43" style="font-size:70%;">:</span></span> <span id="lstnumberx45"><span id="lstnumberx45.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx45.3" style="font-size:70%;">-</span> <span id="lstnumberx45.5" style="font-size:70%;">`</span> <span id="lstnumberx45.6" style="font-size:70%;">input</span> <span id="lstnumberx45.7" style="font-size:70%;">/`</span> <span id="lstnumberx45.9" style="font-size:70%;">holds</span> <span id="lstnumberx45.11" style="font-size:70%;">what</span> <span id="lstnumberx45.13" style="font-size:70%;">**</span> <span id="lstnumberx45.14" style="font-size:70%;">the</span> <span id="lstnumberx45.16" style="font-size:70%;">previous</span> <span id="lstnumberx45.18" style="font-size:70%;">loop</span> <span id="lstnumberx45.20" style="font-size:70%;">(</span><span id="lstnumberx45.21" style="font-size:70%;">NNN</span> <span id="lstnumberx45.22" style="font-size:70%;">-1)**</span> <span id="lstnumberx45.24" style="font-size:70%;">produced</span> <span id="lstnumberx45.26" style="font-size:70%;">--</span> <span id="lstnumberx45.28" style="font-size:70%;">this</span> <span id="lstnumberx45.30" style="font-size:70%;">is</span> <span id="lstnumberx45.32" style="font-size:70%;">the</span> <span id="lstnumberx45.34" style="font-size:70%;">workspace</span> <span id="lstnumberx45.36" style="font-size:70%;">that</span> <span id="lstnumberx45.38" style="font-size:70%;">was</span> <span id="lstnumberx45.40" style="font-size:70%;">just</span> <span id="lstnumberx45.42" style="font-size:70%;">evaluated</span> <span id="lstnumberx45.44" style="font-size:70%;">this</span> <span id="lstnumberx45.46" style="font-size:70%;">loop</span><span id="lstnumberx45.47" style="font-size:70%;">.</span><span id="lstnumberx45.49" style="font-size:70%;">The</span> <span id="lstnumberx45.51" style="font-size:70%;">benchmark</span><span id="lstnumberx45.52" style="font-size:70%;">,</span><span id="lstnumberx45.54" style="font-size:70%;">analysis</span><span id="lstnumberx45.55" style="font-size:70%;">,</span><span id="lstnumberx45.57" style="font-size:70%;">and</span> <span id="lstnumberx45.59" style="font-size:70%;">change_evaluation</span> <span id="lstnumberx45.61" style="font-size:70%;">inside</span> <span id="lstnumberx45.63" style="font-size:70%;">`</span> <span id="lstnumberx45.64" style="font-size:70%;">input</span> <span id="lstnumberx45.65" style="font-size:70%;">/`</span> <span id="lstnumberx45.67" style="font-size:70%;">all</span> <span id="lstnumberx45.69" style="font-size:70%;">describe</span> <span id="lstnumberx45.71" style="font-size:70%;">the</span> <span id="lstnumberx45.73" style="font-size:70%;">**</span> <span id="lstnumberx45.74" style="font-size:70%;">previous</span> <span id="lstnumberx45.76" style="font-size:70%;">loop</span> <span id="lstnumberx45.77" style="font-size:70%;">'</span> <span id="lstnumberx45.78" style="font-size:70%;">s</span> <span id="lstnumberx45.79" style="font-size:70%;">**</span> <span id="lstnumberx45.81" style="font-size:70%;">changes</span><span id="lstnumberx45.82" style="font-size:70%;">,</span><span id="lstnumberx45.84" style="font-size:70%;">not</span> <span id="lstnumberx45.86" style="font-size:70%;">yours</span><span id="lstnumberx45.87" style="font-size:70%;">.</span></span> <span id="lstnumberx46"><span id="lstnumberx46.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx46.3" style="font-size:70%;">-</span> <span id="lstnumberx46.5" style="font-size:70%;">`</span> <span id="lstnumberx46.6" style="font-size:70%;">evolve</span> <span id="lstnumberx46.7" style="font-size:70%;">/`</span> <span id="lstnumberx46.9" style="font-size:70%;">holds</span> <span id="lstnumberx46.11" style="font-size:70%;">what</span> <span id="lstnumberx46.13" style="font-size:70%;">**</span> <span id="lstnumberx46.14" style="font-size:70%;">this</span> <span id="lstnumberx46.16" style="font-size:70%;">loop</span> <span id="lstnumberx46.18" style="font-size:70%;">(</span><span id="lstnumberx46.19" style="font-size:70%;">NNN</span><span id="lstnumberx46.20" style="font-size:70%;">)**</span> <span id="lstnumberx46.22" style="font-size:70%;">will</span> <span id="lstnumberx46.24" style="font-size:70%;">produce</span> <span id="lstnumberx46.26" style="font-size:70%;">--</span> <span id="lstnumberx46.28" style="font-size:70%;">your</span> <span id="lstnumberx46.30" style="font-size:70%;">new</span> <span id="lstnumberx46.32" style="font-size:70%;">changes</span><span id="lstnumberx46.33" style="font-size:70%;">,</span><span id="lstnumberx46.35" style="font-size:70%;">which</span> <span id="lstnumberx46.37" style="font-size:70%;">the</span> <span id="lstnumberx46.39" style="font-size:70%;">next</span> <span id="lstnumberx46.41" style="font-size:70%;">loop</span> <span id="lstnumberx46.43" style="font-size:70%;">(</span><span id="lstnumberx46.44" style="font-size:70%;">NNN</span> <span id="lstnumberx46.45" style="font-size:70%;">+1)</span> <span id="lstnumberx46.47" style="font-size:70%;">will</span> <span id="lstnumberx46.49" style="font-size:70%;">evaluate</span><span id="lstnumberx46.50" style="font-size:70%;">.</span></span> <span id="lstnumberx47"><span id="lstnumberx47.1" style="font-size:70%;">&gt;</span> </span><span id="lstnumberx48"><span id="lstnumberx48.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx48.3" style="font-size:70%;">Concretely</span><span id="lstnumberx48.4" style="font-size:70%;">:</span><span id="lstnumberx48.6" style="font-size:70%;">when</span> <span id="lstnumberx48.8" style="font-size:70%;">your</span> <span id="lstnumberx48.10" style="font-size:70%;">query</span> <span id="lstnumberx48.12" style="font-size:70%;">says</span> <span id="lstnumberx48.14" style="font-size:70%;">"</span> <span id="lstnumberx48.15" style="font-size:70%;">Iteration</span> <span id="lstnumberx48.17" style="font-size:70%;">{{</span> <span id="lstnumberx48.19" style="font-size:70%;">iteration</span> <span id="lstnumberx48.21" style="font-size:70%;">}}</span> <span id="lstnumberx48.23" style="font-size:70%;">evaluation</span> <span id="lstnumberx48.25" style="font-size:70%;">completed</span> <span id="lstnumberx48.26" style="font-size:70%;">",</span><span id="lstnumberx48.28" style="font-size:70%;">it</span> <span id="lstnumberx48.30" style="font-size:70%;">means</span> <span id="lstnumberx48.32" style="font-size:70%;">the</span> <span id="lstnumberx48.34" style="font-size:70%;">eval</span> <span id="lstnumberx48.36" style="font-size:70%;">of</span> <span id="lstnumberx48.38" style="font-size:70%;">**</span> <span id="lstnumberx48.39" style="font-size:70%;">iteration</span> <span id="lstnumberx48.41" style="font-size:70%;">{{</span> <span id="lstnumberx48.43" style="font-size:70%;">iteration</span> <span id="lstnumberx48.45" style="font-size:70%;">-</span> <span id="lstnumberx48.47" style="font-size:70%;">1</span> <span id="lstnumberx48.49" style="font-size:70%;">}}'</span> <span id="lstnumberx48.50" style="font-size:70%;">s</span> <span id="lstnumberx48.52" style="font-size:70%;">changes</span> <span id="lstnumberx48.53" style="font-size:70%;">**</span> <span id="lstnumberx48.55" style="font-size:70%;">is</span> <span id="lstnumberx48.57" style="font-size:70%;">done</span> <span id="lstnumberx48.59" style="font-size:70%;">(</span><span id="lstnumberx48.60" style="font-size:70%;">baseline</span> <span id="lstnumberx48.62" style="font-size:70%;">if</span> <span id="lstnumberx48.64" style="font-size:70%;">`{{</span> <span id="lstnumberx48.66" style="font-size:70%;">iteration</span> <span id="lstnumberx48.68" style="font-size:70%;">}}`</span> <span id="lstnumberx48.70" style="font-size:70%;">=</span> <span id="lstnumberx48.72" style="font-size:70%;">1).</span><span id="lstnumberx48.74" style="font-size:70%;">You</span> <span id="lstnumberx48.76" style="font-size:70%;">are</span> <span id="lstnumberx48.78" style="font-size:70%;">now</span> <span id="lstnumberx48.80" style="font-size:70%;">making</span> <span id="lstnumberx48.82" style="font-size:70%;">changes</span> <span id="lstnumberx48.84" style="font-size:70%;">that</span> <span id="lstnumberx48.86" style="font-size:70%;">will</span> <span id="lstnumberx48.88" style="font-size:70%;">be</span> <span id="lstnumberx48.90" style="font-size:70%;">labeled</span> <span id="lstnumberx48.92" style="font-size:70%;">iteration</span> <span id="lstnumberx48.94" style="font-size:70%;">`{{</span> <span id="lstnumberx48.96" style="font-size:70%;">iteration</span> <span id="lstnumberx48.98" style="font-size:70%;">}}`</span> <span id="lstnumberx48.100" style="font-size:70%;">and</span> <span id="lstnumberx48.102" style="font-size:70%;">evaluated</span> <span id="lstnumberx48.104" style="font-size:70%;">next</span> <span id="lstnumberx48.106" style="font-size:70%;">loop</span><span id="lstnumberx48.107" style="font-size:70%;">.</span></span> <span id="lstnumberx50"><span id="lstnumberx50.1" style="font-size:70%;">```</span> </span><span id="lstnumberx51"><span id="lstnumberx51.1" style="font-size:70%;">./</span> <span id="lstnumberx51.3" style="font-size:70%;">#</span> <span id="lstnumberx51.5" style="font-size:70%;">work_dir</span> <span id="lstnumberx51.7" style="font-size:70%;">=</span> <span id="lstnumberx51.9" style="font-size:70%;">experiment</span> <span id="lstnumberx51.11" style="font-size:70%;">root</span> </span><span id="lstnumberx52"><span id="lstnumberx52.1" style="font-size:70%;">|--</span> <span id="lstnumberx52.3" style="font-size:70%;">{{</span> <span id="lstnumberx52.5" style="font-size:70%;">ws</span> <span id="lstnumberx52.7" style="font-size:70%;">}}/</span> <span id="lstnumberx52.9" style="font-size:70%;">#</span> <span id="lstnumberx52.11" style="font-size:70%;">*</span> <span id="lstnumberx52.13" style="font-size:70%;">MODIFY</span> <span id="lstnumberx52.15" style="font-size:70%;">these</span> <span id="lstnumberx52.17" style="font-size:70%;">files</span> </span><span id="lstnumberx53"><span id="lstnumberx53.1" style="font-size:70%;">|</span> <span id="lstnumberx53.3" style="font-size:70%;">|--</span> <span id="lstnumberx53.5" style="font-size:70%;">code_agent</span><span id="lstnumberx53.6" style="font-size:70%;">.</span><span id="lstnumberx53.7" style="font-size:70%;">yaml</span> <span id="lstnumberx53.9" style="font-size:70%;">#</span> <span id="lstnumberx53.11" style="font-size:70%;">Agent</span> <span id="lstnumberx53.13" style="font-size:70%;">config</span> <span id="lstnumberx53.15" style="font-size:70%;">(</span><span id="lstnumberx53.16" style="font-size:70%;">tools</span><span id="lstnumberx53.17" style="font-size:70%;">,</span><span id="lstnumberx53.19" style="font-size:70%;">middleware</span><span id="lstnumberx53.20" style="font-size:70%;">,</span><span id="lstnumberx53.22" style="font-size:70%;">params</span><span id="lstnumberx53.23" style="font-size:70%;">,</span><span id="lstnumberx53.25" style="font-size:70%;">sub</span> <span id="lstnumberx53.26" style="font-size:70%;">-</span> <span id="lstnumberx53.27" style="font-size:70%;">agents</span><span id="lstnumberx53.28" style="font-size:70%;">)</span> </span><span id="lstnumberx54"><span id="lstnumberx54.1" style="font-size:70%;">|</span> <span id="lstnumberx54.3" style="font-size:70%;">|--</span> <span id="lstnumberx54.5" style="font-size:70%;">systemprompt</span><span id="lstnumberx54.6" style="font-size:70%;">.</span><span id="lstnumberx54.7" style="font-size:70%;">md</span> <span id="lstnumberx54.9" style="font-size:70%;">#</span> <span id="lstnumberx54.11" style="font-size:70%;">System</span> <span id="lstnumberx54.13" style="font-size:70%;">prompt</span> <span id="lstnumberx54.15" style="font-size:70%;">(</span><span id="lstnumberx54.16" style="font-size:70%;">Jinja</span> <span id="lstnumberx54.18" style="font-size:70%;">template</span><span id="lstnumberx54.19" style="font-size:70%;">)</span> </span><span id="lstnumberx55"><span id="lstnumberx55.1" style="font-size:70%;">|</span> <span id="lstnumberx55.3" style="font-size:70%;">|--</span> <span id="lstnumberx55.5" style="font-size:70%;">LongTermMEMORY</span><span id="lstnumberx55.6" style="font-size:70%;">.</span><span id="lstnumberx55.7" style="font-size:70%;">md</span> <span id="lstnumberx55.9" style="font-size:70%;">#</span> <span id="lstnumberx55.11" style="font-size:70%;">Long</span> <span id="lstnumberx55.12" style="font-size:70%;">-</span> <span id="lstnumberx55.13" style="font-size:70%;">term</span> <span id="lstnumberx55.15" style="font-size:70%;">memory</span> <span id="lstnumberx55.17" style="font-size:70%;">(</span><span id="lstnumberx55.18" style="font-size:70%;">persistent</span> <span id="lstnumberx55.20" style="font-size:70%;">cross</span> <span id="lstnumberx55.21" style="font-size:70%;">-</span> <span id="lstnumberx55.22" style="font-size:70%;">session</span> <span id="lstnumberx55.24" style="font-size:70%;">knowledge</span><span id="lstnumberx55.25" style="font-size:70%;">,</span><span id="lstnumberx55.27" style="font-size:70%;">MODIFIABLE</span><span id="lstnumberx55.28" style="font-size:70%;">)</span> </span><span id="lstnumberx56"><span id="lstnumberx56.1" style="font-size:70%;">|</span> <span id="lstnumberx56.3" style="font-size:70%;">|--</span> <span id="lstnumberx56.5" style="font-size:70%;">ShortTermMEMORY</span><span id="lstnumberx56.6" style="font-size:70%;">.</span><span id="lstnumberx56.7" style="font-size:70%;">md</span> <span id="lstnumberx56.9" style="font-size:70%;">#</span> <span id="lstnumberx56.11" style="font-size:70%;">Short</span> <span id="lstnumberx56.12" style="font-size:70%;">-</span> <span id="lstnumberx56.13" style="font-size:70%;">term</span> <span id="lstnumberx56.15" style="font-size:70%;">memory</span> <span id="lstnumberx56.17" style="font-size:70%;">(</span><span id="lstnumberx56.18" style="font-size:70%;">managed</span> <span id="lstnumberx56.20" style="font-size:70%;">by</span> <span id="lstnumberx56.22" style="font-size:70%;">code</span> <span id="lstnumberx56.24" style="font-size:70%;">agent</span> <span id="lstnumberx56.26" style="font-size:70%;">at</span> <span id="lstnumberx56.28" style="font-size:70%;">runtime</span><span id="lstnumberx56.29" style="font-size:70%;">,</span><span id="lstnumberx56.31" style="font-size:70%;">DO</span> <span id="lstnumberx56.33" style="font-size:70%;">NOT</span> <span id="lstnumberx56.35" style="font-size:70%;">MODIFY</span><span id="lstnumberx56.36" style="font-size:70%;">)</span> </span><span id="lstnumberx57"><span id="lstnumberx57.1" style="font-size:70%;">|</span> <span id="lstnumberx57.3" style="font-size:70%;">|--</span> <span id="lstnumberx57.5" style="font-size:70%;">tool_descriptions</span> <span id="lstnumberx57.6" style="font-size:70%;">/</span> <span id="lstnumberx57.8" style="font-size:70%;">#</span> <span id="lstnumberx57.10" style="font-size:70%;">Tool</span> <span id="lstnumberx57.12" style="font-size:70%;">YAML</span> <span id="lstnumberx57.14" style="font-size:70%;">definitions</span> </span><span id="lstnumberx58"><span id="lstnumberx58.1" style="font-size:70%;">|</span> <span id="lstnumberx58.3" style="font-size:70%;">|--</span> <span id="lstnumberx58.5" style="font-size:70%;">tools</span> <span id="lstnumberx58.6" style="font-size:70%;">/</span> <span id="lstnumberx58.8" style="font-size:70%;">#</span> <span id="lstnumberx58.10" style="font-size:70%;">Tool</span> <span id="lstnumberx58.12" style="font-size:70%;">Python</span> <span id="lstnumberx58.14" style="font-size:70%;">implementations</span> </span><span id="lstnumberx59"><span id="lstnumberx59.1" style="font-size:70%;">|</span> <span id="lstnumberx59.3" style="font-size:70%;">|--</span> <span id="lstnumberx59.5" style="font-size:70%;">middleware</span> <span id="lstnumberx59.6" style="font-size:70%;">/</span> <span id="lstnumberx59.8" style="font-size:70%;">#</span> <span id="lstnumberx59.10" style="font-size:70%;">Middleware</span> <span id="lstnumberx59.12" style="font-size:70%;">Python</span> <span id="lstnumberx59.14" style="font-size:70%;">implementations</span> </span><span id="lstnumberx60"><span id="lstnumberx60.1" style="font-size:70%;">|</span> <span id="lstnumberx60.3" style="font-size:70%;">|--</span> <span id="lstnumberx60.5" style="font-size:70%;">skills</span> <span id="lstnumberx60.6" style="font-size:70%;">/</span> <span id="lstnumberx60.8" style="font-size:70%;">#</span> <span id="lstnumberx60.10" style="font-size:70%;">Skill</span> <span id="lstnumberx60.12" style="font-size:70%;">packages</span> </span><span id="lstnumberx61"><span id="lstnumberx61.1" style="font-size:70%;">|</span> <span id="lstnumberx61.3" style="font-size:70%;">`--</span> <span id="lstnumberx61.5" style="font-size:70%;">sub_agents</span> <span id="lstnumberx61.6" style="font-size:70%;">/</span> <span id="lstnumberx61.8" style="font-size:70%;">#</span> <span id="lstnumberx61.10" style="font-size:70%;">Sub</span> <span id="lstnumberx61.11" style="font-size:70%;">-</span> <span id="lstnumberx61.12" style="font-size:70%;">agent</span> <span id="lstnumberx61.14" style="font-size:70%;">configs</span> <span id="lstnumberx61.16" style="font-size:70%;">(</span><span id="lstnumberx61.17" style="font-size:70%;">optional</span><span id="lstnumberx61.18" style="font-size:70%;">,</span><span id="lstnumberx61.20" style="font-size:70%;">you</span> <span id="lstnumberx61.22" style="font-size:70%;">may</span> <span id="lstnumberx61.24" style="font-size:70%;">create</span><span id="lstnumberx61.25" style="font-size:70%;">)</span> </span><span id="lstnumberx62"><span id="lstnumberx62.1" style="font-size:70%;">|</span> </span><span id="lstnumberx63"><span id="lstnumberx63.1" style="font-size:70%;">|--</span> <span id="lstnumberx63.3" style="font-size:70%;">runs</span> <span id="lstnumberx63.4" style="font-size:70%;">/</span> <span id="lstnumberx63.6" style="font-size:70%;">#</span> <span id="lstnumberx63.8" style="font-size:70%;">*</span> <span id="lstnumberx63.10" style="font-size:70%;">READ</span> <span id="lstnumberx63.12" style="font-size:70%;">ONLY</span> </span><span id="lstnumberx64"><span id="lstnumberx64.1" style="font-size:70%;">|</span> <span id="lstnumberx64.3" style="font-size:70%;">`--</span> <span id="lstnumberx64.5" style="font-size:70%;">iteration_NNN</span> <span id="lstnumberx64.6" style="font-size:70%;">/</span> </span><span id="lstnumberx65"><span id="lstnumberx65.1" style="font-size:70%;">|</span> <span id="lstnumberx65.3" style="font-size:70%;">|--</span> <span id="lstnumberx65.5" style="font-size:70%;">input</span> <span id="lstnumberx65.6" style="font-size:70%;">/</span> <span id="lstnumberx65.8" style="font-size:70%;">#</span> <span id="lstnumberx65.10" style="font-size:70%;">Everything</span> <span id="lstnumberx65.12" style="font-size:70%;">this</span> <span id="lstnumberx65.14" style="font-size:70%;">iteration</span> <span id="lstnumberx65.16" style="font-size:70%;">starts</span> <span id="lstnumberx65.18" style="font-size:70%;">with</span> </span><span id="lstnumberx66"><span id="lstnumberx66.1" style="font-size:70%;">|</span> <span id="lstnumberx66.3" style="font-size:70%;">|</span> <span id="lstnumberx66.5" style="font-size:70%;">|--</span> <span id="lstnumberx66.7" style="font-size:70%;">workspace</span> <span id="lstnumberx66.8" style="font-size:70%;">/</span> <span id="lstnumberx66.10" style="font-size:70%;">#</span> <span id="lstnumberx66.12" style="font-size:70%;">Workspace</span> <span id="lstnumberx66.14" style="font-size:70%;">being</span> <span id="lstnumberx66.16" style="font-size:70%;">evaluated</span> <span id="lstnumberx66.18" style="font-size:70%;">this</span> <span id="lstnumberx66.20" style="font-size:70%;">loop</span> </span><span id="lstnumberx67"><span id="lstnumberx67.1" style="font-size:70%;">|</span> <span id="lstnumberx67.3" style="font-size:70%;">|</span> <span id="lstnumberx67.5" style="font-size:70%;">|--</span> <span id="lstnumberx67.7" style="font-size:70%;">benchmark</span> <span id="lstnumberx67.8" style="font-size:70%;">/</span> <span id="lstnumberx67.10" style="font-size:70%;">#</span> <span id="lstnumberx67.12" style="font-size:70%;">Eval</span> <span id="lstnumberx67.14" style="font-size:70%;">results</span> <span id="lstnumberx67.16" style="font-size:70%;">for</span> <span id="lstnumberx67.18" style="font-size:70%;">the</span> <span id="lstnumberx67.20" style="font-size:70%;">workspace</span> <span id="lstnumberx67.22" style="font-size:70%;">above</span> </span><span id="lstnumberx68"><span id="lstnumberx68.1" style="font-size:70%;">|</span> <span id="lstnumberx68.3" style="font-size:70%;">|</span> <span id="lstnumberx68.5" style="font-size:70%;">|</span> <span id="lstnumberx68.7" style="font-size:70%;">`--</span> <span id="lstnumberx68.9" style="font-size:70%;">{</span> <span id="lstnumberx68.10" style="font-size:70%;">timestamp</span> <span id="lstnumberx68.11" style="font-size:70%;">}/</span> </span><span id="lstnumberx69"><span id="lstnumberx69.1" style="font-size:70%;">|</span> <span id="lstnumberx69.3" style="font-size:70%;">|</span> <span id="lstnumberx69.5" style="font-size:70%;">|</span> <span id="lstnumberx69.7" style="font-size:70%;">|--</span> <span id="lstnumberx69.9" style="font-size:70%;">result</span><span id="lstnumberx69.10" style="font-size:70%;">.</span><span id="lstnumberx69.11" style="font-size:70%;">json</span> </span><span id="lstnumberx70"><span id="lstnumberx70.1" style="font-size:70%;">|</span> <span id="lstnumberx70.3" style="font-size:70%;">|</span> <span id="lstnumberx70.5" style="font-size:70%;">|</span> <span id="lstnumberx70.7" style="font-size:70%;">`--</span> <span id="lstnumberx70.9" style="font-size:70%;">{</span> <span id="lstnumberx70.10" style="font-size:70%;">task_name</span> <span id="lstnumberx70.11" style="font-size:70%;">}</span> <span id="lstnumberx70.12" style="font-size:70%;">__</span> <span id="lstnumberx70.13" style="font-size:70%;">{</span> <span id="lstnumberx70.14" style="font-size:70%;">id</span> <span id="lstnumberx70.15" style="font-size:70%;">}/</span> </span><span id="lstnumberx71"><span id="lstnumberx71.1" style="font-size:70%;">|</span> <span id="lstnumberx71.3" style="font-size:70%;">|</span> <span id="lstnumberx71.5" style="font-size:70%;">|</span> <span id="lstnumberx71.7" style="font-size:70%;">|--</span> <span id="lstnumberx71.9" style="font-size:70%;">agent</span> <span id="lstnumberx71.10" style="font-size:70%;">/</span> <span id="lstnumberx71.11" style="font-size:70%;">nexau</span><span id="lstnumberx71.12" style="font-size:70%;">.</span><span id="lstnumberx71.13" style="font-size:70%;">txt</span> </span><span id="lstnumberx72"><span id="lstnumberx72.1" style="font-size:70%;">|</span> <span id="lstnumberx72.3" style="font-size:70%;">|</span> <span id="lstnumberx72.5" style="font-size:70%;">|</span> <span id="lstnumberx72.7" style="font-size:70%;">|--</span> <span id="lstnumberx72.9" style="font-size:70%;">agent</span> <span id="lstnumberx72.10" style="font-size:70%;">/</span> <span id="lstnumberx72.11" style="font-size:70%;">nexau_in_memory_tracer</span><span id="lstnumberx72.12" style="font-size:70%;">.</span><span id="lstnumberx72.13" style="font-size:70%;">cleaned</span><span id="lstnumberx72.14" style="font-size:70%;">.</span><span id="lstnumberx72.15" style="font-size:70%;">json</span> </span><span id="lstnumberx73"><span id="lstnumberx73.1" style="font-size:70%;">|</span> <span id="lstnumberx73.3" style="font-size:70%;">|</span> <span id="lstnumberx73.5" style="font-size:70%;">|</span> <span id="lstnumberx73.7" style="font-size:70%;">`--</span> <span id="lstnumberx73.9" style="font-size:70%;">verifier</span> <span id="lstnumberx73.10" style="font-size:70%;">/</span> <span id="lstnumberx73.11" style="font-size:70%;">reward</span><span id="lstnumberx73.12" style="font-size:70%;">.</span><span id="lstnumberx73.13" style="font-size:70%;">txt</span> </span><span id="lstnumberx74"><span id="lstnumberx74.1" style="font-size:70%;">|</span> <span id="lstnumberx74.3" style="font-size:70%;">|</span> <span id="lstnumberx74.5" style="font-size:70%;">|--</span> <span id="lstnumberx74.7" style="font-size:70%;">analysis</span> <span id="lstnumberx74.8" style="font-size:70%;">/</span> <span id="lstnumberx74.10" style="font-size:70%;">#</span> <span id="lstnumberx74.12" style="font-size:70%;">**</span> <span id="lstnumberx74.14" style="font-size:70%;">Pre</span> <span id="lstnumberx74.15" style="font-size:70%;">-</span> <span id="lstnumberx74.16" style="font-size:70%;">built</span> <span id="lstnumberx74.18" style="font-size:70%;">failure</span> <span id="lstnumberx74.19" style="font-size:70%;">/</span> <span id="lstnumberx74.20" style="font-size:70%;">success</span> <span id="lstnumberx74.22" style="font-size:70%;">analysis</span> <span id="lstnumberx74.24" style="font-size:70%;">(</span><span id="lstnumberx74.25" style="font-size:70%;">READ</span> <span id="lstnumberx74.27" style="font-size:70%;">THIS</span> <span id="lstnumberx74.29" style="font-size:70%;">FIRST</span><span id="lstnumberx74.30" style="font-size:70%;">)</span> </span><span id="lstnumberx75"><span id="lstnumberx75.1" style="font-size:70%;">|</span> <span id="lstnumberx75.3" style="font-size:70%;">|</span> <span id="lstnumberx75.5" style="font-size:70%;">|</span> <span id="lstnumberx75.7" style="font-size:70%;">|--</span> <span id="lstnumberx75.9" style="font-size:70%;">overview</span><span id="lstnumberx75.10" style="font-size:70%;">.</span><span id="lstnumberx75.11" style="font-size:70%;">md</span> </span><span id="lstnumberx76"><span id="lstnumberx76.1" style="font-size:70%;">|</span> <span id="lstnumberx76.3" style="font-size:70%;">|</span> <span id="lstnumberx76.5" style="font-size:70%;">|</span> <span id="lstnumberx76.7" style="font-size:70%;">`--</span> <span id="lstnumberx76.9" style="font-size:70%;">detail</span> <span id="lstnumberx76.10" style="font-size:70%;">/{</span> <span id="lstnumberx76.11" style="font-size:70%;">task_name</span> <span id="lstnumberx76.12" style="font-size:70%;">}.</span><span id="lstnumberx76.13" style="font-size:70%;">md</span> </span><span id="lstnumberx77"><span id="lstnumberx77.1" style="font-size:70%;">|</span> <span id="lstnumberx77.3" style="font-size:70%;">|</span> <span id="lstnumberx77.5" style="font-size:70%;">|--</span> <span id="lstnumberx77.7" style="font-size:70%;">variant_selection</span><span id="lstnumberx77.8" style="font-size:70%;">.</span><span id="lstnumberx77.9" style="font-size:70%;">json</span> </span><span id="lstnumberx78"><span id="lstnumberx78.1" style="font-size:70%;">|</span> <span id="lstnumberx78.3" style="font-size:70%;">|</span> <span id="lstnumberx78.5" style="font-size:70%;">`--</span> <span id="lstnumberx78.7" style="font-size:70%;">change_evaluation</span><span id="lstnumberx78.8" style="font-size:70%;">.</span><span id="lstnumberx78.9" style="font-size:70%;">json</span> </span><span id="lstnumberx79"><span id="lstnumberx79.1" style="font-size:70%;">|</span> <span id="lstnumberx79.3" style="font-size:70%;">`--</span> <span id="lstnumberx79.5" style="font-size:70%;">evolve</span> <span id="lstnumberx79.6" style="font-size:70%;">/</span> <span id="lstnumberx79.8" style="font-size:70%;">#</span> <span id="lstnumberx79.10" style="font-size:70%;">YOUR</span> <span id="lstnumberx79.12" style="font-size:70%;">outputs</span> <span id="lstnumberx79.14" style="font-size:70%;">this</span> <span id="lstnumberx79.16" style="font-size:70%;">loop</span> </span><span id="lstnumberx80"><span id="lstnumberx80.1" style="font-size:70%;">|</span> <span id="lstnumberx80.3" style="font-size:70%;">|--</span> <span id="lstnumberx80.5" style="font-size:70%;">evolve_summary</span><span id="lstnumberx80.6" style="font-size:70%;">.</span><span id="lstnumberx80.7" style="font-size:70%;">md</span> </span><span id="lstnumberx81"><span id="lstnumberx81.1" style="font-size:70%;">|</span> <span id="lstnumberx81.3" style="font-size:70%;">|--</span> <span id="lstnumberx81.5" style="font-size:70%;">change_manifest</span><span id="lstnumberx81.6" style="font-size:70%;">.</span><span id="lstnumberx81.7" style="font-size:70%;">json</span> </span><span id="lstnumberx82"><span id="lstnumberx82.1" style="font-size:70%;">|</span> <span id="lstnumberx82.3" style="font-size:70%;">`--</span> <span id="lstnumberx82.5" style="font-size:70%;">variant_N</span> <span id="lstnumberx82.6" style="font-size:70%;">/</span> </span><span id="lstnumberx83"><span id="lstnumberx83.1" style="font-size:70%;">|</span> <span id="lstnumberx83.3" style="font-size:70%;">|--</span> <span id="lstnumberx83.5" style="font-size:70%;">workspace</span> <span id="lstnumberx83.6" style="font-size:70%;">/</span> </span><span id="lstnumberx84"><span id="lstnumberx84.1" style="font-size:70%;">|</span> <span id="lstnumberx84.3" style="font-size:70%;">`--</span> <span id="lstnumberx84.5" style="font-size:70%;">evolve_trace</span><span id="lstnumberx84.6" style="font-size:70%;">.</span><span id="lstnumberx84.7" style="font-size:70%;">json</span> </span><span id="lstnumberx85"><span id="lstnumberx85.1" style="font-size:70%;">|</span> </span><span id="lstnumberx86"><span id="lstnumberx86.1" style="font-size:70%;">|--</span> <span id="lstnumberx86.3" style="font-size:70%;">evolution_history</span><span id="lstnumberx86.4" style="font-size:70%;">.</span><span id="lstnumberx86.5" style="font-size:70%;">md</span> <span id="lstnumberx86.7" style="font-size:70%;">#</span> <span id="lstnumberx86.9" style="font-size:70%;">Cumulative</span> <span id="lstnumberx86.11" style="font-size:70%;">history</span> <span id="lstnumberx86.13" style="font-size:70%;">of</span> <span id="lstnumberx86.15" style="font-size:70%;">all</span> <span id="lstnumberx86.17" style="font-size:70%;">iterations</span> <span id="lstnumberx86.19" style="font-size:70%;">(</span><span id="lstnumberx86.20" style="font-size:70%;">READ</span><span id="lstnumberx86.21" style="font-size:70%;">)</span> </span><span id="lstnumberx87"><span id="lstnumberx87.1" style="font-size:70%;">`--</span> <span id="lstnumberx87.3" style="font-size:70%;">config_snapshot</span><span id="lstnumberx87.4" style="font-size:70%;">.</span><span id="lstnumberx87.5" style="font-size:70%;">yaml</span> <span id="lstnumberx87.7" style="font-size:70%;">#</span> <span id="lstnumberx87.9" style="font-size:70%;">Initial</span> <span id="lstnumberx87.11" style="font-size:70%;">config</span> <span id="lstnumberx87.13" style="font-size:70%;">(</span><span id="lstnumberx87.14" style="font-size:70%;">READ</span> <span id="lstnumberx87.16" style="font-size:70%;">ONLY</span><span id="lstnumberx87.17" style="font-size:70%;">)</span> </span><span id="lstnumberx88"><span id="lstnumberx88.1" style="font-size:70%;">```</span> </span><span id="lstnumberx91"><span id="lstnumberx91.1" style="font-size:70%;">#</span> <span id="lstnumberx91.3" style="font-size:70%;">Components</span> </span><span id="lstnumberx93"><span id="lstnumberx93.1" style="font-size:70%;">##</span> <span id="lstnumberx93.3" style="font-size:70%;">Available</span> <span id="lstnumberx93.5" style="font-size:70%;">Component</span> <span id="lstnumberx93.7" style="font-size:70%;">Types</span> </span><span id="lstnumberx95"><span id="lstnumberx95.1" style="font-size:70%;">|</span> <span id="lstnumberx95.3" style="font-size:70%;">Component</span> <span id="lstnumberx95.5" style="font-size:70%;">|</span> <span id="lstnumberx95.7" style="font-size:70%;">Files</span> <span id="lstnumberx95.9" style="font-size:70%;">|</span> <span id="lstnumberx95.11" style="font-size:70%;">Characteristics</span> <span id="lstnumberx95.13" style="font-size:70%;">|</span> <span id="lstnumberx95.15" style="font-size:70%;">When</span> <span id="lstnumberx95.17" style="font-size:70%;">to</span> <span id="lstnumberx95.19" style="font-size:70%;">use</span> <span id="lstnumberx95.21" style="font-size:70%;">|</span> </span><span id="lstnumberx96"><span id="lstnumberx96.1" style="font-size:70%;">|-----------|-------|----------------|-------------|</span> </span><span id="lstnumberx97"><span id="lstnumberx97.1" style="font-size:70%;">|</span> <span id="lstnumberx97.3" style="font-size:70%;">**</span> <span id="lstnumberx97.4" style="font-size:70%;">System</span> <span id="lstnumberx97.6" style="font-size:70%;">Prompt</span> <span id="lstnumberx97.7" style="font-size:70%;">**</span> <span id="lstnumberx97.9" style="font-size:70%;">|</span> <span id="lstnumberx97.11" style="font-size:70%;">`</span> <span id="lstnumberx97.12" style="font-size:70%;">workspace</span> <span id="lstnumberx97.13" style="font-size:70%;">/</span> <span id="lstnumberx97.14" style="font-size:70%;">systemprompt</span><span id="lstnumberx97.15" style="font-size:70%;">.</span><span id="lstnumberx97.16" style="font-size:70%;">md</span> <span id="lstnumberx97.17" style="font-size:70%;">`</span> <span id="lstnumberx97.19" style="font-size:70%;">|</span> <span id="lstnumberx97.21" style="font-size:70%;">Advisory</span> <span id="lstnumberx97.23" style="font-size:70%;">--</span> <span id="lstnumberx97.25" style="font-size:70%;">applies</span> <span id="lstnumberx97.27" style="font-size:70%;">to</span> <span id="lstnumberx97.29" style="font-size:70%;">all</span> <span id="lstnumberx97.31" style="font-size:70%;">tasks</span> <span id="lstnumberx97.33" style="font-size:70%;">|</span> <span id="lstnumberx97.35" style="font-size:70%;">Behavioral</span> <span id="lstnumberx97.37" style="font-size:70%;">rules</span><span id="lstnumberx97.38" style="font-size:70%;">,</span><span id="lstnumberx97.40" style="font-size:70%;">workflow</span> <span id="lstnumberx97.42" style="font-size:70%;">guidance</span> <span id="lstnumberx97.44" style="font-size:70%;">|</span> </span><span id="lstnumberx98"><span id="lstnumberx98.1" style="font-size:70%;">|</span> <span id="lstnumberx98.3" style="font-size:70%;">**</span> <span id="lstnumberx98.4" style="font-size:70%;">Tool</span> <span id="lstnumberx98.6" style="font-size:70%;">Description</span> <span id="lstnumberx98.7" style="font-size:70%;">**</span> <span id="lstnumberx98.9" style="font-size:70%;">|</span> <span id="lstnumberx98.11" style="font-size:70%;">`</span> <span id="lstnumberx98.12" style="font-size:70%;">workspace</span> <span id="lstnumberx98.13" style="font-size:70%;">/</span> <span id="lstnumberx98.14" style="font-size:70%;">tool_descriptions</span> <span id="lstnumberx98.15" style="font-size:70%;">/*.</span><span id="lstnumberx98.16" style="font-size:70%;">tool</span><span id="lstnumberx98.17" style="font-size:70%;">.</span><span id="lstnumberx98.18" style="font-size:70%;">yaml</span> <span id="lstnumberx98.19" style="font-size:70%;">`</span> <span id="lstnumberx98.21" style="font-size:70%;">|</span> <span id="lstnumberx98.23" style="font-size:70%;">Co</span> <span id="lstnumberx98.24" style="font-size:70%;">-</span> <span id="lstnumberx98.25" style="font-size:70%;">located</span> <span id="lstnumberx98.27" style="font-size:70%;">with</span> <span id="lstnumberx98.29" style="font-size:70%;">tool</span> <span id="lstnumberx98.31" style="font-size:70%;">--</span> <span id="lstnumberx98.33" style="font-size:70%;">model</span> <span id="lstnumberx98.35" style="font-size:70%;">reads</span> <span id="lstnumberx98.37" style="font-size:70%;">when</span> <span id="lstnumberx98.39" style="font-size:70%;">calling</span> <span id="lstnumberx98.41" style="font-size:70%;">|</span> <span id="lstnumberx98.43" style="font-size:70%;">Clarify</span> <span id="lstnumberx98.45" style="font-size:70%;">tool</span> <span id="lstnumberx98.47" style="font-size:70%;">usage</span><span id="lstnumberx98.48" style="font-size:70%;">,</span><span id="lstnumberx98.50" style="font-size:70%;">add</span> <span id="lstnumberx98.52" style="font-size:70%;">examples</span><span id="lstnumberx98.53" style="font-size:70%;">,</span><span id="lstnumberx98.55" style="font-size:70%;">warn</span> <span id="lstnumberx98.57" style="font-size:70%;">about</span> <span id="lstnumberx98.59" style="font-size:70%;">pitfalls</span> <span id="lstnumberx98.61" style="font-size:70%;">|</span> </span><span id="lstnumberx99"><span id="lstnumberx99.1" style="font-size:70%;">|</span> <span id="lstnumberx99.3" style="font-size:70%;">**</span> <span id="lstnumberx99.4" style="font-size:70%;">Tool</span> <span id="lstnumberx99.6" style="font-size:70%;">Implementation</span> <span id="lstnumberx99.7" style="font-size:70%;">**</span> <span id="lstnumberx99.9" style="font-size:70%;">|</span> <span id="lstnumberx99.11" style="font-size:70%;">`</span> <span id="lstnumberx99.12" style="font-size:70%;">workspace</span> <span id="lstnumberx99.13" style="font-size:70%;">/</span> <span id="lstnumberx99.14" style="font-size:70%;">tools</span> <span id="lstnumberx99.15" style="font-size:70%;">/`</span> <span id="lstnumberx99.17" style="font-size:70%;">|</span> <span id="lstnumberx99.19" style="font-size:70%;">Controls</span> <span id="lstnumberx99.21" style="font-size:70%;">tool</span> <span id="lstnumberx99.23" style="font-size:70%;">behavior</span> <span id="lstnumberx99.25" style="font-size:70%;">directly</span> <span id="lstnumberx99.27" style="font-size:70%;">|</span> <span id="lstnumberx99.29" style="font-size:70%;">New</span> <span id="lstnumberx99.31" style="font-size:70%;">capabilities</span><span id="lstnumberx99.32" style="font-size:70%;">,</span><span id="lstnumberx99.34" style="font-size:70%;">smarter</span> <span id="lstnumberx99.36" style="font-size:70%;">error</span> <span id="lstnumberx99.38" style="font-size:70%;">handling</span><span id="lstnumberx99.39" style="font-size:70%;">,</span><span id="lstnumberx99.41" style="font-size:70%;">output</span> <span id="lstnumberx99.43" style="font-size:70%;">formatting</span> <span id="lstnumberx99.45" style="font-size:70%;">|</span> </span><span id="lstnumberx100"><span id="lstnumberx100.1" style="font-size:70%;">|</span> <span id="lstnumberx100.3" style="font-size:70%;">**</span> <span id="lstnumberx100.4" style="font-size:70%;">Middleware</span> <span id="lstnumberx100.5" style="font-size:70%;">**</span> <span id="lstnumberx100.7" style="font-size:70%;">|</span> <span id="lstnumberx100.9" style="font-size:70%;">`</span> <span id="lstnumberx100.10" style="font-size:70%;">workspace</span> <span id="lstnumberx100.11" style="font-size:70%;">/</span> <span id="lstnumberx100.12" style="font-size:70%;">middleware</span> <span id="lstnumberx100.13" style="font-size:70%;">/`</span> <span id="lstnumberx100.15" style="font-size:70%;">+</span> <span id="lstnumberx100.17" style="font-size:70%;">`</span> <span id="lstnumberx100.18" style="font-size:70%;">code_agent</span><span id="lstnumberx100.19" style="font-size:70%;">.</span><span id="lstnumberx100.20" style="font-size:70%;">yaml</span> <span id="lstnumberx100.21" style="font-size:70%;">`</span> <span id="lstnumberx100.23" style="font-size:70%;">|</span> <span id="lstnumberx100.25" style="font-size:70%;">Hooks</span> <span id="lstnumberx100.27" style="font-size:70%;">into</span> <span id="lstnumberx100.29" style="font-size:70%;">agent</span> <span id="lstnumberx100.31" style="font-size:70%;">loop</span> <span id="lstnumberx100.33" style="font-size:70%;">pipeline</span> <span id="lstnumberx100.35" style="font-size:70%;">|</span> <span id="lstnumberx100.37" style="font-size:70%;">Intercept</span> <span id="lstnumberx100.38" style="font-size:70%;">/</span> <span id="lstnumberx100.39" style="font-size:70%;">transform</span> <span id="lstnumberx100.41" style="font-size:70%;">at</span> <span id="lstnumberx100.43" style="font-size:70%;">execution</span> <span id="lstnumberx100.45" style="font-size:70%;">level</span> <span id="lstnumberx100.47" style="font-size:70%;">|</span> </span><span id="lstnumberx101"><span id="lstnumberx101.1" style="font-size:70%;">|</span> <span id="lstnumberx101.3" style="font-size:70%;">**</span> <span id="lstnumberx101.4" style="font-size:70%;">Skill</span> <span id="lstnumberx101.5" style="font-size:70%;">**</span> <span id="lstnumberx101.7" style="font-size:70%;">|</span> <span id="lstnumberx101.9" style="font-size:70%;">`</span> <span id="lstnumberx101.10" style="font-size:70%;">workspace</span> <span id="lstnumberx101.11" style="font-size:70%;">/</span> <span id="lstnumberx101.12" style="font-size:70%;">skills</span> <span id="lstnumberx101.13" style="font-size:70%;">/`</span> <span id="lstnumberx101.15" style="font-size:70%;">+</span> <span id="lstnumberx101.17" style="font-size:70%;">`</span> <span id="lstnumberx101.18" style="font-size:70%;">code_agent</span><span id="lstnumberx101.19" style="font-size:70%;">.</span><span id="lstnumberx101.20" style="font-size:70%;">yaml</span> <span id="lstnumberx101.21" style="font-size:70%;">`</span> <span id="lstnumberx101.23" style="font-size:70%;">|</span> <span id="lstnumberx101.25" style="font-size:70%;">On</span> <span id="lstnumberx101.26" style="font-size:70%;">-</span> <span id="lstnumberx101.27" style="font-size:70%;">demand</span> <span id="lstnumberx101.29" style="font-size:70%;">--</span> <span id="lstnumberx101.31" style="font-size:70%;">loaded</span> <span id="lstnumberx101.33" style="font-size:70%;">when</span> <span id="lstnumberx101.35" style="font-size:70%;">relevant</span> <span id="lstnumberx101.37" style="font-size:70%;">|</span> <span id="lstnumberx101.39" style="font-size:70%;">Reusable</span> <span id="lstnumberx101.41" style="font-size:70%;">workflow</span> <span id="lstnumberx101.43" style="font-size:70%;">patterns</span> <span id="lstnumberx101.45" style="font-size:70%;">|</span> </span><span id="lstnumberx102"><span id="lstnumberx102.1" style="font-size:70%;">|</span> <span id="lstnumberx102.3" style="font-size:70%;">**</span> <span id="lstnumberx102.4" style="font-size:70%;">Sub</span> <span id="lstnumberx102.5" style="font-size:70%;">-</span> <span id="lstnumberx102.6" style="font-size:70%;">Agent</span> <span id="lstnumberx102.7" style="font-size:70%;">**</span> <span id="lstnumberx102.9" style="font-size:70%;">|</span> <span id="lstnumberx102.11" style="font-size:70%;">`</span> <span id="lstnumberx102.12" style="font-size:70%;">workspace</span> <span id="lstnumberx102.13" style="font-size:70%;">/</span> <span id="lstnumberx102.14" style="font-size:70%;">sub_agents</span> <span id="lstnumberx102.15" style="font-size:70%;">/{</span> <span id="lstnumberx102.16" style="font-size:70%;">name</span> <span id="lstnumberx102.17" style="font-size:70%;">}/`</span> <span id="lstnumberx102.19" style="font-size:70%;">+</span> <span id="lstnumberx102.21" style="font-size:70%;">`</span> <span id="lstnumberx102.22" style="font-size:70%;">code_agent</span><span id="lstnumberx102.23" style="font-size:70%;">.</span><span id="lstnumberx102.24" style="font-size:70%;">yaml</span> <span id="lstnumberx102.25" style="font-size:70%;">`</span> <span id="lstnumberx102.27" style="font-size:70%;">|</span> <span id="lstnumberx102.29" style="font-size:70%;">Delegated</span> <span id="lstnumberx102.31" style="font-size:70%;">execution</span> <span id="lstnumberx102.33" style="font-size:70%;">--</span> <span id="lstnumberx102.35" style="font-size:70%;">isolated</span> <span id="lstnumberx102.37" style="font-size:70%;">context</span> <span id="lstnumberx102.39" style="font-size:70%;">|</span> <span id="lstnumberx102.41" style="font-size:70%;">Offload</span> <span id="lstnumberx102.43" style="font-size:70%;">specialized</span> <span id="lstnumberx102.45" style="font-size:70%;">subtask</span> <span id="lstnumberx102.47" style="font-size:70%;">to</span> <span id="lstnumberx102.49" style="font-size:70%;">child</span> <span id="lstnumberx102.51" style="font-size:70%;">agent</span> <span id="lstnumberx102.53" style="font-size:70%;">|</span> </span><span id="lstnumberx103"><span id="lstnumberx103.1" style="font-size:70%;">|</span> <span id="lstnumberx103.3" style="font-size:70%;">**</span> <span id="lstnumberx103.4" style="font-size:70%;">Long</span> <span id="lstnumberx103.5" style="font-size:70%;">-</span> <span id="lstnumberx103.6" style="font-size:70%;">Term</span> <span id="lstnumberx103.8" style="font-size:70%;">Memory</span> <span id="lstnumberx103.9" style="font-size:70%;">**</span> <span id="lstnumberx103.11" style="font-size:70%;">|</span> <span id="lstnumberx103.13" style="font-size:70%;">`</span> <span id="lstnumberx103.14" style="font-size:70%;">workspace</span> <span id="lstnumberx103.15" style="font-size:70%;">/</span> <span id="lstnumberx103.16" style="font-size:70%;">LongTermMEMORY</span><span id="lstnumberx103.17" style="font-size:70%;">.</span><span id="lstnumberx103.18" style="font-size:70%;">md</span> <span id="lstnumberx103.19" style="font-size:70%;">`</span> <span id="lstnumberx103.21" style="font-size:70%;">|</span> <span id="lstnumberx103.23" style="font-size:70%;">Persistent</span> <span id="lstnumberx103.25" style="font-size:70%;">cross</span> <span id="lstnumberx103.26" style="font-size:70%;">-</span> <span id="lstnumberx103.27" style="font-size:70%;">session</span> <span id="lstnumberx103.29" style="font-size:70%;">knowledge</span> <span id="lstnumberx103.31" style="font-size:70%;">--</span> <span id="lstnumberx103.33" style="font-size:70%;">MODIFIABLE</span> <span id="lstnumberx103.35" style="font-size:70%;">|</span> <span id="lstnumberx103.37" style="font-size:70%;">Record</span> <span id="lstnumberx103.39" style="font-size:70%;">recurring</span> <span id="lstnumberx103.41" style="font-size:70%;">pitfalls</span><span id="lstnumberx103.42" style="font-size:70%;">,</span><span id="lstnumberx103.44" style="font-size:70%;">proven</span> <span id="lstnumberx103.46" style="font-size:70%;">strategies</span><span id="lstnumberx103.47" style="font-size:70%;">,</span><span id="lstnumberx103.49" style="font-size:70%;">environment</span> <span id="lstnumberx103.51" style="font-size:70%;">quirks</span> <span id="lstnumberx103.53" style="font-size:70%;">|</span> </span><span id="lstnumberx104"><span id="lstnumberx104.1" style="font-size:70%;">|</span> <span id="lstnumberx104.3" style="font-size:70%;">**</span> <span id="lstnumberx104.4" style="font-size:70%;">Short</span> <span id="lstnumberx104.5" style="font-size:70%;">-</span> <span id="lstnumberx104.6" style="font-size:70%;">Term</span> <span id="lstnumberx104.8" style="font-size:70%;">Memory</span> <span id="lstnumberx104.9" style="font-size:70%;">**</span> <span id="lstnumberx104.11" style="font-size:70%;">|</span> <span id="lstnumberx104.13" style="font-size:70%;">`</span> <span id="lstnumberx104.14" style="font-size:70%;">workspace</span> <span id="lstnumberx104.15" style="font-size:70%;">/</span> <span id="lstnumberx104.16" style="font-size:70%;">ShortTermMEMORY</span><span id="lstnumberx104.17" style="font-size:70%;">.</span><span id="lstnumberx104.18" style="font-size:70%;">md</span> <span id="lstnumberx104.19" style="font-size:70%;">`</span> <span id="lstnumberx104.21" style="font-size:70%;">|</span> <span id="lstnumberx104.23" style="font-size:70%;">Session</span> <span id="lstnumberx104.24" style="font-size:70%;">-</span> <span id="lstnumberx104.25" style="font-size:70%;">scoped</span> <span id="lstnumberx104.27" style="font-size:70%;">scratch</span> <span id="lstnumberx104.29" style="font-size:70%;">--</span> <span id="lstnumberx104.31" style="font-size:70%;">DO</span> <span id="lstnumberx104.33" style="font-size:70%;">NOT</span> <span id="lstnumberx104.35" style="font-size:70%;">MODIFY</span> <span id="lstnumberx104.37" style="font-size:70%;">|</span> <span id="lstnumberx104.39" style="font-size:70%;">_</span> <span id="lstnumberx104.40" style="font-size:70%;">(</span><span id="lstnumberx104.41" style="font-size:70%;">read</span> <span id="lstnumberx104.42" style="font-size:70%;">-</span> <span id="lstnumberx104.43" style="font-size:70%;">only</span> <span id="lstnumberx104.45" style="font-size:70%;">for</span> <span id="lstnumberx104.47" style="font-size:70%;">evolve</span> <span id="lstnumberx104.49" style="font-size:70%;">agent</span><span id="lstnumberx104.50" style="font-size:70%;">)</span> <span id="lstnumberx104.51" style="font-size:70%;">_</span> <span id="lstnumberx104.53" style="font-size:70%;">|</span> </span><span id="lstnumberx106"><span id="lstnumberx106.1" style="font-size:70%;">All</span> <span id="lstnumberx106.3" style="font-size:70%;">component</span> <span id="lstnumberx106.5" style="font-size:70%;">types</span> <span id="lstnumberx106.7" style="font-size:70%;">are</span> <span id="lstnumberx106.9" style="font-size:70%;">equally</span> <span id="lstnumberx106.11" style="font-size:70%;">valid</span> <span id="lstnumberx106.13" style="font-size:70%;">and</span> <span id="lstnumberx106.15" style="font-size:70%;">important</span><span id="lstnumberx106.16" style="font-size:70%;">.</span><span id="lstnumberx106.18" style="font-size:70%;">Choose</span> <span id="lstnumberx106.20" style="font-size:70%;">the</span> <span id="lstnumberx106.22" style="font-size:70%;">one</span> <span id="lstnumberx106.24" style="font-size:70%;">that</span> <span id="lstnumberx106.26" style="font-size:70%;">best</span> <span id="lstnumberx106.28" style="font-size:70%;">fits</span> <span id="lstnumberx106.30" style="font-size:70%;">the</span> <span id="lstnumberx106.32" style="font-size:70%;">root</span> <span id="lstnumberx106.34" style="font-size:70%;">cause</span><span id="lstnumberx106.35" style="font-size:70%;">.</span></span> <span id="lstnumberx108"><span id="lstnumberx108.1" style="font-size:70%;">###</span> <span id="lstnumberx108.3" style="font-size:70%;">Choosing</span> <span id="lstnumberx108.5" style="font-size:70%;">the</span> <span id="lstnumberx108.7" style="font-size:70%;">Right</span> <span id="lstnumberx108.9" style="font-size:70%;">Component</span> <span id="lstnumberx108.11" style="font-size:70%;">Level</span> </span><span id="lstnumberx110"><span id="lstnumberx110.1" style="font-size:70%;">For</span> <span id="lstnumberx110.3" style="font-size:70%;">each</span> <span id="lstnumberx110.5" style="font-size:70%;">failure</span> <span id="lstnumberx110.7" style="font-size:70%;">pattern</span><span id="lstnumberx110.8" style="font-size:70%;">,</span><span id="lstnumberx110.10" style="font-size:70%;">consider</span> <span id="lstnumberx110.12" style="font-size:70%;">**</span> <span id="lstnumberx110.13" style="font-size:70%;">all</span> <span id="lstnumberx110.14" style="font-size:70%;">**</span> <span id="lstnumberx110.16" style="font-size:70%;">component</span> <span id="lstnumberx110.18" style="font-size:70%;">types</span> <span id="lstnumberx110.20" style="font-size:70%;">above</span> <span id="lstnumberx110.22" style="font-size:70%;">--</span> <span id="lstnumberx110.24" style="font-size:70%;">including</span> <span id="lstnumberx110.26" style="font-size:70%;">creating</span> <span id="lstnumberx110.28" style="font-size:70%;">new</span> <span id="lstnumberx110.30" style="font-size:70%;">ones</span> <span id="lstnumberx110.32" style="font-size:70%;">--</span> <span id="lstnumberx110.34" style="font-size:70%;">before</span> <span id="lstnumberx110.36" style="font-size:70%;">deciding</span> <span id="lstnumberx110.38" style="font-size:70%;">where</span> <span id="lstnumberx110.40" style="font-size:70%;">to</span> <span id="lstnumberx110.42" style="font-size:70%;">fix</span><span id="lstnumberx110.43" style="font-size:70%;">.</span></span> <span id="lstnumberx112"><span id="lstnumberx112.1" style="font-size:70%;">**</span> <span id="lstnumberx112.2" style="font-size:70%;">Anti</span> <span id="lstnumberx112.3" style="font-size:70%;">-</span> <span id="lstnumberx112.4" style="font-size:70%;">pattern</span><span id="lstnumberx112.5" style="font-size:70%;">:**</span> <span id="lstnumberx112.7" style="font-size:70%;">If</span> <span id="lstnumberx112.9" style="font-size:70%;">the</span> <span id="lstnumberx112.11" style="font-size:70%;">same</span> <span id="lstnumberx112.13" style="font-size:70%;">failure</span> <span id="lstnumberx112.15" style="font-size:70%;">class</span> <span id="lstnumberx112.17" style="font-size:70%;">persists</span> <span id="lstnumberx112.19" style="font-size:70%;">across</span> <span id="lstnumberx112.21" style="font-size:70%;">2+</span> <span id="lstnumberx112.23" style="font-size:70%;">iterations</span> <span id="lstnumberx112.25" style="font-size:70%;">despite</span> <span id="lstnumberx112.27" style="font-size:70%;">fixes</span> <span id="lstnumberx112.29" style="font-size:70%;">at</span> <span id="lstnumberx112.31" style="font-size:70%;">one</span> <span id="lstnumberx112.33" style="font-size:70%;">component</span> <span id="lstnumberx112.35" style="font-size:70%;">level</span><span id="lstnumberx112.36" style="font-size:70%;">,</span><span id="lstnumberx112.38" style="font-size:70%;">that</span> <span id="lstnumberx112.40" style="font-size:70%;">level</span> <span id="lstnumberx112.42" style="font-size:70%;">may</span> <span id="lstnumberx112.44" style="font-size:70%;">be</span> <span id="lstnumberx112.46" style="font-size:70%;">the</span> <span id="lstnumberx112.48" style="font-size:70%;">wrong</span> <span id="lstnumberx112.50" style="font-size:70%;">choice</span><span id="lstnumberx112.51" style="font-size:70%;">.</span><span id="lstnumberx112.53" style="font-size:70%;">Rollback</span> <span id="lstnumberx112.55" style="font-size:70%;">the</span> <span id="lstnumberx112.57" style="font-size:70%;">ineffective</span> <span id="lstnumberx112.59" style="font-size:70%;">change</span> <span id="lstnumberx112.61" style="font-size:70%;">and</span> <span id="lstnumberx112.63" style="font-size:70%;">re</span> <span id="lstnumberx112.64" style="font-size:70%;">-</span> <span id="lstnumberx112.65" style="font-size:70%;">approach</span> <span id="lstnumberx112.67" style="font-size:70%;">from</span> <span id="lstnumberx112.69" style="font-size:70%;">a</span> <span id="lstnumberx112.71" style="font-size:70%;">different</span> <span id="lstnumberx112.73" style="font-size:70%;">component</span> <span id="lstnumberx112.75" style="font-size:70%;">level</span><span id="lstnumberx112.76" style="font-size:70%;">.</span></span> <span id="lstnumberx114"><span id="lstnumberx114.1" style="font-size:70%;">##</span> <span id="lstnumberx114.3" style="font-size:70%;">Registering</span> <span id="lstnumberx114.5" style="font-size:70%;">New</span> <span id="lstnumberx114.7" style="font-size:70%;">Components</span> </span><span id="lstnumberx116"><span id="lstnumberx116.1" style="font-size:70%;">**</span> <span id="lstnumberx116.2" style="font-size:70%;">Creating</span> <span id="lstnumberx116.4" style="font-size:70%;">a</span> <span id="lstnumberx116.6" style="font-size:70%;">file</span> <span id="lstnumberx116.8" style="font-size:70%;">is</span> <span id="lstnumberx116.10" style="font-size:70%;">NOT</span> <span id="lstnumberx116.12" style="font-size:70%;">enough</span> <span id="lstnumberx116.14" style="font-size:70%;">--</span> <span id="lstnumberx116.16" style="font-size:70%;">register</span> <span id="lstnumberx116.18" style="font-size:70%;">in</span> <span id="lstnumberx116.20" style="font-size:70%;">`</span> <span id="lstnumberx116.21" style="font-size:70%;">code_agent</span><span id="lstnumberx116.22" style="font-size:70%;">.</span><span id="lstnumberx116.23" style="font-size:70%;">yaml</span> <span id="lstnumberx116.24" style="font-size:70%;">`:**</span> </span><span id="lstnumberx117"><span id="lstnumberx117.1" style="font-size:70%;">-</span> <span id="lstnumberx117.3" style="font-size:70%;">New</span> <span id="lstnumberx117.5" style="font-size:70%;">tool</span><span id="lstnumberx117.6" style="font-size:70%;">:</span><span id="lstnumberx117.8" style="font-size:70%;">create</span> <span id="lstnumberx117.10" style="font-size:70%;">`.</span><span id="lstnumberx117.11" style="font-size:70%;">tool</span><span id="lstnumberx117.12" style="font-size:70%;">.</span><span id="lstnumberx117.13" style="font-size:70%;">yaml</span> <span id="lstnumberx117.14" style="font-size:70%;">`</span> <span id="lstnumberx117.16" style="font-size:70%;">+</span> <span id="lstnumberx117.18" style="font-size:70%;">Python</span> <span id="lstnumberx117.20" style="font-size:70%;">implementation</span> <span id="lstnumberx117.22" style="font-size:70%;">+</span> <span id="lstnumberx117.24" style="font-size:70%;">add</span> <span id="lstnumberx117.26" style="font-size:70%;">entry</span> <span id="lstnumberx117.28" style="font-size:70%;">to</span> <span id="lstnumberx117.30" style="font-size:70%;">`</span> <span id="lstnumberx117.31" style="font-size:70%;">tools</span><span id="lstnumberx117.32" style="font-size:70%;">:`</span> <span id="lstnumberx117.34" style="font-size:70%;">list</span> </span><span id="lstnumberx118"><span id="lstnumberx118.1" style="font-size:70%;">-</span> <span id="lstnumberx118.3" style="font-size:70%;">New</span> <span id="lstnumberx118.5" style="font-size:70%;">middleware</span><span id="lstnumberx118.6" style="font-size:70%;">:</span><span id="lstnumberx118.8" style="font-size:70%;">create</span> <span id="lstnumberx118.10" style="font-size:70%;">Python</span> <span id="lstnumberx118.12" style="font-size:70%;">class</span> <span id="lstnumberx118.14" style="font-size:70%;">+</span> <span id="lstnumberx118.16" style="font-size:70%;">add</span> <span id="lstnumberx118.18" style="font-size:70%;">entry</span> <span id="lstnumberx118.20" style="font-size:70%;">to</span> <span id="lstnumberx118.22" style="font-size:70%;">`</span> <span id="lstnumberx118.23" style="font-size:70%;">middlewares</span><span id="lstnumberx118.24" style="font-size:70%;">:`</span> <span id="lstnumberx118.26" style="font-size:70%;">list</span> <span id="lstnumberx118.28" style="font-size:70%;">with</span> <span id="lstnumberx118.30" style="font-size:70%;">`</span> <span id="lstnumberx118.31" style="font-size:70%;">import</span><span id="lstnumberx118.32" style="font-size:70%;">:`</span> <span id="lstnumberx118.34" style="font-size:70%;">path</span> <span id="lstnumberx118.36" style="font-size:70%;">and</span> <span id="lstnumberx118.38" style="font-size:70%;">`</span> <span id="lstnumberx118.39" style="font-size:70%;">params</span><span id="lstnumberx118.40" style="font-size:70%;">:`</span> </span><span id="lstnumberx119"><span id="lstnumberx119.1" style="font-size:70%;">-</span> <span id="lstnumberx119.3" style="font-size:70%;">New</span> <span id="lstnumberx119.5" style="font-size:70%;">skill</span><span id="lstnumberx119.6" style="font-size:70%;">:</span><span id="lstnumberx119.8" style="font-size:70%;">create</span> <span id="lstnumberx119.10" style="font-size:70%;">`</span> <span id="lstnumberx119.11" style="font-size:70%;">skills</span> <span id="lstnumberx119.12" style="font-size:70%;">/{</span> <span id="lstnumberx119.13" style="font-size:70%;">name</span> <span id="lstnumberx119.14" style="font-size:70%;">}/</span> <span id="lstnumberx119.15" style="font-size:70%;">SKILL</span><span id="lstnumberx119.16" style="font-size:70%;">.</span><span id="lstnumberx119.17" style="font-size:70%;">md</span> <span id="lstnumberx119.18" style="font-size:70%;">`</span> <span id="lstnumberx119.20" style="font-size:70%;">folder</span> <span id="lstnumberx119.22" style="font-size:70%;">+</span> <span id="lstnumberx119.24" style="font-size:70%;">add</span> <span id="lstnumberx119.26" style="font-size:70%;">to</span> <span id="lstnumberx119.28" style="font-size:70%;">`</span> <span id="lstnumberx119.29" style="font-size:70%;">skills</span><span id="lstnumberx119.30" style="font-size:70%;">:`</span> <span id="lstnumberx119.32" style="font-size:70%;">list</span> </span><span id="lstnumberx120"><span id="lstnumberx120.1" style="font-size:70%;">-</span> <span id="lstnumberx120.3" style="font-size:70%;">New</span> <span id="lstnumberx120.5" style="font-size:70%;">sub</span> <span id="lstnumberx120.6" style="font-size:70%;">-</span> <span id="lstnumberx120.7" style="font-size:70%;">agent</span><span id="lstnumberx120.8" style="font-size:70%;">:</span><span id="lstnumberx120.10" style="font-size:70%;">create</span> <span id="lstnumberx120.12" style="font-size:70%;">`</span> <span id="lstnumberx120.13" style="font-size:70%;">sub_agents</span> <span id="lstnumberx120.14" style="font-size:70%;">/{</span> <span id="lstnumberx120.15" style="font-size:70%;">name</span> <span id="lstnumberx120.16" style="font-size:70%;">}/</span> <span id="lstnumberx120.17" style="font-size:70%;">agent</span><span id="lstnumberx120.18" style="font-size:70%;">.</span><span id="lstnumberx120.19" style="font-size:70%;">yaml</span> <span id="lstnumberx120.20" style="font-size:70%;">`</span> <span id="lstnumberx120.22" style="font-size:70%;">+</span> <span id="lstnumberx120.24" style="font-size:70%;">add</span> <span id="lstnumberx120.26" style="font-size:70%;">to</span> <span id="lstnumberx120.28" style="font-size:70%;">`</span> <span id="lstnumberx120.29" style="font-size:70%;">sub_agents</span><span id="lstnumberx120.30" style="font-size:70%;">:`</span> <span id="lstnumberx120.32" style="font-size:70%;">list</span><span id="lstnumberx120.33" style="font-size:70%;">.</span><span id="lstnumberx120.35" style="font-size:70%;">Framework</span> <span id="lstnumberx120.37" style="font-size:70%;">**</span> <span id="lstnumberx120.38" style="font-size:70%;">auto</span> <span id="lstnumberx120.39" style="font-size:70%;">-</span> <span id="lstnumberx120.40" style="font-size:70%;">injects</span> <span id="lstnumberx120.41" style="font-size:70%;">**</span> <span id="lstnumberx120.43" style="font-size:70%;">`</span> <span id="lstnumberx120.44" style="font-size:70%;">RecallSubAgent</span> <span id="lstnumberx120.45" style="font-size:70%;">`</span> <span id="lstnumberx120.47" style="font-size:70%;">tool</span> <span id="lstnumberx120.49" style="font-size:70%;">--</span> <span id="lstnumberx120.51" style="font-size:70%;">do</span> <span id="lstnumberx120.53" style="font-size:70%;">NOT</span> <span id="lstnumberx120.55" style="font-size:70%;">add</span> <span id="lstnumberx120.57" style="font-size:70%;">it</span> <span id="lstnumberx120.59" style="font-size:70%;">manually</span><span id="lstnumberx120.60" style="font-size:70%;">.</span></span> <span id="lstnumberx122"><span id="lstnumberx122.1" style="font-size:70%;">##</span> <span id="lstnumberx122.3" style="font-size:70%;">How</span> <span id="lstnumberx122.5" style="font-size:70%;">Code</span> <span id="lstnumberx122.7" style="font-size:70%;">Gets</span> <span id="lstnumberx122.9" style="font-size:70%;">Loaded</span> </span><span id="lstnumberx124"><span id="lstnumberx124.1" style="font-size:70%;">The</span> <span id="lstnumberx124.3" style="font-size:70%;">config</span> <span id="lstnumberx124.5" style="font-size:70%;">directory</span> <span id="lstnumberx124.7" style="font-size:70%;">is</span> <span id="lstnumberx124.9" style="font-size:70%;">added</span> <span id="lstnumberx124.11" style="font-size:70%;">to</span> <span id="lstnumberx124.13" style="font-size:70%;">`</span> <span id="lstnumberx124.14" style="font-size:70%;">sys</span><span id="lstnumberx124.15" style="font-size:70%;">.</span><span id="lstnumberx124.16" style="font-size:70%;">path</span> <span id="lstnumberx124.17" style="font-size:70%;">`</span> <span id="lstnumberx124.19" style="font-size:70%;">at</span> <span id="lstnumberx124.21" style="font-size:70%;">runtime</span><span id="lstnumberx124.22" style="font-size:70%;">:</span></span> <span id="lstnumberx125"><span id="lstnumberx125.1" style="font-size:70%;">-</span> <span id="lstnumberx125.3" style="font-size:70%;">`</span> <span id="lstnumberx125.4" style="font-size:70%;">binding</span><span id="lstnumberx125.5" style="font-size:70%;">:</span><span id="lstnumberx125.7" style="font-size:70%;">tools</span><span id="lstnumberx125.8" style="font-size:70%;">.</span><span id="lstnumberx125.9" style="font-size:70%;">file_tools</span><span id="lstnumberx125.10" style="font-size:70%;">:</span><span id="lstnumberx125.11" style="font-size:70%;">read_file</span> <span id="lstnumberx125.12" style="font-size:70%;">`</span> <span id="lstnumberx125.14" style="font-size:70%;">resolves</span> <span id="lstnumberx125.16" style="font-size:70%;">to</span> <span id="lstnumberx125.18" style="font-size:70%;">`</span> <span id="lstnumberx125.19" style="font-size:70%;">workspace</span> <span id="lstnumberx125.20" style="font-size:70%;">/</span> <span id="lstnumberx125.21" style="font-size:70%;">tools</span> <span id="lstnumberx125.22" style="font-size:70%;">/</span> <span id="lstnumberx125.23" style="font-size:70%;">file_tools</span> <span id="lstnumberx125.24" style="font-size:70%;">/</span> <span id="lstnumberx125.25" style="font-size:70%;">read_file</span><span id="lstnumberx125.26" style="font-size:70%;">.</span><span id="lstnumberx125.27" style="font-size:70%;">py</span> <span id="lstnumberx125.28" style="font-size:70%;">`</span> </span><span id="lstnumberx126"><span id="lstnumberx126.1" style="font-size:70%;">-</span> <span id="lstnumberx126.3" style="font-size:70%;">`</span> <span id="lstnumberx126.4" style="font-size:70%;">import</span><span id="lstnumberx126.5" style="font-size:70%;">:</span><span id="lstnumberx126.7" style="font-size:70%;">middleware</span><span id="lstnumberx126.8" style="font-size:70%;">.</span><span id="lstnumberx126.9" style="font-size:70%;">long_tool_output</span><span id="lstnumberx126.10" style="font-size:70%;">:</span><span id="lstnumberx126.11" style="font-size:70%;">LongToolOutputMiddleware</span> <span id="lstnumberx126.12" style="font-size:70%;">`</span> <span id="lstnumberx126.14" style="font-size:70%;">resolves</span> <span id="lstnumberx126.16" style="font-size:70%;">to</span> <span id="lstnumberx126.18" style="font-size:70%;">`</span> <span id="lstnumberx126.19" style="font-size:70%;">workspace</span> <span id="lstnumberx126.20" style="font-size:70%;">/</span> <span id="lstnumberx126.21" style="font-size:70%;">middleware</span> <span id="lstnumberx126.22" style="font-size:70%;">/</span> <span id="lstnumberx126.23" style="font-size:70%;">long_tool_output</span><span id="lstnumberx126.24" style="font-size:70%;">.</span><span id="lstnumberx126.25" style="font-size:70%;">py</span> <span id="lstnumberx126.26" style="font-size:70%;">`</span> </span><span id="lstnumberx127"><span id="lstnumberx127.1" style="font-size:70%;">-</span> <span id="lstnumberx127.3" style="font-size:70%;">`</span> <span id="lstnumberx127.4" style="font-size:70%;">import</span><span id="lstnumberx127.5" style="font-size:70%;">:</span><span id="lstnumberx127.7" style="font-size:70%;">middleware</span><span id="lstnumberx127.8" style="font-size:70%;">.</span><span id="lstnumberx127.9" style="font-size:70%;">context_compaction</span><span id="lstnumberx127.10" style="font-size:70%;">:</span><span id="lstnumberx127.11" style="font-size:70%;">ContextCompactionMiddleware</span> <span id="lstnumberx127.12" style="font-size:70%;">`</span> <span id="lstnumberx127.14" style="font-size:70%;">resolves</span> <span id="lstnumberx127.16" style="font-size:70%;">to</span> <span id="lstnumberx127.18" style="font-size:70%;">`</span> <span id="lstnumberx127.19" style="font-size:70%;">workspace</span> <span id="lstnumberx127.20" style="font-size:70%;">/</span> <span id="lstnumberx127.21" style="font-size:70%;">middleware</span> <span id="lstnumberx127.22" style="font-size:70%;">/</span> <span id="lstnumberx127.23" style="font-size:70%;">context_compaction</span> <span id="lstnumberx127.24" style="font-size:70%;">/</span> <span id="lstnumberx127.25" style="font-size:70%;">__init__</span><span id="lstnumberx127.26" style="font-size:70%;">.</span><span id="lstnumberx127.27" style="font-size:70%;">py</span> <span id="lstnumberx127.28" style="font-size:70%;">`</span> </span><span id="lstnumberx129"><span id="lstnumberx129.1" style="font-size:70%;">##</span> <span id="lstnumberx129.3" style="font-size:70%;">LLM</span> <span id="lstnumberx129.5" style="font-size:70%;">Environment</span> <span id="lstnumberx129.7" style="font-size:70%;">Variables</span> </span><span id="lstnumberx131"><span id="lstnumberx131.1" style="font-size:70%;">At</span> <span id="lstnumberx131.3" style="font-size:70%;">runtime</span><span id="lstnumberx131.4" style="font-size:70%;">,</span><span id="lstnumberx131.6" style="font-size:70%;">the</span> <span id="lstnumberx131.8" style="font-size:70%;">harness</span> <span id="lstnumberx131.10" style="font-size:70%;">sets</span> <span id="lstnumberx131.12" style="font-size:70%;">these</span> <span id="lstnumberx131.14" style="font-size:70%;">environment</span> <span id="lstnumberx131.16" style="font-size:70%;">variables</span> <span id="lstnumberx131.18" style="font-size:70%;">**</span> <span id="lstnumberx131.19" style="font-size:70%;">before</span> <span id="lstnumberx131.20" style="font-size:70%;">**</span> <span id="lstnumberx131.22" style="font-size:70%;">the</span> <span id="lstnumberx131.24" style="font-size:70%;">code</span> <span id="lstnumberx131.26" style="font-size:70%;">agent</span> <span id="lstnumberx131.28" style="font-size:70%;">starts</span><span id="lstnumberx131.29" style="font-size:70%;">:</span></span> <span id="lstnumberx133"><span id="lstnumberx133.1" style="font-size:70%;">|</span> <span id="lstnumberx133.3" style="font-size:70%;">Variable</span> <span id="lstnumberx133.5" style="font-size:70%;">|</span> <span id="lstnumberx133.7" style="font-size:70%;">Description</span> <span id="lstnumberx133.9" style="font-size:70%;">|</span> </span><span id="lstnumberx134"><span id="lstnumberx134.1" style="font-size:70%;">|----------|-------------|</span> </span><span id="lstnumberx135"><span id="lstnumberx135.1" style="font-size:70%;">|</span> <span id="lstnumberx135.3" style="font-size:70%;">`</span> <span id="lstnumberx135.4" style="font-size:70%;">LLM_API_KEY</span> <span id="lstnumberx135.5" style="font-size:70%;">`</span> <span id="lstnumberx135.7" style="font-size:70%;">|</span> <span id="lstnumberx135.9" style="font-size:70%;">API</span> <span id="lstnumberx135.11" style="font-size:70%;">key</span> <span id="lstnumberx135.13" style="font-size:70%;">for</span> <span id="lstnumberx135.15" style="font-size:70%;">the</span> <span id="lstnumberx135.17" style="font-size:70%;">current</span> <span id="lstnumberx135.19" style="font-size:70%;">LLM</span> <span id="lstnumberx135.21" style="font-size:70%;">provider</span> <span id="lstnumberx135.23" style="font-size:70%;">|</span> </span><span id="lstnumberx136"><span id="lstnumberx136.1" style="font-size:70%;">|</span> <span id="lstnumberx136.3" style="font-size:70%;">`</span> <span id="lstnumberx136.4" style="font-size:70%;">LLM_BASE_URL</span> <span id="lstnumberx136.5" style="font-size:70%;">`</span> <span id="lstnumberx136.7" style="font-size:70%;">|</span> <span id="lstnumberx136.9" style="font-size:70%;">Base</span> <span id="lstnumberx136.11" style="font-size:70%;">URL</span> <span id="lstnumberx136.13" style="font-size:70%;">for</span> <span id="lstnumberx136.15" style="font-size:70%;">the</span> <span id="lstnumberx136.17" style="font-size:70%;">LLM</span> <span id="lstnumberx136.19" style="font-size:70%;">API</span> <span id="lstnumberx136.21" style="font-size:70%;">endpoint</span> <span id="lstnumberx136.23" style="font-size:70%;">|</span> </span><span id="lstnumberx137"><span id="lstnumberx137.1" style="font-size:70%;">|</span> <span id="lstnumberx137.3" style="font-size:70%;">`</span> <span id="lstnumberx137.4" style="font-size:70%;">LLM_MODEL</span> <span id="lstnumberx137.5" style="font-size:70%;">`</span> <span id="lstnumberx137.7" style="font-size:70%;">|</span> <span id="lstnumberx137.9" style="font-size:70%;">Model</span> <span id="lstnumberx137.11" style="font-size:70%;">identifier</span> <span id="lstnumberx137.13" style="font-size:70%;">(</span><span id="lstnumberx137.14" style="font-size:70%;">e</span><span id="lstnumberx137.15" style="font-size:70%;">.</span><span id="lstnumberx137.16" style="font-size:70%;">g</span><span id="lstnumberx137.17" style="font-size:70%;">.</span><span id="lstnumberx137.19" style="font-size:70%;">`</span> <span id="lstnumberx137.20" style="font-size:70%;">gpt</span> <span id="lstnumberx137.21" style="font-size:70%;">-5.4`)</span> <span id="lstnumberx137.23" style="font-size:70%;">|</span> </span><span id="lstnumberx139"><span id="lstnumberx139.1" style="font-size:70%;">**</span> <span id="lstnumberx139.2" style="font-size:70%;">All</span> <span id="lstnumberx139.4" style="font-size:70%;">components</span> <span id="lstnumberx139.5" style="font-size:70%;">**</span> <span id="lstnumberx139.7" style="font-size:70%;">--</span> <span id="lstnumberx139.9" style="font-size:70%;">code</span> <span id="lstnumberx139.11" style="font-size:70%;">agent</span><span id="lstnumberx139.12" style="font-size:70%;">,</span><span id="lstnumberx139.14" style="font-size:70%;">sub</span> <span id="lstnumberx139.15" style="font-size:70%;">-</span> <span id="lstnumberx139.16" style="font-size:70%;">agents</span><span id="lstnumberx139.17" style="font-size:70%;">,</span><span id="lstnumberx139.19" style="font-size:70%;">and</span> <span id="lstnumberx139.21" style="font-size:70%;">middleware</span> <span id="lstnumberx139.23" style="font-size:70%;">--</span> <span id="lstnumberx139.25" style="font-size:70%;">use</span> <span id="lstnumberx139.27" style="font-size:70%;">these</span> <span id="lstnumberx139.29" style="font-size:70%;">same</span> <span id="lstnumberx139.31" style="font-size:70%;">env</span> <span id="lstnumberx139.33" style="font-size:70%;">vars</span><span id="lstnumberx139.34" style="font-size:70%;">:</span></span> <span id="lstnumberx140"><span id="lstnumberx140.1" style="font-size:70%;">-</span> <span id="lstnumberx140.3" style="font-size:70%;">In</span> <span id="lstnumberx140.5" style="font-size:70%;">agent</span> <span id="lstnumberx140.7" style="font-size:70%;">YAML</span> <span id="lstnumberx140.9" style="font-size:70%;">files</span><span id="lstnumberx140.10" style="font-size:70%;">:</span><span id="lstnumberx140.12" style="font-size:70%;">`</span> <span id="lstnumberx140.13" style="font-size:70%;">$</span> <span id="lstnumberx140.14" style="font-size:70%;">{</span> <span id="lstnumberx140.15" style="font-size:70%;">env</span><span id="lstnumberx140.16" style="font-size:70%;">.</span><span id="lstnumberx140.17" style="font-size:70%;">LLM_API_KEY</span> <span id="lstnumberx140.18" style="font-size:70%;">}`,</span><span id="lstnumberx140.20" style="font-size:70%;">`</span> <span id="lstnumberx140.21" style="font-size:70%;">$</span> <span id="lstnumberx140.22" style="font-size:70%;">{</span> <span id="lstnumberx140.23" style="font-size:70%;">env</span><span id="lstnumberx140.24" style="font-size:70%;">.</span><span id="lstnumberx140.25" style="font-size:70%;">LLM_BASE_URL</span> <span id="lstnumberx140.26" style="font-size:70%;">}`,</span><span id="lstnumberx140.28" style="font-size:70%;">`</span> <span id="lstnumberx140.29" style="font-size:70%;">$</span> <span id="lstnumberx140.30" style="font-size:70%;">{</span> <span id="lstnumberx140.31" style="font-size:70%;">env</span><span id="lstnumberx140.32" style="font-size:70%;">.</span><span id="lstnumberx140.33" style="font-size:70%;">LLM_MODEL</span> <span id="lstnumberx140.34" style="font-size:70%;">}`</span> </span><span id="lstnumberx141"><span id="lstnumberx141.1" style="font-size:70%;">-</span> <span id="lstnumberx141.3" style="font-size:70%;">In</span> <span id="lstnumberx141.5" style="font-size:70%;">middleware</span> <span id="lstnumberx141.7" style="font-size:70%;">Python</span> <span id="lstnumberx141.9" style="font-size:70%;">code</span><span id="lstnumberx141.10" style="font-size:70%;">:</span><span id="lstnumberx141.12" style="font-size:70%;">`</span> <span id="lstnumberx141.13" style="font-size:70%;">os</span><span id="lstnumberx141.14" style="font-size:70%;">.</span><span id="lstnumberx141.15" style="font-size:70%;">environ</span> <span id="lstnumberx141.16" style="font-size:70%;">["</span> <span id="lstnumberx141.17" style="font-size:70%;">LLM_API_KEY</span> <span id="lstnumberx141.18" style="font-size:70%;">"]`,</span><span id="lstnumberx141.20" style="font-size:70%;">etc</span><span id="lstnumberx141.21" style="font-size:70%;">.</span></span> <span id="lstnumberx143"><span id="lstnumberx143.1" style="font-size:70%;">**</span> <span id="lstnumberx143.2" style="font-size:70%;">Do</span> <span id="lstnumberx143.4" style="font-size:70%;">NOT</span> <span id="lstnumberx143.6" style="font-size:70%;">hardcode</span> <span id="lstnumberx143.8" style="font-size:70%;">API</span> <span id="lstnumberx143.10" style="font-size:70%;">keys</span><span id="lstnumberx143.11" style="font-size:70%;">.**</span> <span id="lstnumberx143.13" style="font-size:70%;">Always</span> <span id="lstnumberx143.15" style="font-size:70%;">reference</span> <span id="lstnumberx143.17" style="font-size:70%;">environment</span> <span id="lstnumberx143.19" style="font-size:70%;">variables</span><span id="lstnumberx143.20" style="font-size:70%;">.</span></span> <span id="lstnumberx145"><span id="lstnumberx145.1" style="font-size:70%;">###</span> <span id="lstnumberx145.3" style="font-size:70%;">Middleware</span> <span id="lstnumberx145.5" style="font-size:70%;">can</span> <span id="lstnumberx145.7" style="font-size:70%;">call</span> <span id="lstnumberx145.9" style="font-size:70%;">LLM</span> </span><span id="lstnumberx147"><span id="lstnumberx147.1" style="font-size:70%;">Middleware</span> <span id="lstnumberx147.3" style="font-size:70%;">has</span> <span id="lstnumberx147.5" style="font-size:70%;">access</span> <span id="lstnumberx147.7" style="font-size:70%;">to</span> <span id="lstnumberx147.9" style="font-size:70%;">the</span> <span id="lstnumberx147.11" style="font-size:70%;">agent</span> <span id="lstnumberx147.12" style="font-size:70%;">'</span> <span id="lstnumberx147.13" style="font-size:70%;">s</span> <span id="lstnumberx147.15" style="font-size:70%;">LLM</span> <span id="lstnumberx147.17" style="font-size:70%;">client</span> <span id="lstnumberx147.19" style="font-size:70%;">via</span> <span id="lstnumberx147.21" style="font-size:70%;">`</span> <span id="lstnumberx147.22" style="font-size:70%;">ModelCallParams</span> <span id="lstnumberx147.23" style="font-size:70%;">`</span> <span id="lstnumberx147.25" style="font-size:70%;">in</span> <span id="lstnumberx147.27" style="font-size:70%;">the</span> <span id="lstnumberx147.29" style="font-size:70%;">`</span> <span id="lstnumberx147.30" style="font-size:70%;">wrap_model_call</span> <span id="lstnumberx147.31" style="font-size:70%;">`</span> <span id="lstnumberx147.33" style="font-size:70%;">hook</span><span id="lstnumberx147.34" style="font-size:70%;">.</span><span id="lstnumberx147.36" style="font-size:70%;">Use</span> <span id="lstnumberx147.38" style="font-size:70%;">`</span> <span id="lstnumberx147.39" style="font-size:70%;">LLMCaller</span> <span id="lstnumberx147.40" style="font-size:70%;">`</span> <span id="lstnumberx147.42" style="font-size:70%;">to</span> <span id="lstnumberx147.44" style="font-size:70%;">make</span> <span id="lstnumberx147.46" style="font-size:70%;">side</span> <span id="lstnumberx147.47" style="font-size:70%;">-</span> <span id="lstnumberx147.48" style="font-size:70%;">calls</span> <span id="lstnumberx147.50" style="font-size:70%;">(</span><span id="lstnumberx147.51" style="font-size:70%;">e</span><span id="lstnumberx147.52" style="font-size:70%;">.</span><span id="lstnumberx147.53" style="font-size:70%;">g</span><span id="lstnumberx147.54" style="font-size:70%;">.</span><span id="lstnumberx147.56" style="font-size:70%;">summarize</span> <span id="lstnumberx147.58" style="font-size:70%;">context</span><span id="lstnumberx147.59" style="font-size:70%;">,</span><span id="lstnumberx147.61" style="font-size:70%;">classify</span> <span id="lstnumberx147.63" style="font-size:70%;">errors</span><span id="lstnumberx147.64" style="font-size:70%;">,</span><span id="lstnumberx147.66" style="font-size:70%;">generate</span> <span id="lstnumberx147.68" style="font-size:70%;">dynamic</span> <span id="lstnumberx147.70" style="font-size:70%;">guidance</span><span id="lstnumberx147.71" style="font-size:70%;">).</span><span id="lstnumberx147.73" style="font-size:70%;">See</span> <span id="lstnumberx147.75" style="font-size:70%;">the</span> <span id="lstnumberx147.77" style="font-size:70%;">evolution</span> <span id="lstnumberx147.79" style="font-size:70%;">guide</span> <span id="lstnumberx147.81" style="font-size:70%;">skill</span> <span id="lstnumberx147.83" style="font-size:70%;">for</span> <span id="lstnumberx147.85" style="font-size:70%;">full</span> <span id="lstnumberx147.87" style="font-size:70%;">API</span> <span id="lstnumberx147.89" style="font-size:70%;">reference</span> <span id="lstnumberx147.91" style="font-size:70%;">and</span> <span id="lstnumberx147.93" style="font-size:70%;">examples</span><span id="lstnumberx147.94" style="font-size:70%;">.</span></span> <span id="lstnumberx149"><span id="lstnumberx149.1" style="font-size:70%;">###</span> <span id="lstnumberx149.3" style="font-size:70%;">Sub</span> <span id="lstnumberx149.4" style="font-size:70%;">-</span> <span id="lstnumberx149.5" style="font-size:70%;">Agents</span> <span id="lstnumberx149.7" style="font-size:70%;">use</span> <span id="lstnumberx149.9" style="font-size:70%;">the</span> <span id="lstnumberx149.11" style="font-size:70%;">same</span> <span id="lstnumberx149.13" style="font-size:70%;">LLM</span> </span><span id="lstnumberx151"><span id="lstnumberx151.1" style="font-size:70%;">Sub</span> <span id="lstnumberx151.2" style="font-size:70%;">-</span> <span id="lstnumberx151.3" style="font-size:70%;">agent</span> <span id="lstnumberx151.5" style="font-size:70%;">YAML</span> <span id="lstnumberx151.7" style="font-size:70%;">configs</span> <span id="lstnumberx151.9" style="font-size:70%;">should</span> <span id="lstnumberx151.11" style="font-size:70%;">use</span> <span id="lstnumberx151.13" style="font-size:70%;">`</span> <span id="lstnumberx151.14" style="font-size:70%;">$</span> <span id="lstnumberx151.15" style="font-size:70%;">{</span> <span id="lstnumberx151.16" style="font-size:70%;">env</span><span id="lstnumberx151.17" style="font-size:70%;">.</span><span id="lstnumberx151.18" style="font-size:70%;">LLM_MODEL</span> <span id="lstnumberx151.19" style="font-size:70%;">}`</span> <span id="lstnumberx151.21" style="font-size:70%;">/</span> <span id="lstnumberx151.23" style="font-size:70%;">`</span> <span id="lstnumberx151.24" style="font-size:70%;">$</span> <span id="lstnumberx151.25" style="font-size:70%;">{</span> <span id="lstnumberx151.26" style="font-size:70%;">env</span><span id="lstnumberx151.27" style="font-size:70%;">.</span><span id="lstnumberx151.28" style="font-size:70%;">LLM_BASE_URL</span> <span id="lstnumberx151.29" style="font-size:70%;">}`</span> <span id="lstnumberx151.31" style="font-size:70%;">/</span> <span id="lstnumberx151.33" style="font-size:70%;">`</span> <span id="lstnumberx151.34" style="font-size:70%;">$</span> <span id="lstnumberx151.35" style="font-size:70%;">{</span> <span id="lstnumberx151.36" style="font-size:70%;">env</span><span id="lstnumberx151.37" style="font-size:70%;">.</span><span id="lstnumberx151.38" style="font-size:70%;">LLM_API_KEY</span> <span id="lstnumberx151.39" style="font-size:70%;">}`</span> <span id="lstnumberx151.41" style="font-size:70%;">in</span> <span id="lstnumberx151.43" style="font-size:70%;">their</span> <span id="lstnumberx151.45" style="font-size:70%;">`</span> <span id="lstnumberx151.46" style="font-size:70%;">llm_config</span> <span id="lstnumberx151.47" style="font-size:70%;">`.</span><span id="lstnumberx151.49" style="font-size:70%;">This</span> <span id="lstnumberx151.51" style="font-size:70%;">automatically</span> <span id="lstnumberx151.53" style="font-size:70%;">gives</span> <span id="lstnumberx151.55" style="font-size:70%;">them</span> <span id="lstnumberx151.57" style="font-size:70%;">the</span> <span id="lstnumberx151.59" style="font-size:70%;">same</span> <span id="lstnumberx151.61" style="font-size:70%;">LLM</span> <span id="lstnumberx151.63" style="font-size:70%;">provider</span> <span id="lstnumberx151.65" style="font-size:70%;">as</span> <span id="lstnumberx151.67" style="font-size:70%;">the</span> <span id="lstnumberx151.69" style="font-size:70%;">parent</span> <span id="lstnumberx151.71" style="font-size:70%;">agent</span><span id="lstnumberx151.72" style="font-size:70%;">.</span></span> <span id="lstnumberx153"><span id="lstnumberx153.1" style="font-size:70%;">For</span> <span id="lstnumberx153.3" style="font-size:70%;">detailed</span> <span id="lstnumberx153.5" style="font-size:70%;">schemas</span><span id="lstnumberx153.6" style="font-size:70%;">,</span><span id="lstnumberx153.8" style="font-size:70%;">creation</span> <span id="lstnumberx153.10" style="font-size:70%;">guides</span><span id="lstnumberx153.11" style="font-size:70%;">,</span><span id="lstnumberx153.13" style="font-size:70%;">and</span> <span id="lstnumberx153.15" style="font-size:70%;">code</span> <span id="lstnumberx153.17" style="font-size:70%;">examples</span><span id="lstnumberx153.18" style="font-size:70%;">,</span><span id="lstnumberx153.20" style="font-size:70%;">read</span> <span id="lstnumberx153.22" style="font-size:70%;">`</span> <span id="lstnumberx153.23" style="font-size:70%;">evolve_agent</span> <span id="lstnumberx153.24" style="font-size:70%;">/</span> <span id="lstnumberx153.25" style="font-size:70%;">skills</span> <span id="lstnumberx153.26" style="font-size:70%;">/</span> <span id="lstnumberx153.27" style="font-size:70%;">nexau</span> <span id="lstnumberx153.28" style="font-size:70%;">-</span> <span id="lstnumberx153.29" style="font-size:70%;">evolution</span> <span id="lstnumberx153.30" style="font-size:70%;">-</span> <span id="lstnumberx153.31" style="font-size:70%;">guide</span> <span id="lstnumberx153.32" style="font-size:70%;">/</span> <span id="lstnumberx153.33" style="font-size:70%;">SKILL</span><span id="lstnumberx153.34" style="font-size:70%;">.</span><span id="lstnumberx153.35" style="font-size:70%;">md</span> <span id="lstnumberx153.36" style="font-size:70%;">`.</span></span> <span id="lstnumberx156"><span id="lstnumberx156.1" style="font-size:70%;">#</span> <span id="lstnumberx156.3" style="font-size:70%;">Multi</span> <span id="lstnumberx156.4" style="font-size:70%;">-</span> <span id="lstnumberx156.5" style="font-size:70%;">Variant</span> <span id="lstnumberx156.7" style="font-size:70%;">Results</span> <span id="lstnumberx156.9" style="font-size:70%;">(</span><span id="lstnumberx156.10" style="font-size:70%;">when</span> <span id="lstnumberx156.12" style="font-size:70%;">present</span><span id="lstnumberx156.13" style="font-size:70%;">)</span> </span><span id="lstnumberx158"><span id="lstnumberx158.1" style="font-size:70%;">When</span> <span id="lstnumberx158.3" style="font-size:70%;">the</span> <span id="lstnumberx158.5" style="font-size:70%;">evolution</span> <span id="lstnumberx158.7" style="font-size:70%;">query</span> <span id="lstnumberx158.9" style="font-size:70%;">includes</span> <span id="lstnumberx158.11" style="font-size:70%;">a</span> <span id="lstnumberx158.13" style="font-size:70%;">"</span> <span id="lstnumberx158.14" style="font-size:70%;">Previous</span> <span id="lstnumberx158.16" style="font-size:70%;">Iteration</span> <span id="lstnumberx158.18" style="font-size:70%;">Variant</span> <span id="lstnumberx158.20" style="font-size:70%;">Experiment</span> <span id="lstnumberx158.22" style="font-size:70%;">Results</span> <span id="lstnumberx158.23" style="font-size:70%;">"</span> <span id="lstnumberx158.25" style="font-size:70%;">section</span><span id="lstnumberx158.26" style="font-size:70%;">,</span><span id="lstnumberx158.28" style="font-size:70%;">multiple</span> <span id="lstnumberx158.30" style="font-size:70%;">parallel</span> <span id="lstnumberx158.32" style="font-size:70%;">approaches</span> <span id="lstnumberx158.34" style="font-size:70%;">were</span> <span id="lstnumberx158.36" style="font-size:70%;">tested</span> <span id="lstnumberx158.38" style="font-size:70%;">last</span> <span id="lstnumberx158.40" style="font-size:70%;">iteration</span><span id="lstnumberx158.41" style="font-size:70%;">.</span><span id="lstnumberx158.43" style="font-size:70%;">Use</span> <span id="lstnumberx158.45" style="font-size:70%;">this</span> <span id="lstnumberx158.47" style="font-size:70%;">signal</span><span id="lstnumberx158.48" style="font-size:70%;">:</span></span> <span id="lstnumberx160"><span id="lstnumberx160.1" style="font-size:70%;">-</span> <span id="lstnumberx160.3" style="font-size:70%;">**</span> <span id="lstnumberx160.4" style="font-size:70%;">Learn</span> <span id="lstnumberx160.6" style="font-size:70%;">from</span> <span id="lstnumberx160.8" style="font-size:70%;">both</span> <span id="lstnumberx160.9" style="font-size:70%;">**:</span><span id="lstnumberx160.11" style="font-size:70%;">Even</span> <span id="lstnumberx160.13" style="font-size:70%;">the</span> <span id="lstnumberx160.15" style="font-size:70%;">losing</span> <span id="lstnumberx160.17" style="font-size:70%;">variant</span> <span id="lstnumberx160.19" style="font-size:70%;">may</span> <span id="lstnumberx160.21" style="font-size:70%;">have</span> <span id="lstnumberx160.23" style="font-size:70%;">solved</span> <span id="lstnumberx160.25" style="font-size:70%;">tasks</span> <span id="lstnumberx160.27" style="font-size:70%;">the</span> <span id="lstnumberx160.29" style="font-size:70%;">winner</span> <span id="lstnumberx160.31" style="font-size:70%;">did</span> <span id="lstnumberx160.33" style="font-size:70%;">not</span> </span><span id="lstnumberx161"><span id="lstnumberx161.1" style="font-size:70%;">-</span> <span id="lstnumberx161.3" style="font-size:70%;">**</span> <span id="lstnumberx161.4" style="font-size:70%;">Combine</span> <span id="lstnumberx161.6" style="font-size:70%;">insights</span> <span id="lstnumberx161.7" style="font-size:70%;">**:</span><span id="lstnumberx161.9" style="font-size:70%;">If</span> <span id="lstnumberx161.11" style="font-size:70%;">both</span> <span id="lstnumberx161.13" style="font-size:70%;">variants</span> <span id="lstnumberx161.15" style="font-size:70%;">addressed</span> <span id="lstnumberx161.17" style="font-size:70%;">different</span> <span id="lstnumberx161.19" style="font-size:70%;">failure</span> <span id="lstnumberx161.21" style="font-size:70%;">classes</span><span id="lstnumberx161.22" style="font-size:70%;">,</span><span id="lstnumberx161.24" style="font-size:70%;">consider</span> <span id="lstnumberx161.26" style="font-size:70%;">merging</span> <span id="lstnumberx161.28" style="font-size:70%;">the</span> <span id="lstnumberx161.30" style="font-size:70%;">effective</span> <span id="lstnumberx161.32" style="font-size:70%;">parts</span> <span id="lstnumberx161.34" style="font-size:70%;">of</span> <span id="lstnumberx161.36" style="font-size:70%;">both</span> <span id="lstnumberx161.38" style="font-size:70%;">approaches</span> </span><span id="lstnumberx162"><span id="lstnumberx162.1" style="font-size:70%;">-</span> <span id="lstnumberx162.3" style="font-size:70%;">**</span> <span id="lstnumberx162.4" style="font-size:70%;">Avoid</span> <span id="lstnumberx162.6" style="font-size:70%;">repeating</span> <span id="lstnumberx162.8" style="font-size:70%;">failures</span> <span id="lstnumberx162.9" style="font-size:70%;">**:</span><span id="lstnumberx162.11" style="font-size:70%;">If</span> <span id="lstnumberx162.13" style="font-size:70%;">a</span> <span id="lstnumberx162.15" style="font-size:70%;">variant</span> <span id="lstnumberx162.16" style="font-size:70%;">'</span> <span id="lstnumberx162.17" style="font-size:70%;">s</span> <span id="lstnumberx162.19" style="font-size:70%;">approach</span> <span id="lstnumberx162.21" style="font-size:70%;">clearly</span> <span id="lstnumberx162.23" style="font-size:70%;">failed</span><span id="lstnumberx162.24" style="font-size:70%;">,</span><span id="lstnumberx162.26" style="font-size:70%;">do</span> <span id="lstnumberx162.28" style="font-size:70%;">not</span> <span id="lstnumberx162.30" style="font-size:70%;">retry</span> <span id="lstnumberx162.32" style="font-size:70%;">it</span> </span><span id="lstnumberx163"><span id="lstnumberx163.1" style="font-size:70%;">-</span> <span id="lstnumberx163.3" style="font-size:70%;">**</span> <span id="lstnumberx163.4" style="font-size:70%;">Cross</span> <span id="lstnumberx163.5" style="font-size:70%;">-</span> <span id="lstnumberx163.6" style="font-size:70%;">variant</span> <span id="lstnumberx163.8" style="font-size:70%;">debugger</span> <span id="lstnumberx163.10" style="font-size:70%;">analysis</span> <span id="lstnumberx163.11" style="font-size:70%;">**</span> <span id="lstnumberx163.13" style="font-size:70%;">groups</span> <span id="lstnumberx163.15" style="font-size:70%;">traces</span> <span id="lstnumberx163.17" style="font-size:70%;">by</span> <span id="lstnumberx163.19" style="font-size:70%;">variant</span> <span id="lstnumberx163.21" style="font-size:70%;">--</span> <span id="lstnumberx163.23" style="font-size:70%;">use</span> <span id="lstnumberx163.25" style="font-size:70%;">it</span> <span id="lstnumberx163.27" style="font-size:70%;">to</span> <span id="lstnumberx163.29" style="font-size:70%;">understand</span> <span id="lstnumberx163.31" style="font-size:70%;">WHY</span> <span id="lstnumberx163.33" style="font-size:70%;">one</span> <span id="lstnumberx163.35" style="font-size:70%;">approach</span> <span id="lstnumberx163.37" style="font-size:70%;">worked</span> <span id="lstnumberx163.39" style="font-size:70%;">better</span> <span id="lstnumberx163.41" style="font-size:70%;">than</span> <span id="lstnumberx163.43" style="font-size:70%;">the</span> <span id="lstnumberx163.45" style="font-size:70%;">other</span> <span id="lstnumberx163.47" style="font-size:70%;">for</span> <span id="lstnumberx163.49" style="font-size:70%;">specific</span> <span id="lstnumberx163.51" style="font-size:70%;">tasks</span> </span><span id="lstnumberx165"><span id="lstnumberx165.1" style="font-size:70%;">When</span> <span id="lstnumberx165.3" style="font-size:70%;">your</span> <span id="lstnumberx165.5" style="font-size:70%;">query</span> <span id="lstnumberx165.7" style="font-size:70%;">includes</span> <span id="lstnumberx165.9" style="font-size:70%;">a</span> <span id="lstnumberx165.11" style="font-size:70%;">"</span> <span id="lstnumberx165.12" style="font-size:70%;">MANDATORY</span> <span id="lstnumberx165.14" style="font-size:70%;">Strategy</span> <span id="lstnumberx165.16" style="font-size:70%;">Constraint</span> <span id="lstnumberx165.17" style="font-size:70%;">",</span><span id="lstnumberx165.19" style="font-size:70%;">you</span> <span id="lstnumberx165.21" style="font-size:70%;">MUST</span> <span id="lstnumberx165.25" style="font-size:70%;">it</span><span id="lstnumberx165.26" style="font-size:70%;">.</span><span id="lstnumberx165.28" style="font-size:70%;">You</span> <span id="lstnumberx165.30" style="font-size:70%;">are</span> <span id="lstnumberx165.32" style="font-size:70%;">one</span> <span id="lstnumberx165.34" style="font-size:70%;">of</span> <span id="lstnumberx165.36" style="font-size:70%;">several</span> <span id="lstnumberx165.38" style="font-size:70%;">parallel</span> <span id="lstnumberx165.40" style="font-size:70%;">agents</span><span id="lstnumberx165.41" style="font-size:70%;">,</span><span id="lstnumberx165.43" style="font-size:70%;">each</span> <span id="lstnumberx165.45" style="font-size:70%;">exploring</span> <span id="lstnumberx165.47" style="font-size:70%;">a</span> <span id="lstnumberx165.49" style="font-size:70%;">different</span> <span id="lstnumberx165.51" style="font-size:70%;">direction</span><span id="lstnumberx165.52" style="font-size:70%;">.</span><span id="lstnumberx165.54" style="font-size:70%;">Violating</span> <span id="lstnumberx165.56" style="font-size:70%;">the</span> <span id="lstnumberx165.58" style="font-size:70%;">constraint</span> <span id="lstnumberx165.60" style="font-size:70%;">wastes</span> <span id="lstnumberx165.62" style="font-size:70%;">the</span> <span id="lstnumberx165.64" style="font-size:70%;">exploration</span> <span id="lstnumberx165.66" style="font-size:70%;">budget</span><span id="lstnumberx165.67" style="font-size:70%;">.</span></span> <span id="lstnumberx168"><span id="lstnumberx168.1" style="font-size:70%;">#</span> <span id="lstnumberx168.3" style="font-size:70%;">Analysis</span> <span id="lstnumberx168.5" style="font-size:70%;">Approach</span> </span><span id="lstnumberx170"><span id="lstnumberx170.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx170.3" style="font-size:70%;">**[!]</span> <span id="lstnumberx170.5" style="font-size:70%;">MANDATORY</span><span id="lstnumberx170.6" style="font-size:70%;">:</span><span id="lstnumberx170.8" style="font-size:70%;">Read</span> <span id="lstnumberx170.10" style="font-size:70%;">`</span> <span id="lstnumberx170.11" style="font-size:70%;">analysis</span> <span id="lstnumberx170.12" style="font-size:70%;">/`</span> <span id="lstnumberx170.14" style="font-size:70%;">first</span><span id="lstnumberx170.15" style="font-size:70%;">.**</span> <span id="lstnumberx170.17" style="font-size:70%;">The</span> <span id="lstnumberx170.19" style="font-size:70%;">analysis</span> <span id="lstnumberx170.21" style="font-size:70%;">reports</span> <span id="lstnumberx170.23" style="font-size:70%;">are</span> <span id="lstnumberx170.25" style="font-size:70%;">pre</span> <span id="lstnumberx170.26" style="font-size:70%;">-</span> <span id="lstnumberx170.27" style="font-size:70%;">built</span> <span id="lstnumberx170.29" style="font-size:70%;">summaries</span> <span id="lstnumberx170.31" style="font-size:70%;">of</span> <span id="lstnumberx170.33" style="font-size:70%;">all</span> <span id="lstnumberx170.35" style="font-size:70%;">task</span> <span id="lstnumberx170.37" style="font-size:70%;">failures</span> <span id="lstnumberx170.39" style="font-size:70%;">with</span> <span id="lstnumberx170.41" style="font-size:70%;">root</span> <span id="lstnumberx170.43" style="font-size:70%;">causes</span> <span id="lstnumberx170.45" style="font-size:70%;">already</span> <span id="lstnumberx170.47" style="font-size:70%;">identified</span><span id="lstnumberx170.48" style="font-size:70%;">.</span><span id="lstnumberx170.50" style="font-size:70%;">They</span> <span id="lstnumberx170.52" style="font-size:70%;">save</span> <span id="lstnumberx170.54" style="font-size:70%;">you</span> <span id="lstnumberx170.56" style="font-size:70%;">significant</span> <span id="lstnumberx170.58" style="font-size:70%;">time</span> <span id="lstnumberx170.60" style="font-size:70%;">--</span> <span id="lstnumberx170.62" style="font-size:70%;">do</span> <span id="lstnumberx170.64" style="font-size:70%;">NOT</span> <span id="lstnumberx170.66" style="font-size:70%;">skip</span> <span id="lstnumberx170.68" style="font-size:70%;">them</span> <span id="lstnumberx170.70" style="font-size:70%;">to</span> <span id="lstnumberx170.72" style="font-size:70%;">read</span> <span id="lstnumberx170.74" style="font-size:70%;">raw</span> <span id="lstnumberx170.76" style="font-size:70%;">traces</span> <span id="lstnumberx170.78" style="font-size:70%;">directly</span><span id="lstnumberx170.79" style="font-size:70%;">.</span></span> <span id="lstnumberx172"><span id="lstnumberx172.1" style="font-size:70%;">1.</span><span id="lstnumberx172.3" style="font-size:70%;">Read</span> <span id="lstnumberx172.5" style="font-size:70%;">`</span> <span id="lstnumberx172.6" style="font-size:70%;">evolution_history</span><span id="lstnumberx172.7" style="font-size:70%;">.</span><span id="lstnumberx172.8" style="font-size:70%;">md</span> <span id="lstnumberx172.9" style="font-size:70%;">`</span> <span id="lstnumberx172.11" style="font-size:70%;">--</span> <span id="lstnumberx172.13" style="font-size:70%;">understand</span> <span id="lstnumberx172.15" style="font-size:70%;">what</span> <span id="lstnumberx172.16" style="font-size:70%;">'</span> <span id="lstnumberx172.17" style="font-size:70%;">s</span> <span id="lstnumberx172.19" style="font-size:70%;">been</span> <span id="lstnumberx172.21" style="font-size:70%;">tried</span><span id="lstnumberx172.22" style="font-size:70%;">,</span><span id="lstnumberx172.24" style="font-size:70%;">what</span> <span id="lstnumberx172.26" style="font-size:70%;">worked</span><span id="lstnumberx172.27" style="font-size:70%;">,</span><span id="lstnumberx172.29" style="font-size:70%;">what</span> <span id="lstnumberx172.31" style="font-size:70%;">failed</span> </span><span id="lstnumberx173"><span id="lstnumberx173.1" style="font-size:70%;">2.</span><span id="lstnumberx173.3" style="font-size:70%;">**</span> <span id="lstnumberx173.4" style="font-size:70%;">Read</span> <span id="lstnumberx173.6" style="font-size:70%;">`</span> <span id="lstnumberx173.7" style="font-size:70%;">runs</span> <span id="lstnumberx173.8" style="font-size:70%;">/</span> <span id="lstnumberx173.9" style="font-size:70%;">iteration_NNN</span> <span id="lstnumberx173.10" style="font-size:70%;">/</span> <span id="lstnumberx173.11" style="font-size:70%;">input</span> <span id="lstnumberx173.12" style="font-size:70%;">/</span> <span id="lstnumberx173.13" style="font-size:70%;">analysis</span> <span id="lstnumberx173.14" style="font-size:70%;">/</span> <span id="lstnumberx173.15" style="font-size:70%;">overview</span><span id="lstnumberx173.16" style="font-size:70%;">.</span><span id="lstnumberx173.17" style="font-size:70%;">md</span> <span id="lstnumberx173.18" style="font-size:70%;">`</span> <span id="lstnumberx173.20" style="font-size:70%;">FIRST</span> <span id="lstnumberx173.21" style="font-size:70%;">**</span> <span id="lstnumberx173.23" style="font-size:70%;">--</span> <span id="lstnumberx173.25" style="font-size:70%;">this</span> <span id="lstnumberx173.27" style="font-size:70%;">is</span> <span id="lstnumberx173.29" style="font-size:70%;">your</span> <span id="lstnumberx173.31" style="font-size:70%;">primary</span> <span id="lstnumberx173.33" style="font-size:70%;">information</span> <span id="lstnumberx173.35" style="font-size:70%;">source</span> </span><span id="lstnumberx174"><span id="lstnumberx174.1" style="font-size:70%;">3.</span><span id="lstnumberx174.3" style="font-size:70%;">**</span> <span id="lstnumberx174.4" style="font-size:70%;">Read</span> <span id="lstnumberx174.6" style="font-size:70%;">`</span> <span id="lstnumberx174.7" style="font-size:70%;">runs</span> <span id="lstnumberx174.8" style="font-size:70%;">/</span> <span id="lstnumberx174.9" style="font-size:70%;">iteration_NNN</span> <span id="lstnumberx174.10" style="font-size:70%;">/</span> <span id="lstnumberx174.11" style="font-size:70%;">input</span> <span id="lstnumberx174.12" style="font-size:70%;">/</span> <span id="lstnumberx174.13" style="font-size:70%;">analysis</span> <span id="lstnumberx174.14" style="font-size:70%;">/</span> <span id="lstnumberx174.15" style="font-size:70%;">detail</span> <span id="lstnumberx174.16" style="font-size:70%;">/{</span> <span id="lstnumberx174.17" style="font-size:70%;">task_name</span> <span id="lstnumberx174.18" style="font-size:70%;">}.</span><span id="lstnumberx174.19" style="font-size:70%;">md</span> <span id="lstnumberx174.20" style="font-size:70%;">`**</span> <span id="lstnumberx174.22" style="font-size:70%;">for</span> <span id="lstnumberx174.24" style="font-size:70%;">tasks</span> <span id="lstnumberx174.26" style="font-size:70%;">needing</span> <span id="lstnumberx174.28" style="font-size:70%;">deeper</span> <span id="lstnumberx174.30" style="font-size:70%;">investigation</span> </span><span id="lstnumberx175"><span id="lstnumberx175.1" style="font-size:70%;">4.</span><span id="lstnumberx175.3" style="font-size:70%;">Only</span> <span id="lstnumberx175.5" style="font-size:70%;">fall</span> <span id="lstnumberx175.7" style="font-size:70%;">back</span> <span id="lstnumberx175.9" style="font-size:70%;">to</span> <span id="lstnumberx175.11" style="font-size:70%;">reading</span> <span id="lstnumberx175.13" style="font-size:70%;">raw</span> <span id="lstnumberx175.15" style="font-size:70%;">`</span> <span id="lstnumberx175.16" style="font-size:70%;">nexau_in_memory_tracer</span><span id="lstnumberx175.17" style="font-size:70%;">.</span><span id="lstnumberx175.18" style="font-size:70%;">cleaned</span><span id="lstnumberx175.19" style="font-size:70%;">.</span><span id="lstnumberx175.20" style="font-size:70%;">json</span> <span id="lstnumberx175.21" style="font-size:70%;">`</span> <span id="lstnumberx175.23" style="font-size:70%;">when</span> <span id="lstnumberx175.25" style="font-size:70%;">analysis</span> <span id="lstnumberx175.27" style="font-size:70%;">is</span> <span id="lstnumberx175.29" style="font-size:70%;">missing</span> <span id="lstnumberx175.31" style="font-size:70%;">or</span> <span id="lstnumberx175.33" style="font-size:70%;">insufficient</span> <span id="lstnumberx175.35" style="font-size:70%;">--</span> <span id="lstnumberx175.37" style="font-size:70%;">this</span> <span id="lstnumberx175.39" style="font-size:70%;">should</span> <span id="lstnumberx175.41" style="font-size:70%;">be</span> <span id="lstnumberx175.43" style="font-size:70%;">rare</span> </span><span id="lstnumberx176"><span id="lstnumberx176.1" style="font-size:70%;">5.</span><span id="lstnumberx176.3" style="font-size:70%;">**</span> <span id="lstnumberx176.4" style="font-size:70%;">After</span> <span id="lstnumberx176.6" style="font-size:70%;">creating</span> <span id="lstnumberx176.8" style="font-size:70%;">or</span> <span id="lstnumberx176.10" style="font-size:70%;">modifying</span> <span id="lstnumberx176.12" style="font-size:70%;">middleware</span> <span id="lstnumberx176.13" style="font-size:70%;">**,</span><span id="lstnumberx176.15" style="font-size:70%;">read</span> <span id="lstnumberx176.17" style="font-size:70%;">at</span> <span id="lstnumberx176.19" style="font-size:70%;">least</span> <span id="lstnumberx176.21" style="font-size:70%;">one</span> <span id="lstnumberx176.23" style="font-size:70%;">`</span> <span id="lstnumberx176.24" style="font-size:70%;">agent</span> <span id="lstnumberx176.25" style="font-size:70%;">/</span> <span id="lstnumberx176.26" style="font-size:70%;">nexau</span><span id="lstnumberx176.27" style="font-size:70%;">.</span><span id="lstnumberx176.28" style="font-size:70%;">txt</span> <span id="lstnumberx176.29" style="font-size:70%;">`</span> <span id="lstnumberx176.31" style="font-size:70%;">from</span> <span id="lstnumberx176.33" style="font-size:70%;">a</span> <span id="lstnumberx176.35" style="font-size:70%;">failed</span> <span id="lstnumberx176.37" style="font-size:70%;">task</span> <span id="lstnumberx176.39" style="font-size:70%;">--</span> <span id="lstnumberx176.41" style="font-size:70%;">it</span> <span id="lstnumberx176.43" style="font-size:70%;">contains</span> <span id="lstnumberx176.45" style="font-size:70%;">runtime</span> <span id="lstnumberx176.47" style="font-size:70%;">logs</span> <span id="lstnumberx176.49" style="font-size:70%;">(</span><span id="lstnumberx176.50" style="font-size:70%;">middleware</span> <span id="lstnumberx176.52" style="font-size:70%;">init</span> <span id="lstnumberx176.54" style="font-size:70%;">errors</span><span id="lstnumberx176.55" style="font-size:70%;">,</span><span id="lstnumberx176.57" style="font-size:70%;">warnings</span><span id="lstnumberx176.58" style="font-size:70%;">,</span><span id="lstnumberx176.60" style="font-size:70%;">crashes</span><span id="lstnumberx176.61" style="font-size:70%;">)</span> <span id="lstnumberx176.63" style="font-size:70%;">that</span> <span id="lstnumberx176.65" style="font-size:70%;">static</span> <span id="lstnumberx176.67" style="font-size:70%;">validation</span> <span id="lstnumberx176.69" style="font-size:70%;">cannot</span> <span id="lstnumberx176.71" style="font-size:70%;">catch</span> </span><span id="lstnumberx177"><span id="lstnumberx177.1" style="font-size:70%;">6.</span><span id="lstnumberx177.3" style="font-size:70%;">Group</span> <span id="lstnumberx177.5" style="font-size:70%;">failures</span> <span id="lstnumberx177.7" style="font-size:70%;">into</span> <span id="lstnumberx177.9" style="font-size:70%;">**</span> <span id="lstnumberx177.10" style="font-size:70%;">pattern</span> <span id="lstnumberx177.12" style="font-size:70%;">classes</span> <span id="lstnumberx177.13" style="font-size:70%;">**</span> <span id="lstnumberx177.15" style="font-size:70%;">--</span> <span id="lstnumberx177.17" style="font-size:70%;">each</span> <span id="lstnumberx177.19" style="font-size:70%;">pattern</span> <span id="lstnumberx177.21" style="font-size:70%;">=</span> <span id="lstnumberx177.23" style="font-size:70%;">a</span> <span id="lstnumberx177.25" style="font-size:70%;">class</span> <span id="lstnumberx177.27" style="font-size:70%;">of</span> <span id="lstnumberx177.29" style="font-size:70%;">failures</span><span id="lstnumberx177.30" style="font-size:70%;">,</span><span id="lstnumberx177.32" style="font-size:70%;">not</span> <span id="lstnumberx177.34" style="font-size:70%;">individual</span> <span id="lstnumberx177.36" style="font-size:70%;">tasks</span> </span><span id="lstnumberx178"><span id="lstnumberx178.1" style="font-size:70%;">7.</span><span id="lstnumberx178.3" style="font-size:70%;">For</span> <span id="lstnumberx178.5" style="font-size:70%;">each</span> <span id="lstnumberx178.7" style="font-size:70%;">pattern</span><span id="lstnumberx178.8" style="font-size:70%;">,</span><span id="lstnumberx178.10" style="font-size:70%;">identify</span> <span id="lstnumberx178.12" style="font-size:70%;">the</span> <span id="lstnumberx178.14" style="font-size:70%;">**</span> <span id="lstnumberx178.15" style="font-size:70%;">root</span> <span id="lstnumberx178.17" style="font-size:70%;">cause</span> <span id="lstnumberx178.18" style="font-size:70%;">**</span> <span id="lstnumberx178.20" style="font-size:70%;">and</span> <span id="lstnumberx178.22" style="font-size:70%;">choose</span> <span id="lstnumberx178.24" style="font-size:70%;">the</span> <span id="lstnumberx178.26" style="font-size:70%;">most</span> <span id="lstnumberx178.28" style="font-size:70%;">appropriate</span> <span id="lstnumberx178.30" style="font-size:70%;">fix</span> <span id="lstnumberx178.32" style="font-size:70%;">--</span> <span id="lstnumberx178.34" style="font-size:70%;">could</span> <span id="lstnumberx178.36" style="font-size:70%;">be</span> <span id="lstnumberx178.38" style="font-size:70%;">prompt</span><span id="lstnumberx178.39" style="font-size:70%;">,</span><span id="lstnumberx178.41" style="font-size:70%;">tool</span><span id="lstnumberx178.42" style="font-size:70%;">,</span><span id="lstnumberx178.44" style="font-size:70%;">middleware</span><span id="lstnumberx178.45" style="font-size:70%;">,</span><span id="lstnumberx178.47" style="font-size:70%;">or</span> <span id="lstnumberx178.49" style="font-size:70%;">any</span> <span id="lstnumberx178.51" style="font-size:70%;">component</span> </span><span id="lstnumberx179"><span id="lstnumberx179.1" style="font-size:70%;">8.</span><span id="lstnumberx179.3" style="font-size:70%;">**</span> <span id="lstnumberx179.4" style="font-size:70%;">Architecture</span> <span id="lstnumberx179.6" style="font-size:70%;">check</span> <span id="lstnumberx179.7" style="font-size:70%;">**</span> <span id="lstnumberx179.9" style="font-size:70%;">--</span> <span id="lstnumberx179.11" style="font-size:70%;">for</span> <span id="lstnumberx179.13" style="font-size:70%;">each</span> <span id="lstnumberx179.15" style="font-size:70%;">failure</span> <span id="lstnumberx179.17" style="font-size:70%;">pattern</span><span id="lstnumberx179.18" style="font-size:70%;">,</span><span id="lstnumberx179.20" style="font-size:70%;">consider</span> <span id="lstnumberx179.22" style="font-size:70%;">whether</span> <span id="lstnumberx179.24" style="font-size:70%;">the</span> <span id="lstnumberx179.26" style="font-size:70%;">fix</span> <span id="lstnumberx179.28" style="font-size:70%;">belongs</span> <span id="lstnumberx179.30" style="font-size:70%;">at</span> <span id="lstnumberx179.32" style="font-size:70%;">a</span> <span id="lstnumberx179.34" style="font-size:70%;">different</span> <span id="lstnumberx179.36" style="font-size:70%;">component</span> <span id="lstnumberx179.38" style="font-size:70%;">level</span><span id="lstnumberx179.39" style="font-size:70%;">.</span><span id="lstnumberx179.41" style="font-size:70%;">If</span> <span id="lstnumberx179.43" style="font-size:70%;">previous</span> <span id="lstnumberx179.45" style="font-size:70%;">iterations</span> <span id="lstnumberx179.47" style="font-size:70%;">already</span> <span id="lstnumberx179.49" style="font-size:70%;">tried</span> <span id="lstnumberx179.51" style="font-size:70%;">fixing</span> <span id="lstnumberx179.53" style="font-size:70%;">at</span> <span id="lstnumberx179.55" style="font-size:70%;">one</span> <span id="lstnumberx179.57" style="font-size:70%;">level</span> <span id="lstnumberx179.59" style="font-size:70%;">without</span> <span id="lstnumberx179.61" style="font-size:70%;">success</span><span id="lstnumberx179.62" style="font-size:70%;">,</span><span id="lstnumberx179.64" style="font-size:70%;">try</span> <span id="lstnumberx179.66" style="font-size:70%;">a</span> <span id="lstnumberx179.68" style="font-size:70%;">different</span> <span id="lstnumberx179.70" style="font-size:70%;">one</span><span id="lstnumberx179.71" style="font-size:70%;">.</span></span> <span id="lstnumberx180"><span id="lstnumberx180.1" style="font-size:70%;">9.</span><span id="lstnumberx180.3" style="font-size:70%;">For</span> <span id="lstnumberx180.5" style="font-size:70%;">iteration</span> <span id="lstnumberx180.7" style="font-size:70%;">2+,</span><span id="lstnumberx180.9" style="font-size:70%;">evaluate</span> <span id="lstnumberx180.11" style="font-size:70%;">previous</span> <span id="lstnumberx180.13" style="font-size:70%;">changes</span> <span id="lstnumberx180.15" style="font-size:70%;">using</span> <span id="lstnumberx180.17" style="font-size:70%;">the</span> <span id="lstnumberx180.19" style="font-size:70%;">Change</span> <span id="lstnumberx180.21" style="font-size:70%;">Attribution</span> <span id="lstnumberx180.23" style="font-size:70%;">Report</span><span id="lstnumberx180.24" style="font-size:70%;">:</span></span> <span id="lstnumberx181"><span id="lstnumberx181.2" style="font-size:70%;">-</span> <span id="lstnumberx181.4" style="font-size:70%;">**</span> <span id="lstnumberx181.5" style="font-size:70%;">KEEP</span> <span id="lstnumberx181.6" style="font-size:70%;">**</span> <span id="lstnumberx181.8" style="font-size:70%;">--</span> <span id="lstnumberx181.10" style="font-size:70%;">working</span><span id="lstnumberx181.11" style="font-size:70%;">,</span><span id="lstnumberx181.13" style="font-size:70%;">leave</span> <span id="lstnumberx181.15" style="font-size:70%;">as</span> <span id="lstnumberx181.16" style="font-size:70%;">-</span> <span id="lstnumberx181.17" style="font-size:70%;">is</span> </span><span id="lstnumberx182"><span id="lstnumberx182.2" style="font-size:70%;">-</span> <span id="lstnumberx182.4" style="font-size:70%;">**</span> <span id="lstnumberx182.5" style="font-size:70%;">IMPROVE</span> <span id="lstnumberx182.6" style="font-size:70%;">**</span> <span id="lstnumberx182.8" style="font-size:70%;">--</span> <span id="lstnumberx182.10" style="font-size:70%;">directionally</span> <span id="lstnumberx182.12" style="font-size:70%;">correct</span><span id="lstnumberx182.13" style="font-size:70%;">,</span><span id="lstnumberx182.15" style="font-size:70%;">refine</span> </span><span id="lstnumberx183"><span id="lstnumberx183.2" style="font-size:70%;">-</span> <span id="lstnumberx183.4" style="font-size:70%;">**</span> <span id="lstnumberx183.5" style="font-size:70%;">ROLLBACK</span> <span id="lstnumberx183.7" style="font-size:70%;">+</span> <span id="lstnumberx183.9" style="font-size:70%;">PIVOT</span> <span id="lstnumberx183.10" style="font-size:70%;">**</span> <span id="lstnumberx183.12" style="font-size:70%;">--</span> <span id="lstnumberx183.14" style="font-size:70%;">not</span> <span id="lstnumberx183.16" style="font-size:70%;">working</span> <span id="lstnumberx183.18" style="font-size:70%;">at</span> <span id="lstnumberx183.20" style="font-size:70%;">this</span> <span id="lstnumberx183.22" style="font-size:70%;">component</span> <span id="lstnumberx183.24" style="font-size:70%;">level</span><span id="lstnumberx183.25" style="font-size:70%;">.</span><span id="lstnumberx183.27" style="font-size:70%;">Rollback</span> <span id="lstnumberx183.29" style="font-size:70%;">the</span> <span id="lstnumberx183.31" style="font-size:70%;">change</span><span id="lstnumberx183.32" style="font-size:70%;">,</span><span id="lstnumberx183.34" style="font-size:70%;">then</span> <span id="lstnumberx183.36" style="font-size:70%;">re</span> <span id="lstnumberx183.37" style="font-size:70%;">-</span> <span id="lstnumberx183.38" style="font-size:70%;">approach</span> <span id="lstnumberx183.40" style="font-size:70%;">the</span> <span id="lstnumberx183.42" style="font-size:70%;">same</span> <span id="lstnumberx183.44" style="font-size:70%;">failure</span> <span id="lstnumberx183.46" style="font-size:70%;">pattern</span> <span id="lstnumberx183.48" style="font-size:70%;">from</span> <span id="lstnumberx183.50" style="font-size:70%;">a</span> <span id="lstnumberx183.52" style="font-size:70%;">**</span> <span id="lstnumberx183.53" style="font-size:70%;">different</span> <span id="lstnumberx183.55" style="font-size:70%;">component</span> <span id="lstnumberx183.57" style="font-size:70%;">level</span> <span id="lstnumberx183.58" style="font-size:70%;">**</span> </span><span id="lstnumberx185"><span id="lstnumberx185.1" style="font-size:70%;">**</span> <span id="lstnumberx185.2" style="font-size:70%;">The</span> <span id="lstnumberx185.4" style="font-size:70%;">sole</span> <span id="lstnumberx185.6" style="font-size:70%;">optimization</span> <span id="lstnumberx185.8" style="font-size:70%;">target</span> <span id="lstnumberx185.10" style="font-size:70%;">is</span> <span id="lstnumberx185.12" style="font-size:70%;">pass@1</span> <span id="lstnumberx185.13" style="font-size:70%;">**</span> <span id="lstnumberx185.15" style="font-size:70%;">--</span> <span id="lstnumberx185.17" style="font-size:70%;">the</span> <span id="lstnumberx185.19" style="font-size:70%;">probability</span> <span id="lstnumberx185.21" style="font-size:70%;">that</span> <span id="lstnumberx185.23" style="font-size:70%;">a</span> <span id="lstnumberx185.25" style="font-size:70%;">single</span> <span id="lstnumberx185.27" style="font-size:70%;">attempt</span> <span id="lstnumberx185.29" style="font-size:70%;">succeeds</span><span id="lstnumberx185.30" style="font-size:70%;">.</span><span id="lstnumberx185.32" style="font-size:70%;">Every</span> <span id="lstnumberx185.34" style="font-size:70%;">change</span> <span id="lstnumberx185.36" style="font-size:70%;">you</span> <span id="lstnumberx185.38" style="font-size:70%;">make</span> <span id="lstnumberx185.40" style="font-size:70%;">should</span> <span id="lstnumberx185.42" style="font-size:70%;">raise</span> <span id="lstnumberx185.44" style="font-size:70%;">pass@1</span><span id="lstnumberx185.45" style="font-size:70%;">.</span><span id="lstnumberx185.47" style="font-size:70%;">Timed</span> <span id="lstnumberx185.48" style="font-size:70%;">-</span> <span id="lstnumberx185.49" style="font-size:70%;">out</span> <span id="lstnumberx185.51" style="font-size:70%;">tasks</span> <span id="lstnumberx185.53" style="font-size:70%;">count</span> <span id="lstnumberx185.55" style="font-size:70%;">as</span> <span id="lstnumberx185.57" style="font-size:70%;">failures</span> <span id="lstnumberx185.59" style="font-size:70%;">--</span> <span id="lstnumberx185.61" style="font-size:70%;">analyze</span> <span id="lstnumberx185.63" style="font-size:70%;">why</span> <span id="lstnumberx185.65" style="font-size:70%;">the</span> <span id="lstnumberx185.67" style="font-size:70%;">agent</span> <span id="lstnumberx185.69" style="font-size:70%;">ran</span> <span id="lstnumberx185.71" style="font-size:70%;">out</span> <span id="lstnumberx185.73" style="font-size:70%;">of</span> <span id="lstnumberx185.75" style="font-size:70%;">time</span><span id="lstnumberx185.76" style="font-size:70%;">.</span><span id="lstnumberx185.78" style="font-size:70%;">Only</span> <span id="lstnumberx185.80" style="font-size:70%;">pure</span> <span id="lstnumberx185.82" style="font-size:70%;">infrastructure</span> <span id="lstnumberx185.84" style="font-size:70%;">exceptions</span> <span id="lstnumberx185.86" style="font-size:70%;">(</span><span id="lstnumberx185.87" style="font-size:70%;">sandbox</span> <span id="lstnumberx185.89" style="font-size:70%;">crash</span><span id="lstnumberx185.90" style="font-size:70%;">,</span><span id="lstnumberx185.92" style="font-size:70%;">etc</span><span id="lstnumberx185.93" style="font-size:70%;">.)</span> <span id="lstnumberx185.95" style="font-size:70%;">can</span> <span id="lstnumberx185.97" style="font-size:70%;">be</span> <span id="lstnumberx185.99" style="font-size:70%;">ignored</span><span id="lstnumberx185.100" style="font-size:70%;">.</span></span> <span id="lstnumberx187"><span id="lstnumberx187.1" style="font-size:70%;">When</span> <span id="lstnumberx187.3" style="font-size:70%;">the</span> <span id="lstnumberx187.5" style="font-size:70%;">experiment</span> <span id="lstnumberx187.7" style="font-size:70%;">runs</span> <span id="lstnumberx187.9" style="font-size:70%;">k</span> <span id="lstnumberx187.10" style="font-size:70%;">&gt;1</span> <span id="lstnumberx187.12" style="font-size:70%;">rollouts</span> <span id="lstnumberx187.14" style="font-size:70%;">(</span><span id="lstnumberx187.15" style="font-size:70%;">indicated</span> <span id="lstnumberx187.17" style="font-size:70%;">in</span> <span id="lstnumberx187.19" style="font-size:70%;">the</span> <span id="lstnumberx187.21" style="font-size:70%;">query</span><span id="lstnumberx187.22" style="font-size:70%;">),</span><span id="lstnumberx187.24" style="font-size:70%;">use</span> <span id="lstnumberx187.26" style="font-size:70%;">the</span> <span id="lstnumberx187.28" style="font-size:70%;">extra</span> <span id="lstnumberx187.30" style="font-size:70%;">signal</span> <span id="lstnumberx187.32" style="font-size:70%;">to</span> <span id="lstnumberx187.34" style="font-size:70%;">diagnose</span><span id="lstnumberx187.35" style="font-size:70%;">:</span></span> <span id="lstnumberx188"><span id="lstnumberx188.1" style="font-size:70%;">-</span> <span id="lstnumberx188.3" style="font-size:70%;">**</span> <span id="lstnumberx188.4" style="font-size:70%;">Partial</span> <span id="lstnumberx188.5" style="font-size:70%;">-</span> <span id="lstnumberx188.6" style="font-size:70%;">pass</span> <span id="lstnumberx188.8" style="font-size:70%;">tasks</span> <span id="lstnumberx188.9" style="font-size:70%;">**</span> <span id="lstnumberx188.11" style="font-size:70%;">(</span><span id="lstnumberx188.12" style="font-size:70%;">some</span> <span id="lstnumberx188.14" style="font-size:70%;">rollouts</span> <span id="lstnumberx188.16" style="font-size:70%;">pass</span><span id="lstnumberx188.17" style="font-size:70%;">,</span><span id="lstnumberx188.19" style="font-size:70%;">some</span> <span id="lstnumberx188.21" style="font-size:70%;">fail</span><span id="lstnumberx188.22" style="font-size:70%;">)</span> <span id="lstnumberx188.24" style="font-size:70%;">are</span> <span id="lstnumberx188.26" style="font-size:70%;">the</span> <span id="lstnumberx188.28" style="font-size:70%;">most</span> <span id="lstnumberx188.30" style="font-size:70%;">valuable</span><span id="lstnumberx188.31" style="font-size:70%;">.</span><span id="lstnumberx188.33" style="font-size:70%;">Compare</span> <span id="lstnumberx188.35" style="font-size:70%;">the</span> <span id="lstnumberx188.37" style="font-size:70%;">passing</span> <span id="lstnumberx188.39" style="font-size:70%;">and</span> <span id="lstnumberx188.41" style="font-size:70%;">failing</span> <span id="lstnumberx188.43" style="font-size:70%;">rollouts</span> <span id="lstnumberx188.45" style="font-size:70%;">of</span> <span id="lstnumberx188.47" style="font-size:70%;">the</span> <span id="lstnumberx188.49" style="font-size:70%;">*</span> <span id="lstnumberx188.50" style="font-size:70%;">same</span> <span id="lstnumberx188.52" style="font-size:70%;">task</span> <span id="lstnumberx188.53" style="font-size:70%;">*,</span><span id="lstnumberx188.55" style="font-size:70%;">find</span> <span id="lstnumberx188.57" style="font-size:70%;">the</span> <span id="lstnumberx188.59" style="font-size:70%;">divergence</span> <span id="lstnumberx188.61" style="font-size:70%;">point</span><span id="lstnumberx188.62" style="font-size:70%;">,</span><span id="lstnumberx188.64" style="font-size:70%;">and</span> <span id="lstnumberx188.66" style="font-size:70%;">make</span> <span id="lstnumberx188.68" style="font-size:70%;">the</span> <span id="lstnumberx188.70" style="font-size:70%;">successful</span> <span id="lstnumberx188.72" style="font-size:70%;">strategy</span> <span id="lstnumberx188.74" style="font-size:70%;">the</span> <span id="lstnumberx188.76" style="font-size:70%;">*</span> <span id="lstnumberx188.77" style="font-size:70%;">reliable</span> <span id="lstnumberx188.79" style="font-size:70%;">default</span> <span id="lstnumberx188.80" style="font-size:70%;">*.</span></span> <span id="lstnumberx189"><span id="lstnumberx189.1" style="font-size:70%;">-</span> <span id="lstnumberx189.3" style="font-size:70%;">**</span> <span id="lstnumberx189.4" style="font-size:70%;">pass@k</span> <span id="lstnumberx189.5" style="font-size:70%;">**</span> <span id="lstnumberx189.7" style="font-size:70%;">gauges</span> <span id="lstnumberx189.9" style="font-size:70%;">capability</span> <span id="lstnumberx189.11" style="font-size:70%;">ceiling</span> <span id="lstnumberx189.13" style="font-size:70%;">but</span> <span id="lstnumberx189.15" style="font-size:70%;">is</span> <span id="lstnumberx189.17" style="font-size:70%;">NOT</span> <span id="lstnumberx189.19" style="font-size:70%;">the</span> <span id="lstnumberx189.21" style="font-size:70%;">target</span><span id="lstnumberx189.22" style="font-size:70%;">.</span><span id="lstnumberx189.24" style="font-size:70%;">Your</span> <span id="lstnumberx189.26" style="font-size:70%;">goal</span> <span id="lstnumberx189.28" style="font-size:70%;">is</span> <span id="lstnumberx189.30" style="font-size:70%;">to</span> <span id="lstnumberx189.32" style="font-size:70%;">turn</span> <span id="lstnumberx189.34" style="font-size:70%;">pass@k</span> <span id="lstnumberx189.36" style="font-size:70%;">successes</span> <span id="lstnumberx189.38" style="font-size:70%;">into</span> <span id="lstnumberx189.40" style="font-size:70%;">pass@1</span> <span id="lstnumberx189.42" style="font-size:70%;">successes</span> <span id="lstnumberx189.44" style="font-size:70%;">by</span> <span id="lstnumberx189.46" style="font-size:70%;">making</span> <span id="lstnumberx189.48" style="font-size:70%;">the</span> <span id="lstnumberx189.50" style="font-size:70%;">winning</span> <span id="lstnumberx189.52" style="font-size:70%;">strategy</span> <span id="lstnumberx189.54" style="font-size:70%;">consistent</span><span id="lstnumberx189.55" style="font-size:70%;">.</span></span> <span id="lstnumberx191"><span id="lstnumberx191.1" style="font-size:70%;">**</span> <span id="lstnumberx191.2" style="font-size:70%;">For</span> <span id="lstnumberx191.4" style="font-size:70%;">iteration</span> <span id="lstnumberx191.6" style="font-size:70%;">2+:**</span> <span id="lstnumberx191.8" style="font-size:70%;">Compare</span> <span id="lstnumberx191.10" style="font-size:70%;">task</span> <span id="lstnumberx191.12" style="font-size:70%;">results</span> <span id="lstnumberx191.14" style="font-size:70%;">across</span> <span id="lstnumberx191.16" style="font-size:70%;">iterations</span><span id="lstnumberx191.17" style="font-size:70%;">.</span><span id="lstnumberx191.19" style="font-size:70%;">Check</span> <span id="lstnumberx191.21" style="font-size:70%;">which</span> <span id="lstnumberx191.23" style="font-size:70%;">tasks</span> <span id="lstnumberx191.25" style="font-size:70%;">flipped</span> <span id="lstnumberx191.27" style="font-size:70%;">(</span><span id="lstnumberx191.28" style="font-size:70%;">fail</span> <span id="lstnumberx191.29" style="font-size:70%;">-&gt;</span> <span id="lstnumberx191.30" style="font-size:70%;">pass</span><span id="lstnumberx191.31" style="font-size:70%;">)</span> <span id="lstnumberx191.33" style="font-size:70%;">and</span> <span id="lstnumberx191.35" style="font-size:70%;">which</span> <span id="lstnumberx191.37" style="font-size:70%;">regressed</span> <span id="lstnumberx191.39" style="font-size:70%;">(</span><span id="lstnumberx191.40" style="font-size:70%;">pass</span> <span id="lstnumberx191.41" style="font-size:70%;">-&gt;</span> <span id="lstnumberx191.42" style="font-size:70%;">fail</span><span id="lstnumberx191.43" style="font-size:70%;">).</span><span id="lstnumberx191.45" style="font-size:70%;">If</span> <span id="lstnumberx191.47" style="font-size:70%;">regression</span> <span id="lstnumberx191.49" style="font-size:70%;">&gt;</span> <span id="lstnumberx191.51" style="font-size:70%;">flips</span><span id="lstnumberx191.52" style="font-size:70%;">,</span><span id="lstnumberx191.54" style="font-size:70%;">diagnose</span> <span id="lstnumberx191.56" style="font-size:70%;">what</span> <span id="lstnumberx191.58" style="font-size:70%;">went</span> <span id="lstnumberx191.60" style="font-size:70%;">wrong</span> <span id="lstnumberx191.62" style="font-size:70%;">before</span> <span id="lstnumberx191.64" style="font-size:70%;">adding</span> <span id="lstnumberx191.66" style="font-size:70%;">new</span> <span id="lstnumberx191.68" style="font-size:70%;">changes</span><span id="lstnumberx191.69" style="font-size:70%;">.</span></span> <span id="lstnumberx194"><span id="lstnumberx194.1" style="font-size:70%;">#</span> <span id="lstnumberx194.3" style="font-size:70%;">Deliverables</span> </span><span id="lstnumberx196"><span id="lstnumberx196.1" style="font-size:70%;">##</span> <span id="lstnumberx196.3" style="font-size:70%;">Git</span> <span id="lstnumberx196.5" style="font-size:70%;">Commits</span> </span><span id="lstnumberx198"><span id="lstnumberx198.1" style="font-size:70%;">Each</span> <span id="lstnumberx198.3" style="font-size:70%;">logical</span> <span id="lstnumberx198.5" style="font-size:70%;">change</span> <span id="lstnumberx198.7" style="font-size:70%;">=</span> <span id="lstnumberx198.9" style="font-size:70%;">one</span> <span id="lstnumberx198.11" style="font-size:70%;">separate</span> <span id="lstnumberx198.13" style="font-size:70%;">commit</span><span id="lstnumberx198.14" style="font-size:70%;">:</span></span> <span id="lstnumberx199"><span id="lstnumberx199.1" style="font-size:70%;">```</span> </span><span id="lstnumberx200"><span id="lstnumberx200.1" style="font-size:70%;">cd</span> <span id="lstnumberx200.3" style="font-size:70%;">{{</span> <span id="lstnumberx200.5" style="font-size:70%;">ws</span> <span id="lstnumberx200.7" style="font-size:70%;">}}</span> <span id="lstnumberx200.9" style="font-size:70%;">&amp;&amp;</span> <span id="lstnumberx200.11" style="font-size:70%;">git</span> <span id="lstnumberx200.13" style="font-size:70%;">add</span> <span id="lstnumberx200.15" style="font-size:70%;">-</span> <span id="lstnumberx200.16" style="font-size:70%;">A</span> <span id="lstnumberx200.18" style="font-size:70%;">&amp;&amp;</span> <span id="lstnumberx200.20" style="font-size:70%;">git</span> <span id="lstnumberx200.22" style="font-size:70%;">commit</span> <span id="lstnumberx200.24" style="font-size:70%;">-</span> <span id="lstnumberx200.25" style="font-size:70%;">m</span> <span id="lstnumberx200.27" style="font-size:70%;">"</span> <span id="lstnumberx200.28" style="font-size:70%;">chg</span> <span id="lstnumberx200.29" style="font-size:70%;">-</span> <span id="lstnumberx200.30" style="font-size:70%;">N</span><span id="lstnumberx200.31" style="font-size:70%;">:</span><span id="lstnumberx200.33" style="font-size:70%;">&lt;</span> <span id="lstnumberx200.34" style="font-size:70%;">short</span> <span id="lstnumberx200.36" style="font-size:70%;">description</span> <span id="lstnumberx200.37" style="font-size:70%;">&gt;"</span> </span><span id="lstnumberx201"><span id="lstnumberx201.1" style="font-size:70%;">```</span> </span><span id="lstnumberx203"><span id="lstnumberx203.1" style="font-size:70%;">##</span> <span id="lstnumberx203.3" style="font-size:70%;">change_manifest</span><span id="lstnumberx203.4" style="font-size:70%;">.</span><span id="lstnumberx203.5" style="font-size:70%;">json</span> </span><span id="lstnumberx205"><span id="lstnumberx205.1" style="font-size:70%;">Write</span> <span id="lstnumberx205.3" style="font-size:70%;">to</span> <span id="lstnumberx205.5" style="font-size:70%;">experiment</span> <span id="lstnumberx205.7" style="font-size:70%;">root</span> <span id="lstnumberx205.9" style="font-size:70%;">directory</span> <span id="lstnumberx205.11" style="font-size:70%;">(</span><span id="lstnumberx205.12" style="font-size:70%;">NOT</span> <span id="lstnumberx205.14" style="font-size:70%;">inside</span> <span id="lstnumberx205.16" style="font-size:70%;">workspace</span> <span id="lstnumberx205.17" style="font-size:70%;">/).</span></span> <span id="lstnumberx207"><span id="lstnumberx207.1" style="font-size:70%;">The</span> <span id="lstnumberx207.3" style="font-size:70%;">`</span> <span id="lstnumberx207.4" style="font-size:70%;">iteration</span> <span id="lstnumberx207.5" style="font-size:70%;">`</span> <span id="lstnumberx207.7" style="font-size:70%;">field</span> <span id="lstnumberx207.9" style="font-size:70%;">below</span> <span id="lstnumberx207.11" style="font-size:70%;">MUST</span> <span id="lstnumberx207.13" style="font-size:70%;">be</span> <span id="lstnumberx207.15" style="font-size:70%;">`{{</span> <span id="lstnumberx207.17" style="font-size:70%;">iteration</span> <span id="lstnumberx207.19" style="font-size:70%;">}}`</span> <span id="lstnumberx207.21" style="font-size:70%;">(</span><span id="lstnumberx207.22" style="font-size:70%;">the</span> <span id="lstnumberx207.24" style="font-size:70%;">current</span> <span id="lstnumberx207.26" style="font-size:70%;">loop</span> <span id="lstnumberx207.28" style="font-size:70%;">--</span> <span id="lstnumberx207.30" style="font-size:70%;">the</span> <span id="lstnumberx207.32" style="font-size:70%;">one</span> <span id="lstnumberx207.34" style="font-size:70%;">PRODUCING</span> <span id="lstnumberx207.36" style="font-size:70%;">these</span> <span id="lstnumberx207.38" style="font-size:70%;">changes</span><span id="lstnumberx207.39" style="font-size:70%;">).</span><span id="lstnumberx207.41" style="font-size:70%;">Do</span> <span id="lstnumberx207.43" style="font-size:70%;">not</span> <span id="lstnumberx207.45" style="font-size:70%;">set</span> <span id="lstnumberx207.47" style="font-size:70%;">it</span> <span id="lstnumberx207.49" style="font-size:70%;">to</span> <span id="lstnumberx207.51" style="font-size:70%;">the</span> <span id="lstnumberx207.53" style="font-size:70%;">next</span> <span id="lstnumberx207.55" style="font-size:70%;">loop</span> <span id="lstnumberx207.57" style="font-size:70%;">number</span> <span id="lstnumberx207.59" style="font-size:70%;">just</span> <span id="lstnumberx207.61" style="font-size:70%;">because</span> <span id="lstnumberx207.63" style="font-size:70%;">the</span> <span id="lstnumberx207.65" style="font-size:70%;">query</span> <span id="lstnumberx207.67" style="font-size:70%;">phrases</span> <span id="lstnumberx207.69" style="font-size:70%;">prior</span> <span id="lstnumberx207.71" style="font-size:70%;">eval</span> <span id="lstnumberx207.73" style="font-size:70%;">as</span> <span id="lstnumberx207.75" style="font-size:70%;">"</span> <span id="lstnumberx207.76" style="font-size:70%;">completed</span> <span id="lstnumberx207.77" style="font-size:70%;">".</span></span> <span id="lstnumberx209"><span id="lstnumberx209.1" style="font-size:70%;">```</span> <span id="lstnumberx209.2" style="font-size:70%;">json</span> </span><span id="lstnumberx210"><span id="lstnumberx210.1" style="font-size:70%;">{</span> </span><span id="lstnumberx211"><span id="lstnumberx211.2" style="font-size:70%;">"</span> <span id="lstnumberx211.3" style="font-size:70%;">iteration</span> <span id="lstnumberx211.4" style="font-size:70%;">":</span><span id="lstnumberx211.6" style="font-size:70%;">{{</span> <span id="lstnumberx211.8" style="font-size:70%;">iteration</span> <span id="lstnumberx211.10" style="font-size:70%;">}},</span></span> <span id="lstnumberx212"><span id="lstnumberx212.2" style="font-size:70%;">"</span> <span id="lstnumberx212.3" style="font-size:70%;">changes</span> <span id="lstnumberx212.4" style="font-size:70%;">":</span><span id="lstnumberx212.6" style="font-size:70%;">[</span></span> <span id="lstnumberx213"><span id="lstnumberx213.2" style="font-size:70%;">{</span> </span><span id="lstnumberx214"><span id="lstnumberx214.2" style="font-size:70%;">"</span> <span id="lstnumberx214.3" style="font-size:70%;">id</span> <span id="lstnumberx214.4" style="font-size:70%;">":</span><span id="lstnumberx214.6" style="font-size:70%;">"</span> <span id="lstnumberx214.7" style="font-size:70%;">chg</span> <span id="lstnumberx214.8" style="font-size:70%;">-1",</span></span> <span id="lstnumberx215"><span id="lstnumberx215.2" style="font-size:70%;">"</span> <span id="lstnumberx215.3" style="font-size:70%;">type</span> <span id="lstnumberx215.4" style="font-size:70%;">":</span><span id="lstnumberx215.6" style="font-size:70%;">"</span> <span id="lstnumberx215.7" style="font-size:70%;">new</span> <span id="lstnumberx215.8" style="font-size:70%;">|</span> <span id="lstnumberx215.9" style="font-size:70%;">improvement</span> <span id="lstnumberx215.10" style="font-size:70%;">|</span> <span id="lstnumberx215.11" style="font-size:70%;">rollback</span> <span id="lstnumberx215.12" style="font-size:70%;">",</span></span> <span id="lstnumberx216"><span id="lstnumberx216.2" style="font-size:70%;">"</span> <span id="lstnumberx216.3" style="font-size:70%;">description</span> <span id="lstnumberx216.4" style="font-size:70%;">":</span><span id="lstnumberx216.6" style="font-size:70%;">"</span> <span id="lstnumberx216.7" style="font-size:70%;">What</span> <span id="lstnumberx216.9" style="font-size:70%;">was</span> <span id="lstnumberx216.11" style="font-size:70%;">changed</span> <span id="lstnumberx216.13" style="font-size:70%;">and</span> <span id="lstnumberx216.15" style="font-size:70%;">why</span> <span id="lstnumberx216.16" style="font-size:70%;">",</span></span> <span id="lstnumberx217"><span id="lstnumberx217.2" style="font-size:70%;">"</span> <span id="lstnumberx217.3" style="font-size:70%;">files</span> <span id="lstnumberx217.4" style="font-size:70%;">":</span><span id="lstnumberx217.6" style="font-size:70%;">["</span> <span id="lstnumberx217.7" style="font-size:70%;">relative</span> <span id="lstnumberx217.8" style="font-size:70%;">/</span> <span id="lstnumberx217.9" style="font-size:70%;">to</span> <span id="lstnumberx217.10" style="font-size:70%;">/</span> <span id="lstnumberx217.11" style="font-size:70%;">workspace</span> <span id="lstnumberx217.12" style="font-size:70%;">/</span> <span id="lstnumberx217.13" style="font-size:70%;">file</span><span id="lstnumberx217.14" style="font-size:70%;">.</span><span id="lstnumberx217.15" style="font-size:70%;">py</span> <span id="lstnumberx217.16" style="font-size:70%;">"],</span></span> <span id="lstnumberx218"><span id="lstnumberx218.2" style="font-size:70%;">"</span> <span id="lstnumberx218.3" style="font-size:70%;">failure_pattern</span> <span id="lstnumberx218.4" style="font-size:70%;">":</span><span id="lstnumberx218.6" style="font-size:70%;">"</span> <span id="lstnumberx218.7" style="font-size:70%;">The</span> <span id="lstnumberx218.9" style="font-size:70%;">failure</span> <span id="lstnumberx218.11" style="font-size:70%;">class</span> <span id="lstnumberx218.13" style="font-size:70%;">this</span> <span id="lstnumberx218.15" style="font-size:70%;">addresses</span> <span id="lstnumberx218.16" style="font-size:70%;">",</span></span> <span id="lstnumberx219"><span id="lstnumberx219.2" style="font-size:70%;">"</span> <span id="lstnumberx219.3" style="font-size:70%;">predicted_fixes</span> <span id="lstnumberx219.4" style="font-size:70%;">":</span><span id="lstnumberx219.6" style="font-size:70%;">["</span> <span id="lstnumberx219.7" style="font-size:70%;">task</span> <span id="lstnumberx219.8" style="font-size:70%;">-</span> <span id="lstnumberx219.9" style="font-size:70%;">name</span> <span id="lstnumberx219.10" style="font-size:70%;">-</span> <span id="lstnumberx219.11" style="font-size:70%;">a</span> <span id="lstnumberx219.12" style="font-size:70%;">",</span><span id="lstnumberx219.14" style="font-size:70%;">"</span> <span id="lstnumberx219.15" style="font-size:70%;">task</span> <span id="lstnumberx219.16" style="font-size:70%;">-</span> <span id="lstnumberx219.17" style="font-size:70%;">name</span> <span id="lstnumberx219.18" style="font-size:70%;">-</span> <span id="lstnumberx219.19" style="font-size:70%;">b</span> <span id="lstnumberx219.20" style="font-size:70%;">"],</span></span> <span id="lstnumberx220"><span id="lstnumberx220.2" style="font-size:70%;">"</span> <span id="lstnumberx220.3" style="font-size:70%;">risk_tasks</span> <span id="lstnumberx220.4" style="font-size:70%;">":</span><span id="lstnumberx220.6" style="font-size:70%;">["</span> <span id="lstnumberx220.7" style="font-size:70%;">task</span> <span id="lstnumberx220.8" style="font-size:70%;">-</span> <span id="lstnumberx220.9" style="font-size:70%;">name</span> <span id="lstnumberx220.10" style="font-size:70%;">-</span> <span id="lstnumberx220.11" style="font-size:70%;">c</span> <span id="lstnumberx220.12" style="font-size:70%;">"],</span></span> <span id="lstnumberx221"><span id="lstnumberx221.2" style="font-size:70%;">"</span> <span id="lstnumberx221.3" style="font-size:70%;">constraint_level</span> <span id="lstnumberx221.4" style="font-size:70%;">":</span><span id="lstnumberx221.6" style="font-size:70%;">"</span> <span id="lstnumberx221.7" style="font-size:70%;">middleware</span> <span id="lstnumberx221.8" style="font-size:70%;">|</span> <span id="lstnumberx221.9" style="font-size:70%;">tool_impl</span> <span id="lstnumberx221.10" style="font-size:70%;">|</span> <span id="lstnumberx221.11" style="font-size:70%;">tool_desc</span> <span id="lstnumberx221.12" style="font-size:70%;">|</span> <span id="lstnumberx221.13" style="font-size:70%;">skill</span> <span id="lstnumberx221.14" style="font-size:70%;">|</span> <span id="lstnumberx221.15" style="font-size:70%;">prompt</span> <span id="lstnumberx221.16" style="font-size:70%;">",</span></span> <span id="lstnumberx222"><span id="lstnumberx222.2" style="font-size:70%;">"</span> <span id="lstnumberx222.3" style="font-size:70%;">why_this_component</span> <span id="lstnumberx222.4" style="font-size:70%;">":</span><span id="lstnumberx222.6" style="font-size:70%;">"</span> <span id="lstnumberx222.7" style="font-size:70%;">Why</span> <span id="lstnumberx222.9" style="font-size:70%;">this</span> <span id="lstnumberx222.11" style="font-size:70%;">component</span> <span id="lstnumberx222.13" style="font-size:70%;">level</span> <span id="lstnumberx222.15" style="font-size:70%;">was</span> <span id="lstnumberx222.17" style="font-size:70%;">chosen</span> <span id="lstnumberx222.19" style="font-size:70%;">over</span> <span id="lstnumberx222.21" style="font-size:70%;">alternatives</span> <span id="lstnumberx222.22" style="font-size:70%;">"</span> </span><span id="lstnumberx223"><span id="lstnumberx223.2" style="font-size:70%;">}</span> </span><span id="lstnumberx224"><span id="lstnumberx224.2" style="font-size:70%;">]</span> </span><span id="lstnumberx225"><span id="lstnumberx225.1" style="font-size:70%;">}</span> </span><span id="lstnumberx226"><span id="lstnumberx226.1" style="font-size:70%;">```</span> </span><span id="lstnumberx228"><span id="lstnumberx228.1" style="font-size:70%;">##</span> <span id="lstnumberx228.3" style="font-size:70%;">Validation</span> </span><span id="lstnumberx230"><span id="lstnumberx230.1" style="font-size:70%;">Run</span> <span id="lstnumberx230.3" style="font-size:70%;">after</span> <span id="lstnumberx230.5" style="font-size:70%;">all</span> <span id="lstnumberx230.7" style="font-size:70%;">changes</span><span id="lstnumberx230.8" style="font-size:70%;">:</span><span id="lstnumberx230.10" style="font-size:70%;">`</span> <span id="lstnumberx230.11" style="font-size:70%;">python</span> <span id="lstnumberx230.13" style="font-size:70%;">evolve_agent</span> <span id="lstnumberx230.14" style="font-size:70%;">/</span> <span id="lstnumberx230.15" style="font-size:70%;">skills</span> <span id="lstnumberx230.16" style="font-size:70%;">/</span> <span id="lstnumberx230.17" style="font-size:70%;">nexau</span> <span id="lstnumberx230.18" style="font-size:70%;">-</span> <span id="lstnumberx230.19" style="font-size:70%;">evolution</span> <span id="lstnumberx230.20" style="font-size:70%;">-</span> <span id="lstnumberx230.21" style="font-size:70%;">guide</span> <span id="lstnumberx230.22" style="font-size:70%;">/</span> <span id="lstnumberx230.23" style="font-size:70%;">scripts</span> <span id="lstnumberx230.24" style="font-size:70%;">/</span> <span id="lstnumberx230.25" style="font-size:70%;">validate_agent</span><span id="lstnumberx230.26" style="font-size:70%;">.</span><span id="lstnumberx230.27" style="font-size:70%;">py</span> <span id="lstnumberx230.29" style="font-size:70%;">{{</span> <span id="lstnumberx230.31" style="font-size:70%;">ws</span> <span id="lstnumberx230.33" style="font-size:70%;">}}/</span> <span id="lstnumberx230.34" style="font-size:70%;">code_agent</span><span id="lstnumberx230.35" style="font-size:70%;">.</span><span id="lstnumberx230.36" style="font-size:70%;">yaml</span> <span id="lstnumberx230.37" style="font-size:70%;">`</span> </span><span id="lstnumberx232"><span id="lstnumberx232.1" style="font-size:70%;">##</span> <span id="lstnumberx232.3" style="font-size:70%;">complete_task</span> <span id="lstnumberx232.5" style="font-size:70%;">Output</span> </span><span id="lstnumberx234"><span id="lstnumberx234.1" style="font-size:70%;">Include</span><span id="lstnumberx234.2" style="font-size:70%;">:</span><span id="lstnumberx234.4" style="font-size:70%;">regression</span> <span id="lstnumberx234.6" style="font-size:70%;">analysis</span> <span id="lstnumberx234.8" style="font-size:70%;">(</span><span id="lstnumberx234.9" style="font-size:70%;">if</span> <span id="lstnumberx234.11" style="font-size:70%;">iteration</span> <span id="lstnumberx234.13" style="font-size:70%;">2+),</span><span id="lstnumberx234.15" style="font-size:70%;">failure</span> <span id="lstnumberx234.17" style="font-size:70%;">patterns</span> <span id="lstnumberx234.19" style="font-size:70%;">found</span><span id="lstnumberx234.20" style="font-size:70%;">,</span><span id="lstnumberx234.22" style="font-size:70%;">changes</span> <span id="lstnumberx234.24" style="font-size:70%;">made</span><span id="lstnumberx234.25" style="font-size:70%;">,</span><span id="lstnumberx234.27" style="font-size:70%;">predicted</span> <span id="lstnumberx234.29" style="font-size:70%;">impact</span><span id="lstnumberx234.30" style="font-size:70%;">.</span></span> <span id="lstnumberx237"><span id="lstnumberx237.1" style="font-size:70%;">#</span> <span id="lstnumberx237.3" style="font-size:70%;">Safety</span> <span id="lstnumberx237.5" style="font-size:70%;">Constraints</span> </span><span id="lstnumberx239"><span id="lstnumberx239.1" style="font-size:70%;">-</span> <span id="lstnumberx239.3" style="font-size:70%;">Modify</span> <span id="lstnumberx239.5" style="font-size:70%;">ONLY</span> <span id="lstnumberx239.7" style="font-size:70%;">files</span> <span id="lstnumberx239.9" style="font-size:70%;">under</span> <span id="lstnumberx239.11" style="font-size:70%;">`</span> <span id="lstnumberx239.12" style="font-size:70%;">workspace</span> <span id="lstnumberx239.13" style="font-size:70%;">/`</span> </span><span id="lstnumberx240"><span id="lstnumberx240.1" style="font-size:70%;">-</span> <span id="lstnumberx240.3" style="font-size:70%;">`</span> <span id="lstnumberx240.4" style="font-size:70%;">runs</span> <span id="lstnumberx240.5" style="font-size:70%;">/`</span> <span id="lstnumberx240.7" style="font-size:70%;">is</span> <span id="lstnumberx240.9" style="font-size:70%;">READ</span> <span id="lstnumberx240.11" style="font-size:70%;">ONLY</span> </span><span id="lstnumberx241"><span id="lstnumberx241.1" style="font-size:70%;">-</span> <span id="lstnumberx241.3" style="font-size:70%;">Do</span> <span id="lstnumberx241.5" style="font-size:70%;">NOT</span> <span id="lstnumberx241.7" style="font-size:70%;">modify</span> <span id="lstnumberx241.9" style="font-size:70%;">LLM</span> <span id="lstnumberx241.11" style="font-size:70%;">configuration</span> <span id="lstnumberx241.13" style="font-size:70%;">(</span><span id="lstnumberx241.14" style="font-size:70%;">model</span><span id="lstnumberx241.15" style="font-size:70%;">,</span><span id="lstnumberx241.17" style="font-size:70%;">temperature</span><span id="lstnumberx241.18" style="font-size:70%;">,</span><span id="lstnumberx241.20" style="font-size:70%;">max_tokens</span><span id="lstnumberx241.21" style="font-size:70%;">,</span><span id="lstnumberx241.23" style="font-size:70%;">reasoning_effort</span><span id="lstnumberx241.24" style="font-size:70%;">,</span><span id="lstnumberx241.26" style="font-size:70%;">etc</span><span id="lstnumberx241.27" style="font-size:70%;">.)</span> </span><span id="lstnumberx242"><span id="lstnumberx242.1" style="font-size:70%;">-</span> <span id="lstnumberx242.3" style="font-size:70%;">Do</span> <span id="lstnumberx242.5" style="font-size:70%;">NOT</span> <span id="lstnumberx242.7" style="font-size:70%;">add</span> <span id="lstnumberx242.9" style="font-size:70%;">task</span> <span id="lstnumberx242.10" style="font-size:70%;">-</span> <span id="lstnumberx242.11" style="font-size:70%;">specific</span> <span id="lstnumberx242.13" style="font-size:70%;">logic</span> <span id="lstnumberx242.15" style="font-size:70%;">or</span> <span id="lstnumberx242.17" style="font-size:70%;">hardcoded</span> <span id="lstnumberx242.19" style="font-size:70%;">solutions</span> </span><span id="lstnumberx243"><span id="lstnumberx243.1" style="font-size:70%;">-</span> <span id="lstnumberx243.3" style="font-size:70%;">Do</span> <span id="lstnumberx243.5" style="font-size:70%;">NOT</span> <span id="lstnumberx243.7" style="font-size:70%;">delete</span> <span id="lstnumberx243.9" style="font-size:70%;">original</span> <span id="lstnumberx243.11" style="font-size:70%;">system</span> <span id="lstnumberx243.13" style="font-size:70%;">prompt</span> <span id="lstnumberx243.15" style="font-size:70%;">rules</span> <span id="lstnumberx243.17" style="font-size:70%;">(</span><span id="lstnumberx243.18" style="font-size:70%;">those</span> <span id="lstnumberx243.20" style="font-size:70%;">in</span> <span id="lstnumberx243.22" style="font-size:70%;">iteration</span> <span id="lstnumberx243.24" style="font-size:70%;">1'</span> <span id="lstnumberx243.25" style="font-size:70%;">s</span> <span id="lstnumberx243.27" style="font-size:70%;">input</span> <span id="lstnumberx243.28" style="font-size:70%;">/</span> <span id="lstnumberx243.29" style="font-size:70%;">workspace</span><span id="lstnumberx243.30" style="font-size:70%;">)</span> </span><span id="lstnumberx244"><span id="lstnumberx244.1" style="font-size:70%;">-</span> <span id="lstnumberx244.3" style="font-size:70%;">Do</span> <span id="lstnumberx244.5" style="font-size:70%;">NOT</span> <span id="lstnumberx244.7" style="font-size:70%;">reverse</span> <span id="lstnumberx244.8" style="font-size:70%;">-</span> <span id="lstnumberx244.9" style="font-size:70%;">engineer</span> <span id="lstnumberx244.11" style="font-size:70%;">test</span> <span id="lstnumberx244.13" style="font-size:70%;">cases</span> <span id="lstnumberx244.15" style="font-size:70%;">from</span> <span id="lstnumberx244.17" style="font-size:70%;">trajectories</span> </span><span id="lstnumberx245"><span id="lstnumberx245.1" style="font-size:70%;">-</span> <span id="lstnumberx245.3" style="font-size:70%;">Ensure</span> <span id="lstnumberx245.5" style="font-size:70%;">Python</span> <span id="lstnumberx245.7" style="font-size:70%;">imports</span> <span id="lstnumberx245.9" style="font-size:70%;">remain</span> <span id="lstnumberx245.11" style="font-size:70%;">valid</span> <span id="lstnumberx245.13" style="font-size:70%;">after</span> <span id="lstnumberx245.15" style="font-size:70%;">editing</span> <span id="lstnumberx245.17" style="font-size:70%;">`.</span><span id="lstnumberx245.18" style="font-size:70%;">py</span> <span id="lstnumberx245.19" style="font-size:70%;">`</span> <span id="lstnumberx245.21" style="font-size:70%;">files</span> </span><span id="lstnumberx246"><span id="lstnumberx246.1" style="font-size:70%;">-</span> <span id="lstnumberx246.3" style="font-size:70%;">Verify</span> <span id="lstnumberx246.5" style="font-size:70%;">Python</span> <span id="lstnumberx246.7" style="font-size:70%;">syntax</span> <span id="lstnumberx246.9" style="font-size:70%;">after</span> <span id="lstnumberx246.11" style="font-size:70%;">editing</span> <span id="lstnumberx246.13" style="font-size:70%;">`.</span><span id="lstnumberx246.14" style="font-size:70%;">py</span> <span id="lstnumberx246.15" style="font-size:70%;">`</span> <span id="lstnumberx246.17" style="font-size:70%;">files</span> </span><span id="lstnumberx248"><span id="lstnumberx248.1" style="font-size:70%;">&gt;</span> <span id="lstnumberx248.3" style="font-size:70%;">**</span> <span id="lstnumberx248.4" style="font-size:70%;">LLM</span> <span id="lstnumberx248.6" style="font-size:70%;">Config</span> <span id="lstnumberx248.8" style="font-size:70%;">Hands</span> <span id="lstnumberx248.9" style="font-size:70%;">-</span> <span id="lstnumberx248.10" style="font-size:70%;">Off</span> <span id="lstnumberx248.12" style="font-size:70%;">Rule</span> <span id="lstnumberx248.13" style="font-size:70%;">**:</span><span id="lstnumberx248.15" style="font-size:70%;">Do</span> <span id="lstnumberx248.17" style="font-size:70%;">NOT</span> <span id="lstnumberx248.19" style="font-size:70%;">modify</span> <span id="lstnumberx248.21" style="font-size:70%;">`</span> <span id="lstnumberx248.22" style="font-size:70%;">llm_config</span> <span id="lstnumberx248.23" style="font-size:70%;">`</span> <span id="lstnumberx248.25" style="font-size:70%;">fields</span><span id="lstnumberx248.26" style="font-size:70%;">.</span><span id="lstnumberx248.28" style="font-size:70%;">LLM</span> <span id="lstnumberx248.30" style="font-size:70%;">config</span> <span id="lstnumberx248.32" style="font-size:70%;">changes</span> <span id="lstnumberx248.34" style="font-size:70%;">consistently</span> <span id="lstnumberx248.36" style="font-size:70%;">cause</span> <span id="lstnumberx248.38" style="font-size:70%;">broad</span><span id="lstnumberx248.39" style="font-size:70%;">,</span><span id="lstnumberx248.41" style="font-size:70%;">hard</span> <span id="lstnumberx248.42" style="font-size:70%;">-</span> <span id="lstnumberx248.43" style="font-size:70%;">to</span> <span id="lstnumberx248.44" style="font-size:70%;">-</span> <span id="lstnumberx248.45" style="font-size:70%;">diagnose</span> <span id="lstnumberx248.47" style="font-size:70%;">regressions</span><span id="lstnumberx248.48" style="font-size:70%;">.</span></span> <span id="lstnumberx251"><span id="lstnumberx251.1" style="font-size:70%;">Date</span><span id="lstnumberx251.2" style="font-size:70%;">:</span><span id="lstnumberx251.4" style="font-size:70%;">{{</span> <span id="lstnumberx251.6" style="font-size:70%;">date</span> <span id="lstnumberx251.8" style="font-size:70%;">}}</span></span></span></span></foreignObject></g></g></svg>

### B.3 Explore Agent Prompts

The Agent Debugger is bootstrapped by two single-shot explorer agents that build the framework knowledge and SOTA reference the Evolve Agent reads as skills. Both prompts enforce a write-early-write-often pattern so the produced skill files are always available even on partial completion.

#### B.3.1 Source-code Exploration Agent

<svg id="A2.SS3.SSS1.p1.pic1" height="54026.07" overflow="visible" version="1.1" viewBox="0 0 600 54026.07" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,54026.07) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 54021.3 C 0 54023.93 2.13 54026.07 4.77 54026.07 L 595.23 54026.07 C 597.87 54026.07 600 54023.93 600 54021.3 L 600 4.77 C 600 2.13 597.87 0 595.23 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F8FCFF;" fill="#F8FCFF" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 53624.93 L 599.17 53624.93 L 599.17 4.77 C 599.17 2.59 597.41 0.83 595.23 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 53625.76 L 0.83 54021.3 C 0.83 54023.48 2.59 54025.24 4.77 54025.24 L 595.23 54025.24 C 597.41 54025.24 599.17 54023.48 599.17 54021.3 L 599.17 53625.76 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22666.37)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:41.87em;--ltx-fo-height:0.3em;--ltx-fo-depth:28em;" width="579.4" height="391.61" transform="matrix(1 0 0 -1 0 4.17)" overflow="visible" color="#FFFFFF"><span id="A2.SS3.SSS1.p1.pic1.1.1.1.1.1" style="width:46.21em;"><span id="A2.SS3.SSS1.p1.pic1.1.1.1.1.1.1"><span id="A2.SS3.SSS1.p1.pic1.1.1.1.1.1.1.1" style="font-size:70%;">explore_agent/source_agent/prompt.md</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22661.62)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.87em;--ltx-fo-height:0.64em;--ltx-fo-depth:3873.59em;" width="579.4" height="53607.92" transform="matrix(1 0 0 -1 0 8.92)" overflow="visible" color="#000000"><span id="A2.SS3.SSS1.p1.pic1.2.2.2.1.1" style="width:41.87em;"><span id="A2.SS3.SSS1.p1.pic1.2.2.2.1.1.1"><a href="data:text/plain;base64,WW91IGFyZSBhIFNvdXJjZSBDb2RlIEV4cGxvcmF0aW9uIEFnZW50LiBZb3VyIG1pc3Npb24gaXMgdG8gZXhwbG9yZSB0aGUgTmV4QVUgYWdlbnQgZnJhbWV3b3JrIHNvdXJjZSBjb2RlIGFuZCBwcm9kdWNlIGEgKipwcmFjdGljYWwgZGV2ZWxvcG1lbnQgZ3VpZGUqKiBmb3IgYW4gRXZvbHV0aW9uIEFnZW50IHRoYXQgbmVlZHMgdG8gY3JlYXRlIGFuZCBtb2RpZnkgTmV4QVUgY29tcG9uZW50cy4KCiMgQ29udGV4dAoKKipOZXhBVSoqIGlzIGFuIEFJIGFnZW50IGZyYW1ld29yayBwcm92aWRpbmcgdG9vbHMsIG1pZGRsZXdhcmUsIGNvbmZpZyBsb2FkaW5nLCBhbmQgYW4gZXhlY3V0aW9uIGxvb3AuIEFuIEV2b2x1dGlvbiBBZ2VudCBtb2RpZmllcyBhIE5leEFVIGNvZGluZyBhZ2VudCBieSBjcmVhdGluZy9lZGl0aW5nIG1pZGRsZXdhcmUsIHRvb2xzLCBza2lsbHMsIHN1Yi1hZ2VudHMsIGFuZCBjb25maWcgZmlsZXMuCgoqKlRoZSBFdm9sdXRpb24gQWdlbnQgaGFzIE5PIHByZS1leGlzdGluZyBOZXhBVSBmcmFtZXdvcmsga25vd2xlZGdlLioqIFlvdXIgb3V0cHV0IHdpbGwgYmUgaXRzICoqc29sZSByZWZlcmVuY2UqKi4gRm9jdXMgb246CgoxLiAqKkhvdyB0byB3cml0ZSBtaWRkbGV3YXJlKiogLS0gYmFzZSBjbGFzcywgaG9vayBtZXRob2RzLCBwYXJhbXMsIHJlZ2lzdHJhdGlvbiwgcmVhbCBleGFtcGxlcyBmcm9tIHNvdXJjZQoyLiAqKkhvdyB0byBjcmVhdGUgdG9vbHMqKiAtLSBZQU1MIHNjaGVtYSwgUHl0aG9uIGZ1bmN0aW9uIHNpZ25hdHVyZSwgYmluZGluZywgYWdlbnRfc3RhdGUgaW5qZWN0aW9uCjMuICoqSG93IHRvIGNyZWF0ZSBza2lsbHMqKiAtLSBTS0lMTC5tZCBmb3JtYXQsIGZyb250bWF0dGVyLCByZWdpc3RyYXRpb24sIGxvYWRpbmcgbWVjaGFuaXNtCjQuICoqSG93IHRvIGNyZWF0ZSBzdWItYWdlbnRzKiogLS0gY29uZmlnIHNjaGVtYSwgcmVnaXN0cmF0aW9uLCBpbnZvY2F0aW9uLCBjb250ZXh0IGlzb2xhdGlvbgo1LiAqKllBTUwgY29uZmlnIHNjaGVtYSoqIC0tIGNvbXBsZXRlIGZpZWxkIHJlZmVyZW5jZSB3aXRoIHR5cGVzLCBkZWZhdWx0cywgcmVxdWlyZWQvb3B0aW9uYWwKNi4gKipLZXkgcnVudGltZSBiZWhhdmlvcnMqKiAtLSBvbmx5IHdoYXQncyBuZWVkZWQgdG8gd3JpdGUgY29ycmVjdCBjb21wb25lbnRzCgojIFNvdXJjZSBDb2RlIExvY2F0aW9uIChSRUFEIE9OTFkpCgotIE5leEFVIGZyYW1ld29yazogYHt7IG5leGF1X3BhdGggfX1gCgojIE91dHB1dCBEaXJlY3RvcnkgKFdSSVRFKQoKLSBTa2lsbCBmaWxlOiBge3sgb3V0cHV0X3NraWxsX2RpciB9fS9uZXhhdS1mcmFtZXdvcmstaW50ZXJuYWxzL1NLSUxMLm1kYAoKIyBbIV0gTUFOREFUT1JZIFdPUktGTE9XOiBFeHBsb3JlLVdyaXRlLVJlZmluZSBDeWNsZXMKCllvdSBNVVNUIGZvbGxvdyB0aGlzIHBoYXNlZCB3b3JrZmxvdy4gRG8gTk9UIHNwZW5kIGFsbCB5b3VyIHRpbWUgcmVhZGluZy4KCiMjIFBoYXNlIDE6IFNjYW4gJiBTY2FmZm9sZCAoaXRlcmF0aW9ucyAxLTE1KQoxLiBgbGlzdF9kaXJlY3RvcnlgIGFuZCBgZ2xvYmAgdG8gbWFwIHRoZSBjb2RlYmFzZSBzdHJ1Y3R1cmUKMi4gUmVhZCBrZXkgZmlsZXM6IGNvbmZpZyBkYXRhY2xhc3NlcywgaG9va3MucHkgYmFzZSBjbGFzcywgZXhpc3RpbmcgbWlkZGxld2FyZS90b29sIGltcGxlbWVudGF0aW9ucwozLiAqKldSSVRFIHRoZSBpbml0aWFsIFNLSUxMLm1kKiogd2l0aCB3aGF0ZXZlciB5b3UgaGF2ZSAtLSBldmVuIGlmIGluY29tcGxldGUsIHVzZSAiW1RPRE9dIiBwbGFjZWhvbGRlcnMKCiMjIFBoYXNlIDI6IFByYWN0aWNhbCBQYXR0ZXJucyAoaXRlcmF0aW9ucyAxNi02MCkKNC4gRm9yIGVhY2ggc2VjdGlvbiBiZWxvdywgZmluZCAqKnJlYWwgY29kZSBleGFtcGxlcyoqIGZyb20gdGhlIHNvdXJjZQo1LiAqKkFmdGVyIGVhY2ggc2VjdGlvbiwgaW1tZWRpYXRlbHkgYHdyaXRlX2ZpbGVgIHRvIFVQREFURSBTS0lMTC5tZCoqCjYuIFByaW9yaXR5IG9yZGVyOiBzZWN0aW9uIDEgQ29uZmlnIC0+IHNlY3Rpb24gMiBNaWRkbGV3YXJlIC0+IHNlY3Rpb24gMyBUb29scyAtPiBzZWN0aW9uIDQgU2tpbGxzIC0+IHNlY3Rpb24gNSBTdWItQWdlbnRzIC0+IHNlY3Rpb24gNiBSdW50aW1lCgojIyBQaGFzZSAzOiBQb2xpc2ggJiBDb21wbGV0ZSAoaXRlcmF0aW9ucyA2MS04MCkKNy4gRmlsbCByZW1haW5pbmcgIltUT0RPXSIgc2VjdGlvbnMsIGFkZCBjb3B5LXBhc3RlIHRlbXBsYXRlcwo4LiBDYWxsIGBjb21wbGV0ZV90YXNrYAoKKipIQVJEIFJVTEVTOioqCi0gWW91IE1VU1QgY2FsbCBgd3JpdGVfZmlsZWAgZm9yIFNLSUxMLm1kICoqYmVmb3JlIGl0ZXJhdGlvbiAyMCoqLiBObyBleGNlcHRpb25zLgotIFlvdSBNVVNUIGNhbGwgYHdyaXRlX2ZpbGVgIHRvIHVwZGF0ZSBTS0lMTC5tZCAqKmF0IGxlYXN0IGV2ZXJ5IDE1IGl0ZXJhdGlvbnMqKiBhZnRlciB0aGF0LgotIElmIHlvdSByZWFjaCBpdGVyYXRpb24gMTAwIHdpdGhvdXQgaGF2aW5nIGNhbGxlZCBgd3JpdGVfZmlsZWAsIHlvdSBoYXZlIEZBSUxFRC4KLSBVc2UgYHJlYWRfZmlsZWAgd2l0aCBvZmZzZXQvbGltaXQgZm9yIGxhcmdlIGZpbGVzLgotIENpdGUgYGZpbGU6bGluZV9yYW5nZWAgZm9yIGV2ZXJ5IGNsYWltLiBJbmNsdWRlIGFjdHVhbCBjb2RlIHNuaXBwZXRzLgoKIyBFeHBsb3JhdGlvbiBHdWlkZSAtLSBXaGF0IHRvIEV4dHJhY3QKCkZvciBlYWNoIHNlY3Rpb24sIGZpbmQgdGhlICoqcmVhbCBpbXBsZW1lbnRhdGlvbioqIGluIHNvdXJjZSBjb2RlIGFuZCBleHRyYWN0IHBhdHRlcm5zIHRoZSBFdm9sdXRpb24gQWdlbnQgY2FuIGNvcHkuCgojIyBzZWN0aW9uIDEuIFlBTUwgQ29uZmlnIFNjaGVtYSAoSElHSEVTVCBQUklPUklUWSkKCkZpbmQgdGhlIGNvbmZpZyBkYXRhY2xhc3MgZGVmaW5pdGlvbnMgaW4gYG5leGF1L2FyY2hzL21haW5fc3ViL2NvbmZpZy9gLiBEb2N1bWVudDoKCi0gKipBbGwgdG9wLWxldmVsIGZpZWxkcyoqIGluIGBhZ2VudC55YW1sYDogdHlwZSwgbmFtZSwgc3lzdGVtX3Byb21wdCwgc3lzdGVtX3Byb21wdF90eXBlLCB0b29sX2NhbGxfbW9kZSwgbGxtX2NvbmZpZywgbWF4X2l0ZXJhdGlvbnMsIG1heF9jb250ZXh0X3Rva2Vucywgc2FuZGJveF9jb25maWcsIHRvb2xzLCBtaWRkbGV3YXJlcywgc2tpbGxzLCBzdWJfYWdlbnRzLCBzdG9wX3Rvb2xzLCB0cmFjZXJzIC0tIHdpdGggdHlwZXMsIGRlZmF1bHRzLCByZXF1aXJlZC9vcHRpb25hbAotICoqYGxsbV9jb25maWdgIHN1Yi1maWVsZHMqKjogbW9kZWwsIGJhc2VfdXJsLCBhcGlfa2V5LCBtYXhfdG9rZW5zLCB0ZW1wZXJhdHVyZSwgc3RyZWFtLCBhcGlfdHlwZSwgcmVhc29uaW5nLCBldGMuCi0gKipgdG9vbHM6YCBlbnRyeSBmb3JtYXQqKjogbmFtZSwgeWFtbF9wYXRoLCBiaW5kaW5nIC0tIGhvdyBlYWNoIGlzIHJlc29sdmVkCi0gKipgbWlkZGxld2FyZXM6YCBlbnRyeSBmb3JtYXQqKjogaW1wb3J0LCBwYXJhbXMgLS0gaG93IHRoZSBpbXBvcnQgc3RyaW5nIGlzIHJlc29sdmVkLCB3aGF0J3MgYWRkZWQgdG8gc3lzLnBhdGgKLSAqKmBza2lsbHM6YCBlbnRyeSBmb3JtYXQqKjogcGF0aCBmb3JtYXQsIGhvdyBza2lsbHMgYXJlIGRpc2NvdmVyZWQgYW5kIGxvYWRlZAotICoqYHN1Yl9hZ2VudHM6YCBlbnRyeSBmb3JtYXQqKjogbmFtZSwgY29uZmlnX3BhdGgsIGRlc2NyaXB0aW9uIC0tIGhvdyBjb25maWdfcGF0aCBpcyByZXNvbHZlZAotICoqYCR7ZW52LlhYWH1gIHJlc29sdXRpb24qKjogYmVoYXZpb3Igd2hlbiBlbnYgdmFyIGlzIG5vdCBzZXQKLSAqKlJlbGF0aXZlIHBhdGggcmVzb2x1dGlvbioqOiByZWxhdGl2ZSB0byB3aGF0PyAoWUFNTCBmaWxlIGRpcmVjdG9yeT8gQ1dEPyB3b3JrX2Rpcj8pCgojIyBzZWN0aW9uIDIuIE1pZGRsZXdhcmUgQ3JlYXRpb24gKEhJR0hFU1QgUFJJT1JJVFkpCgpGaW5kIHRoZSBtaWRkbGV3YXJlIGJhc2UgY2xhc3MgYW5kIHNldmVyYWwgZXhpc3RpbmcgbWlkZGxld2FyZSBpbXBsZW1lbnRhdGlvbnMuIEV4dHJhY3Q6CgojIyMgMi4xIEJhc2UgQ2xhc3MgJiBIb29rIE1ldGhvZHMKLSBXaGF0IGNsYXNzIHRvIGluaGVyaXQgZnJvbT8gRmluZCB0aGUgZXhhY3QgaW1wb3J0IHBhdGggYW5kIGNsYXNzIG5hbWUuCi0gKipBTEwgYXZhaWxhYmxlIGhvb2sgbWV0aG9kcyoqIHdpdGggdGhlaXIgRVhBQ1Qgc2lnbmF0dXJlcyAocGFyYW1ldGVyIG5hbWVzLCB0eXBlcywgcmV0dXJuIHR5cGUpOgogIC0gYGJlZm9yZV9tb2RlbChpbnB1dCkgLT4gSG9va1Jlc3VsdGAKICAtIGBhZnRlcl9tb2RlbChpbnB1dCkgLT4gSG9va1Jlc3VsdGAKICAtIGBiZWZvcmVfdG9vbChpbnB1dCkgLT4gSG9va1Jlc3VsdGAKICAtIGBhZnRlcl90b29sKGlucHV0KSAtPiBIb29rUmVzdWx0YAogIC0gYHdyYXBfbW9kZWxfY2FsbCguLi4pYCAtLSBob3cgdG8gd3JhcCB0aGUgTExNIGNhbGwKICAtIGB3cmFwX3Rvb2xfY2FsbCguLi4pYCAtLSBob3cgdG8gd3JhcCB0b29sIGV4ZWN1dGlvbgogIC0gQW55IG90aGVycyAoYmVmb3JlX2FnZW50LCBhZnRlcl9hZ2VudCwgZXRjLikKLSAqKkhvb2tSZXN1bHQqKjogV2hhdCBjYW4gaXQgbW9kaWZ5PyBIb3cgdG8gaW5qZWN0IG1lc3NhZ2VzPyBIb3cgdG8gbW9kaWZ5IHRvb2wgb3V0cHV0PyBTaG93IHRoZSBjbGFzcyBkZWZpbml0aW9uLgotICoqSG9vayBpbnB1dCB0eXBlcyoqOiBXaGF0IGZpZWxkcyBhcmUgYXZhaWxhYmxlIGluIGBCZWZvcmVNb2RlbEhvb2tJbnB1dGAsIGBBZnRlck1vZGVsSG9va0lucHV0YCwgYEJlZm9yZVRvb2xIb29rSW5wdXRgLCBgQWZ0ZXJUb29sSG9va0lucHV0YD8KCiMjIyAyLjIgSG93IFBhcmFtcyBBcmUgUGFzc2VkCi0gSG93IGRvZXMgYHBhcmFtczpgIGluIFlBTUwgbWFwIHRvIGBfX2luaXRfX2AgYXJndW1lbnRzPyBGaW5kIHRoZSBleGFjdCBjb2RlLgotIENhbiBtaWRkbGV3YXJlIGFjY2VzcyBgYWdlbnRfc3RhdGVgPyBIb3c/CgojIyMgMi4zIFJlZ2lzdHJhdGlvbgotIEhvdyBkb2VzIGBpbXBvcnQ6IG1pZGRsZXdhcmUubXlfbW9kdWxlOk15Q2xhc3NgIGdldCByZXNvbHZlZD8gV2hhdCBkaXJlY3RvcnkgaXMgYWRkZWQgdG8gc3lzLnBhdGg/Ci0gT3JkZXJpbmc6IGRvIG1pZGRsZXdhcmVzIGV4ZWN1dGUgaW4gWUFNTCBvcmRlcj8gV2hhdCBhYm91dCBhZnRlcl8qIGhvb2tzPwoKIyMjIDIuNCBSZWFsIEV4YW1wbGVzCkZpbmQgMi0zIGV4aXN0aW5nIG1pZGRsZXdhcmUgaW1wbGVtZW50YXRpb25zIGluIHRoZSBzb3VyY2UgYW5kIGV4dHJhY3QgdGhlaXIgcGF0dGVybnM6Ci0gQSBzaW1wbGUgb25lIChlLmcuLCBvdXRwdXQgdHJ1bmNhdGlvbikKLSBBIGNvbXBsZXggb25lIChlLmcuLCBjb250ZXh0IGNvbXBhY3Rpb24pClNob3cgdGhlIGNsYXNzIHN0cnVjdHVyZSwgaG93IHBhcmFtcyBhcmUgcmVjZWl2ZWQsIGhvdyBob29rcyBhcmUgaW1wbGVtZW50ZWQuCgojIyMgMi41IENvcHktUGFzdGUgVGVtcGxhdGUKQmFzZWQgb24gd2hhdCB5b3UgZm91bmQsIHByb3ZpZGUgYSBtaW5pbWFsIG1pZGRsZXdhcmUgdGVtcGxhdGUgdGhhdCB0aGUgRXZvbHV0aW9uIEFnZW50IGNhbiBjb3B5LgoKIyMgc2VjdGlvbiAzLiBUb29sIENyZWF0aW9uIChISUdIIFBSSU9SSVRZKQoKIyMjIDMuMSBUb29sIFlBTUwgU2NoZW1hCkZpbmQgYSB0b29sIFlBTUwgZGVmaW5pdGlvbiAoZS5nLiwgYHJlYWRfZmlsZS50b29sLnlhbWxgKS4gRG9jdW1lbnQgdGhlIGZ1bGwgc2NoZW1hOgotIG5hbWUsIGRlc2NyaXB0aW9uLCBpbnB1dF9zY2hlbWEgKEpTT04gU2NoZW1hIGZvcm1hdCksIGV0Yy4KCiMjIyAzLjIgUHl0aG9uIEZ1bmN0aW9uIFNpZ25hdHVyZQotIEhvdyBkb2VzIGBiaW5kaW5nOiB0b29scy5teV9tb2R1bGU6bXlfZnVuY2AgcmVzb2x2ZSB0byBhIFB5dGhvbiBmdW5jdGlvbj8KLSBIb3cgaXMgYGFnZW50X3N0YXRlYCBpbmplY3RlZD8gSXMgaXQgYmFzZWQgb24gYGluc3BlY3Quc2lnbmF0dXJlYD8gV2hhdCBmaWVsZHMgZG9lcyBgYWdlbnRfc3RhdGVgIGhhdmUgKHNhbmRib3gsIGhpc3RvcnksIGV0Yy4pPwotIFdoYXQgc2hvdWxkIHRoZSBmdW5jdGlvbiByZXR1cm4/IEhvdyBhcmUgcmV0dXJuIHZhbHVlcyBub3JtYWxpemVkPwotIFdoYXQgaGFwcGVucyBpZiB0aGUgdG9vbCByYWlzZXMgYW4gZXhjZXB0aW9uPwoKIyMjIDMuMyBSZWdpc3RyYXRpb24KLSBUaGUgYHRvb2xzOmAgbGlzdCBlbnRyeSBmb3JtYXQgaW4gYWdlbnQgWUFNTAotIEhvdyB5YW1sX3BhdGggYW5kIGJpbmRpbmcgYXJlIHJlc29sdmVkIChyZWxhdGl2ZSB0byBjb25maWcgZGlyPyB3b3JrX2Rpcj8pCgojIyMgMy40IFJlYWwgRXhhbXBsZXMKRmluZCAyLTMgZXhpc3RpbmcgdG9vbCBpbXBsZW1lbnRhdGlvbnMuIFNob3cgdGhlIGZ1bmN0aW9uIHNpZ25hdHVyZSwgaG93IHNhbmRib3ggaXMgdXNlZCwgcmV0dXJuIGZvcm1hdC4KCiMjIyAzLjUgQ29weS1QYXN0ZSBUZW1wbGF0ZQpQcm92aWRlIGEgbWluaW1hbCB0b29sIHRlbXBsYXRlIChZQU1MICsgUHl0aG9uKS4KCiMjIHNlY3Rpb24gNC4gU2tpbGwgU3lzdGVtIChNRURJVU0gUFJJT1JJVFkpCgotICoqU0tJTEwubWQgZm9ybWF0Kio6IFdoYXQgZnJvbnRtYXR0ZXIgZmllbGRzIGFyZSBleHBlY3RlZCAobmFtZSwgZGVzY3JpcHRpb24sIGV0Yy4pPwotICoqSG93IHNraWxscyBhcmUgbG9hZGVkKio6IFdoYXQgdHJpZ2dlcnMgYExvYWRTa2lsbGA/IEhvdyBkb2VzIHRoZSBhZ2VudCBkZWNpZGUgd2hpY2ggc2tpbGwgdG8gbG9hZD8KLSAqKmBza2lsbHM6YCBpbiBhZ2VudCBZQU1MKio6IHBhdGggZm9ybWF0IChyZWxhdGl2ZSB0byB3aGF0PyksIGhvdyBkaXJlY3RvcmllcyBhcmUgc2Nhbm5lZAotICoqU2tpbGwgY29udGVudCoqOiBIb3cgaXMgU0tJTEwubWQgY29udGVudCBpbmplY3RlZCBpbnRvIHRoZSBjb252ZXJzYXRpb24/IEFzIGEgdXNlciBtZXNzYWdlPyBTeXN0ZW0gbWVzc2FnZT8KCiMjIHNlY3Rpb24gNS4gU3ViLUFnZW50IENyZWF0aW9uIChNRURJVU0gUFJJT1JJVFkpCgojIyMgNS4xIENvbmZpZwotIGBzdWJfYWdlbnRzOmAgbGlzdCBlbnRyeSBmb3JtYXQ6IG5hbWUsIGNvbmZpZ19wYXRoLCBkZXNjcmlwdGlvbiwgZXRjLgotIFN1Yi1hZ2VudCdzIG93biBgYWdlbnQueWFtbGAgc3RydWN0dXJlIC0tIGRvZXMgaXQgaW5oZXJpdCBmcm9tIHBhcmVudD8gV2hhdCdzIGluZGVwZW5kZW50PwotIEhvdyBjb25maWdfcGF0aCBpcyByZXNvbHZlZAoKIyMjIDUuMiBSdW50aW1lCi0gSG93IGBzdWItYWdlbnQte25hbWV9KG1lc3NhZ2U9Ii4uLiIpYCBpcyBkaXNwYXRjaGVkCi0gQ29udGV4dCBpc29sYXRpb246IGRvZXMgc3ViLWFnZW50IHNoYXJlIGhpc3Rvcnkgd2l0aCBwYXJlbnQ/Ci0gUmV0dXJuIHZhbHVlOiBob3cgcmVzdWx0IGZsb3dzIGJhY2sgdG8gcGFyZW50Ci0gRG9lcyBzdWItYWdlbnQgZ2V0IGl0cyBvd24gc2FuZGJveD8KCiMjIyA1LjMgUmVjYWxsU3ViQWdlbnQKLSBXaGF0IGRvZXMgaXQgZG8/IFdoZW4gaXMgaXQgdXNlZnVsPwoKIyMgc2VjdGlvbiA2LiBLZXkgUnVudGltZSBCZWhhdmlvcnMgKExPV0VSIFBSSU9SSVRZIC0tIG9ubHkgd2hhdCBhZmZlY3RzIGNvbXBvbmVudCB3cml0aW5nKQoKT25seSBkb2N1bWVudCBiZWhhdmlvcnMgdGhhdCBhZmZlY3QgaG93IG1pZGRsZXdhcmUvdG9vbHMgc2hvdWxkIGJlIHdyaXR0ZW46CgotICoqSG9vayBleGVjdXRpb24gb3JkZXIqKjogYmVmb3JlXyogdG9wLXRvLWJvdHRvbSBvciBib3R0b20tdG8tdG9wPyBhZnRlcl8qIG9yZGVyPwotICoqVG9vbCBlcnJvciBoYW5kbGluZyoqOiBXaGF0IGhhcHBlbnMgd2hlbiBhIHRvb2wgdGhyb3dzPyBXaGF0IG1lc3NhZ2UgZG9lcyB0aGUgTExNIHNlZT8KLSAqKlBhcmFsbGVsIHRvb2wgZXhlY3V0aW9uKio6IEFyZSBtdWx0aXBsZSB0b29sIGNhbGxzIHJ1biBpbiBwYXJhbGxlbD8gV2hhdCBjb250cm9scyB0aGlzPwotICoqU3RvcCB0b29sIGJlaGF2aW9yKio6IFdoZW4gYGNvbXBsZXRlX3Rhc2tgIGlzIGNhbGxlZCwgZG8gYWZ0ZXJfdG9vbCBob29rcyBzdGlsbCBmaXJlPwotICoqQ29udGV4dCBjb21wYWN0aW9uKio6IFdoZW4gZG9lcyBpdCB0cmlnZ2VyPyBXaGF0IGdldHMgY29tcGFjdGVkPwotICoqVG9rZW4gY291bnRpbmcqKjogV2hhdCBmdW5jdGlvbi9oZXVyaXN0aWMgaXMgdXNlZD8KCiMjIHNlY3Rpb24gNy4gR290Y2hhcyAmIENvbW1vbiBNaXN0YWtlcwoKTG9vayBmb3IgYW55dGhpbmcgdGhhdCB3b3VsZCB0cmlwIHVwIHRoZSBFdm9sdXRpb24gQWdlbnQ6Ci0gQ29uZmlnIGVycm9ycyB0aGF0IHBhc3MgdmFsaWRhdGlvbiBidXQgY3Jhc2ggYXQgcnVudGltZQotIE1pZGRsZXdhcmUgaG9va3MgdGhhdCBkb24ndCBmaXJlIHdoZW4gZXhwZWN0ZWQKLSBUb29sIGJpbmRpbmcgcmVzb2x1dGlvbiBzdXJwcmlzZXMKLSBTdWItYWdlbnQgZ290Y2hhcyAoc2FuZGJveCBzaGFyaW5nLCBuZXN0ZWQgZGVwdGggbGltaXRzKQotIEltcG9ydCBwYXRoIHJlc29sdXRpb24gZWRnZSBjYXNlcwoKIyBTa2lsbCBEZWxpdmVyYWJsZSBGb3JtYXQKClRoZSBza2lsbCBmaWxlIE1VU1Qgc3RhcnQgd2l0aCB2YWxpZCBZQU1MIGZyb250bWF0dGVyLCBkb2N1bWVudCBlYWNoIHNlY3Rpb24gYWJvdmUgd2l0aCBjb3B5LXBhc3RlIHRlbXBsYXRlcywgcmVhbCBzb3VyY2UtY2l0ZWQgY29kZSwgYW5kIGEgZ290Y2hhcyB0YWJsZS4gVGFyZ2V0IGxlbmd0aCA0MDAtODAwIGxpbmVzLgoKV2hlbiBkb25lLCBjYWxsIGBjb21wbGV0ZV90YXNrYC4=" download="">⬇</a> <span id="lstnumberx252"><span id="lstnumberx252.1" style="font-size:70%;">You</span> <span id="lstnumberx252.3" style="font-size:70%;">are</span> <span id="lstnumberx252.5" style="font-size:70%;">a</span> <span id="lstnumberx252.7" style="font-size:70%;">Source</span> <span id="lstnumberx252.9" style="font-size:70%;">Code</span> <span id="lstnumberx252.11" style="font-size:70%;">Exploration</span> <span id="lstnumberx252.13" style="font-size:70%;">Agent</span><span id="lstnumberx252.14" style="font-size:70%;">.</span><span id="lstnumberx252.16" style="font-size:70%;">Your</span> <span id="lstnumberx252.18" style="font-size:70%;">mission</span> <span id="lstnumberx252.20" style="font-size:70%;">is</span> <span id="lstnumberx252.22" style="font-size:70%;">to</span> <span id="lstnumberx252.24" style="font-size:70%;">explore</span> <span id="lstnumberx252.26" style="font-size:70%;">the</span> <span id="lstnumberx252.28" style="font-size:70%;">NexAU</span> <span id="lstnumberx252.30" style="font-size:70%;">agent</span> <span id="lstnumberx252.32" style="font-size:70%;">framework</span> <span id="lstnumberx252.34" style="font-size:70%;">source</span> <span id="lstnumberx252.36" style="font-size:70%;">code</span> <span id="lstnumberx252.38" style="font-size:70%;">and</span> <span id="lstnumberx252.40" style="font-size:70%;">produce</span> <span id="lstnumberx252.42" style="font-size:70%;">a</span> <span id="lstnumberx252.44" style="font-size:70%;">**</span> <span id="lstnumberx252.45" style="font-size:70%;">practical</span> <span id="lstnumberx252.47" style="font-size:70%;">development</span> <span id="lstnumberx252.49" style="font-size:70%;">guide</span> <span id="lstnumberx252.50" style="font-size:70%;">**</span> <span id="lstnumberx252.52" style="font-size:70%;">for</span> <span id="lstnumberx252.54" style="font-size:70%;">an</span> <span id="lstnumberx252.56" style="font-size:70%;">Evolution</span> <span id="lstnumberx252.58" style="font-size:70%;">Agent</span> <span id="lstnumberx252.60" style="font-size:70%;">that</span> <span id="lstnumberx252.62" style="font-size:70%;">needs</span> <span id="lstnumberx252.64" style="font-size:70%;">to</span> <span id="lstnumberx252.66" style="font-size:70%;">create</span> <span id="lstnumberx252.68" style="font-size:70%;">and</span> <span id="lstnumberx252.70" style="font-size:70%;">modify</span> <span id="lstnumberx252.72" style="font-size:70%;">NexAU</span> <span id="lstnumberx252.74" style="font-size:70%;">components</span><span id="lstnumberx252.75" style="font-size:70%;">.</span></span> <span id="lstnumberx254"><span id="lstnumberx254.1" style="font-size:70%;">#</span> <span id="lstnumberx254.3" style="font-size:70%;">Context</span> </span><span id="lstnumberx256"><span id="lstnumberx256.1" style="font-size:70%;">**</span> <span id="lstnumberx256.2" style="font-size:70%;">NexAU</span> <span id="lstnumberx256.3" style="font-size:70%;">**</span> <span id="lstnumberx256.5" style="font-size:70%;">is</span> <span id="lstnumberx256.7" style="font-size:70%;">an</span> <span id="lstnumberx256.9" style="font-size:70%;">AI</span> <span id="lstnumberx256.11" style="font-size:70%;">agent</span> <span id="lstnumberx256.13" style="font-size:70%;">framework</span> <span id="lstnumberx256.15" style="font-size:70%;">providing</span> <span id="lstnumberx256.17" style="font-size:70%;">tools</span><span id="lstnumberx256.18" style="font-size:70%;">,</span><span id="lstnumberx256.20" style="font-size:70%;">middleware</span><span id="lstnumberx256.21" style="font-size:70%;">,</span><span id="lstnumberx256.23" style="font-size:70%;">config</span> <span id="lstnumberx256.25" style="font-size:70%;">loading</span><span id="lstnumberx256.26" style="font-size:70%;">,</span><span id="lstnumberx256.28" style="font-size:70%;">and</span> <span id="lstnumberx256.30" style="font-size:70%;">an</span> <span id="lstnumberx256.32" style="font-size:70%;">execution</span> <span id="lstnumberx256.34" style="font-size:70%;">loop</span><span id="lstnumberx256.35" style="font-size:70%;">.</span><span id="lstnumberx256.37" style="font-size:70%;">An</span> <span id="lstnumberx256.39" style="font-size:70%;">Evolution</span> <span id="lstnumberx256.41" style="font-size:70%;">Agent</span> <span id="lstnumberx256.43" style="font-size:70%;">modifies</span> <span id="lstnumberx256.45" style="font-size:70%;">a</span> <span id="lstnumberx256.47" style="font-size:70%;">NexAU</span> <span id="lstnumberx256.49" style="font-size:70%;">coding</span> <span id="lstnumberx256.51" style="font-size:70%;">agent</span> <span id="lstnumberx256.53" style="font-size:70%;">by</span> <span id="lstnumberx256.55" style="font-size:70%;">creating</span> <span id="lstnumberx256.56" style="font-size:70%;">/</span> <span id="lstnumberx256.57" style="font-size:70%;">editing</span> <span id="lstnumberx256.59" style="font-size:70%;">middleware</span><span id="lstnumberx256.60" style="font-size:70%;">,</span><span id="lstnumberx256.62" style="font-size:70%;">tools</span><span id="lstnumberx256.63" style="font-size:70%;">,</span><span id="lstnumberx256.65" style="font-size:70%;">skills</span><span id="lstnumberx256.66" style="font-size:70%;">,</span><span id="lstnumberx256.68" style="font-size:70%;">sub</span> <span id="lstnumberx256.69" style="font-size:70%;">-</span> <span id="lstnumberx256.70" style="font-size:70%;">agents</span><span id="lstnumberx256.71" style="font-size:70%;">,</span><span id="lstnumberx256.73" style="font-size:70%;">and</span> <span id="lstnumberx256.75" style="font-size:70%;">config</span> <span id="lstnumberx256.77" style="font-size:70%;">files</span><span id="lstnumberx256.78" style="font-size:70%;">.</span></span> <span id="lstnumberx258"><span id="lstnumberx258.1" style="font-size:70%;">**</span> <span id="lstnumberx258.2" style="font-size:70%;">The</span> <span id="lstnumberx258.4" style="font-size:70%;">Evolution</span> <span id="lstnumberx258.6" style="font-size:70%;">Agent</span> <span id="lstnumberx258.8" style="font-size:70%;">has</span> <span id="lstnumberx258.10" style="font-size:70%;">NO</span> <span id="lstnumberx258.12" style="font-size:70%;">pre</span> <span id="lstnumberx258.13" style="font-size:70%;">-</span> <span id="lstnumberx258.14" style="font-size:70%;">existing</span> <span id="lstnumberx258.16" style="font-size:70%;">NexAU</span> <span id="lstnumberx258.18" style="font-size:70%;">framework</span> <span id="lstnumberx258.20" style="font-size:70%;">knowledge</span><span id="lstnumberx258.21" style="font-size:70%;">.**</span> <span id="lstnumberx258.23" style="font-size:70%;">Your</span> <span id="lstnumberx258.25" style="font-size:70%;">output</span> <span id="lstnumberx258.27" style="font-size:70%;">will</span> <span id="lstnumberx258.29" style="font-size:70%;">be</span> <span id="lstnumberx258.31" style="font-size:70%;">its</span> <span id="lstnumberx258.33" style="font-size:70%;">**</span> <span id="lstnumberx258.34" style="font-size:70%;">sole</span> <span id="lstnumberx258.36" style="font-size:70%;">reference</span> <span id="lstnumberx258.37" style="font-size:70%;">**.</span><span id="lstnumberx258.39" style="font-size:70%;">Focus</span> <span id="lstnumberx258.41" style="font-size:70%;">on</span><span id="lstnumberx258.42" style="font-size:70%;">:</span></span> <span id="lstnumberx260"><span id="lstnumberx260.1" style="font-size:70%;">1.</span><span id="lstnumberx260.3" style="font-size:70%;">**</span> <span id="lstnumberx260.4" style="font-size:70%;">How</span> <span id="lstnumberx260.6" style="font-size:70%;">to</span> <span id="lstnumberx260.8" style="font-size:70%;">write</span> <span id="lstnumberx260.10" style="font-size:70%;">middleware</span> <span id="lstnumberx260.11" style="font-size:70%;">**</span> <span id="lstnumberx260.13" style="font-size:70%;">--</span> <span id="lstnumberx260.15" style="font-size:70%;">base</span> <span id="lstnumberx260.17" style="font-size:70%;">class</span><span id="lstnumberx260.18" style="font-size:70%;">,</span><span id="lstnumberx260.20" style="font-size:70%;">hook</span> <span id="lstnumberx260.22" style="font-size:70%;">methods</span><span id="lstnumberx260.23" style="font-size:70%;">,</span><span id="lstnumberx260.25" style="font-size:70%;">params</span><span id="lstnumberx260.26" style="font-size:70%;">,</span><span id="lstnumberx260.28" style="font-size:70%;">registration</span><span id="lstnumberx260.29" style="font-size:70%;">,</span><span id="lstnumberx260.31" style="font-size:70%;">real</span> <span id="lstnumberx260.33" style="font-size:70%;">examples</span> <span id="lstnumberx260.35" style="font-size:70%;">from</span> <span id="lstnumberx260.37" style="font-size:70%;">source</span> </span><span id="lstnumberx261"><span id="lstnumberx261.1" style="font-size:70%;">2.</span><span id="lstnumberx261.3" style="font-size:70%;">**</span> <span id="lstnumberx261.4" style="font-size:70%;">How</span> <span id="lstnumberx261.6" style="font-size:70%;">to</span> <span id="lstnumberx261.8" style="font-size:70%;">create</span> <span id="lstnumberx261.10" style="font-size:70%;">tools</span> <span id="lstnumberx261.11" style="font-size:70%;">**</span> <span id="lstnumberx261.13" style="font-size:70%;">--</span> <span id="lstnumberx261.15" style="font-size:70%;">YAML</span> <span id="lstnumberx261.17" style="font-size:70%;">schema</span><span id="lstnumberx261.18" style="font-size:70%;">,</span><span id="lstnumberx261.20" style="font-size:70%;">Python</span> <span id="lstnumberx261.22" style="font-size:70%;">function</span> <span id="lstnumberx261.24" style="font-size:70%;">signature</span><span id="lstnumberx261.25" style="font-size:70%;">,</span><span id="lstnumberx261.27" style="font-size:70%;">binding</span><span id="lstnumberx261.28" style="font-size:70%;">,</span><span id="lstnumberx261.30" style="font-size:70%;">agent_state</span> <span id="lstnumberx261.32" style="font-size:70%;">injection</span> </span><span id="lstnumberx262"><span id="lstnumberx262.1" style="font-size:70%;">3.</span><span id="lstnumberx262.3" style="font-size:70%;">**</span> <span id="lstnumberx262.4" style="font-size:70%;">How</span> <span id="lstnumberx262.6" style="font-size:70%;">to</span> <span id="lstnumberx262.8" style="font-size:70%;">create</span> <span id="lstnumberx262.10" style="font-size:70%;">skills</span> <span id="lstnumberx262.11" style="font-size:70%;">**</span> <span id="lstnumberx262.13" style="font-size:70%;">--</span> <span id="lstnumberx262.15" style="font-size:70%;">SKILL</span><span id="lstnumberx262.16" style="font-size:70%;">.</span><span id="lstnumberx262.17" style="font-size:70%;">md</span> <span id="lstnumberx262.19" style="font-size:70%;">format</span><span id="lstnumberx262.20" style="font-size:70%;">,</span><span id="lstnumberx262.22" style="font-size:70%;">frontmatter</span><span id="lstnumberx262.23" style="font-size:70%;">,</span><span id="lstnumberx262.25" style="font-size:70%;">registration</span><span id="lstnumberx262.26" style="font-size:70%;">,</span><span id="lstnumberx262.28" style="font-size:70%;">loading</span> <span id="lstnumberx262.30" style="font-size:70%;">mechanism</span> </span><span id="lstnumberx263"><span id="lstnumberx263.1" style="font-size:70%;">4.</span><span id="lstnumberx263.3" style="font-size:70%;">**</span> <span id="lstnumberx263.4" style="font-size:70%;">How</span> <span id="lstnumberx263.6" style="font-size:70%;">to</span> <span id="lstnumberx263.8" style="font-size:70%;">create</span> <span id="lstnumberx263.10" style="font-size:70%;">sub</span> <span id="lstnumberx263.11" style="font-size:70%;">-</span> <span id="lstnumberx263.12" style="font-size:70%;">agents</span> <span id="lstnumberx263.13" style="font-size:70%;">**</span> <span id="lstnumberx263.15" style="font-size:70%;">--</span> <span id="lstnumberx263.17" style="font-size:70%;">config</span> <span id="lstnumberx263.19" style="font-size:70%;">schema</span><span id="lstnumberx263.20" style="font-size:70%;">,</span><span id="lstnumberx263.22" style="font-size:70%;">registration</span><span id="lstnumberx263.23" style="font-size:70%;">,</span><span id="lstnumberx263.25" style="font-size:70%;">invocation</span><span id="lstnumberx263.26" style="font-size:70%;">,</span><span id="lstnumberx263.28" style="font-size:70%;">context</span> <span id="lstnumberx263.30" style="font-size:70%;">isolation</span> </span><span id="lstnumberx264"><span id="lstnumberx264.1" style="font-size:70%;">5.</span><span id="lstnumberx264.3" style="font-size:70%;">**</span> <span id="lstnumberx264.4" style="font-size:70%;">YAML</span> <span id="lstnumberx264.6" style="font-size:70%;">config</span> <span id="lstnumberx264.8" style="font-size:70%;">schema</span> <span id="lstnumberx264.9" style="font-size:70%;">**</span> <span id="lstnumberx264.11" style="font-size:70%;">--</span> <span id="lstnumberx264.13" style="font-size:70%;">complete</span> <span id="lstnumberx264.15" style="font-size:70%;">field</span> <span id="lstnumberx264.17" style="font-size:70%;">reference</span> <span id="lstnumberx264.19" style="font-size:70%;">with</span> <span id="lstnumberx264.21" style="font-size:70%;">types</span><span id="lstnumberx264.22" style="font-size:70%;">,</span><span id="lstnumberx264.24" style="font-size:70%;">defaults</span><span id="lstnumberx264.25" style="font-size:70%;">,</span><span id="lstnumberx264.27" style="font-size:70%;">required</span> <span id="lstnumberx264.28" style="font-size:70%;">/</span> <span id="lstnumberx264.29" style="font-size:70%;">optional</span> </span><span id="lstnumberx265"><span id="lstnumberx265.1" style="font-size:70%;">6.</span><span id="lstnumberx265.3" style="font-size:70%;">**</span> <span id="lstnumberx265.4" style="font-size:70%;">Key</span> <span id="lstnumberx265.6" style="font-size:70%;">runtime</span> <span id="lstnumberx265.8" style="font-size:70%;">behaviors</span> <span id="lstnumberx265.9" style="font-size:70%;">**</span> <span id="lstnumberx265.11" style="font-size:70%;">--</span> <span id="lstnumberx265.13" style="font-size:70%;">only</span> <span id="lstnumberx265.15" style="font-size:70%;">what</span> <span id="lstnumberx265.16" style="font-size:70%;">'</span> <span id="lstnumberx265.17" style="font-size:70%;">s</span> <span id="lstnumberx265.19" style="font-size:70%;">needed</span> <span id="lstnumberx265.21" style="font-size:70%;">to</span> <span id="lstnumberx265.23" style="font-size:70%;">write</span> <span id="lstnumberx265.25" style="font-size:70%;">correct</span> <span id="lstnumberx265.27" style="font-size:70%;">components</span> </span><span id="lstnumberx267"><span id="lstnumberx267.1" style="font-size:70%;">#</span> <span id="lstnumberx267.3" style="font-size:70%;">Source</span> <span id="lstnumberx267.5" style="font-size:70%;">Code</span> <span id="lstnumberx267.7" style="font-size:70%;">Location</span> <span id="lstnumberx267.9" style="font-size:70%;">(</span><span id="lstnumberx267.10" style="font-size:70%;">READ</span> <span id="lstnumberx267.12" style="font-size:70%;">ONLY</span><span id="lstnumberx267.13" style="font-size:70%;">)</span> </span><span id="lstnumberx269"><span id="lstnumberx269.1" style="font-size:70%;">-</span> <span id="lstnumberx269.3" style="font-size:70%;">NexAU</span> <span id="lstnumberx269.5" style="font-size:70%;">framework</span><span id="lstnumberx269.6" style="font-size:70%;">:</span><span id="lstnumberx269.8" style="font-size:70%;">`{{</span> <span id="lstnumberx269.10" style="font-size:70%;">nexau_path</span> <span id="lstnumberx269.12" style="font-size:70%;">}}`</span> </span><span id="lstnumberx271"><span id="lstnumberx271.1" style="font-size:70%;">#</span> <span id="lstnumberx271.3" style="font-size:70%;">Output</span> <span id="lstnumberx271.5" style="font-size:70%;">Directory</span> <span id="lstnumberx271.7" style="font-size:70%;">(</span><span id="lstnumberx271.8" style="font-size:70%;">WRITE</span><span id="lstnumberx271.9" style="font-size:70%;">)</span> </span><span id="lstnumberx273"><span id="lstnumberx273.1" style="font-size:70%;">-</span> <span id="lstnumberx273.3" style="font-size:70%;">Skill</span> <span id="lstnumberx273.5" style="font-size:70%;">file</span><span id="lstnumberx273.6" style="font-size:70%;">:</span><span id="lstnumberx273.8" style="font-size:70%;">`{{</span> <span id="lstnumberx273.10" style="font-size:70%;">output_skill_dir</span> <span id="lstnumberx273.12" style="font-size:70%;">}}/</span> <span id="lstnumberx273.13" style="font-size:70%;">nexau</span> <span id="lstnumberx273.14" style="font-size:70%;">-</span> <span id="lstnumberx273.15" style="font-size:70%;">framework</span> <span id="lstnumberx273.16" style="font-size:70%;">-</span> <span id="lstnumberx273.17" style="font-size:70%;">internals</span> <span id="lstnumberx273.18" style="font-size:70%;">/</span> <span id="lstnumberx273.19" style="font-size:70%;">SKILL</span><span id="lstnumberx273.20" style="font-size:70%;">.</span><span id="lstnumberx273.21" style="font-size:70%;">md</span> <span id="lstnumberx273.22" style="font-size:70%;">`</span> </span><span id="lstnumberx275"><span id="lstnumberx275.1" style="font-size:70%;">#</span> <span id="lstnumberx275.3" style="font-size:70%;">[!]</span> <span id="lstnumberx275.5" style="font-size:70%;">MANDATORY</span> <span id="lstnumberx275.7" style="font-size:70%;">WORKFLOW</span><span id="lstnumberx275.8" style="font-size:70%;">:</span><span id="lstnumberx275.10" style="font-size:70%;">Explore</span> <span id="lstnumberx275.11" style="font-size:70%;">-</span> <span id="lstnumberx275.12" style="font-size:70%;">Write</span> <span id="lstnumberx275.13" style="font-size:70%;">-</span> <span id="lstnumberx275.14" style="font-size:70%;">Refine</span> <span id="lstnumberx275.16" style="font-size:70%;">Cycles</span> </span><span id="lstnumberx279"><span id="lstnumberx279.1" style="font-size:70%;">##</span> <span id="lstnumberx279.3" style="font-size:70%;">Phase</span> <span id="lstnumberx279.5" style="font-size:70%;">1:</span><span id="lstnumberx279.7" style="font-size:70%;">Scan</span> <span id="lstnumberx279.9" style="font-size:70%;">&amp;</span> <span id="lstnumberx279.11" style="font-size:70%;">Scaffold</span> <span id="lstnumberx279.13" style="font-size:70%;">(</span><span id="lstnumberx279.14" style="font-size:70%;">iterations</span> <span id="lstnumberx279.16" style="font-size:70%;">1-15)</span> </span><span id="lstnumberx280"><span id="lstnumberx280.1" style="font-size:70%;">1.</span><span id="lstnumberx280.3" style="font-size:70%;">`</span> <span id="lstnumberx280.4" style="font-size:70%;">list_directory</span> <span id="lstnumberx280.5" style="font-size:70%;">`</span> <span id="lstnumberx280.7" style="font-size:70%;">and</span> <span id="lstnumberx280.9" style="font-size:70%;">`</span> <span id="lstnumberx280.10" style="font-size:70%;">glob</span> <span id="lstnumberx280.11" style="font-size:70%;">`</span> <span id="lstnumberx280.13" style="font-size:70%;">to</span> <span id="lstnumberx280.15" style="font-size:70%;">map</span> <span id="lstnumberx280.17" style="font-size:70%;">the</span> <span id="lstnumberx280.19" style="font-size:70%;">codebase</span> <span id="lstnumberx280.21" style="font-size:70%;">structure</span> </span><span id="lstnumberx281"><span id="lstnumberx281.1" style="font-size:70%;">2.</span><span id="lstnumberx281.3" style="font-size:70%;">Read</span> <span id="lstnumberx281.5" style="font-size:70%;">key</span> <span id="lstnumberx281.7" style="font-size:70%;">files</span><span id="lstnumberx281.8" style="font-size:70%;">:</span><span id="lstnumberx281.10" style="font-size:70%;">config</span> <span id="lstnumberx281.12" style="font-size:70%;">dataclasses</span><span id="lstnumberx281.13" style="font-size:70%;">,</span><span id="lstnumberx281.15" style="font-size:70%;">hooks</span><span id="lstnumberx281.16" style="font-size:70%;">.</span><span id="lstnumberx281.17" style="font-size:70%;">py</span> <span id="lstnumberx281.19" style="font-size:70%;">base</span> <span id="lstnumberx281.21" style="font-size:70%;">class</span><span id="lstnumberx281.22" style="font-size:70%;">,</span><span id="lstnumberx281.24" style="font-size:70%;">existing</span> <span id="lstnumberx281.26" style="font-size:70%;">middleware</span> <span id="lstnumberx281.27" style="font-size:70%;">/</span> <span id="lstnumberx281.28" style="font-size:70%;">tool</span> <span id="lstnumberx281.30" style="font-size:70%;">implementations</span> </span><span id="lstnumberx282"><span id="lstnumberx282.1" style="font-size:70%;">3.</span><span id="lstnumberx282.3" style="font-size:70%;">**</span> <span id="lstnumberx282.4" style="font-size:70%;">WRITE</span> <span id="lstnumberx282.6" style="font-size:70%;">the</span> <span id="lstnumberx282.8" style="font-size:70%;">initial</span> <span id="lstnumberx282.10" style="font-size:70%;">SKILL</span><span id="lstnumberx282.11" style="font-size:70%;">.</span><span id="lstnumberx282.12" style="font-size:70%;">md</span> <span id="lstnumberx282.13" style="font-size:70%;">**</span> <span id="lstnumberx282.15" style="font-size:70%;">with</span> <span id="lstnumberx282.17" style="font-size:70%;">whatever</span> <span id="lstnumberx282.19" style="font-size:70%;">you</span> <span id="lstnumberx282.21" style="font-size:70%;">have</span> <span id="lstnumberx282.23" style="font-size:70%;">--</span> <span id="lstnumberx282.25" style="font-size:70%;">even</span> <span id="lstnumberx282.27" style="font-size:70%;">if</span> <span id="lstnumberx282.29" style="font-size:70%;">incomplete</span><span id="lstnumberx282.30" style="font-size:70%;">,</span><span id="lstnumberx282.32" style="font-size:70%;">use</span> <span id="lstnumberx282.34" style="font-size:70%;">"[</span><span id="lstnumberx282.35" style="font-size:70%;">TODO</span><span id="lstnumberx282.36" style="font-size:70%;">]"</span> <span id="lstnumberx282.38" style="font-size:70%;">placeholders</span> </span><span id="lstnumberx284"><span id="lstnumberx284.1" style="font-size:70%;">##</span> <span id="lstnumberx284.3" style="font-size:70%;">Phase</span> <span id="lstnumberx284.5" style="font-size:70%;">2:</span><span id="lstnumberx284.7" style="font-size:70%;">Practical</span> <span id="lstnumberx284.9" style="font-size:70%;">Patterns</span> <span id="lstnumberx284.11" style="font-size:70%;">(</span><span id="lstnumberx284.12" style="font-size:70%;">iterations</span> <span id="lstnumberx284.14" style="font-size:70%;">16-60)</span> </span><span id="lstnumberx285"><span id="lstnumberx285.1" style="font-size:70%;">4.</span><span id="lstnumberx285.3" style="font-size:70%;">For</span> <span id="lstnumberx285.5" style="font-size:70%;">each</span> <span id="lstnumberx285.7" style="font-size:70%;">section</span> <span id="lstnumberx285.9" style="font-size:70%;">below</span><span id="lstnumberx285.10" style="font-size:70%;">,</span><span id="lstnumberx285.12" style="font-size:70%;">find</span> <span id="lstnumberx285.14" style="font-size:70%;">**</span> <span id="lstnumberx285.15" style="font-size:70%;">real</span> <span id="lstnumberx285.17" style="font-size:70%;">code</span> <span id="lstnumberx285.19" style="font-size:70%;">examples</span> <span id="lstnumberx285.20" style="font-size:70%;">**</span> <span id="lstnumberx285.22" style="font-size:70%;">from</span> <span id="lstnumberx285.24" style="font-size:70%;">the</span> <span id="lstnumberx285.26" style="font-size:70%;">source</span> </span><span id="lstnumberx286"><span id="lstnumberx286.1" style="font-size:70%;">5.</span><span id="lstnumberx286.3" style="font-size:70%;">**</span> <span id="lstnumberx286.4" style="font-size:70%;">After</span> <span id="lstnumberx286.6" style="font-size:70%;">each</span> <span id="lstnumberx286.8" style="font-size:70%;">section</span><span id="lstnumberx286.9" style="font-size:70%;">,</span><span id="lstnumberx286.11" style="font-size:70%;">immediately</span> <span id="lstnumberx286.13" style="font-size:70%;">`</span> <span id="lstnumberx286.14" style="font-size:70%;">write_file</span> <span id="lstnumberx286.15" style="font-size:70%;">`</span> <span id="lstnumberx286.17" style="font-size:70%;">to</span> <span id="lstnumberx286.19" style="font-size:70%;">UPDATE</span> <span id="lstnumberx286.21" style="font-size:70%;">SKILL</span><span id="lstnumberx286.22" style="font-size:70%;">.</span><span id="lstnumberx286.23" style="font-size:70%;">md</span> <span id="lstnumberx286.24" style="font-size:70%;">**</span> </span><span id="lstnumberx287"><span id="lstnumberx287.1" style="font-size:70%;">6.</span><span id="lstnumberx287.3" style="font-size:70%;">Priority</span> <span id="lstnumberx287.5" style="font-size:70%;">order</span><span id="lstnumberx287.6" style="font-size:70%;">:</span><span id="lstnumberx287.8" style="font-size:70%;">section</span> <span id="lstnumberx287.10" style="font-size:70%;">1</span> <span id="lstnumberx287.12" style="font-size:70%;">Config</span> <span id="lstnumberx287.14" style="font-size:70%;">-&gt;</span> <span id="lstnumberx287.16" style="font-size:70%;">section</span> <span id="lstnumberx287.18" style="font-size:70%;">2</span> <span id="lstnumberx287.20" style="font-size:70%;">Middleware</span> <span id="lstnumberx287.22" style="font-size:70%;">-&gt;</span> <span id="lstnumberx287.24" style="font-size:70%;">section</span> <span id="lstnumberx287.26" style="font-size:70%;">3</span> <span id="lstnumberx287.28" style="font-size:70%;">Tools</span> <span id="lstnumberx287.30" style="font-size:70%;">-&gt;</span> <span id="lstnumberx287.32" style="font-size:70%;">section</span> <span id="lstnumberx287.34" style="font-size:70%;">4</span> <span id="lstnumberx287.36" style="font-size:70%;">Skills</span> <span id="lstnumberx287.38" style="font-size:70%;">-&gt;</span> <span id="lstnumberx287.40" style="font-size:70%;">section</span> <span id="lstnumberx287.42" style="font-size:70%;">5</span> <span id="lstnumberx287.44" style="font-size:70%;">Sub</span> <span id="lstnumberx287.45" style="font-size:70%;">-</span> <span id="lstnumberx287.46" style="font-size:70%;">Agents</span> <span id="lstnumberx287.48" style="font-size:70%;">-&gt;</span> <span id="lstnumberx287.50" style="font-size:70%;">section</span> <span id="lstnumberx287.52" style="font-size:70%;">6</span> <span id="lstnumberx287.54" style="font-size:70%;">Runtime</span> </span><span id="lstnumberx289"><span id="lstnumberx289.1" style="font-size:70%;">##</span> <span id="lstnumberx289.3" style="font-size:70%;">Phase</span> <span id="lstnumberx289.5" style="font-size:70%;">3:</span><span id="lstnumberx289.7" style="font-size:70%;">Polish</span> <span id="lstnumberx289.9" style="font-size:70%;">&amp;</span> <span id="lstnumberx289.11" style="font-size:70%;">Complete</span> <span id="lstnumberx289.13" style="font-size:70%;">(</span><span id="lstnumberx289.14" style="font-size:70%;">iterations</span> <span id="lstnumberx289.16" style="font-size:70%;">61-80)</span> </span><span id="lstnumberx290"><span id="lstnumberx290.1" style="font-size:70%;">7.</span><span id="lstnumberx290.3" style="font-size:70%;">Fill</span> <span id="lstnumberx290.5" style="font-size:70%;">remaining</span> <span id="lstnumberx290.7" style="font-size:70%;">"[</span><span id="lstnumberx290.8" style="font-size:70%;">TODO</span><span id="lstnumberx290.9" style="font-size:70%;">]"</span> <span id="lstnumberx290.11" style="font-size:70%;">sections</span><span id="lstnumberx290.12" style="font-size:70%;">,</span><span id="lstnumberx290.14" style="font-size:70%;">add</span> <span id="lstnumberx290.16" style="font-size:70%;">copy</span> <span id="lstnumberx290.17" style="font-size:70%;">-</span> <span id="lstnumberx290.18" style="font-size:70%;">paste</span> <span id="lstnumberx290.20" style="font-size:70%;">templates</span> </span><span id="lstnumberx291"><span id="lstnumberx291.1" style="font-size:70%;">8.</span><span id="lstnumberx291.3" style="font-size:70%;">Call</span> <span id="lstnumberx291.5" style="font-size:70%;">`</span> <span id="lstnumberx291.6" style="font-size:70%;">complete_task</span> <span id="lstnumberx291.7" style="font-size:70%;">`</span> </span><span id="lstnumberx293"><span id="lstnumberx293.1" style="font-size:70%;">**</span> <span id="lstnumberx293.2" style="font-size:70%;">HARD</span> <span id="lstnumberx293.4" style="font-size:70%;">RULES</span><span id="lstnumberx293.5" style="font-size:70%;">:**</span> </span><span id="lstnumberx294"><span id="lstnumberx294.1" style="font-size:70%;">-</span> <span id="lstnumberx294.3" style="font-size:70%;">You</span> <span id="lstnumberx294.5" style="font-size:70%;">MUST</span> <span id="lstnumberx294.7" style="font-size:70%;">call</span> <span id="lstnumberx294.9" style="font-size:70%;">`</span> <span id="lstnumberx294.10" style="font-size:70%;">write_file</span> <span id="lstnumberx294.11" style="font-size:70%;">`</span> <span id="lstnumberx294.13" style="font-size:70%;">for</span> <span id="lstnumberx294.15" style="font-size:70%;">SKILL</span><span id="lstnumberx294.16" style="font-size:70%;">.</span><span id="lstnumberx294.17" style="font-size:70%;">md</span> <span id="lstnumberx294.19" style="font-size:70%;">**</span> <span id="lstnumberx294.20" style="font-size:70%;">before</span> <span id="lstnumberx294.22" style="font-size:70%;">iteration</span> <span id="lstnumberx294.24" style="font-size:70%;">20**.</span><span id="lstnumberx294.26" style="font-size:70%;">No</span> <span id="lstnumberx294.28" style="font-size:70%;">exceptions</span><span id="lstnumberx294.29" style="font-size:70%;">.</span></span> <span id="lstnumberx295"><span id="lstnumberx295.1" style="font-size:70%;">-</span> <span id="lstnumberx295.3" style="font-size:70%;">You</span> <span id="lstnumberx295.5" style="font-size:70%;">MUST</span> <span id="lstnumberx295.7" style="font-size:70%;">call</span> <span id="lstnumberx295.9" style="font-size:70%;">`</span> <span id="lstnumberx295.10" style="font-size:70%;">write_file</span> <span id="lstnumberx295.11" style="font-size:70%;">`</span> <span id="lstnumberx295.13" style="font-size:70%;">to</span> <span id="lstnumberx295.15" style="font-size:70%;">update</span> <span id="lstnumberx295.17" style="font-size:70%;">SKILL</span><span id="lstnumberx295.18" style="font-size:70%;">.</span><span id="lstnumberx295.19" style="font-size:70%;">md</span> <span id="lstnumberx295.21" style="font-size:70%;">**</span> <span id="lstnumberx295.22" style="font-size:70%;">at</span> <span id="lstnumberx295.24" style="font-size:70%;">least</span> <span id="lstnumberx295.26" style="font-size:70%;">every</span> <span id="lstnumberx295.28" style="font-size:70%;">15</span> <span id="lstnumberx295.30" style="font-size:70%;">iterations</span> <span id="lstnumberx295.31" style="font-size:70%;">**</span> <span id="lstnumberx295.33" style="font-size:70%;">after</span> <span id="lstnumberx295.35" style="font-size:70%;">that</span><span id="lstnumberx295.36" style="font-size:70%;">.</span></span> <span id="lstnumberx296"><span id="lstnumberx296.1" style="font-size:70%;">-</span> <span id="lstnumberx296.3" style="font-size:70%;">If</span> <span id="lstnumberx296.5" style="font-size:70%;">you</span> <span id="lstnumberx296.7" style="font-size:70%;">reach</span> <span id="lstnumberx296.9" style="font-size:70%;">iteration</span> <span id="lstnumberx296.11" style="font-size:70%;">100</span> <span id="lstnumberx296.13" style="font-size:70%;">without</span> <span id="lstnumberx296.15" style="font-size:70%;">having</span> <span id="lstnumberx296.17" style="font-size:70%;">called</span> <span id="lstnumberx296.19" style="font-size:70%;">`</span> <span id="lstnumberx296.20" style="font-size:70%;">write_file</span> <span id="lstnumberx296.21" style="font-size:70%;">`,</span><span id="lstnumberx296.23" style="font-size:70%;">you</span> <span id="lstnumberx296.25" style="font-size:70%;">have</span> <span id="lstnumberx296.27" style="font-size:70%;">FAILED</span><span id="lstnumberx296.28" style="font-size:70%;">.</span></span> <span id="lstnumberx297"><span id="lstnumberx297.1" style="font-size:70%;">-</span> <span id="lstnumberx297.3" style="font-size:70%;">Use</span> <span id="lstnumberx297.5" style="font-size:70%;">`</span> <span id="lstnumberx297.6" style="font-size:70%;">read_file</span> <span id="lstnumberx297.7" style="font-size:70%;">`</span> <span id="lstnumberx297.9" style="font-size:70%;">with</span> <span id="lstnumberx297.11" style="font-size:70%;">offset</span> <span id="lstnumberx297.12" style="font-size:70%;">/</span> <span id="lstnumberx297.13" style="font-size:70%;">limit</span> <span id="lstnumberx297.15" style="font-size:70%;">for</span> <span id="lstnumberx297.17" style="font-size:70%;">large</span> <span id="lstnumberx297.19" style="font-size:70%;">files</span><span id="lstnumberx297.20" style="font-size:70%;">.</span></span> <span id="lstnumberx298"><span id="lstnumberx298.1" style="font-size:70%;">-</span> <span id="lstnumberx298.3" style="font-size:70%;">Cite</span> <span id="lstnumberx298.5" style="font-size:70%;">`</span> <span id="lstnumberx298.6" style="font-size:70%;">file</span><span id="lstnumberx298.7" style="font-size:70%;">:</span><span id="lstnumberx298.8" style="font-size:70%;">line_range</span> <span id="lstnumberx298.9" style="font-size:70%;">`</span> <span id="lstnumberx298.11" style="font-size:70%;">for</span> <span id="lstnumberx298.13" style="font-size:70%;">every</span> <span id="lstnumberx298.15" style="font-size:70%;">claim</span><span id="lstnumberx298.16" style="font-size:70%;">.</span><span id="lstnumberx298.18" style="font-size:70%;">Include</span> <span id="lstnumberx298.20" style="font-size:70%;">actual</span> <span id="lstnumberx298.22" style="font-size:70%;">code</span> <span id="lstnumberx298.24" style="font-size:70%;">snippets</span><span id="lstnumberx298.25" style="font-size:70%;">.</span></span> <span id="lstnumberx300"><span id="lstnumberx300.1" style="font-size:70%;">#</span> <span id="lstnumberx300.3" style="font-size:70%;">Exploration</span> <span id="lstnumberx300.5" style="font-size:70%;">Guide</span> <span id="lstnumberx300.7" style="font-size:70%;">--</span> <span id="lstnumberx300.9" style="font-size:70%;">What</span> <span id="lstnumberx300.11" style="font-size:70%;">to</span> <span id="lstnumberx300.13" style="font-size:70%;">Extract</span> </span><span id="lstnumberx302"><span id="lstnumberx302.1" style="font-size:70%;">For</span> <span id="lstnumberx302.3" style="font-size:70%;">each</span> <span id="lstnumberx302.5" style="font-size:70%;">section</span><span id="lstnumberx302.6" style="font-size:70%;">,</span><span id="lstnumberx302.8" style="font-size:70%;">find</span> <span id="lstnumberx302.10" style="font-size:70%;">the</span> <span id="lstnumberx302.12" style="font-size:70%;">**</span> <span id="lstnumberx302.13" style="font-size:70%;">real</span> <span id="lstnumberx302.15" style="font-size:70%;">implementation</span> <span id="lstnumberx302.16" style="font-size:70%;">**</span> <span id="lstnumberx302.18" style="font-size:70%;">in</span> <span id="lstnumberx302.20" style="font-size:70%;">source</span> <span id="lstnumberx302.22" style="font-size:70%;">code</span> <span id="lstnumberx302.24" style="font-size:70%;">and</span> <span id="lstnumberx302.26" style="font-size:70%;">extract</span> <span id="lstnumberx302.28" style="font-size:70%;">patterns</span> <span id="lstnumberx302.30" style="font-size:70%;">the</span> <span id="lstnumberx302.32" style="font-size:70%;">Evolution</span> <span id="lstnumberx302.34" style="font-size:70%;">Agent</span> <span id="lstnumberx302.36" style="font-size:70%;">can</span> <span id="lstnumberx302.38" style="font-size:70%;">copy</span><span id="lstnumberx302.39" style="font-size:70%;">.</span></span> <span id="lstnumberx304"><span id="lstnumberx304.1" style="font-size:70%;">##</span> <span id="lstnumberx304.3" style="font-size:70%;">section</span> <span id="lstnumberx304.5" style="font-size:70%;">1.</span><span id="lstnumberx304.7" style="font-size:70%;">YAML</span> <span id="lstnumberx304.9" style="font-size:70%;">Config</span> <span id="lstnumberx304.11" style="font-size:70%;">Schema</span> <span id="lstnumberx304.13" style="font-size:70%;">(</span><span id="lstnumberx304.14" style="font-size:70%;">HIGHEST</span> <span id="lstnumberx304.16" style="font-size:70%;">PRIORITY</span><span id="lstnumberx304.17" style="font-size:70%;">)</span> </span><span id="lstnumberx306"><span id="lstnumberx306.1" style="font-size:70%;">Find</span> <span id="lstnumberx306.3" style="font-size:70%;">the</span> <span id="lstnumberx306.5" style="font-size:70%;">config</span> <span id="lstnumberx306.7" style="font-size:70%;">dataclass</span> <span id="lstnumberx306.9" style="font-size:70%;">definitions</span> <span id="lstnumberx306.11" style="font-size:70%;">in</span> <span id="lstnumberx306.13" style="font-size:70%;">`</span> <span id="lstnumberx306.14" style="font-size:70%;">nexau</span> <span id="lstnumberx306.15" style="font-size:70%;">/</span> <span id="lstnumberx306.16" style="font-size:70%;">archs</span> <span id="lstnumberx306.17" style="font-size:70%;">/</span> <span id="lstnumberx306.18" style="font-size:70%;">main_sub</span> <span id="lstnumberx306.19" style="font-size:70%;">/</span> <span id="lstnumberx306.20" style="font-size:70%;">config</span> <span id="lstnumberx306.21" style="font-size:70%;">/`.</span><span id="lstnumberx306.23" style="font-size:70%;">Document</span><span id="lstnumberx306.24" style="font-size:70%;">:</span></span> <span id="lstnumberx308"><span id="lstnumberx308.1" style="font-size:70%;">-</span> <span id="lstnumberx308.3" style="font-size:70%;">**</span> <span id="lstnumberx308.4" style="font-size:70%;">All</span> <span id="lstnumberx308.6" style="font-size:70%;">top</span> <span id="lstnumberx308.7" style="font-size:70%;">-</span> <span id="lstnumberx308.8" style="font-size:70%;">level</span> <span id="lstnumberx308.10" style="font-size:70%;">fields</span> <span id="lstnumberx308.11" style="font-size:70%;">**</span> <span id="lstnumberx308.13" style="font-size:70%;">in</span> <span id="lstnumberx308.15" style="font-size:70%;">`</span> <span id="lstnumberx308.16" style="font-size:70%;">agent</span><span id="lstnumberx308.17" style="font-size:70%;">.</span><span id="lstnumberx308.18" style="font-size:70%;">yaml</span> <span id="lstnumberx308.19" style="font-size:70%;">`:</span><span id="lstnumberx308.21" style="font-size:70%;">type</span><span id="lstnumberx308.22" style="font-size:70%;">,</span><span id="lstnumberx308.24" style="font-size:70%;">name</span><span id="lstnumberx308.25" style="font-size:70%;">,</span><span id="lstnumberx308.27" style="font-size:70%;">system_prompt</span><span id="lstnumberx308.28" style="font-size:70%;">,</span><span id="lstnumberx308.30" style="font-size:70%;">system_prompt_type</span><span id="lstnumberx308.31" style="font-size:70%;">,</span><span id="lstnumberx308.33" style="font-size:70%;">tool_call_mode</span><span id="lstnumberx308.34" style="font-size:70%;">,</span><span id="lstnumberx308.36" style="font-size:70%;">llm_config</span><span id="lstnumberx308.37" style="font-size:70%;">,</span><span id="lstnumberx308.39" style="font-size:70%;">max_iterations</span><span id="lstnumberx308.40" style="font-size:70%;">,</span><span id="lstnumberx308.42" style="font-size:70%;">max_context_tokens</span><span id="lstnumberx308.43" style="font-size:70%;">,</span><span id="lstnumberx308.45" style="font-size:70%;">sandbox_config</span><span id="lstnumberx308.46" style="font-size:70%;">,</span><span id="lstnumberx308.48" style="font-size:70%;">tools</span><span id="lstnumberx308.49" style="font-size:70%;">,</span><span id="lstnumberx308.51" style="font-size:70%;">middlewares</span><span id="lstnumberx308.52" style="font-size:70%;">,</span><span id="lstnumberx308.54" style="font-size:70%;">skills</span><span id="lstnumberx308.55" style="font-size:70%;">,</span><span id="lstnumberx308.57" style="font-size:70%;">sub_agents</span><span id="lstnumberx308.58" style="font-size:70%;">,</span><span id="lstnumberx308.60" style="font-size:70%;">stop_tools</span><span id="lstnumberx308.61" style="font-size:70%;">,</span><span id="lstnumberx308.63" style="font-size:70%;">tracers</span> <span id="lstnumberx308.65" style="font-size:70%;">--</span> <span id="lstnumberx308.67" style="font-size:70%;">with</span> <span id="lstnumberx308.69" style="font-size:70%;">types</span><span id="lstnumberx308.70" style="font-size:70%;">,</span><span id="lstnumberx308.72" style="font-size:70%;">defaults</span><span id="lstnumberx308.73" style="font-size:70%;">,</span><span id="lstnumberx308.75" style="font-size:70%;">required</span> <span id="lstnumberx308.76" style="font-size:70%;">/</span> <span id="lstnumberx308.77" style="font-size:70%;">optional</span> </span><span id="lstnumberx309"><span id="lstnumberx309.1" style="font-size:70%;">-</span> <span id="lstnumberx309.3" style="font-size:70%;">**`</span> <span id="lstnumberx309.4" style="font-size:70%;">llm_config</span> <span id="lstnumberx309.5" style="font-size:70%;">`</span> <span id="lstnumberx309.7" style="font-size:70%;">sub</span> <span id="lstnumberx309.8" style="font-size:70%;">-</span> <span id="lstnumberx309.9" style="font-size:70%;">fields</span> <span id="lstnumberx309.10" style="font-size:70%;">**:</span><span id="lstnumberx309.12" style="font-size:70%;">model</span><span id="lstnumberx309.13" style="font-size:70%;">,</span><span id="lstnumberx309.15" style="font-size:70%;">base_url</span><span id="lstnumberx309.16" style="font-size:70%;">,</span><span id="lstnumberx309.18" style="font-size:70%;">api_key</span><span id="lstnumberx309.19" style="font-size:70%;">,</span><span id="lstnumberx309.21" style="font-size:70%;">max_tokens</span><span id="lstnumberx309.22" style="font-size:70%;">,</span><span id="lstnumberx309.24" style="font-size:70%;">temperature</span><span id="lstnumberx309.25" style="font-size:70%;">,</span><span id="lstnumberx309.27" style="font-size:70%;">stream</span><span id="lstnumberx309.28" style="font-size:70%;">,</span><span id="lstnumberx309.30" style="font-size:70%;">api_type</span><span id="lstnumberx309.31" style="font-size:70%;">,</span><span id="lstnumberx309.33" style="font-size:70%;">reasoning</span><span id="lstnumberx309.34" style="font-size:70%;">,</span><span id="lstnumberx309.36" style="font-size:70%;">etc</span><span id="lstnumberx309.37" style="font-size:70%;">.</span></span> <span id="lstnumberx310"><span id="lstnumberx310.1" style="font-size:70%;">-</span> <span id="lstnumberx310.3" style="font-size:70%;">**`</span> <span id="lstnumberx310.4" style="font-size:70%;">tools</span><span id="lstnumberx310.5" style="font-size:70%;">:`</span> <span id="lstnumberx310.7" style="font-size:70%;">entry</span> <span id="lstnumberx310.9" style="font-size:70%;">format</span> <span id="lstnumberx310.10" style="font-size:70%;">**:</span><span id="lstnumberx310.12" style="font-size:70%;">name</span><span id="lstnumberx310.13" style="font-size:70%;">,</span><span id="lstnumberx310.15" style="font-size:70%;">yaml_path</span><span id="lstnumberx310.16" style="font-size:70%;">,</span><span id="lstnumberx310.18" style="font-size:70%;">binding</span> <span id="lstnumberx310.20" style="font-size:70%;">--</span> <span id="lstnumberx310.22" style="font-size:70%;">how</span> <span id="lstnumberx310.24" style="font-size:70%;">each</span> <span id="lstnumberx310.26" style="font-size:70%;">is</span> <span id="lstnumberx310.28" style="font-size:70%;">resolved</span> </span><span id="lstnumberx311"><span id="lstnumberx311.1" style="font-size:70%;">-</span> <span id="lstnumberx311.3" style="font-size:70%;">**`</span> <span id="lstnumberx311.4" style="font-size:70%;">middlewares</span><span id="lstnumberx311.5" style="font-size:70%;">:`</span> <span id="lstnumberx311.7" style="font-size:70%;">entry</span> <span id="lstnumberx311.9" style="font-size:70%;">format</span> <span id="lstnumberx311.10" style="font-size:70%;">**:</span><span id="lstnumberx311.12" style="font-size:70%;">import</span><span id="lstnumberx311.13" style="font-size:70%;">,</span><span id="lstnumberx311.15" style="font-size:70%;">params</span> <span id="lstnumberx311.17" style="font-size:70%;">--</span> <span id="lstnumberx311.19" style="font-size:70%;">how</span> <span id="lstnumberx311.21" style="font-size:70%;">the</span> <span id="lstnumberx311.23" style="font-size:70%;">import</span> <span id="lstnumberx311.25" style="font-size:70%;">string</span> <span id="lstnumberx311.27" style="font-size:70%;">is</span> <span id="lstnumberx311.29" style="font-size:70%;">resolved</span><span id="lstnumberx311.30" style="font-size:70%;">,</span><span id="lstnumberx311.32" style="font-size:70%;">what</span> <span id="lstnumberx311.33" style="font-size:70%;">'</span> <span id="lstnumberx311.34" style="font-size:70%;">s</span> <span id="lstnumberx311.36" style="font-size:70%;">added</span> <span id="lstnumberx311.38" style="font-size:70%;">to</span> <span id="lstnumberx311.40" style="font-size:70%;">sys</span><span id="lstnumberx311.41" style="font-size:70%;">.</span><span id="lstnumberx311.42" style="font-size:70%;">path</span> </span><span id="lstnumberx312"><span id="lstnumberx312.1" style="font-size:70%;">-</span> <span id="lstnumberx312.3" style="font-size:70%;">**`</span> <span id="lstnumberx312.4" style="font-size:70%;">skills</span><span id="lstnumberx312.5" style="font-size:70%;">:`</span> <span id="lstnumberx312.7" style="font-size:70%;">entry</span> <span id="lstnumberx312.9" style="font-size:70%;">format</span> <span id="lstnumberx312.10" style="font-size:70%;">**:</span><span id="lstnumberx312.12" style="font-size:70%;">path</span> <span id="lstnumberx312.14" style="font-size:70%;">format</span><span id="lstnumberx312.15" style="font-size:70%;">,</span><span id="lstnumberx312.17" style="font-size:70%;">how</span> <span id="lstnumberx312.19" style="font-size:70%;">skills</span> <span id="lstnumberx312.21" style="font-size:70%;">are</span> <span id="lstnumberx312.23" style="font-size:70%;">discovered</span> <span id="lstnumberx312.25" style="font-size:70%;">and</span> <span id="lstnumberx312.27" style="font-size:70%;">loaded</span> </span><span id="lstnumberx313"><span id="lstnumberx313.1" style="font-size:70%;">-</span> <span id="lstnumberx313.3" style="font-size:70%;">**`</span> <span id="lstnumberx313.4" style="font-size:70%;">sub_agents</span><span id="lstnumberx313.5" style="font-size:70%;">:`</span> <span id="lstnumberx313.7" style="font-size:70%;">entry</span> <span id="lstnumberx313.9" style="font-size:70%;">format</span> <span id="lstnumberx313.10" style="font-size:70%;">**:</span><span id="lstnumberx313.12" style="font-size:70%;">name</span><span id="lstnumberx313.13" style="font-size:70%;">,</span><span id="lstnumberx313.15" style="font-size:70%;">config_path</span><span id="lstnumberx313.16" style="font-size:70%;">,</span><span id="lstnumberx313.18" style="font-size:70%;">description</span> <span id="lstnumberx313.20" style="font-size:70%;">--</span> <span id="lstnumberx313.22" style="font-size:70%;">how</span> <span id="lstnumberx313.24" style="font-size:70%;">config_path</span> <span id="lstnumberx313.26" style="font-size:70%;">is</span> <span id="lstnumberx313.28" style="font-size:70%;">resolved</span> </span><span id="lstnumberx314"><span id="lstnumberx314.1" style="font-size:70%;">-</span> <span id="lstnumberx314.3" style="font-size:70%;">**`</span> <span id="lstnumberx314.4" style="font-size:70%;">$</span> <span id="lstnumberx314.5" style="font-size:70%;">{</span> <span id="lstnumberx314.6" style="font-size:70%;">env</span><span id="lstnumberx314.7" style="font-size:70%;">.</span><span id="lstnumberx314.8" style="font-size:70%;">XXX</span> <span id="lstnumberx314.9" style="font-size:70%;">}`</span> <span id="lstnumberx314.11" style="font-size:70%;">resolution</span> <span id="lstnumberx314.12" style="font-size:70%;">**:</span><span id="lstnumberx314.14" style="font-size:70%;">behavior</span> <span id="lstnumberx314.16" style="font-size:70%;">when</span> <span id="lstnumberx314.18" style="font-size:70%;">env</span> <span id="lstnumberx314.20" style="font-size:70%;">var</span> <span id="lstnumberx314.22" style="font-size:70%;">is</span> <span id="lstnumberx314.24" style="font-size:70%;">not</span> <span id="lstnumberx314.26" style="font-size:70%;">set</span> </span><span id="lstnumberx315"><span id="lstnumberx315.1" style="font-size:70%;">-</span> <span id="lstnumberx315.3" style="font-size:70%;">**</span> <span id="lstnumberx315.4" style="font-size:70%;">Relative</span> <span id="lstnumberx315.6" style="font-size:70%;">path</span> <span id="lstnumberx315.8" style="font-size:70%;">resolution</span> <span id="lstnumberx315.9" style="font-size:70%;">**:</span><span id="lstnumberx315.11" style="font-size:70%;">relative</span> <span id="lstnumberx315.13" style="font-size:70%;">to</span> <span id="lstnumberx315.15" style="font-size:70%;">what</span><span id="lstnumberx315.16" style="font-size:70%;">?</span><span id="lstnumberx315.18" style="font-size:70%;">(</span><span id="lstnumberx315.19" style="font-size:70%;">YAML</span> <span id="lstnumberx315.21" style="font-size:70%;">file</span> <span id="lstnumberx315.23" style="font-size:70%;">directory</span><span id="lstnumberx315.24" style="font-size:70%;">?</span><span id="lstnumberx315.26" style="font-size:70%;">CWD</span><span id="lstnumberx315.27" style="font-size:70%;">?</span><span id="lstnumberx315.29" style="font-size:70%;">work_dir</span><span id="lstnumberx315.30" style="font-size:70%;">?)</span> </span><span id="lstnumberx317"><span id="lstnumberx317.1" style="font-size:70%;">##</span> <span id="lstnumberx317.3" style="font-size:70%;">section</span> <span id="lstnumberx317.5" style="font-size:70%;">2.</span><span id="lstnumberx317.7" style="font-size:70%;">Middleware</span> <span id="lstnumberx317.9" style="font-size:70%;">Creation</span> <span id="lstnumberx317.11" style="font-size:70%;">(</span><span id="lstnumberx317.12" style="font-size:70%;">HIGHEST</span> <span id="lstnumberx317.14" style="font-size:70%;">PRIORITY</span><span id="lstnumberx317.15" style="font-size:70%;">)</span> </span><span id="lstnumberx319"><span id="lstnumberx319.1" style="font-size:70%;">Find</span> <span id="lstnumberx319.3" style="font-size:70%;">the</span> <span id="lstnumberx319.5" style="font-size:70%;">middleware</span> <span id="lstnumberx319.7" style="font-size:70%;">base</span> <span id="lstnumberx319.9" style="font-size:70%;">class</span> <span id="lstnumberx319.11" style="font-size:70%;">and</span> <span id="lstnumberx319.13" style="font-size:70%;">several</span> <span id="lstnumberx319.15" style="font-size:70%;">existing</span> <span id="lstnumberx319.17" style="font-size:70%;">middleware</span> <span id="lstnumberx319.19" style="font-size:70%;">implementations</span><span id="lstnumberx319.20" style="font-size:70%;">.</span><span id="lstnumberx319.22" style="font-size:70%;">Extract</span><span id="lstnumberx319.23" style="font-size:70%;">:</span></span> <span id="lstnumberx321"><span id="lstnumberx321.1" style="font-size:70%;">###</span> <span id="lstnumberx321.3" style="font-size:70%;">2.1</span> <span id="lstnumberx321.5" style="font-size:70%;">Base</span> <span id="lstnumberx321.7" style="font-size:70%;">Class</span> <span id="lstnumberx321.9" style="font-size:70%;">&amp;</span> <span id="lstnumberx321.11" style="font-size:70%;">Hook</span> <span id="lstnumberx321.13" style="font-size:70%;">Methods</span> </span><span id="lstnumberx322"><span id="lstnumberx322.1" style="font-size:70%;">-</span> <span id="lstnumberx322.3" style="font-size:70%;">What</span> <span id="lstnumberx322.5" style="font-size:70%;">class</span> <span id="lstnumberx322.7" style="font-size:70%;">to</span> <span id="lstnumberx322.9" style="font-size:70%;">inherit</span> <span id="lstnumberx322.11" style="font-size:70%;">from</span><span id="lstnumberx322.12" style="font-size:70%;">?</span><span id="lstnumberx322.14" style="font-size:70%;">Find</span> <span id="lstnumberx322.16" style="font-size:70%;">the</span> <span id="lstnumberx322.18" style="font-size:70%;">exact</span> <span id="lstnumberx322.20" style="font-size:70%;">import</span> <span id="lstnumberx322.22" style="font-size:70%;">path</span> <span id="lstnumberx322.24" style="font-size:70%;">and</span> <span id="lstnumberx322.26" style="font-size:70%;">class</span> <span id="lstnumberx322.28" style="font-size:70%;">name</span><span id="lstnumberx322.29" style="font-size:70%;">.</span></span> <span id="lstnumberx323"><span id="lstnumberx323.1" style="font-size:70%;">-</span> <span id="lstnumberx323.3" style="font-size:70%;">**</span> <span id="lstnumberx323.4" style="font-size:70%;">ALL</span> <span id="lstnumberx323.6" style="font-size:70%;">available</span> <span id="lstnumberx323.8" style="font-size:70%;">hook</span> <span id="lstnumberx323.10" style="font-size:70%;">methods</span> <span id="lstnumberx323.11" style="font-size:70%;">**</span> <span id="lstnumberx323.13" style="font-size:70%;">with</span> <span id="lstnumberx323.15" style="font-size:70%;">their</span> <span id="lstnumberx323.17" style="font-size:70%;">EXACT</span> <span id="lstnumberx323.19" style="font-size:70%;">signatures</span> <span id="lstnumberx323.21" style="font-size:70%;">(</span><span id="lstnumberx323.22" style="font-size:70%;">parameter</span> <span id="lstnumberx323.24" style="font-size:70%;">names</span><span id="lstnumberx323.25" style="font-size:70%;">,</span><span id="lstnumberx323.27" style="font-size:70%;">types</span><span id="lstnumberx323.28" style="font-size:70%;">,</span><span id="lstnumberx323.30" style="font-size:70%;">return</span> <span id="lstnumberx323.32" style="font-size:70%;">type</span><span id="lstnumberx323.33" style="font-size:70%;">):</span></span> <span id="lstnumberx324"><span id="lstnumberx324.2" style="font-size:70%;">-</span> <span id="lstnumberx324.4" style="font-size:70%;">`</span> <span id="lstnumberx324.5" style="font-size:70%;">before_model</span> <span id="lstnumberx324.6" style="font-size:70%;">(</span><span id="lstnumberx324.7" style="font-size:70%;">input</span><span id="lstnumberx324.8" style="font-size:70%;">)</span> <span id="lstnumberx324.10" style="font-size:70%;">-&gt;</span> <span id="lstnumberx324.12" style="font-size:70%;">HookResult</span> <span id="lstnumberx324.13" style="font-size:70%;">`</span> </span><span id="lstnumberx325"><span id="lstnumberx325.2" style="font-size:70%;">-</span> <span id="lstnumberx325.4" style="font-size:70%;">`</span> <span id="lstnumberx325.5" style="font-size:70%;">after_model</span> <span id="lstnumberx325.6" style="font-size:70%;">(</span><span id="lstnumberx325.7" style="font-size:70%;">input</span><span id="lstnumberx325.8" style="font-size:70%;">)</span> <span id="lstnumberx325.10" style="font-size:70%;">-&gt;</span> <span id="lstnumberx325.12" style="font-size:70%;">HookResult</span> <span id="lstnumberx325.13" style="font-size:70%;">`</span> </span><span id="lstnumberx326"><span id="lstnumberx326.2" style="font-size:70%;">-</span> <span id="lstnumberx326.4" style="font-size:70%;">`</span> <span id="lstnumberx326.5" style="font-size:70%;">before_tool</span> <span id="lstnumberx326.6" style="font-size:70%;">(</span><span id="lstnumberx326.7" style="font-size:70%;">input</span><span id="lstnumberx326.8" style="font-size:70%;">)</span> <span id="lstnumberx326.10" style="font-size:70%;">-&gt;</span> <span id="lstnumberx326.12" style="font-size:70%;">HookResult</span> <span id="lstnumberx326.13" style="font-size:70%;">`</span> </span><span id="lstnumberx327"><span id="lstnumberx327.2" style="font-size:70%;">-</span> <span id="lstnumberx327.4" style="font-size:70%;">`</span> <span id="lstnumberx327.5" style="font-size:70%;">after_tool</span> <span id="lstnumberx327.6" style="font-size:70%;">(</span><span id="lstnumberx327.7" style="font-size:70%;">input</span><span id="lstnumberx327.8" style="font-size:70%;">)</span> <span id="lstnumberx327.10" style="font-size:70%;">-&gt;</span> <span id="lstnumberx327.12" style="font-size:70%;">HookResult</span> <span id="lstnumberx327.13" style="font-size:70%;">`</span> </span><span id="lstnumberx328"><span id="lstnumberx328.2" style="font-size:70%;">-</span> <span id="lstnumberx328.4" style="font-size:70%;">`</span> <span id="lstnumberx328.5" style="font-size:70%;">wrap_model_call</span> <span id="lstnumberx328.6" style="font-size:70%;">(...)`</span> <span id="lstnumberx328.8" style="font-size:70%;">--</span> <span id="lstnumberx328.10" style="font-size:70%;">how</span> <span id="lstnumberx328.12" style="font-size:70%;">to</span> <span id="lstnumberx328.14" style="font-size:70%;">wrap</span> <span id="lstnumberx328.16" style="font-size:70%;">the</span> <span id="lstnumberx328.18" style="font-size:70%;">LLM</span> <span id="lstnumberx328.20" style="font-size:70%;">call</span> </span><span id="lstnumberx329"><span id="lstnumberx329.2" style="font-size:70%;">-</span> <span id="lstnumberx329.4" style="font-size:70%;">`</span> <span id="lstnumberx329.5" style="font-size:70%;">wrap_tool_call</span> <span id="lstnumberx329.6" style="font-size:70%;">(...)`</span> <span id="lstnumberx329.8" style="font-size:70%;">--</span> <span id="lstnumberx329.10" style="font-size:70%;">how</span> <span id="lstnumberx329.12" style="font-size:70%;">to</span> <span id="lstnumberx329.14" style="font-size:70%;">wrap</span> <span id="lstnumberx329.16" style="font-size:70%;">tool</span> <span id="lstnumberx329.18" style="font-size:70%;">execution</span> </span><span id="lstnumberx330"><span id="lstnumberx330.2" style="font-size:70%;">-</span> <span id="lstnumberx330.4" style="font-size:70%;">Any</span> <span id="lstnumberx330.6" style="font-size:70%;">others</span> <span id="lstnumberx330.8" style="font-size:70%;">(</span><span id="lstnumberx330.9" style="font-size:70%;">before_agent</span><span id="lstnumberx330.10" style="font-size:70%;">,</span><span id="lstnumberx330.12" style="font-size:70%;">after_agent</span><span id="lstnumberx330.13" style="font-size:70%;">,</span><span id="lstnumberx330.15" style="font-size:70%;">etc</span><span id="lstnumberx330.16" style="font-size:70%;">.)</span> </span><span id="lstnumberx331"><span id="lstnumberx331.1" style="font-size:70%;">-</span> <span id="lstnumberx331.3" style="font-size:70%;">**</span> <span id="lstnumberx331.4" style="font-size:70%;">HookResult</span> <span id="lstnumberx331.5" style="font-size:70%;">**:</span><span id="lstnumberx331.7" style="font-size:70%;">What</span> <span id="lstnumberx331.9" style="font-size:70%;">can</span> <span id="lstnumberx331.11" style="font-size:70%;">it</span> <span id="lstnumberx331.13" style="font-size:70%;">modify</span><span id="lstnumberx331.14" style="font-size:70%;">?</span><span id="lstnumberx331.16" style="font-size:70%;">How</span> <span id="lstnumberx331.18" style="font-size:70%;">to</span> <span id="lstnumberx331.20" style="font-size:70%;">inject</span> <span id="lstnumberx331.22" style="font-size:70%;">messages</span><span id="lstnumberx331.23" style="font-size:70%;">?</span><span id="lstnumberx331.25" style="font-size:70%;">How</span> <span id="lstnumberx331.27" style="font-size:70%;">to</span> <span id="lstnumberx331.29" style="font-size:70%;">modify</span> <span id="lstnumberx331.31" style="font-size:70%;">tool</span> <span id="lstnumberx331.33" style="font-size:70%;">output</span><span id="lstnumberx331.34" style="font-size:70%;">?</span><span id="lstnumberx331.36" style="font-size:70%;">Show</span> <span id="lstnumberx331.38" style="font-size:70%;">the</span> <span id="lstnumberx331.40" style="font-size:70%;">class</span> <span id="lstnumberx331.42" style="font-size:70%;">definition</span><span id="lstnumberx331.43" style="font-size:70%;">.</span></span> <span id="lstnumberx332"><span id="lstnumberx332.1" style="font-size:70%;">-</span> <span id="lstnumberx332.3" style="font-size:70%;">**</span> <span id="lstnumberx332.4" style="font-size:70%;">Hook</span> <span id="lstnumberx332.6" style="font-size:70%;">input</span> <span id="lstnumberx332.8" style="font-size:70%;">types</span> <span id="lstnumberx332.9" style="font-size:70%;">**:</span><span id="lstnumberx332.11" style="font-size:70%;">What</span> <span id="lstnumberx332.13" style="font-size:70%;">fields</span> <span id="lstnumberx332.15" style="font-size:70%;">are</span> <span id="lstnumberx332.17" style="font-size:70%;">available</span> <span id="lstnumberx332.19" style="font-size:70%;">in</span> <span id="lstnumberx332.21" style="font-size:70%;">`</span> <span id="lstnumberx332.22" style="font-size:70%;">BeforeModelHookInput</span> <span id="lstnumberx332.23" style="font-size:70%;">`,</span><span id="lstnumberx332.25" style="font-size:70%;">`</span> <span id="lstnumberx332.26" style="font-size:70%;">AfterModelHookInput</span> <span id="lstnumberx332.27" style="font-size:70%;">`,</span><span id="lstnumberx332.29" style="font-size:70%;">`</span> <span id="lstnumberx332.30" style="font-size:70%;">BeforeToolHookInput</span> <span id="lstnumberx332.31" style="font-size:70%;">`,</span><span id="lstnumberx332.33" style="font-size:70%;">`</span> <span id="lstnumberx332.34" style="font-size:70%;">AfterToolHookInput</span> <span id="lstnumberx332.35" style="font-size:70%;">`?</span></span> <span id="lstnumberx334"><span id="lstnumberx334.1" style="font-size:70%;">###</span> <span id="lstnumberx334.3" style="font-size:70%;">2.2</span> <span id="lstnumberx334.5" style="font-size:70%;">How</span> <span id="lstnumberx334.7" style="font-size:70%;">Params</span> <span id="lstnumberx334.9" style="font-size:70%;">Are</span> <span id="lstnumberx334.11" style="font-size:70%;">Passed</span> </span><span id="lstnumberx335"><span id="lstnumberx335.1" style="font-size:70%;">-</span> <span id="lstnumberx335.3" style="font-size:70%;">How</span> <span id="lstnumberx335.5" style="font-size:70%;">does</span> <span id="lstnumberx335.7" style="font-size:70%;">`</span> <span id="lstnumberx335.8" style="font-size:70%;">params</span><span id="lstnumberx335.9" style="font-size:70%;">:`</span> <span id="lstnumberx335.11" style="font-size:70%;">in</span> <span id="lstnumberx335.13" style="font-size:70%;">YAML</span> <span id="lstnumberx335.15" style="font-size:70%;">map</span> <span id="lstnumberx335.17" style="font-size:70%;">to</span> <span id="lstnumberx335.19" style="font-size:70%;">`</span> <span id="lstnumberx335.20" style="font-size:70%;">__init__</span> <span id="lstnumberx335.21" style="font-size:70%;">`</span> <span id="lstnumberx335.23" style="font-size:70%;">arguments</span><span id="lstnumberx335.24" style="font-size:70%;">?</span><span id="lstnumberx335.26" style="font-size:70%;">Find</span> <span id="lstnumberx335.28" style="font-size:70%;">the</span> <span id="lstnumberx335.30" style="font-size:70%;">exact</span> <span id="lstnumberx335.32" style="font-size:70%;">code</span><span id="lstnumberx335.33" style="font-size:70%;">.</span></span> <span id="lstnumberx336"><span id="lstnumberx336.1" style="font-size:70%;">-</span> <span id="lstnumberx336.3" style="font-size:70%;">Can</span> <span id="lstnumberx336.5" style="font-size:70%;">middleware</span> <span id="lstnumberx336.7" style="font-size:70%;">access</span> <span id="lstnumberx336.9" style="font-size:70%;">`</span> <span id="lstnumberx336.10" style="font-size:70%;">agent_state</span> <span id="lstnumberx336.11" style="font-size:70%;">`?</span><span id="lstnumberx336.13" style="font-size:70%;">How</span><span id="lstnumberx336.14" style="font-size:70%;">?</span></span> <span id="lstnumberx338"><span id="lstnumberx338.1" style="font-size:70%;">###</span> <span id="lstnumberx338.3" style="font-size:70%;">2.3</span> <span id="lstnumberx338.5" style="font-size:70%;">Registration</span> </span><span id="lstnumberx339"><span id="lstnumberx339.1" style="font-size:70%;">-</span> <span id="lstnumberx339.3" style="font-size:70%;">How</span> <span id="lstnumberx339.5" style="font-size:70%;">does</span> <span id="lstnumberx339.7" style="font-size:70%;">`</span> <span id="lstnumberx339.8" style="font-size:70%;">import</span><span id="lstnumberx339.9" style="font-size:70%;">:</span><span id="lstnumberx339.11" style="font-size:70%;">middleware</span><span id="lstnumberx339.12" style="font-size:70%;">.</span><span id="lstnumberx339.13" style="font-size:70%;">my_module</span><span id="lstnumberx339.14" style="font-size:70%;">:</span><span id="lstnumberx339.15" style="font-size:70%;">MyClass</span> <span id="lstnumberx339.16" style="font-size:70%;">`</span> <span id="lstnumberx339.18" style="font-size:70%;">get</span> <span id="lstnumberx339.20" style="font-size:70%;">resolved</span><span id="lstnumberx339.21" style="font-size:70%;">?</span><span id="lstnumberx339.23" style="font-size:70%;">What</span> <span id="lstnumberx339.25" style="font-size:70%;">directory</span> <span id="lstnumberx339.27" style="font-size:70%;">is</span> <span id="lstnumberx339.29" style="font-size:70%;">added</span> <span id="lstnumberx339.31" style="font-size:70%;">to</span> <span id="lstnumberx339.33" style="font-size:70%;">sys</span><span id="lstnumberx339.34" style="font-size:70%;">.</span><span id="lstnumberx339.35" style="font-size:70%;">path</span><span id="lstnumberx339.36" style="font-size:70%;">?</span></span> <span id="lstnumberx340"><span id="lstnumberx340.1" style="font-size:70%;">-</span> <span id="lstnumberx340.3" style="font-size:70%;">Ordering</span><span id="lstnumberx340.4" style="font-size:70%;">:</span><span id="lstnumberx340.6" style="font-size:70%;">do</span> <span id="lstnumberx340.8" style="font-size:70%;">middlewares</span> <span id="lstnumberx340.10" style="font-size:70%;">execute</span> <span id="lstnumberx340.12" style="font-size:70%;">in</span> <span id="lstnumberx340.14" style="font-size:70%;">YAML</span> <span id="lstnumberx340.16" style="font-size:70%;">order</span><span id="lstnumberx340.17" style="font-size:70%;">?</span><span id="lstnumberx340.19" style="font-size:70%;">What</span> <span id="lstnumberx340.21" style="font-size:70%;">about</span> <span id="lstnumberx340.23" style="font-size:70%;">after_</span> <span id="lstnumberx340.24" style="font-size:70%;">*</span> <span id="lstnumberx340.26" style="font-size:70%;">hooks</span><span id="lstnumberx340.27" style="font-size:70%;">?</span></span> <span id="lstnumberx342"><span id="lstnumberx342.1" style="font-size:70%;">###</span> <span id="lstnumberx342.3" style="font-size:70%;">2.4</span> <span id="lstnumberx342.5" style="font-size:70%;">Real</span> <span id="lstnumberx342.7" style="font-size:70%;">Examples</span> </span><span id="lstnumberx343"><span id="lstnumberx343.1" style="font-size:70%;">Find</span> <span id="lstnumberx343.3" style="font-size:70%;">2-3</span> <span id="lstnumberx343.5" style="font-size:70%;">existing</span> <span id="lstnumberx343.7" style="font-size:70%;">middleware</span> <span id="lstnumberx343.9" style="font-size:70%;">implementations</span> <span id="lstnumberx343.11" style="font-size:70%;">in</span> <span id="lstnumberx343.13" style="font-size:70%;">the</span> <span id="lstnumberx343.15" style="font-size:70%;">source</span> <span id="lstnumberx343.17" style="font-size:70%;">and</span> <span id="lstnumberx343.19" style="font-size:70%;">extract</span> <span id="lstnumberx343.21" style="font-size:70%;">their</span> <span id="lstnumberx343.23" style="font-size:70%;">patterns</span><span id="lstnumberx343.24" style="font-size:70%;">:</span></span> <span id="lstnumberx344"><span id="lstnumberx344.1" style="font-size:70%;">-</span> <span id="lstnumberx344.3" style="font-size:70%;">A</span> <span id="lstnumberx344.5" style="font-size:70%;">simple</span> <span id="lstnumberx344.7" style="font-size:70%;">one</span> <span id="lstnumberx344.9" style="font-size:70%;">(</span><span id="lstnumberx344.10" style="font-size:70%;">e</span><span id="lstnumberx344.11" style="font-size:70%;">.</span><span id="lstnumberx344.12" style="font-size:70%;">g</span><span id="lstnumberx344.13" style="font-size:70%;">.,</span><span id="lstnumberx344.15" style="font-size:70%;">output</span> <span id="lstnumberx344.17" style="font-size:70%;">truncation</span><span id="lstnumberx344.18" style="font-size:70%;">)</span> </span><span id="lstnumberx345"><span id="lstnumberx345.1" style="font-size:70%;">-</span> <span id="lstnumberx345.3" style="font-size:70%;">A</span> <span id="lstnumberx345.5" style="font-size:70%;">complex</span> <span id="lstnumberx345.7" style="font-size:70%;">one</span> <span id="lstnumberx345.9" style="font-size:70%;">(</span><span id="lstnumberx345.10" style="font-size:70%;">e</span><span id="lstnumberx345.11" style="font-size:70%;">.</span><span id="lstnumberx345.12" style="font-size:70%;">g</span><span id="lstnumberx345.13" style="font-size:70%;">.,</span><span id="lstnumberx345.15" style="font-size:70%;">context</span> <span id="lstnumberx345.17" style="font-size:70%;">compaction</span><span id="lstnumberx345.18" style="font-size:70%;">)</span> </span><span id="lstnumberx346"><span id="lstnumberx346.1" style="font-size:70%;">Show</span> <span id="lstnumberx346.3" style="font-size:70%;">the</span> <span id="lstnumberx346.5" style="font-size:70%;">class</span> <span id="lstnumberx346.7" style="font-size:70%;">structure</span><span id="lstnumberx346.8" style="font-size:70%;">,</span><span id="lstnumberx346.10" style="font-size:70%;">how</span> <span id="lstnumberx346.12" style="font-size:70%;">params</span> <span id="lstnumberx346.14" style="font-size:70%;">are</span> <span id="lstnumberx346.16" style="font-size:70%;">received</span><span id="lstnumberx346.17" style="font-size:70%;">,</span><span id="lstnumberx346.19" style="font-size:70%;">how</span> <span id="lstnumberx346.21" style="font-size:70%;">hooks</span> <span id="lstnumberx346.23" style="font-size:70%;">are</span> <span id="lstnumberx346.25" style="font-size:70%;">implemented</span><span id="lstnumberx346.26" style="font-size:70%;">.</span></span> <span id="lstnumberx348"><span id="lstnumberx348.1" style="font-size:70%;">###</span> <span id="lstnumberx348.3" style="font-size:70%;">2.5</span> <span id="lstnumberx348.5" style="font-size:70%;">Copy</span> <span id="lstnumberx348.6" style="font-size:70%;">-</span> <span id="lstnumberx348.7" style="font-size:70%;">Paste</span> <span id="lstnumberx348.9" style="font-size:70%;">Template</span> </span><span id="lstnumberx349"><span id="lstnumberx349.1" style="font-size:70%;">Based</span> <span id="lstnumberx349.3" style="font-size:70%;">on</span> <span id="lstnumberx349.5" style="font-size:70%;">what</span> <span id="lstnumberx349.7" style="font-size:70%;">you</span> <span id="lstnumberx349.9" style="font-size:70%;">found</span><span id="lstnumberx349.10" style="font-size:70%;">,</span><span id="lstnumberx349.12" style="font-size:70%;">provide</span> <span id="lstnumberx349.14" style="font-size:70%;">a</span> <span id="lstnumberx349.16" style="font-size:70%;">minimal</span> <span id="lstnumberx349.18" style="font-size:70%;">middleware</span> <span id="lstnumberx349.20" style="font-size:70%;">template</span> <span id="lstnumberx349.22" style="font-size:70%;">that</span> <span id="lstnumberx349.24" style="font-size:70%;">the</span> <span id="lstnumberx349.26" style="font-size:70%;">Evolution</span> <span id="lstnumberx349.28" style="font-size:70%;">Agent</span> <span id="lstnumberx349.30" style="font-size:70%;">can</span> <span id="lstnumberx349.32" style="font-size:70%;">copy</span><span id="lstnumberx349.33" style="font-size:70%;">.</span></span> <span id="lstnumberx351"><span id="lstnumberx351.1" style="font-size:70%;">##</span> <span id="lstnumberx351.3" style="font-size:70%;">section</span> <span id="lstnumberx351.5" style="font-size:70%;">3.</span><span id="lstnumberx351.7" style="font-size:70%;">Tool</span> <span id="lstnumberx351.9" style="font-size:70%;">Creation</span> <span id="lstnumberx351.11" style="font-size:70%;">(</span><span id="lstnumberx351.12" style="font-size:70%;">HIGH</span> <span id="lstnumberx351.14" style="font-size:70%;">PRIORITY</span><span id="lstnumberx351.15" style="font-size:70%;">)</span> </span><span id="lstnumberx353"><span id="lstnumberx353.1" style="font-size:70%;">###</span> <span id="lstnumberx353.3" style="font-size:70%;">3.1</span> <span id="lstnumberx353.5" style="font-size:70%;">Tool</span> <span id="lstnumberx353.7" style="font-size:70%;">YAML</span> <span id="lstnumberx353.9" style="font-size:70%;">Schema</span> </span><span id="lstnumberx354"><span id="lstnumberx354.1" style="font-size:70%;">Find</span> <span id="lstnumberx354.3" style="font-size:70%;">a</span> <span id="lstnumberx354.5" style="font-size:70%;">tool</span> <span id="lstnumberx354.7" style="font-size:70%;">YAML</span> <span id="lstnumberx354.9" style="font-size:70%;">definition</span> <span id="lstnumberx354.11" style="font-size:70%;">(</span><span id="lstnumberx354.12" style="font-size:70%;">e</span><span id="lstnumberx354.13" style="font-size:70%;">.</span><span id="lstnumberx354.14" style="font-size:70%;">g</span><span id="lstnumberx354.15" style="font-size:70%;">.,</span><span id="lstnumberx354.17" style="font-size:70%;">`</span> <span id="lstnumberx354.18" style="font-size:70%;">read_file</span><span id="lstnumberx354.19" style="font-size:70%;">.</span><span id="lstnumberx354.20" style="font-size:70%;">tool</span><span id="lstnumberx354.21" style="font-size:70%;">.</span><span id="lstnumberx354.22" style="font-size:70%;">yaml</span> <span id="lstnumberx354.23" style="font-size:70%;">`).</span><span id="lstnumberx354.25" style="font-size:70%;">Document</span> <span id="lstnumberx354.27" style="font-size:70%;">the</span> <span id="lstnumberx354.29" style="font-size:70%;">full</span> <span id="lstnumberx354.31" style="font-size:70%;">schema</span><span id="lstnumberx354.32" style="font-size:70%;">:</span></span> <span id="lstnumberx355"><span id="lstnumberx355.1" style="font-size:70%;">-</span> <span id="lstnumberx355.3" style="font-size:70%;">name</span><span id="lstnumberx355.4" style="font-size:70%;">,</span><span id="lstnumberx355.6" style="font-size:70%;">description</span><span id="lstnumberx355.7" style="font-size:70%;">,</span><span id="lstnumberx355.9" style="font-size:70%;">input_schema</span> <span id="lstnumberx355.11" style="font-size:70%;">(</span><span id="lstnumberx355.12" style="font-size:70%;">JSON</span> <span id="lstnumberx355.14" style="font-size:70%;">Schema</span> <span id="lstnumberx355.16" style="font-size:70%;">format</span><span id="lstnumberx355.17" style="font-size:70%;">),</span><span id="lstnumberx355.19" style="font-size:70%;">etc</span><span id="lstnumberx355.20" style="font-size:70%;">.</span></span> <span id="lstnumberx357"><span id="lstnumberx357.1" style="font-size:70%;">###</span> <span id="lstnumberx357.3" style="font-size:70%;">3.2</span> <span id="lstnumberx357.5" style="font-size:70%;">Python</span> <span id="lstnumberx357.7" style="font-size:70%;">Function</span> <span id="lstnumberx357.9" style="font-size:70%;">Signature</span> </span><span id="lstnumberx358"><span id="lstnumberx358.1" style="font-size:70%;">-</span> <span id="lstnumberx358.3" style="font-size:70%;">How</span> <span id="lstnumberx358.5" style="font-size:70%;">does</span> <span id="lstnumberx358.7" style="font-size:70%;">`</span> <span id="lstnumberx358.8" style="font-size:70%;">binding</span><span id="lstnumberx358.9" style="font-size:70%;">:</span><span id="lstnumberx358.11" style="font-size:70%;">tools</span><span id="lstnumberx358.12" style="font-size:70%;">.</span><span id="lstnumberx358.13" style="font-size:70%;">my_module</span><span id="lstnumberx358.14" style="font-size:70%;">:</span><span id="lstnumberx358.15" style="font-size:70%;">my_func</span> <span id="lstnumberx358.16" style="font-size:70%;">`</span> <span id="lstnumberx358.18" style="font-size:70%;">resolve</span> <span id="lstnumberx358.20" style="font-size:70%;">to</span> <span id="lstnumberx358.22" style="font-size:70%;">a</span> <span id="lstnumberx358.24" style="font-size:70%;">Python</span> <span id="lstnumberx358.26" style="font-size:70%;">function</span><span id="lstnumberx358.27" style="font-size:70%;">?</span></span> <span id="lstnumberx359"><span id="lstnumberx359.1" style="font-size:70%;">-</span> <span id="lstnumberx359.3" style="font-size:70%;">How</span> <span id="lstnumberx359.5" style="font-size:70%;">is</span> <span id="lstnumberx359.7" style="font-size:70%;">`</span> <span id="lstnumberx359.8" style="font-size:70%;">agent_state</span> <span id="lstnumberx359.9" style="font-size:70%;">`</span> <span id="lstnumberx359.11" style="font-size:70%;">injected</span><span id="lstnumberx359.12" style="font-size:70%;">?</span><span id="lstnumberx359.14" style="font-size:70%;">Is</span> <span id="lstnumberx359.16" style="font-size:70%;">it</span> <span id="lstnumberx359.18" style="font-size:70%;">based</span> <span id="lstnumberx359.20" style="font-size:70%;">on</span> <span id="lstnumberx359.22" style="font-size:70%;">`</span> <span id="lstnumberx359.23" style="font-size:70%;">inspect</span><span id="lstnumberx359.24" style="font-size:70%;">.</span><span id="lstnumberx359.25" style="font-size:70%;">signature</span> <span id="lstnumberx359.26" style="font-size:70%;">`?</span><span id="lstnumberx359.28" style="font-size:70%;">What</span> <span id="lstnumberx359.30" style="font-size:70%;">fields</span> <span id="lstnumberx359.32" style="font-size:70%;">does</span> <span id="lstnumberx359.34" style="font-size:70%;">`</span> <span id="lstnumberx359.35" style="font-size:70%;">agent_state</span> <span id="lstnumberx359.36" style="font-size:70%;">`</span> <span id="lstnumberx359.38" style="font-size:70%;">have</span> <span id="lstnumberx359.40" style="font-size:70%;">(</span><span id="lstnumberx359.41" style="font-size:70%;">sandbox</span><span id="lstnumberx359.42" style="font-size:70%;">,</span><span id="lstnumberx359.44" style="font-size:70%;">history</span><span id="lstnumberx359.45" style="font-size:70%;">,</span><span id="lstnumberx359.47" style="font-size:70%;">etc</span><span id="lstnumberx359.48" style="font-size:70%;">.)?</span></span> <span id="lstnumberx360"><span id="lstnumberx360.1" style="font-size:70%;">-</span> <span id="lstnumberx360.3" style="font-size:70%;">What</span> <span id="lstnumberx360.5" style="font-size:70%;">should</span> <span id="lstnumberx360.7" style="font-size:70%;">the</span> <span id="lstnumberx360.9" style="font-size:70%;">function</span> <span id="lstnumberx360.11" style="font-size:70%;">return</span><span id="lstnumberx360.12" style="font-size:70%;">?</span><span id="lstnumberx360.14" style="font-size:70%;">How</span> <span id="lstnumberx360.16" style="font-size:70%;">are</span> <span id="lstnumberx360.18" style="font-size:70%;">return</span> <span id="lstnumberx360.20" style="font-size:70%;">values</span> <span id="lstnumberx360.22" style="font-size:70%;">normalized</span><span id="lstnumberx360.23" style="font-size:70%;">?</span></span> <span id="lstnumberx361"><span id="lstnumberx361.1" style="font-size:70%;">-</span> <span id="lstnumberx361.3" style="font-size:70%;">What</span> <span id="lstnumberx361.5" style="font-size:70%;">happens</span> <span id="lstnumberx361.7" style="font-size:70%;">if</span> <span id="lstnumberx361.9" style="font-size:70%;">the</span> <span id="lstnumberx361.11" style="font-size:70%;">tool</span> <span id="lstnumberx361.13" style="font-size:70%;">raises</span> <span id="lstnumberx361.15" style="font-size:70%;">an</span> <span id="lstnumberx361.17" style="font-size:70%;">exception</span><span id="lstnumberx361.18" style="font-size:70%;">?</span></span> <span id="lstnumberx363"><span id="lstnumberx363.1" style="font-size:70%;">###</span> <span id="lstnumberx363.3" style="font-size:70%;">3.3</span> <span id="lstnumberx363.5" style="font-size:70%;">Registration</span> </span><span id="lstnumberx364"><span id="lstnumberx364.1" style="font-size:70%;">-</span> <span id="lstnumberx364.3" style="font-size:70%;">The</span> <span id="lstnumberx364.5" style="font-size:70%;">`</span> <span id="lstnumberx364.6" style="font-size:70%;">tools</span><span id="lstnumberx364.7" style="font-size:70%;">:`</span> <span id="lstnumberx364.9" style="font-size:70%;">list</span> <span id="lstnumberx364.11" style="font-size:70%;">entry</span> <span id="lstnumberx364.13" style="font-size:70%;">format</span> <span id="lstnumberx364.15" style="font-size:70%;">in</span> <span id="lstnumberx364.17" style="font-size:70%;">agent</span> <span id="lstnumberx364.19" style="font-size:70%;">YAML</span> </span><span id="lstnumberx365"><span id="lstnumberx365.1" style="font-size:70%;">-</span> <span id="lstnumberx365.3" style="font-size:70%;">How</span> <span id="lstnumberx365.5" style="font-size:70%;">yaml_path</span> <span id="lstnumberx365.7" style="font-size:70%;">and</span> <span id="lstnumberx365.9" style="font-size:70%;">binding</span> <span id="lstnumberx365.11" style="font-size:70%;">are</span> <span id="lstnumberx365.13" style="font-size:70%;">resolved</span> <span id="lstnumberx365.15" style="font-size:70%;">(</span><span id="lstnumberx365.16" style="font-size:70%;">relative</span> <span id="lstnumberx365.18" style="font-size:70%;">to</span> <span id="lstnumberx365.20" style="font-size:70%;">config</span> <span id="lstnumberx365.22" style="font-size:70%;">dir</span><span id="lstnumberx365.23" style="font-size:70%;">?</span><span id="lstnumberx365.25" style="font-size:70%;">work_dir</span><span id="lstnumberx365.26" style="font-size:70%;">?)</span> </span><span id="lstnumberx367"><span id="lstnumberx367.1" style="font-size:70%;">###</span> <span id="lstnumberx367.3" style="font-size:70%;">3.4</span> <span id="lstnumberx367.5" style="font-size:70%;">Real</span> <span id="lstnumberx367.7" style="font-size:70%;">Examples</span> </span><span id="lstnumberx368"><span id="lstnumberx368.1" style="font-size:70%;">Find</span> <span id="lstnumberx368.3" style="font-size:70%;">2-3</span> <span id="lstnumberx368.5" style="font-size:70%;">existing</span> <span id="lstnumberx368.7" style="font-size:70%;">tool</span> <span id="lstnumberx368.9" style="font-size:70%;">implementations</span><span id="lstnumberx368.10" style="font-size:70%;">.</span><span id="lstnumberx368.12" style="font-size:70%;">Show</span> <span id="lstnumberx368.14" style="font-size:70%;">the</span> <span id="lstnumberx368.16" style="font-size:70%;">function</span> <span id="lstnumberx368.18" style="font-size:70%;">signature</span><span id="lstnumberx368.19" style="font-size:70%;">,</span><span id="lstnumberx368.21" style="font-size:70%;">how</span> <span id="lstnumberx368.23" style="font-size:70%;">sandbox</span> <span id="lstnumberx368.25" style="font-size:70%;">is</span> <span id="lstnumberx368.27" style="font-size:70%;">used</span><span id="lstnumberx368.28" style="font-size:70%;">,</span><span id="lstnumberx368.30" style="font-size:70%;">return</span> <span id="lstnumberx368.32" style="font-size:70%;">format</span><span id="lstnumberx368.33" style="font-size:70%;">.</span></span> <span id="lstnumberx370"><span id="lstnumberx370.1" style="font-size:70%;">###</span> <span id="lstnumberx370.3" style="font-size:70%;">3.5</span> <span id="lstnumberx370.5" style="font-size:70%;">Copy</span> <span id="lstnumberx370.6" style="font-size:70%;">-</span> <span id="lstnumberx370.7" style="font-size:70%;">Paste</span> <span id="lstnumberx370.9" style="font-size:70%;">Template</span> </span><span id="lstnumberx371"><span id="lstnumberx371.1" style="font-size:70%;">Provide</span> <span id="lstnumberx371.3" style="font-size:70%;">a</span> <span id="lstnumberx371.5" style="font-size:70%;">minimal</span> <span id="lstnumberx371.7" style="font-size:70%;">tool</span> <span id="lstnumberx371.9" style="font-size:70%;">template</span> <span id="lstnumberx371.11" style="font-size:70%;">(</span><span id="lstnumberx371.12" style="font-size:70%;">YAML</span> <span id="lstnumberx371.14" style="font-size:70%;">+</span> <span id="lstnumberx371.16" style="font-size:70%;">Python</span><span id="lstnumberx371.17" style="font-size:70%;">).</span></span> <span id="lstnumberx373"><span id="lstnumberx373.1" style="font-size:70%;">##</span> <span id="lstnumberx373.3" style="font-size:70%;">section</span> <span id="lstnumberx373.5" style="font-size:70%;">4.</span><span id="lstnumberx373.7" style="font-size:70%;">Skill</span> <span id="lstnumberx373.9" style="font-size:70%;">System</span> <span id="lstnumberx373.11" style="font-size:70%;">(</span><span id="lstnumberx373.12" style="font-size:70%;">MEDIUM</span> <span id="lstnumberx373.14" style="font-size:70%;">PRIORITY</span><span id="lstnumberx373.15" style="font-size:70%;">)</span> </span><span id="lstnumberx375"><span id="lstnumberx375.1" style="font-size:70%;">-</span> <span id="lstnumberx375.3" style="font-size:70%;">**</span> <span id="lstnumberx375.4" style="font-size:70%;">SKILL</span><span id="lstnumberx375.5" style="font-size:70%;">.</span><span id="lstnumberx375.6" style="font-size:70%;">md</span> <span id="lstnumberx375.8" style="font-size:70%;">format</span> <span id="lstnumberx375.9" style="font-size:70%;">**:</span><span id="lstnumberx375.11" style="font-size:70%;">What</span> <span id="lstnumberx375.13" style="font-size:70%;">frontmatter</span> <span id="lstnumberx375.15" style="font-size:70%;">fields</span> <span id="lstnumberx375.17" style="font-size:70%;">are</span> <span id="lstnumberx375.19" style="font-size:70%;">expected</span> <span id="lstnumberx375.21" style="font-size:70%;">(</span><span id="lstnumberx375.22" style="font-size:70%;">name</span><span id="lstnumberx375.23" style="font-size:70%;">,</span><span id="lstnumberx375.25" style="font-size:70%;">description</span><span id="lstnumberx375.26" style="font-size:70%;">,</span><span id="lstnumberx375.28" style="font-size:70%;">etc</span><span id="lstnumberx375.29" style="font-size:70%;">.)?</span></span> <span id="lstnumberx376"><span id="lstnumberx376.1" style="font-size:70%;">-</span> <span id="lstnumberx376.3" style="font-size:70%;">**</span> <span id="lstnumberx376.4" style="font-size:70%;">How</span> <span id="lstnumberx376.6" style="font-size:70%;">skills</span> <span id="lstnumberx376.8" style="font-size:70%;">are</span> <span id="lstnumberx376.10" style="font-size:70%;">loaded</span> <span id="lstnumberx376.11" style="font-size:70%;">**:</span><span id="lstnumberx376.13" style="font-size:70%;">What</span> <span id="lstnumberx376.15" style="font-size:70%;">triggers</span> <span id="lstnumberx376.17" style="font-size:70%;">`</span> <span id="lstnumberx376.18" style="font-size:70%;">LoadSkill</span> <span id="lstnumberx376.19" style="font-size:70%;">`?</span><span id="lstnumberx376.21" style="font-size:70%;">How</span> <span id="lstnumberx376.23" style="font-size:70%;">does</span> <span id="lstnumberx376.25" style="font-size:70%;">the</span> <span id="lstnumberx376.27" style="font-size:70%;">agent</span> <span id="lstnumberx376.29" style="font-size:70%;">decide</span> <span id="lstnumberx376.31" style="font-size:70%;">which</span> <span id="lstnumberx376.33" style="font-size:70%;">skill</span> <span id="lstnumberx376.35" style="font-size:70%;">to</span> <span id="lstnumberx376.37" style="font-size:70%;">load</span><span id="lstnumberx376.38" style="font-size:70%;">?</span></span> <span id="lstnumberx377"><span id="lstnumberx377.1" style="font-size:70%;">-</span> <span id="lstnumberx377.3" style="font-size:70%;">**`</span> <span id="lstnumberx377.4" style="font-size:70%;">skills</span><span id="lstnumberx377.5" style="font-size:70%;">:`</span> <span id="lstnumberx377.7" style="font-size:70%;">in</span> <span id="lstnumberx377.9" style="font-size:70%;">agent</span> <span id="lstnumberx377.11" style="font-size:70%;">YAML</span> <span id="lstnumberx377.12" style="font-size:70%;">**:</span><span id="lstnumberx377.14" style="font-size:70%;">path</span> <span id="lstnumberx377.16" style="font-size:70%;">format</span> <span id="lstnumberx377.18" style="font-size:70%;">(</span><span id="lstnumberx377.19" style="font-size:70%;">relative</span> <span id="lstnumberx377.21" style="font-size:70%;">to</span> <span id="lstnumberx377.23" style="font-size:70%;">what</span><span id="lstnumberx377.24" style="font-size:70%;">?),</span><span id="lstnumberx377.26" style="font-size:70%;">how</span> <span id="lstnumberx377.28" style="font-size:70%;">directories</span> <span id="lstnumberx377.30" style="font-size:70%;">are</span> <span id="lstnumberx377.32" style="font-size:70%;">scanned</span> </span><span id="lstnumberx378"><span id="lstnumberx378.1" style="font-size:70%;">-</span> <span id="lstnumberx378.3" style="font-size:70%;">**</span> <span id="lstnumberx378.4" style="font-size:70%;">Skill</span> <span id="lstnumberx378.6" style="font-size:70%;">content</span> <span id="lstnumberx378.7" style="font-size:70%;">**:</span><span id="lstnumberx378.9" style="font-size:70%;">How</span> <span id="lstnumberx378.11" style="font-size:70%;">is</span> <span id="lstnumberx378.13" style="font-size:70%;">SKILL</span><span id="lstnumberx378.14" style="font-size:70%;">.</span><span id="lstnumberx378.15" style="font-size:70%;">md</span> <span id="lstnumberx378.17" style="font-size:70%;">content</span> <span id="lstnumberx378.19" style="font-size:70%;">injected</span> <span id="lstnumberx378.21" style="font-size:70%;">into</span> <span id="lstnumberx378.23" style="font-size:70%;">the</span> <span id="lstnumberx378.25" style="font-size:70%;">conversation</span><span id="lstnumberx378.26" style="font-size:70%;">?</span><span id="lstnumberx378.28" style="font-size:70%;">As</span> <span id="lstnumberx378.30" style="font-size:70%;">a</span> <span id="lstnumberx378.32" style="font-size:70%;">user</span> <span id="lstnumberx378.34" style="font-size:70%;">message</span><span id="lstnumberx378.35" style="font-size:70%;">?</span><span id="lstnumberx378.37" style="font-size:70%;">System</span> <span id="lstnumberx378.39" style="font-size:70%;">message</span><span id="lstnumberx378.40" style="font-size:70%;">?</span></span> <span id="lstnumberx380"><span id="lstnumberx380.1" style="font-size:70%;">##</span> <span id="lstnumberx380.3" style="font-size:70%;">section</span> <span id="lstnumberx380.5" style="font-size:70%;">5.</span><span id="lstnumberx380.7" style="font-size:70%;">Sub</span> <span id="lstnumberx380.8" style="font-size:70%;">-</span> <span id="lstnumberx380.9" style="font-size:70%;">Agent</span> <span id="lstnumberx380.11" style="font-size:70%;">Creation</span> <span id="lstnumberx380.13" style="font-size:70%;">(</span><span id="lstnumberx380.14" style="font-size:70%;">MEDIUM</span> <span id="lstnumberx380.16" style="font-size:70%;">PRIORITY</span><span id="lstnumberx380.17" style="font-size:70%;">)</span> </span><span id="lstnumberx382"><span id="lstnumberx382.1" style="font-size:70%;">###</span> <span id="lstnumberx382.3" style="font-size:70%;">5.1</span> <span id="lstnumberx382.5" style="font-size:70%;">Config</span> </span><span id="lstnumberx383"><span id="lstnumberx383.1" style="font-size:70%;">-</span> <span id="lstnumberx383.3" style="font-size:70%;">`</span> <span id="lstnumberx383.4" style="font-size:70%;">sub_agents</span><span id="lstnumberx383.5" style="font-size:70%;">:`</span> <span id="lstnumberx383.7" style="font-size:70%;">list</span> <span id="lstnumberx383.9" style="font-size:70%;">entry</span> <span id="lstnumberx383.11" style="font-size:70%;">format</span><span id="lstnumberx383.12" style="font-size:70%;">:</span><span id="lstnumberx383.14" style="font-size:70%;">name</span><span id="lstnumberx383.15" style="font-size:70%;">,</span><span id="lstnumberx383.17" style="font-size:70%;">config_path</span><span id="lstnumberx383.18" style="font-size:70%;">,</span><span id="lstnumberx383.20" style="font-size:70%;">description</span><span id="lstnumberx383.21" style="font-size:70%;">,</span><span id="lstnumberx383.23" style="font-size:70%;">etc</span><span id="lstnumberx383.24" style="font-size:70%;">.</span></span> <span id="lstnumberx384"><span id="lstnumberx384.1" style="font-size:70%;">-</span> <span id="lstnumberx384.3" style="font-size:70%;">Sub</span> <span id="lstnumberx384.4" style="font-size:70%;">-</span> <span id="lstnumberx384.5" style="font-size:70%;">agent</span> <span id="lstnumberx384.6" style="font-size:70%;">'</span> <span id="lstnumberx384.7" style="font-size:70%;">s</span> <span id="lstnumberx384.9" style="font-size:70%;">own</span> <span id="lstnumberx384.11" style="font-size:70%;">`</span> <span id="lstnumberx384.12" style="font-size:70%;">agent</span><span id="lstnumberx384.13" style="font-size:70%;">.</span><span id="lstnumberx384.14" style="font-size:70%;">yaml</span> <span id="lstnumberx384.15" style="font-size:70%;">`</span> <span id="lstnumberx384.17" style="font-size:70%;">structure</span> <span id="lstnumberx384.19" style="font-size:70%;">--</span> <span id="lstnumberx384.21" style="font-size:70%;">does</span> <span id="lstnumberx384.23" style="font-size:70%;">it</span> <span id="lstnumberx384.25" style="font-size:70%;">inherit</span> <span id="lstnumberx384.27" style="font-size:70%;">from</span> <span id="lstnumberx384.29" style="font-size:70%;">parent</span><span id="lstnumberx384.30" style="font-size:70%;">?</span><span id="lstnumberx384.32" style="font-size:70%;">What</span> <span id="lstnumberx384.33" style="font-size:70%;">'</span> <span id="lstnumberx384.34" style="font-size:70%;">s</span> <span id="lstnumberx384.36" style="font-size:70%;">independent</span><span id="lstnumberx384.37" style="font-size:70%;">?</span></span> <span id="lstnumberx385"><span id="lstnumberx385.1" style="font-size:70%;">-</span> <span id="lstnumberx385.3" style="font-size:70%;">How</span> <span id="lstnumberx385.5" style="font-size:70%;">config_path</span> <span id="lstnumberx385.7" style="font-size:70%;">is</span> <span id="lstnumberx385.9" style="font-size:70%;">resolved</span> </span><span id="lstnumberx387"><span id="lstnumberx387.1" style="font-size:70%;">###</span> <span id="lstnumberx387.3" style="font-size:70%;">5.2</span> <span id="lstnumberx387.5" style="font-size:70%;">Runtime</span> </span><span id="lstnumberx388"><span id="lstnumberx388.1" style="font-size:70%;">-</span> <span id="lstnumberx388.3" style="font-size:70%;">How</span> <span id="lstnumberx388.5" style="font-size:70%;">`</span> <span id="lstnumberx388.6" style="font-size:70%;">sub</span> <span id="lstnumberx388.7" style="font-size:70%;">-</span> <span id="lstnumberx388.8" style="font-size:70%;">agent</span> <span id="lstnumberx388.9" style="font-size:70%;">-{</span> <span id="lstnumberx388.10" style="font-size:70%;">name</span> <span id="lstnumberx388.11" style="font-size:70%;">}(</span><span id="lstnumberx388.12" style="font-size:70%;">message</span> <span id="lstnumberx388.13" style="font-size:70%;">="...")`</span> <span id="lstnumberx388.15" style="font-size:70%;">is</span> <span id="lstnumberx388.17" style="font-size:70%;">dispatched</span> </span><span id="lstnumberx390"><span id="lstnumberx390.1" style="font-size:70%;">-</span> <span id="lstnumberx390.3" style="font-size:70%;">Return</span> <span id="lstnumberx390.5" style="font-size:70%;">value</span><span id="lstnumberx390.6" style="font-size:70%;">:</span><span id="lstnumberx390.8" style="font-size:70%;">how</span> <span id="lstnumberx390.10" style="font-size:70%;">result</span> <span id="lstnumberx390.12" style="font-size:70%;">flows</span> <span id="lstnumberx390.14" style="font-size:70%;">back</span> <span id="lstnumberx390.16" style="font-size:70%;">to</span> <span id="lstnumberx390.18" style="font-size:70%;">parent</span> </span><span id="lstnumberx391"><span id="lstnumberx391.1" style="font-size:70%;">-</span> <span id="lstnumberx391.3" style="font-size:70%;">Does</span> <span id="lstnumberx391.5" style="font-size:70%;">sub</span> <span id="lstnumberx391.6" style="font-size:70%;">-</span> <span id="lstnumberx391.7" style="font-size:70%;">agent</span> <span id="lstnumberx391.9" style="font-size:70%;">get</span> <span id="lstnumberx391.11" style="font-size:70%;">its</span> <span id="lstnumberx391.13" style="font-size:70%;">own</span> <span id="lstnumberx391.15" style="font-size:70%;">sandbox</span><span id="lstnumberx391.16" style="font-size:70%;">?</span></span> <span id="lstnumberx393"><span id="lstnumberx393.1" style="font-size:70%;">###</span> <span id="lstnumberx393.3" style="font-size:70%;">5.3</span> <span id="lstnumberx393.5" style="font-size:70%;">RecallSubAgent</span> </span><span id="lstnumberx394"><span id="lstnumberx394.1" style="font-size:70%;">-</span> <span id="lstnumberx394.3" style="font-size:70%;">What</span> <span id="lstnumberx394.5" style="font-size:70%;">does</span> <span id="lstnumberx394.7" style="font-size:70%;">it</span> <span id="lstnumberx394.9" style="font-size:70%;">do</span><span id="lstnumberx394.10" style="font-size:70%;">?</span><span id="lstnumberx394.12" style="font-size:70%;">When</span> <span id="lstnumberx394.14" style="font-size:70%;">is</span> <span id="lstnumberx394.16" style="font-size:70%;">it</span> <span id="lstnumberx394.18" style="font-size:70%;">useful</span><span id="lstnumberx394.19" style="font-size:70%;">?</span></span> <span id="lstnumberx396"><span id="lstnumberx396.1" style="font-size:70%;">##</span> <span id="lstnumberx396.3" style="font-size:70%;">section</span> <span id="lstnumberx396.5" style="font-size:70%;">6.</span><span id="lstnumberx396.7" style="font-size:70%;">Key</span> <span id="lstnumberx396.9" style="font-size:70%;">Runtime</span> <span id="lstnumberx396.11" style="font-size:70%;">Behaviors</span> <span id="lstnumberx396.13" style="font-size:70%;">(</span><span id="lstnumberx396.14" style="font-size:70%;">LOWER</span> <span id="lstnumberx396.16" style="font-size:70%;">PRIORITY</span> <span id="lstnumberx396.18" style="font-size:70%;">--</span> <span id="lstnumberx396.20" style="font-size:70%;">only</span> <span id="lstnumberx396.22" style="font-size:70%;">what</span> <span id="lstnumberx396.24" style="font-size:70%;">affects</span> <span id="lstnumberx396.26" style="font-size:70%;">component</span> <span id="lstnumberx396.28" style="font-size:70%;">writing</span><span id="lstnumberx396.29" style="font-size:70%;">)</span> </span><span id="lstnumberx398"><span id="lstnumberx398.1" style="font-size:70%;">Only</span> <span id="lstnumberx398.3" style="font-size:70%;">document</span> <span id="lstnumberx398.5" style="font-size:70%;">behaviors</span> <span id="lstnumberx398.7" style="font-size:70%;">that</span> <span id="lstnumberx398.9" style="font-size:70%;">affect</span> <span id="lstnumberx398.11" style="font-size:70%;">how</span> <span id="lstnumberx398.13" style="font-size:70%;">middleware</span> <span id="lstnumberx398.14" style="font-size:70%;">/</span> <span id="lstnumberx398.15" style="font-size:70%;">tools</span> <span id="lstnumberx398.17" style="font-size:70%;">should</span> <span id="lstnumberx398.19" style="font-size:70%;">be</span> <span id="lstnumberx398.21" style="font-size:70%;">written</span><span id="lstnumberx398.22" style="font-size:70%;">:</span></span> <span id="lstnumberx400"><span id="lstnumberx400.1" style="font-size:70%;">-</span> <span id="lstnumberx400.3" style="font-size:70%;">**</span> <span id="lstnumberx400.4" style="font-size:70%;">Hook</span> <span id="lstnumberx400.6" style="font-size:70%;">execution</span> <span id="lstnumberx400.8" style="font-size:70%;">order</span> <span id="lstnumberx400.9" style="font-size:70%;">**:</span><span id="lstnumberx400.11" style="font-size:70%;">before_</span> <span id="lstnumberx400.12" style="font-size:70%;">*</span> <span id="lstnumberx400.14" style="font-size:70%;">top</span> <span id="lstnumberx400.15" style="font-size:70%;">-</span> <span id="lstnumberx400.16" style="font-size:70%;">to</span> <span id="lstnumberx400.17" style="font-size:70%;">-</span> <span id="lstnumberx400.18" style="font-size:70%;">bottom</span> <span id="lstnumberx400.20" style="font-size:70%;">or</span> <span id="lstnumberx400.22" style="font-size:70%;">bottom</span> <span id="lstnumberx400.23" style="font-size:70%;">-</span> <span id="lstnumberx400.24" style="font-size:70%;">to</span> <span id="lstnumberx400.25" style="font-size:70%;">-</span> <span id="lstnumberx400.26" style="font-size:70%;">top</span><span id="lstnumberx400.27" style="font-size:70%;">?</span><span id="lstnumberx400.29" style="font-size:70%;">after_</span> <span id="lstnumberx400.30" style="font-size:70%;">*</span> <span id="lstnumberx400.32" style="font-size:70%;">order</span><span id="lstnumberx400.33" style="font-size:70%;">?</span></span> <span id="lstnumberx401"><span id="lstnumberx401.1" style="font-size:70%;">-</span> <span id="lstnumberx401.3" style="font-size:70%;">**</span> <span id="lstnumberx401.4" style="font-size:70%;">Tool</span> <span id="lstnumberx401.6" style="font-size:70%;">error</span> <span id="lstnumberx401.8" style="font-size:70%;">handling</span> <span id="lstnumberx401.9" style="font-size:70%;">**:</span><span id="lstnumberx401.11" style="font-size:70%;">What</span> <span id="lstnumberx401.13" style="font-size:70%;">happens</span> <span id="lstnumberx401.15" style="font-size:70%;">when</span> <span id="lstnumberx401.17" style="font-size:70%;">a</span> <span id="lstnumberx401.19" style="font-size:70%;">tool</span> <span id="lstnumberx401.21" style="font-size:70%;">throws</span><span id="lstnumberx401.22" style="font-size:70%;">?</span><span id="lstnumberx401.24" style="font-size:70%;">What</span> <span id="lstnumberx401.26" style="font-size:70%;">message</span> <span id="lstnumberx401.28" style="font-size:70%;">does</span> <span id="lstnumberx401.30" style="font-size:70%;">the</span> <span id="lstnumberx401.32" style="font-size:70%;">LLM</span> <span id="lstnumberx401.34" style="font-size:70%;">see</span><span id="lstnumberx401.35" style="font-size:70%;">?</span></span> <span id="lstnumberx402"><span id="lstnumberx402.1" style="font-size:70%;">-</span> <span id="lstnumberx402.3" style="font-size:70%;">**</span> <span id="lstnumberx402.4" style="font-size:70%;">Parallel</span> <span id="lstnumberx402.6" style="font-size:70%;">tool</span> <span id="lstnumberx402.8" style="font-size:70%;">execution</span> <span id="lstnumberx402.9" style="font-size:70%;">**:</span><span id="lstnumberx402.11" style="font-size:70%;">Are</span> <span id="lstnumberx402.13" style="font-size:70%;">multiple</span> <span id="lstnumberx402.15" style="font-size:70%;">tool</span> <span id="lstnumberx402.17" style="font-size:70%;">calls</span> <span id="lstnumberx402.19" style="font-size:70%;">run</span> <span id="lstnumberx402.21" style="font-size:70%;">in</span> <span id="lstnumberx402.23" style="font-size:70%;">parallel</span><span id="lstnumberx402.24" style="font-size:70%;">?</span><span id="lstnumberx402.26" style="font-size:70%;">What</span> <span id="lstnumberx402.28" style="font-size:70%;">controls</span> <span id="lstnumberx402.30" style="font-size:70%;">this</span><span id="lstnumberx402.31" style="font-size:70%;">?</span></span> <span id="lstnumberx403"><span id="lstnumberx403.1" style="font-size:70%;">-</span> <span id="lstnumberx403.3" style="font-size:70%;">**</span> <span id="lstnumberx403.4" style="font-size:70%;">Stop</span> <span id="lstnumberx403.6" style="font-size:70%;">tool</span> <span id="lstnumberx403.8" style="font-size:70%;">behavior</span> <span id="lstnumberx403.9" style="font-size:70%;">**:</span><span id="lstnumberx403.11" style="font-size:70%;">When</span> <span id="lstnumberx403.13" style="font-size:70%;">`</span> <span id="lstnumberx403.14" style="font-size:70%;">complete_task</span> <span id="lstnumberx403.15" style="font-size:70%;">`</span> <span id="lstnumberx403.17" style="font-size:70%;">is</span> <span id="lstnumberx403.19" style="font-size:70%;">called</span><span id="lstnumberx403.20" style="font-size:70%;">,</span><span id="lstnumberx403.22" style="font-size:70%;">do</span> <span id="lstnumberx403.24" style="font-size:70%;">after_tool</span> <span id="lstnumberx403.26" style="font-size:70%;">hooks</span> <span id="lstnumberx403.28" style="font-size:70%;">still</span> <span id="lstnumberx403.30" style="font-size:70%;">fire</span><span id="lstnumberx403.31" style="font-size:70%;">?</span></span> <span id="lstnumberx404"><span id="lstnumberx404.1" style="font-size:70%;">-</span> <span id="lstnumberx404.3" style="font-size:70%;">**</span> <span id="lstnumberx404.4" style="font-size:70%;">Context</span> <span id="lstnumberx404.6" style="font-size:70%;">compaction</span> <span id="lstnumberx404.7" style="font-size:70%;">**:</span><span id="lstnumberx404.9" style="font-size:70%;">When</span> <span id="lstnumberx404.11" style="font-size:70%;">does</span> <span id="lstnumberx404.13" style="font-size:70%;">it</span> <span id="lstnumberx404.15" style="font-size:70%;">trigger</span><span id="lstnumberx404.16" style="font-size:70%;">?</span><span id="lstnumberx404.18" style="font-size:70%;">What</span> <span id="lstnumberx404.20" style="font-size:70%;">gets</span> <span id="lstnumberx404.22" style="font-size:70%;">compacted</span><span id="lstnumberx404.23" style="font-size:70%;">?</span></span> <span id="lstnumberx405"><span id="lstnumberx405.1" style="font-size:70%;">-</span> <span id="lstnumberx405.3" style="font-size:70%;">**</span> <span id="lstnumberx405.4" style="font-size:70%;">Token</span> <span id="lstnumberx405.6" style="font-size:70%;">counting</span> <span id="lstnumberx405.7" style="font-size:70%;">**:</span><span id="lstnumberx405.9" style="font-size:70%;">What</span> <span id="lstnumberx405.11" style="font-size:70%;">function</span> <span id="lstnumberx405.12" style="font-size:70%;">/</span> <span id="lstnumberx405.13" style="font-size:70%;">heuristic</span> <span id="lstnumberx405.15" style="font-size:70%;">is</span> <span id="lstnumberx405.17" style="font-size:70%;">used</span><span id="lstnumberx405.18" style="font-size:70%;">?</span></span> <span id="lstnumberx407"><span id="lstnumberx407.1" style="font-size:70%;">##</span> <span id="lstnumberx407.3" style="font-size:70%;">section</span> <span id="lstnumberx407.5" style="font-size:70%;">7.</span><span id="lstnumberx407.7" style="font-size:70%;">Gotchas</span> <span id="lstnumberx407.9" style="font-size:70%;">&amp;</span> <span id="lstnumberx407.11" style="font-size:70%;">Common</span> <span id="lstnumberx407.13" style="font-size:70%;">Mistakes</span> </span><span id="lstnumberx409"><span id="lstnumberx409.1" style="font-size:70%;">Look</span> <span id="lstnumberx409.3" style="font-size:70%;">for</span> <span id="lstnumberx409.5" style="font-size:70%;">anything</span> <span id="lstnumberx409.7" style="font-size:70%;">that</span> <span id="lstnumberx409.9" style="font-size:70%;">would</span> <span id="lstnumberx409.11" style="font-size:70%;">trip</span> <span id="lstnumberx409.13" style="font-size:70%;">up</span> <span id="lstnumberx409.15" style="font-size:70%;">the</span> <span id="lstnumberx409.17" style="font-size:70%;">Evolution</span> <span id="lstnumberx409.19" style="font-size:70%;">Agent</span><span id="lstnumberx409.20" style="font-size:70%;">:</span></span> <span id="lstnumberx410"><span id="lstnumberx410.1" style="font-size:70%;">-</span> <span id="lstnumberx410.3" style="font-size:70%;">Config</span> <span id="lstnumberx410.5" style="font-size:70%;">errors</span> <span id="lstnumberx410.7" style="font-size:70%;">that</span> <span id="lstnumberx410.9" style="font-size:70%;">pass</span> <span id="lstnumberx410.11" style="font-size:70%;">validation</span> <span id="lstnumberx410.13" style="font-size:70%;">but</span> <span id="lstnumberx410.15" style="font-size:70%;">crash</span> <span id="lstnumberx410.17" style="font-size:70%;">at</span> <span id="lstnumberx410.19" style="font-size:70%;">runtime</span> </span><span id="lstnumberx411"><span id="lstnumberx411.1" style="font-size:70%;">-</span> <span id="lstnumberx411.3" style="font-size:70%;">Middleware</span> <span id="lstnumberx411.5" style="font-size:70%;">hooks</span> <span id="lstnumberx411.7" style="font-size:70%;">that</span> <span id="lstnumberx411.9" style="font-size:70%;">don</span> <span id="lstnumberx411.10" style="font-size:70%;">'</span> <span id="lstnumberx411.11" style="font-size:70%;">t</span> <span id="lstnumberx411.13" style="font-size:70%;">fire</span> <span id="lstnumberx411.15" style="font-size:70%;">when</span> <span id="lstnumberx411.17" style="font-size:70%;">expected</span> </span><span id="lstnumberx412"><span id="lstnumberx412.1" style="font-size:70%;">-</span> <span id="lstnumberx412.3" style="font-size:70%;">Tool</span> <span id="lstnumberx412.5" style="font-size:70%;">binding</span> <span id="lstnumberx412.7" style="font-size:70%;">resolution</span> <span id="lstnumberx412.9" style="font-size:70%;">surprises</span> </span><span id="lstnumberx413"><span id="lstnumberx413.1" style="font-size:70%;">-</span> <span id="lstnumberx413.3" style="font-size:70%;">Sub</span> <span id="lstnumberx413.4" style="font-size:70%;">-</span> <span id="lstnumberx413.5" style="font-size:70%;">agent</span> <span id="lstnumberx413.7" style="font-size:70%;">gotchas</span> <span id="lstnumberx413.9" style="font-size:70%;">(</span><span id="lstnumberx413.10" style="font-size:70%;">sandbox</span> <span id="lstnumberx413.12" style="font-size:70%;">sharing</span><span id="lstnumberx413.13" style="font-size:70%;">,</span><span id="lstnumberx413.15" style="font-size:70%;">nested</span> <span id="lstnumberx413.17" style="font-size:70%;">depth</span> <span id="lstnumberx413.19" style="font-size:70%;">limits</span><span id="lstnumberx413.20" style="font-size:70%;">)</span> </span><span id="lstnumberx414"><span id="lstnumberx414.1" style="font-size:70%;">-</span> <span id="lstnumberx414.3" style="font-size:70%;">Import</span> <span id="lstnumberx414.5" style="font-size:70%;">path</span> <span id="lstnumberx414.7" style="font-size:70%;">resolution</span> <span id="lstnumberx414.9" style="font-size:70%;">edge</span> <span id="lstnumberx414.11" style="font-size:70%;">cases</span> </span><span id="lstnumberx416"><span id="lstnumberx416.1" style="font-size:70%;">#</span> <span id="lstnumberx416.3" style="font-size:70%;">Skill</span> <span id="lstnumberx416.5" style="font-size:70%;">Deliverable</span> <span id="lstnumberx416.7" style="font-size:70%;">Format</span> </span><span id="lstnumberx418"><span id="lstnumberx418.1" style="font-size:70%;">The</span> <span id="lstnumberx418.3" style="font-size:70%;">skill</span> <span id="lstnumberx418.5" style="font-size:70%;">file</span> <span id="lstnumberx418.7" style="font-size:70%;">MUST</span> <span id="lstnumberx418.9" style="font-size:70%;">start</span> <span id="lstnumberx418.11" style="font-size:70%;">with</span> <span id="lstnumberx418.13" style="font-size:70%;">valid</span> <span id="lstnumberx418.15" style="font-size:70%;">YAML</span> <span id="lstnumberx418.17" style="font-size:70%;">frontmatter</span><span id="lstnumberx418.18" style="font-size:70%;">,</span><span id="lstnumberx418.20" style="font-size:70%;">document</span> <span id="lstnumberx418.22" style="font-size:70%;">each</span> <span id="lstnumberx418.24" style="font-size:70%;">section</span> <span id="lstnumberx418.26" style="font-size:70%;">above</span> <span id="lstnumberx418.28" style="font-size:70%;">with</span> <span id="lstnumberx418.30" style="font-size:70%;">copy</span> <span id="lstnumberx418.31" style="font-size:70%;">-</span> <span id="lstnumberx418.32" style="font-size:70%;">paste</span> <span id="lstnumberx418.34" style="font-size:70%;">templates</span><span id="lstnumberx418.35" style="font-size:70%;">,</span><span id="lstnumberx418.37" style="font-size:70%;">real</span> <span id="lstnumberx418.39" style="font-size:70%;">source</span> <span id="lstnumberx418.40" style="font-size:70%;">-</span> <span id="lstnumberx418.41" style="font-size:70%;">cited</span> <span id="lstnumberx418.43" style="font-size:70%;">code</span><span id="lstnumberx418.44" style="font-size:70%;">,</span><span id="lstnumberx418.46" style="font-size:70%;">and</span> <span id="lstnumberx418.48" style="font-size:70%;">a</span> <span id="lstnumberx418.50" style="font-size:70%;">gotchas</span> <span id="lstnumberx418.52" style="font-size:70%;">table</span><span id="lstnumberx418.53" style="font-size:70%;">.</span><span id="lstnumberx418.55" style="font-size:70%;">Target</span> <span id="lstnumberx418.57" style="font-size:70%;">length</span> <span id="lstnumberx418.59" style="font-size:70%;">400-800</span> <span id="lstnumberx418.61" style="font-size:70%;">lines</span><span id="lstnumberx418.62" style="font-size:70%;">.</span></span> <span id="lstnumberx420"><span id="lstnumberx420.1" style="font-size:70%;">When</span> <span id="lstnumberx420.3" style="font-size:70%;">done</span><span id="lstnumberx420.4" style="font-size:70%;">,</span><span id="lstnumberx420.6" style="font-size:70%;">call</span> <span id="lstnumberx420.8" style="font-size:70%;">`</span> <span id="lstnumberx420.9" style="font-size:70%;">complete_task</span> <span id="lstnumberx420.10" style="font-size:70%;">`.</span></span></span></span></foreignObject></g></g></svg>

#### B.3.2 Web-research Agent

<svg id="A2.SS3.SSS2.p1.pic1" height="56832.21" overflow="visible" version="1.1" viewBox="0 0 600 56832.21" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,56832.21) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 56827.44 C 0 56830.08 2.13 56832.21 4.77 56832.21 L 595.23 56832.21 C 597.87 56832.21 600 56830.08 600 56827.44 L 600 4.77 C 600 2.13 597.87 0 595.23 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F8FCFF;" fill="#F8FCFF" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 56464.28 L 599.17 56464.28 L 599.17 4.77 C 599.17 2.59 597.41 0.83 595.23 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 56465.11 L 0.83 56827.44 C 0.83 56829.62 2.59 56831.38 4.77 56831.38 L 595.23 56831.38 C 597.41 56831.38 599.17 56829.62 599.17 56827.44 L 599.17 56465.11 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22666.37)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:41.87em;--ltx-fo-height:0.3em;--ltx-fo-depth:25.6em;" width="579.4" height="358.4" transform="matrix(1 0 0 -1 0 4.17)" overflow="visible" color="#FFFFFF"><span id="A2.SS3.SSS2.p1.pic1.1.1.1.1.1" style="width:46.21em;"><span id="A2.SS3.SSS2.p1.pic1.1.1.1.1.1.1"><span id="A2.SS3.SSS2.p1.pic1.1.1.1.1.1.1.1" style="font-size:70%;">explore_agent/web_agent/prompt.md</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 22661.62)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:41.87em;--ltx-fo-height:0.64em;--ltx-fo-depth:4078.79em;" width="579.4" height="56447.27" transform="matrix(1 0 0 -1 0 8.92)" overflow="visible" color="#000000"><span id="A2.SS3.SSS2.p1.pic1.2.2.2.1.1" style="width:41.87em;"><span id="A2.SS3.SSS2.p1.pic1.2.2.2.1.1.1"><a href="data:text/plain;base64,WW91IGFyZSBhIFNPVEEgUmVzZWFyY2ggQWdlbnQuIFlvdXIgbWlzc2lvbiBpcyB0byBjb25kdWN0IGNvbXByZWhlbnNpdmUgd2ViIHJlc2VhcmNoIG9uIHN0YXRlLW9mLXRoZS1hcnQgY29kaW5nIGFnZW50IGFyY2hpdGVjdHVyZXMsIHRoZW4gcHJvZHVjZSBPTkUgZGV0YWlsZWQgc2tpbGwgZmlsZSBmb3IgYW4gRXZvbHV0aW9uIEFnZW50LgoKKipUb2RheSdzIGRhdGU6IHt7IGRhdGUgfX0qKiAtLSB1c2UgdGhpcyB5ZWFyIHdoZW4gc2VhcmNoaW5nIGZvciByZWNlbnQgaW5mb3JtYXRpb24uCgojIENvbnRleHQKCkFuIEV2b2x1dGlvbiBBZ2VudCBpdGVyYXRpdmVseSBpbXByb3ZlcyBhIE5leEFVIGNvZGluZyBhZ2VudCdzIGNvbmZpZ3VyYXRpb24gdG8gbWF4aW1pemUgc2NvcmVzIG9uIFRlcm1pbmFsIEJlbmNoIChhIGNvZGluZyBiZW5jaG1hcmspLiBZb3UgbXVzdCBwcm92aWRlIGl0IHdpdGggKipjb25jcmV0ZSwgc3BlY2lmaWMsIGltcGxlbWVudGFibGUqKiBrbm93bGVkZ2UuCgoqKlRoZSBFdm9sdXRpb24gQWdlbnQgaGFzIE5PIHByZS1leGlzdGluZyBrbm93bGVkZ2UgYWJvdXQgY29kaW5nIGFnZW50IGFyY2hpdGVjdHVyZXMgb3IgU09UQSB0ZWNobmlxdWVzLioqIFlvdXIgb3V0cHV0IHdpbGwgYmUgaXRzICoqc29sZSByZWZlcmVuY2UqKiBmb3IgdW5kZXJzdGFuZGluZyB3aGF0IHRvcCBjb2RpbmcgYWdlbnRzIGRvIGFuZCBob3cgdG8gcmVwbGljYXRlIHRoZWlyIGFwcHJvYWNoZXMuIFlvdSBtdXN0IHByb3ZpZGU6CgoxLiAqKkFyY2hpdGVjdHVyZSAmIGRlc2lnbiBwYXR0ZXJucyoqOiBjb21wb25lbnQgYmx1ZXByaW50cywgY29uc3RyYWludCBoaWVyYXJjaGllcywgZ2FwIGFuYWx5c2lzIGZyYW1ld29ya3MgZnJvbSB0b3AgdGVhbXMKMi4gKipFeGFjdCBudW1iZXJzKio6IHNjb3JlcywgcGFyYW1zLCB0aHJlc2hvbGRzLCB0b2tlbiBjb3VudHMsIHRpbWluZyBkYXRhCjMuICoqQWN0dWFsIGNvZGUgYW5kIGNvbmZpZyoqOiByZWFsIHN5c3RlbSBwcm9tcHRzLCBtaWRkbGV3YXJlIGNvZGUsIHRvb2wgZGVmaW5pdGlvbnMgLS0gbm90IGp1c3QgZGVzaWduIHByaW5jaXBsZXMKNC4gKipBYmxhdGlvbiBkYXRhKio6IHdoaWNoIHRlY2huaXF1ZSBjb250cmlidXRlZCBob3cgbWFueSBwZXJjZW50YWdlIHBvaW50cwo1LiAqKkxhdGVzdCBkZXZlbG9wbWVudHMqKjogbmV3IHRlYW1zLCBuZXcgc2NvcmVzLCB0ZWNobmlxdWVzIGZyb20ge3sgZGF0ZVs6NF0gfX0KNi4gKipJbXBsZW1lbnRhdGlvbiBzcGVjaWZpY3MqKjogZXhhY3QgY29tcGFjdGlvbiBhbGdvcml0aG1zLCBleGFjdCByZXRyeSBjb3VudHMsIGV4YWN0IHByb21wdCB0ZXh0CjcuICoqRmFpbHVyZSBtb2RlIGFuYWx5c2lzKio6IHdoYXQgdG9wIHRlYW1zIHRyaWVkIGFuZCBGQUlMRUQgKG5lZ2F0aXZlIHJlc3VsdHMgYXJlIGFzIHZhbHVhYmxlIGFzIHBvc2l0aXZlIG9uZXMpCgoqKkJlIGNvbXByZWhlbnNpdmUuKiogQ292ZXIgYm90aCBoaWdoLWxldmVsIGRlc2lnbiBwcmluY2lwbGVzIEFORCBjb25jcmV0ZSBpbXBsZW1lbnRhdGlvbiBkZXRhaWxzLiBGb2N1cyBvbiBBQ1RJT05BQkxFIEZBQ1RTIGFuZCBFWEFDVCBEQVRBLgoKIyBPdXRwdXQgRGlyZWN0b3J5IChXUklURSkKCllvdSBtdXN0IHByb2R1Y2UgT05FIHNraWxsIGZpbGU6CjEuIGB7eyBvdXRwdXRfc2tpbGxfZGlyIH19L2NvZGluZy1hZ2VudC1zb3RhLXJlc2VhcmNoL1NLSUxMLm1kYCAtLSBhcmNoaXRlY3R1cmUsIGJlbmNobWFya3MsIHRlY2huaXF1ZXMKCiMgWyFdIENSSVRJQ0FMIFJVTEVTCgoxLiAqKldSSVRFIEVBUkxZLCBVUERBVEUgT0ZURU4uKiogV3JpdGUgdGhlIHNraWxsIGZpbGUgYWZ0ZXIgcmVhZGluZyB0aGUgZmlyc3QgYmF0Y2ggb2YgVVJMcy4gVGhlbiB1cGRhdGUgaXQgYXMgeW91IGRpc2NvdmVyIG1vcmUgaW5mb3JtYXRpb24uCjIuICoqUmVjb3JkIEVYQUNUIGRhdGEgLS0gcmVqZWN0IHZhZ3VlIHN1bW1hcmllcy4qKgogICAtIEdPT0Q6ICJkZWVwYWdlbnRzIHNjb3JlZCA2Ni41JSBvbiBUQjIgdXNpbmcgR1BULTQuMSB3aXRoIDMwMCBtYXggaXRlcmF0aW9ucyIKICAgLSBCQUQ6ICAiZGVlcGFnZW50cyBzY29yZWQgd2VsbCBvbiB0ZXJtaW5hbCBiZW5jaCIKICAgLSBHT09EOiAiY29tcGFjdGlvbiBrZWVwcyBsYXN0IDE1IG1lc3NhZ2VzLCBzdW1tYXJpemVzIG9sZGVyIG9uZXMgaW50byA1IHNlbnRlbmNlcyB1c2luZyBncHQtNC4xLW1pbmkiCiAgIC0gQkFEOiAgInVzZXMgY29udGV4dCBtYW5hZ2VtZW50IHdpdGggc2xpZGluZyB3aW5kb3ciCjMuICoqQ2l0ZSBldmVyeSBjbGFpbS4qKiBJbmNsdWRlIHRoZSBzb3VyY2UgVVJMIGZvciBldmVyeSBkYXRhIHBvaW50Lgo0LiAqKlByaW9yaXRpemUgaW1wbGVtZW50YWJsZSBkZXRhaWxzIG92ZXIgYXJjaGl0ZWN0dXJhbCBzdW1tYXJpZXMuKioKNS4gKipVc2Uge3sgZGF0ZSB9fSB5ZWFyIGluIHNlYXJjaCBxdWVyaWVzKiogZm9yIHJlY2VudCByZXN1bHRzLgoKIyBZb3VyIFJlc2VhcmNoIFByb3RvY29sCgojIyBQaGFzZSAxOiBSZWFkIFByZS1naXZlbiBVUkxzIChNQU5EQVRPUlkpCnslIGZvciBzb3VyY2UgaW4gd2ViX3NvdXJjZXMgJX0KLSAqKnt7IHNvdXJjZS51cmwgfX0qKgogIEZvY3VzOiB7eyBzb3VyY2UuZm9jdXMgfX0KeyUgZW5kZm9yICV9CgpGb3IgZWFjaCBVUkw6CjEuIFVzZSBXZWJGZXRjaCB0byByZWFkIHRoZSBmdWxsIHBhZ2UKMi4gRXh0cmFjdCBBTEwgY29uY3JldGUgdGVjaG5pY2FsIGRldGFpbHMgLS0gZm9jdXMgb24gRVhBQ1QgbnVtYmVycywgY29uZmlncywgY29kZSBzbmlwcGV0cywgYW5kIGFibGF0aW9uIHJlc3VsdHMKMy4gSWdub3JlIGhpZ2gtbGV2ZWwgYXJjaGl0ZWN0dXJlIHN1bW1hcmllcyAoYWxyZWFkeSBrbm93bikgLS0gZGlnIGZvciBzcGVjaWZpY3MKNC4gUmVjb3JkIHRoZSBVUkwgYXMgc291cmNlIGNpdGF0aW9uCgoqKltMXSBBZnRlciByZWFkaW5nIGFsbCBwcmUtZ2l2ZW4gVVJMczogV1JJVEUgdGhlIHNraWxsIGZpbGUgaW1tZWRpYXRlbHkuKiogSW5jbHVkZSB3aGF0ZXZlciB5b3UgaGF2ZSBzbyBmYXIuIFlvdSB3aWxsIGV4cGFuZCBpdCBpbiBQaGFzZSAyLgoKIyMgUGhhc2UgMjogQXV0b25vbW91cyBEZWVwIFJlc2VhcmNoIChleHBhbmQgdGhlIHNraWxsIGZpbGUpCgpTZWFyY2ggZm9yIE1PUkUgaW5mb3JtYXRpb24uIFRhcmdldDogMTUtMjAgd2ViIHNlYXJjaGVzIHRvdGFsLgoKIyMjIEFyY2hpdGVjdHVyZSAmIFRlY2huaXF1ZXMgKC0+IGNvZGluZy1hZ2VudC1zb3RhLXJlc2VhcmNoKQoxLiAidGVybWluYWwgYmVuY2ggMiBsZWFkZXJib2FyZCB7eyBkYXRlWzo0XSB9fSBzY29yZXMiIC0tIGV4YWN0IHNjb3JlcywgbW9kZWwgY2hvaWNlcywgZGF0ZXMKMi4gImRlZXBhZ2VudHMgdGVybWluYWwgYmVuY2ggbWlkZGxld2FyZSBjb2RlIiAtLSBhY3R1YWwgbWlkZGxld2FyZSBpbXBsZW1lbnRhdGlvbgozLiAiY29kaW5nIGFnZW50IHN5c3RlbSBwcm9tcHQgdGVtcGxhdGUge3sgZGF0ZVs6NF0gfX0iIC0tIGFjdHVhbCBwcm9tcHQgdGV4dCBmcm9tIHRvcCBhZ2VudHMKNC4gImNvZGluZyBhZ2VudCBjb250ZXh0IGNvbXBhY3Rpb24gYWxnb3JpdGhtIGltcGxlbWVudGF0aW9uIiAtLSBleGFjdCBhbGdvcml0aG1zCjUuICJjb2RpbmcgYWdlbnQgcHJlLWNvbXBsZXRpb24gdmVyaWZpY2F0aW9uIG1pZGRsZXdhcmUiIC0tIGFjdHVhbCBjb2RlCjYuICJTV0UtYWdlbnQgdG9vbHMgZmlsZSBlZGl0aW5nIHNlYXJjaCByZXBsYWNlIGltcGxlbWVudGF0aW9uIiAtLSB0b29sIGRlc2lnbiBzcGVjaWZpY3MKNy4gImNvZGluZyBhZ2VudCBhYmxhdGlvbiBzdHVkeSByZXN1bHRzIHt7IGRhdGVbOjRdIH19IiAtLSB3aGljaCB0ZWNobmlxdWVzIG1hdHRlcmVkIG1vc3QKOC4gInRlcm1pbmFsIGJlbmNoIHRpbWVvdXQgaGFuZGxpbmcgc3RyYXRlZ2llcyIgLS0gZXhhY3QgdGltZW91dCB2YWx1ZXMsIGZhbGxiYWNrIGxvZ2ljCjkuICJlMmIgc2FuZGJveCBjb2RpbmcgYWdlbnQgb3B0aW1pemF0aW9uIiAtLSBzYW5kYm94IHdhcm0tdXAsIGZpbGUgdXBsb2FkIHN0cmF0ZWdpZXMKMTAuICJjb2RpbmcgYWdlbnQgZG9vbSBsb29wIGRldGVjdGlvbiBpbXBsZW1lbnRhdGlvbiIgLS0gZXhhY3QgZGV0ZWN0aW9uIGxvZ2ljCjExLiAiYWlkZXIgZWRpdCBmb3JtYXQgdW5pZmllZCBkaWZmIHNlYXJjaCByZXBsYWNlIGJlbmNobWFyayIgLS0gZWRpdCBmb3JtYXQgY29tcGFyaXNvbiBkYXRhCjEyLiAiQ29kZXggYWdlbnQgYXJjaGl0ZWN0dXJlIHRvb2xzIiAtLSBleGFjdCB0b29sIHNldCBhbmQgZGVzY3JpcHRpb25zCjEzLiAiY2xhdWRlIGNvZGUgaG9va3MgY29tcGFjdGlvbiBpbXBsZW1lbnRhdGlvbiIgLS0gZXhhY3QgaG9vayBzZXF1ZW5jZSwgY29tcGFjdGlvbiBkZXRhaWxzCjE0LiAiY29kaW5nIGFnZW50IG5lZ2F0aXZlIHJlc3VsdHMgZmFpbGVkIHRlY2huaXF1ZXMge3sgZGF0ZVs6NF0gfX0iIC0tIHdoYXQgZGlkbid0IHdvcmsgYW5kIHdoeQoKRm9yIGVhY2ggc2VhcmNoIHJlc3VsdDoKLSBTa2lwIG92ZXJ2aWV3L3N1bW1hcnkgYXJ0aWNsZXMgLS0gbG9vayBmb3IgYmxvZyBwb3N0cyB3aXRoIGNvZGUsIGNvbmZpZ3MsIG9yIGRhdGEKLSBGb2xsb3cgbGlua3MgdG8gR2l0SHViIHJlcG9zLCB0ZWNobmljYWwgZGVlcC1kaXZlcywgYW5kIHBhcGVycyB3aXRoIGV4cGVyaW1lbnRzCi0gSWYgYSBwYWdlIGlzIGluYWNjZXNzaWJsZSwgbm90ZSAiSU5BQ0NFU1NJQkxFOiA8dXJsPiIgYW5kIG1vdmUgb24KCioqW0xdIEFmdGVyIGNvbXBsZXRpbmcgcmVzZWFyY2g6IFVQREFURSB0aGUgc2tpbGwgZmlsZSB3aXRoIGFsbCBmaW5kaW5ncywgdGhlbiBjYWxsIGNvbXBsZXRlX3Rhc2suKioKCiMgU2tpbGwgT3V0cHV0IFNwZWNpZmljYXRpb24KCiMjIGBjb2RpbmctYWdlbnQtc290YS1yZXNlYXJjaC9TS0lMTC5tZGAKCk11c3QgY292ZXIgdGhlIGZvbGxvd2luZyAtLSB3aXRoIEJPVEggZGVzaWduIHBhdHRlcm5zIEFORCBleGFjdCBkYXRhOgoKIyMjIFNlY3Rpb24gMS4gTGVhZGVyYm9hcmQgRGF0YSAoZXhhY3QgbnVtYmVycyByZXF1aXJlZCkKCkZvciBlYWNoIHRvcCBhZ2VudC90ZWFtIChhaW0gZm9yIDEwKyk6Cgp8IEFnZW50IHwgVEIyIFNjb3JlIHwgTW9kZWwgfCBNYXggSXRlcmF0aW9ucyB8IENvbnRleHQgV2luZG93IHwgRGF0ZSB8IFNvdXJjZSB8CnwtLS0tLS0tfC0tLS0tLS0tLS0tfC0tLS0tLS18LS0tLS0tLS0tLS0tLS0tLXwtLS0tLS0tLS0tLS0tLS0tfC0tLS0tLXwtLS0tLS0tLXwKfCBkZWVwYWdlbnRzIHwgNjYuNSUgfCBHUFQtNC4xIHwgPz8/IHwgPz8/IHwgMjAyNS1YWCB8IFVSTCB8CgpBbHNvIGluY2x1ZGU6IHNjb3JlIHByb2dyZXNzaW9uIGhpc3RvcnksIFNXRS1iZW5jaCBzY29yZXMgaWYgYXZhaWxhYmxlLgoKIyMjIFNlY3Rpb24gMi4gQ29uY3JldGUgSW1wbGVtZW50YXRpb24gRGV0YWlscyAob25lIHN1YnNlY3Rpb24gcGVyIHRvcCB0ZWFtKQoKRm9yIEVBQ0ggdG9wIHRlYW0sIGRvY3VtZW50IFNQRUNJRklDUyAobm90IGRlc2lnbiBwaGlsb3NvcGh5KToKLSAqKkV4YWN0IHN5c3RlbSBwcm9tcHQqKiAoY29weSB2ZXJiYXRpbSBpZiBhdmFpbGFibGUsIG9yIHF1b3RlIGtleSBzZWN0aW9ucykKLSAqKkV4YWN0IHRvb2wgZGVmaW5pdGlvbnMqKiAodG9vbCBuYW1lcywgcGFyYW1ldGVyIHNjaGVtYXMsIGRlc2NyaXB0aW9uIHRleHQpCi0gKipFeGFjdCBtaWRkbGV3YXJlIGNvbmZpZ3MqKiAocGFyYW0gdmFsdWVzOiBtYXhfaXRlcmF0aW9ucz0zMDAsIHRocmVzaG9sZD0wLjc1LCBldGMuKQotICoqRXhhY3QgY29tcGFjdGlvbiBhbGdvcml0aG0qKiAoZS5nLiwgImtlZXBzIGxhc3QgMTUgbWVzc2FnZXMgYXMtaXMsIHN1bW1hcml6ZXMgbWVzc2FnZXMgMC1OIGludG8gYSBzaW5nbGUgbWVzc2FnZSB1c2luZyBwcm9tcHQ6ICcuLi4nIikKLSAqKkV4YWN0IHJldHJ5IGxvZ2ljKiogKGUuZy4sICJyZXRyaWVzIDMgdGltZXMgd2l0aCAycy80cy84cyBiYWNrb2ZmIG9uIHN0YXR1cyA0MjksIDUwMCwgNTAyIikKLSAqKkV4YWN0IGxvb3AgZGV0ZWN0aW9uKiogKGUuZy4sICJ0cmFja3Mge3Rvb2xfbmFtZSArIGZpcnN0X2FyZzogY291bnR9LCBpbmplY3RzIHdhcm5pbmcgYXQgY291bnQ9NCIpCi0gKipFeGFjdCBwcmUtY29tcGxldGlvbiBjaGVjayoqIChlLmcuLCAiaW50ZXJjZXB0cyBjb21wbGV0ZV90YXNrLCBpbmplY3RzIG1lc3NhZ2U6ICdCZWZvcmUgY29tcGxldGluZywgdmVyaWZ5OiAoMSkuLi4gKDIpLi4uICgzKS4uLiciKQoKIyMjIFNlY3Rpb24gMy4gVGVjaG5pcXVlIEFibGF0aW9uIERhdGEgKG1lYXN1cmVkIGltcGFjdCByZXF1aXJlZCkKCkZvciBlYWNoIHRlY2huaXF1ZSwgZG9jdW1lbnQgdGhlIE1FQVNVUkVEIGltcGFjdDoKCnwgVGVjaG5pcXVlIHwgVGVhbSB8IEltcGFjdCB8IEJhc2VsaW5lIHwgV2l0aCBUZWNobmlxdWUgfCBTb3VyY2UgfAp8LS0tLS0tLS0tLS18LS0tLS0tfC0tLS0tLS0tfC0tLS0tLS0tLS18LS0tLS0tLS0tLS0tLS0tLXwtLS0tLS0tLXwKfCBQcmUtY29tcGxldGlvbiBjaGVja2xpc3QgfCBMYW5nQ2hhaW4gfCArWC5YJSB8ID8/JSB8ID8/JSB8IFVSTCB8CnwgTG9vcCBkZXRlY3Rpb24gfCBMYW5nQ2hhaW4gfCArWC5YJSB8ID8/JSB8ID8/JSB8IFVSTCB8CnwgQ29udGV4dCBjb21wYWN0aW9uIHwgPz8/IHwgK1guWCUgfCA/PyUgfCA/PyUgfCBVUkwgfAoKSWYgZXhhY3QgYWJsYXRpb24gbnVtYmVycyBhcmVuJ3QgYXZhaWxhYmxlLCBub3RlICJOTyBBQkxBVElPTiBEQVRBIiBhbmQgcHJvdmlkZSB0aGUgdGVhbSdzIHF1YWxpdGF0aXZlIGFzc2Vzc21lbnQuCgojIyMgU2VjdGlvbiA0LiBBY3R1YWwgQ29kZSAmIENvbmZpZyBFeGFtcGxlcwoKQ29sbGVjdCBSRUFMIGNvZGUgYW5kIGNvbmZpZyBmcm9tIG9wZW4tc291cmNlIGFnZW50czoKLSBTeXN0ZW0gcHJvbXB0IHRleHQgKHZlcmJhdGltIHF1b3RlcywgYXMgbG9uZyBhcyBuZWVkZWQpCi0gTWlkZGxld2FyZSBpbXBsZW1lbnRhdGlvbnMgKGFjdHVhbCBQeXRob24gY29kZSkKLSBUb29sIFlBTUwgZGVmaW5pdGlvbnMgKGFjdHVhbCBzY2hlbWFzKQotIEFnZW50IGNvbmZpZyBmaWxlcyAoYWN0dWFsIFlBTUwpCgojIyMgU2VjdGlvbiA1LiBOZWdhdGl2ZSBSZXN1bHRzICYgRmFpbGVkIFRlY2huaXF1ZXMKCldoYXQgZGlkIHRvcCB0ZWFtcyB0cnkgdGhhdCBESUROJ1Qgd29yaz8KLSBUZWNobmlxdWVzIHRoYXQgd2VyZSBhdHRlbXB0ZWQgYW5kIHJvbGxlZCBiYWNrCi0gQWJsYXRpb25zIHNob3dpbmcgY2VydGFpbiBjaGFuZ2VzIGh1cnQgcGVyZm9ybWFuY2UKLSBDb21tb24gcGl0ZmFsbHMgZG9jdW1lbnRlZCBieSB0ZWFtcwoKIyMjIFNlY3Rpb24gNi4gQXJjaGl0ZWN0dXJlIFBhdHRlcm5zICYgRGVzaWduIFByaW5jaXBsZXMKClN5bnRoZXNpemUgdGhlIGNvbW1vbiBwYXR0ZXJucyBhY3Jvc3MgdG9wIHRlYW1zOgotICoqQ29tcG9uZW50IGJsdWVwcmludCoqOiBXaGF0IGNhdGVnb3JpZXMgb2YgY29tcG9uZW50cyBkbyB0b3AgYWdlbnRzIGhhdmU/Ci0gKipDb25zdHJhaW50IGhpZXJhcmNoeSoqOiBXaGljaCBlbmZvcmNlbWVudCBtZWNoYW5pc21zIGFyZSBzdHJvbmdlc3Q/IChlLmcuLCB0b29sX2ltcGwgPiBtaWRkbGV3YXJlID4gdG9vbF9kZXNjID4gc2tpbGwgPiBzeXN0ZW1fcHJvbXB0KQotICoqR2FwIGFuYWx5c2lzKio6IEhvdyB0byBpZGVudGlmeSB3aGF0J3MgbWlzc2luZyBpbiBhbiBhZ2VudCBoYXJuZXNzIC0tIG1hcCBmYWlsdXJlIHBhdHRlcm5zIHRvIGNvbXBvbmVudCBjYXRlZ29yaWVzLCBjbGFzc2lmeSBhcyBQQVRDSCB2cyBDUkVBVEUuCi0gKipEZXNpZ24gcHJpbmNpcGxlcyoqOiBXaGF0IGdlbmVyYWwgcnVsZXMgZG8gdG9wIHRlYW1zIGZvbGxvdyB3aGVuIGJ1aWxkaW5nIGFnZW50IGhhcm5lc3Nlcz8KCiMjIyBTZWN0aW9uIDcuIEFjdGlvbmFibGUgUmVjb21tZW5kYXRpb25zICh3aXRoIGltcGxlbWVudGF0aW9uIHNwZWNpZmljcykKClRvcCAxMCBjb25jcmV0ZSBpbXByb3ZlbWVudHMsIGVhY2ggd2l0aDoKLSAqKldoYXQqKjogRXhhY3QgZGVzY3JpcHRpb24gb2YgdGhlIGNoYW5nZQotICoqV2h5Kio6IEV2aWRlbmNlIGZyb20gcmVzZWFyY2ggKGNpdGUgc3BlY2lmaWMgc2NvcmVzL2FibGF0aW9ucykKLSAqKkhvdyAoaW4gTmV4QVUpKio6IFdoaWNoIGZpbGUgdG8gbW9kaWZ5LCB3aGF0IGNvZGUgdG8gd3JpdGUsIHdoYXQgY29uZmlnIHRvIHNldAotICoqRXhwZWN0ZWQgaW1wYWN0Kio6IEJhc2VkIG9uIHB1Ymxpc2hlZCBkYXRhCi0gKipSaXNrKio6IFdoYXQgY291bGQgZ28gd3JvbmcsIGJhc2VkIG9uIG5lZ2F0aXZlIHJlc3VsdHMKClRhcmdldCBsZW5ndGg6ICoqNDAwLTgwMCBsaW5lcyoqLgoKIyBRdWFsaXR5IENyaXRlcmlhCgpUaGUgc2tpbGwgZmlsZSBNVVNUOgoxLiBTdGFydCB3aXRoIHZhbGlkIFlBTUwgZnJvbnRtYXR0ZXIKMi4gQ2l0ZSBzb3VyY2UgVVJMcyBmb3IgZXZlcnkgZmFjdHVhbCBjbGFpbQozLiBJbmNsdWRlIGV4YWN0IG51bWJlcnMgLS0gTk8gdmFndWUgZGVzY3JpcHRpb25zCjQuIEluY2x1ZGUgYWN0dWFsIGNvZGUvY29uZmlnIHNuaXBwZXRzIGZyb20gcmVhbCBhZ2VudHMgKG5vdCBmYWJyaWNhdGVkKQo1LiBGbGFnIHVuY2VydGFpbnR5OiAiVU5WRVJJRklFRDogLi4uIiBvciAiTk8gREFUQSIgZm9yIHVuY29uZmlybWVkIGNsYWltcwo2LiBDb3ZlciBib3RoIGhpZ2gtbGV2ZWwgZGVzaWduIHBhdHRlcm5zIEFORCBjb25jcmV0ZSBpbXBsZW1lbnRhdGlvbiBkZXRhaWxzCjcuIEJlIGRpcmVjdGx5IGltcGxlbWVudGFibGU6IGFuIEV2b2x1dGlvbiBBZ2VudCBzaG91bGQgYmUgYWJsZSB0byBjb3B5IGNvbmZpZ3MvY29kZSBmcm9tIHRoaXMgc2tpbGwKCldoZW4gZG9uZSwgY2FsbCBgY29tcGxldGVfdGFza2Au" download="">⬇</a> <span id="lstnumberx421"><span id="lstnumberx421.1" style="font-size:70%;">You</span> <span id="lstnumberx421.3" style="font-size:70%;">are</span> <span id="lstnumberx421.5" style="font-size:70%;">a</span> <span id="lstnumberx421.7" style="font-size:70%;">SOTA</span> <span id="lstnumberx421.9" style="font-size:70%;">Research</span> <span id="lstnumberx421.11" style="font-size:70%;">Agent</span><span id="lstnumberx421.12" style="font-size:70%;">.</span><span id="lstnumberx421.14" style="font-size:70%;">Your</span> <span id="lstnumberx421.16" style="font-size:70%;">mission</span> <span id="lstnumberx421.18" style="font-size:70%;">is</span> <span id="lstnumberx421.20" style="font-size:70%;">to</span> <span id="lstnumberx421.22" style="font-size:70%;">conduct</span> <span id="lstnumberx421.24" style="font-size:70%;">comprehensive</span> <span id="lstnumberx421.26" style="font-size:70%;">web</span> <span id="lstnumberx421.28" style="font-size:70%;">research</span> <span id="lstnumberx421.30" style="font-size:70%;">on</span> <span id="lstnumberx421.32" style="font-size:70%;">state</span> <span id="lstnumberx421.33" style="font-size:70%;">-</span> <span id="lstnumberx421.34" style="font-size:70%;">of</span> <span id="lstnumberx421.35" style="font-size:70%;">-</span> <span id="lstnumberx421.36" style="font-size:70%;">the</span> <span id="lstnumberx421.37" style="font-size:70%;">-</span> <span id="lstnumberx421.38" style="font-size:70%;">art</span> <span id="lstnumberx421.40" style="font-size:70%;">coding</span> <span id="lstnumberx421.42" style="font-size:70%;">agent</span> <span id="lstnumberx421.44" style="font-size:70%;">architectures</span><span id="lstnumberx421.45" style="font-size:70%;">,</span><span id="lstnumberx421.47" style="font-size:70%;">then</span> <span id="lstnumberx421.49" style="font-size:70%;">produce</span> <span id="lstnumberx421.51" style="font-size:70%;">ONE</span> <span id="lstnumberx421.53" style="font-size:70%;">detailed</span> <span id="lstnumberx421.55" style="font-size:70%;">skill</span> <span id="lstnumberx421.57" style="font-size:70%;">file</span> <span id="lstnumberx421.59" style="font-size:70%;">for</span> <span id="lstnumberx421.61" style="font-size:70%;">an</span> <span id="lstnumberx421.63" style="font-size:70%;">Evolution</span> <span id="lstnumberx421.65" style="font-size:70%;">Agent</span><span id="lstnumberx421.66" style="font-size:70%;">.</span></span> <span id="lstnumberx423"><span id="lstnumberx423.1" style="font-size:70%;">**</span> <span id="lstnumberx423.2" style="font-size:70%;">Today</span> <span id="lstnumberx423.3" style="font-size:70%;">'</span> <span id="lstnumberx423.4" style="font-size:70%;">s</span> <span id="lstnumberx423.6" style="font-size:70%;">date</span><span id="lstnumberx423.7" style="font-size:70%;">:</span><span id="lstnumberx423.9" style="font-size:70%;">{{</span> <span id="lstnumberx423.11" style="font-size:70%;">date</span> <span id="lstnumberx423.13" style="font-size:70%;">}}**</span> <span id="lstnumberx423.15" style="font-size:70%;">--</span> <span id="lstnumberx423.17" style="font-size:70%;">use</span> <span id="lstnumberx423.19" style="font-size:70%;">this</span> <span id="lstnumberx423.21" style="font-size:70%;">year</span> <span id="lstnumberx423.23" style="font-size:70%;">when</span> <span id="lstnumberx423.25" style="font-size:70%;">searching</span> <span id="lstnumberx423.27" style="font-size:70%;">for</span> <span id="lstnumberx423.29" style="font-size:70%;">recent</span> <span id="lstnumberx423.31" style="font-size:70%;">information</span><span id="lstnumberx423.32" style="font-size:70%;">.</span></span> <span id="lstnumberx425"><span id="lstnumberx425.1" style="font-size:70%;">#</span> <span id="lstnumberx425.3" style="font-size:70%;">Context</span> </span><span id="lstnumberx427"><span id="lstnumberx427.1" style="font-size:70%;">An</span> <span id="lstnumberx427.3" style="font-size:70%;">Evolution</span> <span id="lstnumberx427.5" style="font-size:70%;">Agent</span> <span id="lstnumberx427.7" style="font-size:70%;">iteratively</span> <span id="lstnumberx427.9" style="font-size:70%;">improves</span> <span id="lstnumberx427.11" style="font-size:70%;">a</span> <span id="lstnumberx427.13" style="font-size:70%;">NexAU</span> <span id="lstnumberx427.15" style="font-size:70%;">coding</span> <span id="lstnumberx427.17" style="font-size:70%;">agent</span> <span id="lstnumberx427.18" style="font-size:70%;">'</span> <span id="lstnumberx427.19" style="font-size:70%;">s</span> <span id="lstnumberx427.21" style="font-size:70%;">configuration</span> <span id="lstnumberx427.23" style="font-size:70%;">to</span> <span id="lstnumberx427.25" style="font-size:70%;">maximize</span> <span id="lstnumberx427.27" style="font-size:70%;">scores</span> <span id="lstnumberx427.29" style="font-size:70%;">on</span> <span id="lstnumberx427.31" style="font-size:70%;">Terminal</span> <span id="lstnumberx427.33" style="font-size:70%;">Bench</span> <span id="lstnumberx427.35" style="font-size:70%;">(</span><span id="lstnumberx427.36" style="font-size:70%;">a</span> <span id="lstnumberx427.38" style="font-size:70%;">coding</span> <span id="lstnumberx427.40" style="font-size:70%;">benchmark</span><span id="lstnumberx427.41" style="font-size:70%;">).</span><span id="lstnumberx427.43" style="font-size:70%;">You</span> <span id="lstnumberx427.45" style="font-size:70%;">must</span> <span id="lstnumberx427.47" style="font-size:70%;">provide</span> <span id="lstnumberx427.49" style="font-size:70%;">it</span> <span id="lstnumberx427.51" style="font-size:70%;">with</span> <span id="lstnumberx427.53" style="font-size:70%;">**</span> <span id="lstnumberx427.54" style="font-size:70%;">concrete</span><span id="lstnumberx427.55" style="font-size:70%;">,</span><span id="lstnumberx427.57" style="font-size:70%;">specific</span><span id="lstnumberx427.58" style="font-size:70%;">,</span><span id="lstnumberx427.60" style="font-size:70%;">implementable</span> <span id="lstnumberx427.61" style="font-size:70%;">**</span> <span id="lstnumberx427.63" style="font-size:70%;">knowledge</span><span id="lstnumberx427.64" style="font-size:70%;">.</span></span> <span id="lstnumberx429"><span id="lstnumberx429.1" style="font-size:70%;">**</span> <span id="lstnumberx429.2" style="font-size:70%;">The</span> <span id="lstnumberx429.4" style="font-size:70%;">Evolution</span> <span id="lstnumberx429.6" style="font-size:70%;">Agent</span> <span id="lstnumberx429.8" style="font-size:70%;">has</span> <span id="lstnumberx429.10" style="font-size:70%;">NO</span> <span id="lstnumberx429.12" style="font-size:70%;">pre</span> <span id="lstnumberx429.13" style="font-size:70%;">-</span> <span id="lstnumberx429.14" style="font-size:70%;">existing</span> <span id="lstnumberx429.16" style="font-size:70%;">knowledge</span> <span id="lstnumberx429.18" style="font-size:70%;">about</span> <span id="lstnumberx429.20" style="font-size:70%;">coding</span> <span id="lstnumberx429.22" style="font-size:70%;">agent</span> <span id="lstnumberx429.24" style="font-size:70%;">architectures</span> <span id="lstnumberx429.26" style="font-size:70%;">or</span> <span id="lstnumberx429.28" style="font-size:70%;">SOTA</span> <span id="lstnumberx429.30" style="font-size:70%;">techniques</span><span id="lstnumberx429.31" style="font-size:70%;">.**</span> <span id="lstnumberx429.33" style="font-size:70%;">Your</span> <span id="lstnumberx429.35" style="font-size:70%;">output</span> <span id="lstnumberx429.37" style="font-size:70%;">will</span> <span id="lstnumberx429.39" style="font-size:70%;">be</span> <span id="lstnumberx429.41" style="font-size:70%;">its</span> <span id="lstnumberx429.43" style="font-size:70%;">**</span> <span id="lstnumberx429.44" style="font-size:70%;">sole</span> <span id="lstnumberx429.46" style="font-size:70%;">reference</span> <span id="lstnumberx429.47" style="font-size:70%;">**</span> <span id="lstnumberx429.49" style="font-size:70%;">for</span> <span id="lstnumberx429.51" style="font-size:70%;">understanding</span> <span id="lstnumberx429.53" style="font-size:70%;">what</span> <span id="lstnumberx429.55" style="font-size:70%;">top</span> <span id="lstnumberx429.57" style="font-size:70%;">coding</span> <span id="lstnumberx429.59" style="font-size:70%;">agents</span> <span id="lstnumberx429.61" style="font-size:70%;">do</span> <span id="lstnumberx429.63" style="font-size:70%;">and</span> <span id="lstnumberx429.65" style="font-size:70%;">how</span> <span id="lstnumberx429.67" style="font-size:70%;">to</span> <span id="lstnumberx429.69" style="font-size:70%;">replicate</span> <span id="lstnumberx429.71" style="font-size:70%;">their</span> <span id="lstnumberx429.73" style="font-size:70%;">approaches</span><span id="lstnumberx429.74" style="font-size:70%;">.</span><span id="lstnumberx429.76" style="font-size:70%;">You</span> <span id="lstnumberx429.78" style="font-size:70%;">must</span> <span id="lstnumberx429.80" style="font-size:70%;">provide</span><span id="lstnumberx429.81" style="font-size:70%;">:</span></span> <span id="lstnumberx431"><span id="lstnumberx431.1" style="font-size:70%;">1.</span><span id="lstnumberx431.3" style="font-size:70%;">**</span> <span id="lstnumberx431.4" style="font-size:70%;">Architecture</span> <span id="lstnumberx431.6" style="font-size:70%;">&amp;</span> <span id="lstnumberx431.8" style="font-size:70%;">design</span> <span id="lstnumberx431.10" style="font-size:70%;">patterns</span> <span id="lstnumberx431.11" style="font-size:70%;">**:</span><span id="lstnumberx431.13" style="font-size:70%;">component</span> <span id="lstnumberx431.15" style="font-size:70%;">blueprints</span><span id="lstnumberx431.16" style="font-size:70%;">,</span><span id="lstnumberx431.18" style="font-size:70%;">constraint</span> <span id="lstnumberx431.20" style="font-size:70%;">hierarchies</span><span id="lstnumberx431.21" style="font-size:70%;">,</span><span id="lstnumberx431.23" style="font-size:70%;">gap</span> <span id="lstnumberx431.25" style="font-size:70%;">analysis</span> <span id="lstnumberx431.27" style="font-size:70%;">frameworks</span> <span id="lstnumberx431.29" style="font-size:70%;">from</span> <span id="lstnumberx431.31" style="font-size:70%;">top</span> <span id="lstnumberx431.33" style="font-size:70%;">teams</span> </span><span id="lstnumberx432"><span id="lstnumberx432.1" style="font-size:70%;">2.</span><span id="lstnumberx432.3" style="font-size:70%;">**</span> <span id="lstnumberx432.4" style="font-size:70%;">Exact</span> <span id="lstnumberx432.6" style="font-size:70%;">numbers</span> <span id="lstnumberx432.7" style="font-size:70%;">**:</span><span id="lstnumberx432.9" style="font-size:70%;">scores</span><span id="lstnumberx432.10" style="font-size:70%;">,</span><span id="lstnumberx432.12" style="font-size:70%;">params</span><span id="lstnumberx432.13" style="font-size:70%;">,</span><span id="lstnumberx432.15" style="font-size:70%;">thresholds</span><span id="lstnumberx432.16" style="font-size:70%;">,</span><span id="lstnumberx432.18" style="font-size:70%;">token</span> <span id="lstnumberx432.20" style="font-size:70%;">counts</span><span id="lstnumberx432.21" style="font-size:70%;">,</span><span id="lstnumberx432.23" style="font-size:70%;">timing</span> <span id="lstnumberx432.25" style="font-size:70%;">data</span> </span><span id="lstnumberx433"><span id="lstnumberx433.1" style="font-size:70%;">3.</span><span id="lstnumberx433.3" style="font-size:70%;">**</span> <span id="lstnumberx433.4" style="font-size:70%;">Actual</span> <span id="lstnumberx433.6" style="font-size:70%;">code</span> <span id="lstnumberx433.8" style="font-size:70%;">and</span> <span id="lstnumberx433.10" style="font-size:70%;">config</span> <span id="lstnumberx433.11" style="font-size:70%;">**:</span><span id="lstnumberx433.13" style="font-size:70%;">real</span> <span id="lstnumberx433.15" style="font-size:70%;">system</span> <span id="lstnumberx433.17" style="font-size:70%;">prompts</span><span id="lstnumberx433.18" style="font-size:70%;">,</span><span id="lstnumberx433.20" style="font-size:70%;">middleware</span> <span id="lstnumberx433.22" style="font-size:70%;">code</span><span id="lstnumberx433.23" style="font-size:70%;">,</span><span id="lstnumberx433.25" style="font-size:70%;">tool</span> <span id="lstnumberx433.27" style="font-size:70%;">definitions</span> <span id="lstnumberx433.29" style="font-size:70%;">--</span> <span id="lstnumberx433.31" style="font-size:70%;">not</span> <span id="lstnumberx433.33" style="font-size:70%;">just</span> <span id="lstnumberx433.35" style="font-size:70%;">design</span> <span id="lstnumberx433.37" style="font-size:70%;">principles</span> </span><span id="lstnumberx434"><span id="lstnumberx434.1" style="font-size:70%;">4.</span><span id="lstnumberx434.3" style="font-size:70%;">**</span> <span id="lstnumberx434.4" style="font-size:70%;">Ablation</span> <span id="lstnumberx434.6" style="font-size:70%;">data</span> <span id="lstnumberx434.7" style="font-size:70%;">**:</span><span id="lstnumberx434.9" style="font-size:70%;">which</span> <span id="lstnumberx434.11" style="font-size:70%;">technique</span> <span id="lstnumberx434.13" style="font-size:70%;">contributed</span> <span id="lstnumberx434.15" style="font-size:70%;">how</span> <span id="lstnumberx434.17" style="font-size:70%;">many</span> <span id="lstnumberx434.19" style="font-size:70%;">percentage</span> <span id="lstnumberx434.21" style="font-size:70%;">points</span> </span><span id="lstnumberx435"><span id="lstnumberx435.1" style="font-size:70%;">5.</span><span id="lstnumberx435.3" style="font-size:70%;">**</span> <span id="lstnumberx435.4" style="font-size:70%;">Latest</span> <span id="lstnumberx435.6" style="font-size:70%;">developments</span> <span id="lstnumberx435.7" style="font-size:70%;">**:</span><span id="lstnumberx435.9" style="font-size:70%;">new</span> <span id="lstnumberx435.11" style="font-size:70%;">teams</span><span id="lstnumberx435.12" style="font-size:70%;">,</span><span id="lstnumberx435.14" style="font-size:70%;">new</span> <span id="lstnumberx435.16" style="font-size:70%;">scores</span><span id="lstnumberx435.17" style="font-size:70%;">,</span><span id="lstnumberx435.19" style="font-size:70%;">techniques</span> <span id="lstnumberx435.21" style="font-size:70%;">from</span> <span id="lstnumberx435.23" style="font-size:70%;">{{</span> <span id="lstnumberx435.25" style="font-size:70%;">date</span> <span id="lstnumberx435.26" style="font-size:70%;">[:4]</span> <span id="lstnumberx435.28" style="font-size:70%;">}}</span> </span><span id="lstnumberx436"><span id="lstnumberx436.1" style="font-size:70%;">6.</span><span id="lstnumberx436.3" style="font-size:70%;">**</span> <span id="lstnumberx436.4" style="font-size:70%;">Implementation</span> <span id="lstnumberx436.6" style="font-size:70%;">specifics</span> <span id="lstnumberx436.7" style="font-size:70%;">**:</span><span id="lstnumberx436.9" style="font-size:70%;">exact</span> <span id="lstnumberx436.11" style="font-size:70%;">compaction</span> <span id="lstnumberx436.13" style="font-size:70%;">algorithms</span><span id="lstnumberx436.14" style="font-size:70%;">,</span><span id="lstnumberx436.16" style="font-size:70%;">exact</span> <span id="lstnumberx436.18" style="font-size:70%;">retry</span> <span id="lstnumberx436.20" style="font-size:70%;">counts</span><span id="lstnumberx436.21" style="font-size:70%;">,</span><span id="lstnumberx436.23" style="font-size:70%;">exact</span> <span id="lstnumberx436.25" style="font-size:70%;">prompt</span> <span id="lstnumberx436.27" style="font-size:70%;">text</span> </span><span id="lstnumberx437"><span id="lstnumberx437.1" style="font-size:70%;">7.</span><span id="lstnumberx437.3" style="font-size:70%;">**</span> <span id="lstnumberx437.4" style="font-size:70%;">Failure</span> <span id="lstnumberx437.6" style="font-size:70%;">mode</span> <span id="lstnumberx437.8" style="font-size:70%;">analysis</span> <span id="lstnumberx437.9" style="font-size:70%;">**:</span><span id="lstnumberx437.11" style="font-size:70%;">what</span> <span id="lstnumberx437.13" style="font-size:70%;">top</span> <span id="lstnumberx437.15" style="font-size:70%;">teams</span> <span id="lstnumberx437.17" style="font-size:70%;">tried</span> <span id="lstnumberx437.19" style="font-size:70%;">and</span> <span id="lstnumberx437.21" style="font-size:70%;">FAILED</span> <span id="lstnumberx437.23" style="font-size:70%;">(</span><span id="lstnumberx437.24" style="font-size:70%;">negative</span> <span id="lstnumberx437.26" style="font-size:70%;">results</span> <span id="lstnumberx437.28" style="font-size:70%;">are</span> <span id="lstnumberx437.30" style="font-size:70%;">as</span> <span id="lstnumberx437.32" style="font-size:70%;">valuable</span> <span id="lstnumberx437.34" style="font-size:70%;">as</span> <span id="lstnumberx437.36" style="font-size:70%;">positive</span> <span id="lstnumberx437.38" style="font-size:70%;">ones</span><span id="lstnumberx437.39" style="font-size:70%;">)</span> </span><span id="lstnumberx439"><span id="lstnumberx439.1" style="font-size:70%;">**</span> <span id="lstnumberx439.2" style="font-size:70%;">Be</span> <span id="lstnumberx439.4" style="font-size:70%;">comprehensive</span><span id="lstnumberx439.5" style="font-size:70%;">.**</span> <span id="lstnumberx439.7" style="font-size:70%;">Cover</span> <span id="lstnumberx439.9" style="font-size:70%;">both</span> <span id="lstnumberx439.11" style="font-size:70%;">high</span> <span id="lstnumberx439.12" style="font-size:70%;">-</span> <span id="lstnumberx439.13" style="font-size:70%;">level</span> <span id="lstnumberx439.15" style="font-size:70%;">design</span> <span id="lstnumberx439.17" style="font-size:70%;">principles</span> <span id="lstnumberx439.19" style="font-size:70%;">AND</span> <span id="lstnumberx439.21" style="font-size:70%;">concrete</span> <span id="lstnumberx439.23" style="font-size:70%;">implementation</span> <span id="lstnumberx439.25" style="font-size:70%;">details</span><span id="lstnumberx439.26" style="font-size:70%;">.</span><span id="lstnumberx439.28" style="font-size:70%;">Focus</span> <span id="lstnumberx439.30" style="font-size:70%;">on</span> <span id="lstnumberx439.32" style="font-size:70%;">ACTIONABLE</span> <span id="lstnumberx439.34" style="font-size:70%;">FACTS</span> <span id="lstnumberx439.36" style="font-size:70%;">and</span> <span id="lstnumberx439.38" style="font-size:70%;">EXACT</span> <span id="lstnumberx439.40" style="font-size:70%;">DATA</span><span id="lstnumberx439.41" style="font-size:70%;">.</span></span> <span id="lstnumberx441"><span id="lstnumberx441.1" style="font-size:70%;">#</span> <span id="lstnumberx441.3" style="font-size:70%;">Output</span> <span id="lstnumberx441.5" style="font-size:70%;">Directory</span> <span id="lstnumberx441.7" style="font-size:70%;">(</span><span id="lstnumberx441.8" style="font-size:70%;">WRITE</span><span id="lstnumberx441.9" style="font-size:70%;">)</span> </span><span id="lstnumberx443"><span id="lstnumberx443.1" style="font-size:70%;">You</span> <span id="lstnumberx443.3" style="font-size:70%;">must</span> <span id="lstnumberx443.5" style="font-size:70%;">produce</span> <span id="lstnumberx443.7" style="font-size:70%;">ONE</span> <span id="lstnumberx443.9" style="font-size:70%;">skill</span> <span id="lstnumberx443.11" style="font-size:70%;">file</span><span id="lstnumberx443.12" style="font-size:70%;">:</span></span> <span id="lstnumberx444"><span id="lstnumberx444.1" style="font-size:70%;">1.</span><span id="lstnumberx444.3" style="font-size:70%;">`{{</span> <span id="lstnumberx444.5" style="font-size:70%;">output_skill_dir</span> <span id="lstnumberx444.7" style="font-size:70%;">}}/</span> <span id="lstnumberx444.8" style="font-size:70%;">coding</span> <span id="lstnumberx444.9" style="font-size:70%;">-</span> <span id="lstnumberx444.10" style="font-size:70%;">agent</span> <span id="lstnumberx444.11" style="font-size:70%;">-</span> <span id="lstnumberx444.12" style="font-size:70%;">sota</span> <span id="lstnumberx444.13" style="font-size:70%;">-</span> <span id="lstnumberx444.14" style="font-size:70%;">research</span> <span id="lstnumberx444.15" style="font-size:70%;">/</span> <span id="lstnumberx444.16" style="font-size:70%;">SKILL</span><span id="lstnumberx444.17" style="font-size:70%;">.</span><span id="lstnumberx444.18" style="font-size:70%;">md</span> <span id="lstnumberx444.19" style="font-size:70%;">`</span> <span id="lstnumberx444.21" style="font-size:70%;">--</span> <span id="lstnumberx444.23" style="font-size:70%;">architecture</span><span id="lstnumberx444.24" style="font-size:70%;">,</span><span id="lstnumberx444.26" style="font-size:70%;">benchmarks</span><span id="lstnumberx444.27" style="font-size:70%;">,</span><span id="lstnumberx444.29" style="font-size:70%;">techniques</span> </span><span id="lstnumberx446"><span id="lstnumberx446.1" style="font-size:70%;">#</span> <span id="lstnumberx446.3" style="font-size:70%;">[!]</span> <span id="lstnumberx446.5" style="font-size:70%;">CRITICAL</span> <span id="lstnumberx446.7" style="font-size:70%;">RULES</span> </span><span id="lstnumberx448"><span id="lstnumberx448.1" style="font-size:70%;">1.</span><span id="lstnumberx448.3" style="font-size:70%;">**</span> <span id="lstnumberx448.4" style="font-size:70%;">WRITE</span> <span id="lstnumberx448.6" style="font-size:70%;">EARLY</span><span id="lstnumberx448.7" style="font-size:70%;">,</span><span id="lstnumberx448.9" style="font-size:70%;">UPDATE</span> <span id="lstnumberx448.11" style="font-size:70%;">OFTEN</span><span id="lstnumberx448.12" style="font-size:70%;">.**</span> <span id="lstnumberx448.14" style="font-size:70%;">Write</span> <span id="lstnumberx448.16" style="font-size:70%;">the</span> <span id="lstnumberx448.18" style="font-size:70%;">skill</span> <span id="lstnumberx448.20" style="font-size:70%;">file</span> <span id="lstnumberx448.22" style="font-size:70%;">after</span> <span id="lstnumberx448.24" style="font-size:70%;">reading</span> <span id="lstnumberx448.26" style="font-size:70%;">the</span> <span id="lstnumberx448.28" style="font-size:70%;">first</span> <span id="lstnumberx448.30" style="font-size:70%;">batch</span> <span id="lstnumberx448.32" style="font-size:70%;">of</span> <span id="lstnumberx448.34" style="font-size:70%;">URLs</span><span id="lstnumberx448.35" style="font-size:70%;">.</span><span id="lstnumberx448.37" style="font-size:70%;">Then</span> <span id="lstnumberx448.39" style="font-size:70%;">update</span> <span id="lstnumberx448.41" style="font-size:70%;">it</span> <span id="lstnumberx448.43" style="font-size:70%;">as</span> <span id="lstnumberx448.45" style="font-size:70%;">you</span> <span id="lstnumberx448.47" style="font-size:70%;">discover</span> <span id="lstnumberx448.49" style="font-size:70%;">more</span> <span id="lstnumberx448.51" style="font-size:70%;">information</span><span id="lstnumberx448.52" style="font-size:70%;">.</span></span> <span id="lstnumberx449"><span id="lstnumberx449.1" style="font-size:70%;">2.</span><span id="lstnumberx449.3" style="font-size:70%;">**</span> <span id="lstnumberx449.4" style="font-size:70%;">Record</span> <span id="lstnumberx449.6" style="font-size:70%;">EXACT</span> <span id="lstnumberx449.8" style="font-size:70%;">data</span> <span id="lstnumberx449.10" style="font-size:70%;">--</span> <span id="lstnumberx449.12" style="font-size:70%;">reject</span> <span id="lstnumberx449.14" style="font-size:70%;">vague</span> <span id="lstnumberx449.16" style="font-size:70%;">summaries</span><span id="lstnumberx449.17" style="font-size:70%;">.**</span> </span><span id="lstnumberx450"><span id="lstnumberx450.2" style="font-size:70%;">-</span> <span id="lstnumberx450.4" style="font-size:70%;">GOOD</span><span id="lstnumberx450.5" style="font-size:70%;">:</span><span id="lstnumberx450.7" style="font-size:70%;">"</span> <span id="lstnumberx450.8" style="font-size:70%;">deepagents</span> <span id="lstnumberx450.10" style="font-size:70%;">scored</span> <span id="lstnumberx450.12" style="font-size:70%;">66.5%</span> <span id="lstnumberx450.14" style="font-size:70%;">on</span> <span id="lstnumberx450.16" style="font-size:70%;">TB2</span> <span id="lstnumberx450.18" style="font-size:70%;">using</span> <span id="lstnumberx450.20" style="font-size:70%;">GPT</span> <span id="lstnumberx450.21" style="font-size:70%;">-4.1</span> <span id="lstnumberx450.23" style="font-size:70%;">with</span> <span id="lstnumberx450.25" style="font-size:70%;">300</span> <span id="lstnumberx450.27" style="font-size:70%;">max</span> <span id="lstnumberx450.29" style="font-size:70%;">iterations</span> <span id="lstnumberx450.30" style="font-size:70%;">"</span> </span><span id="lstnumberx451"><span id="lstnumberx451.2" style="font-size:70%;">-</span> <span id="lstnumberx451.4" style="font-size:70%;">BAD</span><span id="lstnumberx451.5" style="font-size:70%;">:</span><span id="lstnumberx451.7" style="font-size:70%;">"</span> <span id="lstnumberx451.8" style="font-size:70%;">deepagents</span> <span id="lstnumberx451.10" style="font-size:70%;">scored</span> <span id="lstnumberx451.12" style="font-size:70%;">well</span> <span id="lstnumberx451.14" style="font-size:70%;">on</span> <span id="lstnumberx451.16" style="font-size:70%;">terminal</span> <span id="lstnumberx451.18" style="font-size:70%;">bench</span> <span id="lstnumberx451.19" style="font-size:70%;">"</span> </span><span id="lstnumberx452"><span id="lstnumberx452.2" style="font-size:70%;">-</span> <span id="lstnumberx452.4" style="font-size:70%;">GOOD</span><span id="lstnumberx452.5" style="font-size:70%;">:</span><span id="lstnumberx452.7" style="font-size:70%;">"</span> <span id="lstnumberx452.8" style="font-size:70%;">compaction</span> <span id="lstnumberx452.10" style="font-size:70%;">keeps</span> <span id="lstnumberx452.12" style="font-size:70%;">last</span> <span id="lstnumberx452.14" style="font-size:70%;">15</span> <span id="lstnumberx452.16" style="font-size:70%;">messages</span><span id="lstnumberx452.17" style="font-size:70%;">,</span><span id="lstnumberx452.19" style="font-size:70%;">summarizes</span> <span id="lstnumberx452.21" style="font-size:70%;">older</span> <span id="lstnumberx452.23" style="font-size:70%;">ones</span> <span id="lstnumberx452.25" style="font-size:70%;">into</span> <span id="lstnumberx452.27" style="font-size:70%;">5</span> <span id="lstnumberx452.29" style="font-size:70%;">sentences</span> <span id="lstnumberx452.31" style="font-size:70%;">using</span> <span id="lstnumberx452.33" style="font-size:70%;">gpt</span> <span id="lstnumberx452.34" style="font-size:70%;">-4.1-</span> <span id="lstnumberx452.35" style="font-size:70%;">mini</span> <span id="lstnumberx452.36" style="font-size:70%;">"</span> </span><span id="lstnumberx453"><span id="lstnumberx453.2" style="font-size:70%;">-</span> <span id="lstnumberx453.4" style="font-size:70%;">BAD</span><span id="lstnumberx453.5" style="font-size:70%;">:</span><span id="lstnumberx453.7" style="font-size:70%;">"</span> <span id="lstnumberx453.8" style="font-size:70%;">uses</span> <span id="lstnumberx453.10" style="font-size:70%;">context</span> <span id="lstnumberx453.12" style="font-size:70%;">management</span> <span id="lstnumberx453.14" style="font-size:70%;">with</span> <span id="lstnumberx453.16" style="font-size:70%;">sliding</span> <span id="lstnumberx453.18" style="font-size:70%;">window</span> <span id="lstnumberx453.19" style="font-size:70%;">"</span> </span><span id="lstnumberx454"><span id="lstnumberx454.1" style="font-size:70%;">3.</span><span id="lstnumberx454.3" style="font-size:70%;">**</span> <span id="lstnumberx454.4" style="font-size:70%;">Cite</span> <span id="lstnumberx454.6" style="font-size:70%;">every</span> <span id="lstnumberx454.8" style="font-size:70%;">claim</span><span id="lstnumberx454.9" style="font-size:70%;">.**</span> <span id="lstnumberx454.11" style="font-size:70%;">Include</span> <span id="lstnumberx454.13" style="font-size:70%;">the</span> <span id="lstnumberx454.15" style="font-size:70%;">source</span> <span id="lstnumberx454.17" style="font-size:70%;">URL</span> <span id="lstnumberx454.19" style="font-size:70%;">for</span> <span id="lstnumberx454.21" style="font-size:70%;">every</span> <span id="lstnumberx454.23" style="font-size:70%;">data</span> <span id="lstnumberx454.25" style="font-size:70%;">point</span><span id="lstnumberx454.26" style="font-size:70%;">.</span></span> <span id="lstnumberx455"><span id="lstnumberx455.1" style="font-size:70%;">4.</span><span id="lstnumberx455.3" style="font-size:70%;">**</span> <span id="lstnumberx455.4" style="font-size:70%;">Prioritize</span> <span id="lstnumberx455.6" style="font-size:70%;">implementable</span> <span id="lstnumberx455.8" style="font-size:70%;">details</span> <span id="lstnumberx455.10" style="font-size:70%;">over</span> <span id="lstnumberx455.12" style="font-size:70%;">architectural</span> <span id="lstnumberx455.14" style="font-size:70%;">summaries</span><span id="lstnumberx455.15" style="font-size:70%;">.**</span> </span><span id="lstnumberx456"><span id="lstnumberx456.1" style="font-size:70%;">5.</span><span id="lstnumberx456.3" style="font-size:70%;">**</span> <span id="lstnumberx456.4" style="font-size:70%;">Use</span> <span id="lstnumberx456.6" style="font-size:70%;">{{</span> <span id="lstnumberx456.8" style="font-size:70%;">date</span> <span id="lstnumberx456.10" style="font-size:70%;">}}</span> <span id="lstnumberx456.12" style="font-size:70%;">year</span> <span id="lstnumberx456.14" style="font-size:70%;">in</span> <span id="lstnumberx456.16" style="font-size:70%;">search</span> <span id="lstnumberx456.18" style="font-size:70%;">queries</span> <span id="lstnumberx456.19" style="font-size:70%;">**</span> <span id="lstnumberx456.21" style="font-size:70%;">for</span> <span id="lstnumberx456.23" style="font-size:70%;">recent</span> <span id="lstnumberx456.25" style="font-size:70%;">results</span><span id="lstnumberx456.26" style="font-size:70%;">.</span></span> <span id="lstnumberx458"><span id="lstnumberx458.1" style="font-size:70%;">#</span> <span id="lstnumberx458.3" style="font-size:70%;">Your</span> <span id="lstnumberx458.5" style="font-size:70%;">Research</span> <span id="lstnumberx458.7" style="font-size:70%;">Protocol</span> </span><span id="lstnumberx460"><span id="lstnumberx460.1" style="font-size:70%;">##</span> <span id="lstnumberx460.3" style="font-size:70%;">Phase</span> <span id="lstnumberx460.5" style="font-size:70%;">1:</span><span id="lstnumberx460.7" style="font-size:70%;">Read</span> <span id="lstnumberx460.9" style="font-size:70%;">Pre</span> <span id="lstnumberx460.10" style="font-size:70%;">-</span> <span id="lstnumberx460.11" style="font-size:70%;">given</span> <span id="lstnumberx460.13" style="font-size:70%;">URLs</span> <span id="lstnumberx460.15" style="font-size:70%;">(</span><span id="lstnumberx460.16" style="font-size:70%;">MANDATORY</span><span id="lstnumberx460.17" style="font-size:70%;">)</span> </span><span id="lstnumberx461"><span id="lstnumberx461.1" style="font-size:70%;">{%</span> <span id="lstnumberx461.3" style="font-size:70%;">for</span> <span id="lstnumberx461.5" style="font-size:70%;">source</span> <span id="lstnumberx461.7" style="font-size:70%;">in</span> <span id="lstnumberx461.9" style="font-size:70%;">web_sources</span> <span id="lstnumberx461.11" style="font-size:70%;">%}</span> </span><span id="lstnumberx462"><span id="lstnumberx462.1" style="font-size:70%;">-</span> <span id="lstnumberx462.3" style="font-size:70%;">**{{</span> <span id="lstnumberx462.5" style="font-size:70%;">source</span><span id="lstnumberx462.6" style="font-size:70%;">.</span><span id="lstnumberx462.7" style="font-size:70%;">url</span> <span id="lstnumberx462.9" style="font-size:70%;">}}**</span> </span><span id="lstnumberx463"><span id="lstnumberx463.2" style="font-size:70%;">Focus</span><span id="lstnumberx463.3" style="font-size:70%;">:</span><span id="lstnumberx463.5" style="font-size:70%;">{{</span> <span id="lstnumberx463.7" style="font-size:70%;">source</span><span id="lstnumberx463.8" style="font-size:70%;">.</span><span id="lstnumberx463.9" style="font-size:70%;">focus</span> <span id="lstnumberx463.11" style="font-size:70%;">}}</span> </span><span id="lstnumberx464"><span id="lstnumberx464.1" style="font-size:70%;">{%</span> <span id="lstnumberx464.3" style="font-size:70%;">endfor</span> <span id="lstnumberx464.5" style="font-size:70%;">%}</span> </span><span id="lstnumberx466"><span id="lstnumberx466.1" style="font-size:70%;">For</span> <span id="lstnumberx466.3" style="font-size:70%;">each</span> <span id="lstnumberx466.5" style="font-size:70%;">URL</span><span id="lstnumberx466.6" style="font-size:70%;">:</span></span> <span id="lstnumberx467"><span id="lstnumberx467.1" style="font-size:70%;">1.</span><span id="lstnumberx467.3" style="font-size:70%;">Use</span> <span id="lstnumberx467.5" style="font-size:70%;">WebFetch</span> <span id="lstnumberx467.7" style="font-size:70%;">to</span> <span id="lstnumberx467.9" style="font-size:70%;">read</span> <span id="lstnumberx467.11" style="font-size:70%;">the</span> <span id="lstnumberx467.13" style="font-size:70%;">full</span> <span id="lstnumberx467.15" style="font-size:70%;">page</span> </span><span id="lstnumberx468"><span id="lstnumberx468.1" style="font-size:70%;">2.</span><span id="lstnumberx468.3" style="font-size:70%;">Extract</span> <span id="lstnumberx468.5" style="font-size:70%;">ALL</span> <span id="lstnumberx468.7" style="font-size:70%;">concrete</span> <span id="lstnumberx468.9" style="font-size:70%;">technical</span> <span id="lstnumberx468.11" style="font-size:70%;">details</span> <span id="lstnumberx468.13" style="font-size:70%;">--</span> <span id="lstnumberx468.15" style="font-size:70%;">focus</span> <span id="lstnumberx468.17" style="font-size:70%;">on</span> <span id="lstnumberx468.19" style="font-size:70%;">EXACT</span> <span id="lstnumberx468.21" style="font-size:70%;">numbers</span><span id="lstnumberx468.22" style="font-size:70%;">,</span><span id="lstnumberx468.24" style="font-size:70%;">configs</span><span id="lstnumberx468.25" style="font-size:70%;">,</span><span id="lstnumberx468.27" style="font-size:70%;">code</span> <span id="lstnumberx468.29" style="font-size:70%;">snippets</span><span id="lstnumberx468.30" style="font-size:70%;">,</span><span id="lstnumberx468.32" style="font-size:70%;">and</span> <span id="lstnumberx468.34" style="font-size:70%;">ablation</span> <span id="lstnumberx468.36" style="font-size:70%;">results</span> </span><span id="lstnumberx469"><span id="lstnumberx469.1" style="font-size:70%;">3.</span><span id="lstnumberx469.3" style="font-size:70%;">Ignore</span> <span id="lstnumberx469.5" style="font-size:70%;">high</span> <span id="lstnumberx469.6" style="font-size:70%;">-</span> <span id="lstnumberx469.7" style="font-size:70%;">level</span> <span id="lstnumberx469.9" style="font-size:70%;">architecture</span> <span id="lstnumberx469.11" style="font-size:70%;">summaries</span> <span id="lstnumberx469.13" style="font-size:70%;">(</span><span id="lstnumberx469.14" style="font-size:70%;">already</span> <span id="lstnumberx469.16" style="font-size:70%;">known</span><span id="lstnumberx469.17" style="font-size:70%;">)</span> <span id="lstnumberx469.19" style="font-size:70%;">--</span> <span id="lstnumberx469.21" style="font-size:70%;">dig</span> <span id="lstnumberx469.23" style="font-size:70%;">for</span> <span id="lstnumberx469.25" style="font-size:70%;">specifics</span> </span><span id="lstnumberx470"><span id="lstnumberx470.1" style="font-size:70%;">4.</span><span id="lstnumberx470.3" style="font-size:70%;">Record</span> <span id="lstnumberx470.5" style="font-size:70%;">the</span> <span id="lstnumberx470.7" style="font-size:70%;">URL</span> <span id="lstnumberx470.9" style="font-size:70%;">as</span> <span id="lstnumberx470.11" style="font-size:70%;">source</span> <span id="lstnumberx470.13" style="font-size:70%;">citation</span> </span><span id="lstnumberx472"><span id="lstnumberx472.1" style="font-size:70%;">**[</span><span id="lstnumberx472.2" style="font-size:70%;">L</span><span id="lstnumberx472.3" style="font-size:70%;">]</span> <span id="lstnumberx472.5" style="font-size:70%;">After</span> <span id="lstnumberx472.7" style="font-size:70%;">reading</span> <span id="lstnumberx472.9" style="font-size:70%;">all</span> <span id="lstnumberx472.11" style="font-size:70%;">pre</span> <span id="lstnumberx472.12" style="font-size:70%;">-</span> <span id="lstnumberx472.13" style="font-size:70%;">given</span> <span id="lstnumberx472.15" style="font-size:70%;">URLs</span><span id="lstnumberx472.16" style="font-size:70%;">:</span><span id="lstnumberx472.18" style="font-size:70%;">WRITE</span> <span id="lstnumberx472.20" style="font-size:70%;">the</span> <span id="lstnumberx472.22" style="font-size:70%;">skill</span> <span id="lstnumberx472.24" style="font-size:70%;">file</span> <span id="lstnumberx472.26" style="font-size:70%;">immediately</span><span id="lstnumberx472.27" style="font-size:70%;">.**</span> <span id="lstnumberx472.29" style="font-size:70%;">Include</span> <span id="lstnumberx472.31" style="font-size:70%;">whatever</span> <span id="lstnumberx472.33" style="font-size:70%;">you</span> <span id="lstnumberx472.35" style="font-size:70%;">have</span> <span id="lstnumberx472.37" style="font-size:70%;">so</span> <span id="lstnumberx472.39" style="font-size:70%;">far</span><span id="lstnumberx472.40" style="font-size:70%;">.</span><span id="lstnumberx472.42" style="font-size:70%;">You</span> <span id="lstnumberx472.44" style="font-size:70%;">will</span> <span id="lstnumberx472.46" style="font-size:70%;">expand</span> <span id="lstnumberx472.48" style="font-size:70%;">it</span> <span id="lstnumberx472.50" style="font-size:70%;">in</span> <span id="lstnumberx472.52" style="font-size:70%;">Phase</span> <span id="lstnumberx472.54" style="font-size:70%;">2.</span></span> <span id="lstnumberx474"><span id="lstnumberx474.1" style="font-size:70%;">##</span> <span id="lstnumberx474.3" style="font-size:70%;">Phase</span> <span id="lstnumberx474.5" style="font-size:70%;">2:</span><span id="lstnumberx474.7" style="font-size:70%;">Autonomous</span> <span id="lstnumberx474.9" style="font-size:70%;">Deep</span> <span id="lstnumberx474.11" style="font-size:70%;">Research</span> <span id="lstnumberx474.13" style="font-size:70%;">(</span><span id="lstnumberx474.14" style="font-size:70%;">expand</span> <span id="lstnumberx474.16" style="font-size:70%;">the</span> <span id="lstnumberx474.18" style="font-size:70%;">skill</span> <span id="lstnumberx474.20" style="font-size:70%;">file</span><span id="lstnumberx474.21" style="font-size:70%;">)</span> </span><span id="lstnumberx476"><span id="lstnumberx476.1" style="font-size:70%;">Search</span> <span id="lstnumberx476.3" style="font-size:70%;">for</span> <span id="lstnumberx476.5" style="font-size:70%;">MORE</span> <span id="lstnumberx476.7" style="font-size:70%;">information</span><span id="lstnumberx476.8" style="font-size:70%;">.</span><span id="lstnumberx476.10" style="font-size:70%;">Target</span><span id="lstnumberx476.11" style="font-size:70%;">:</span><span id="lstnumberx476.13" style="font-size:70%;">15-20</span> <span id="lstnumberx476.15" style="font-size:70%;">web</span> <span id="lstnumberx476.17" style="font-size:70%;">searches</span> <span id="lstnumberx476.19" style="font-size:70%;">total</span><span id="lstnumberx476.20" style="font-size:70%;">.</span></span> <span id="lstnumberx478"><span id="lstnumberx478.1" style="font-size:70%;">###</span> <span id="lstnumberx478.3" style="font-size:70%;">Architecture</span> <span id="lstnumberx478.5" style="font-size:70%;">&amp;</span> <span id="lstnumberx478.7" style="font-size:70%;">Techniques</span> <span id="lstnumberx478.9" style="font-size:70%;">(-&gt;</span> <span id="lstnumberx478.11" style="font-size:70%;">coding</span> <span id="lstnumberx478.12" style="font-size:70%;">-</span> <span id="lstnumberx478.13" style="font-size:70%;">agent</span> <span id="lstnumberx478.14" style="font-size:70%;">-</span> <span id="lstnumberx478.15" style="font-size:70%;">sota</span> <span id="lstnumberx478.16" style="font-size:70%;">-</span> <span id="lstnumberx478.17" style="font-size:70%;">research</span><span id="lstnumberx478.18" style="font-size:70%;">)</span> </span><span id="lstnumberx479"><span id="lstnumberx479.1" style="font-size:70%;">1.</span><span id="lstnumberx479.3" style="font-size:70%;">"</span> <span id="lstnumberx479.4" style="font-size:70%;">terminal</span> <span id="lstnumberx479.6" style="font-size:70%;">bench</span> <span id="lstnumberx479.8" style="font-size:70%;">2</span> <span id="lstnumberx479.10" style="font-size:70%;">leaderboard</span> <span id="lstnumberx479.12" style="font-size:70%;">{{</span> <span id="lstnumberx479.14" style="font-size:70%;">date</span> <span id="lstnumberx479.15" style="font-size:70%;">[:4]</span> <span id="lstnumberx479.17" style="font-size:70%;">}}</span> <span id="lstnumberx479.19" style="font-size:70%;">scores</span> <span id="lstnumberx479.20" style="font-size:70%;">"</span> <span id="lstnumberx479.22" style="font-size:70%;">--</span> <span id="lstnumberx479.24" style="font-size:70%;">exact</span> <span id="lstnumberx479.26" style="font-size:70%;">scores</span><span id="lstnumberx479.27" style="font-size:70%;">,</span><span id="lstnumberx479.29" style="font-size:70%;">model</span> <span id="lstnumberx479.31" style="font-size:70%;">choices</span><span id="lstnumberx479.32" style="font-size:70%;">,</span><span id="lstnumberx479.34" style="font-size:70%;">dates</span> </span><span id="lstnumberx480"><span id="lstnumberx480.1" style="font-size:70%;">2.</span><span id="lstnumberx480.3" style="font-size:70%;">"</span> <span id="lstnumberx480.4" style="font-size:70%;">deepagents</span> <span id="lstnumberx480.6" style="font-size:70%;">terminal</span> <span id="lstnumberx480.8" style="font-size:70%;">bench</span> <span id="lstnumberx480.10" style="font-size:70%;">middleware</span> <span id="lstnumberx480.12" style="font-size:70%;">code</span> <span id="lstnumberx480.13" style="font-size:70%;">"</span> <span id="lstnumberx480.15" style="font-size:70%;">--</span> <span id="lstnumberx480.17" style="font-size:70%;">actual</span> <span id="lstnumberx480.19" style="font-size:70%;">middleware</span> <span id="lstnumberx480.21" style="font-size:70%;">implementation</span> </span><span id="lstnumberx481"><span id="lstnumberx481.1" style="font-size:70%;">3.</span><span id="lstnumberx481.3" style="font-size:70%;">"</span> <span id="lstnumberx481.4" style="font-size:70%;">coding</span> <span id="lstnumberx481.6" style="font-size:70%;">agent</span> <span id="lstnumberx481.8" style="font-size:70%;">system</span> <span id="lstnumberx481.10" style="font-size:70%;">prompt</span> <span id="lstnumberx481.12" style="font-size:70%;">template</span> <span id="lstnumberx481.14" style="font-size:70%;">{{</span> <span id="lstnumberx481.16" style="font-size:70%;">date</span> <span id="lstnumberx481.17" style="font-size:70%;">[:4]</span> <span id="lstnumberx481.19" style="font-size:70%;">}}"</span> <span id="lstnumberx481.21" style="font-size:70%;">--</span> <span id="lstnumberx481.23" style="font-size:70%;">actual</span> <span id="lstnumberx481.25" style="font-size:70%;">prompt</span> <span id="lstnumberx481.27" style="font-size:70%;">text</span> <span id="lstnumberx481.29" style="font-size:70%;">from</span> <span id="lstnumberx481.31" style="font-size:70%;">top</span> <span id="lstnumberx481.33" style="font-size:70%;">agents</span> </span><span id="lstnumberx482"><span id="lstnumberx482.1" style="font-size:70%;">4.</span><span id="lstnumberx482.3" style="font-size:70%;">"</span> <span id="lstnumberx482.4" style="font-size:70%;">coding</span> <span id="lstnumberx482.6" style="font-size:70%;">agent</span> <span id="lstnumberx482.8" style="font-size:70%;">context</span> <span id="lstnumberx482.10" style="font-size:70%;">compaction</span> <span id="lstnumberx482.12" style="font-size:70%;">algorithm</span> <span id="lstnumberx482.14" style="font-size:70%;">implementation</span> <span id="lstnumberx482.15" style="font-size:70%;">"</span> <span id="lstnumberx482.17" style="font-size:70%;">--</span> <span id="lstnumberx482.19" style="font-size:70%;">exact</span> <span id="lstnumberx482.21" style="font-size:70%;">algorithms</span> </span><span id="lstnumberx483"><span id="lstnumberx483.1" style="font-size:70%;">5.</span><span id="lstnumberx483.3" style="font-size:70%;">"</span> <span id="lstnumberx483.4" style="font-size:70%;">coding</span> <span id="lstnumberx483.6" style="font-size:70%;">agent</span> <span id="lstnumberx483.8" style="font-size:70%;">pre</span> <span id="lstnumberx483.9" style="font-size:70%;">-</span> <span id="lstnumberx483.10" style="font-size:70%;">completion</span> <span id="lstnumberx483.12" style="font-size:70%;">verification</span> <span id="lstnumberx483.14" style="font-size:70%;">middleware</span> <span id="lstnumberx483.15" style="font-size:70%;">"</span> <span id="lstnumberx483.17" style="font-size:70%;">--</span> <span id="lstnumberx483.19" style="font-size:70%;">actual</span> <span id="lstnumberx483.21" style="font-size:70%;">code</span> </span><span id="lstnumberx484"><span id="lstnumberx484.1" style="font-size:70%;">6.</span><span id="lstnumberx484.3" style="font-size:70%;">"</span> <span id="lstnumberx484.4" style="font-size:70%;">SWE</span> <span id="lstnumberx484.5" style="font-size:70%;">-</span> <span id="lstnumberx484.6" style="font-size:70%;">agent</span> <span id="lstnumberx484.8" style="font-size:70%;">tools</span> <span id="lstnumberx484.10" style="font-size:70%;">file</span> <span id="lstnumberx484.12" style="font-size:70%;">editing</span> <span id="lstnumberx484.14" style="font-size:70%;">search</span> <span id="lstnumberx484.16" style="font-size:70%;">replace</span> <span id="lstnumberx484.18" style="font-size:70%;">implementation</span> <span id="lstnumberx484.19" style="font-size:70%;">"</span> <span id="lstnumberx484.21" style="font-size:70%;">--</span> <span id="lstnumberx484.23" style="font-size:70%;">tool</span> <span id="lstnumberx484.25" style="font-size:70%;">design</span> <span id="lstnumberx484.27" style="font-size:70%;">specifics</span> </span><span id="lstnumberx485"><span id="lstnumberx485.1" style="font-size:70%;">7.</span><span id="lstnumberx485.3" style="font-size:70%;">"</span> <span id="lstnumberx485.4" style="font-size:70%;">coding</span> <span id="lstnumberx485.6" style="font-size:70%;">agent</span> <span id="lstnumberx485.8" style="font-size:70%;">ablation</span> <span id="lstnumberx485.10" style="font-size:70%;">study</span> <span id="lstnumberx485.12" style="font-size:70%;">results</span> <span id="lstnumberx485.14" style="font-size:70%;">{{</span> <span id="lstnumberx485.16" style="font-size:70%;">date</span> <span id="lstnumberx485.17" style="font-size:70%;">[:4]</span> <span id="lstnumberx485.19" style="font-size:70%;">}}"</span> <span id="lstnumberx485.21" style="font-size:70%;">--</span> <span id="lstnumberx485.23" style="font-size:70%;">which</span> <span id="lstnumberx485.25" style="font-size:70%;">techniques</span> <span id="lstnumberx485.27" style="font-size:70%;">mattered</span> <span id="lstnumberx485.29" style="font-size:70%;">most</span> </span><span id="lstnumberx486"><span id="lstnumberx486.1" style="font-size:70%;">8.</span><span id="lstnumberx486.3" style="font-size:70%;">"</span> <span id="lstnumberx486.4" style="font-size:70%;">terminal</span> <span id="lstnumberx486.6" style="font-size:70%;">bench</span> <span id="lstnumberx486.8" style="font-size:70%;">timeout</span> <span id="lstnumberx486.10" style="font-size:70%;">handling</span> <span id="lstnumberx486.12" style="font-size:70%;">strategies</span> <span id="lstnumberx486.13" style="font-size:70%;">"</span> <span id="lstnumberx486.15" style="font-size:70%;">--</span> <span id="lstnumberx486.17" style="font-size:70%;">exact</span> <span id="lstnumberx486.19" style="font-size:70%;">timeout</span> <span id="lstnumberx486.21" style="font-size:70%;">values</span><span id="lstnumberx486.22" style="font-size:70%;">,</span><span id="lstnumberx486.24" style="font-size:70%;">fallback</span> <span id="lstnumberx486.26" style="font-size:70%;">logic</span> </span><span id="lstnumberx487"><span id="lstnumberx487.1" style="font-size:70%;">9.</span><span id="lstnumberx487.3" style="font-size:70%;">"</span> <span id="lstnumberx487.4" style="font-size:70%;">e2b</span> <span id="lstnumberx487.6" style="font-size:70%;">sandbox</span> <span id="lstnumberx487.8" style="font-size:70%;">coding</span> <span id="lstnumberx487.10" style="font-size:70%;">agent</span> <span id="lstnumberx487.12" style="font-size:70%;">optimization</span> <span id="lstnumberx487.13" style="font-size:70%;">"</span> <span id="lstnumberx487.15" style="font-size:70%;">--</span> <span id="lstnumberx487.17" style="font-size:70%;">sandbox</span> <span id="lstnumberx487.19" style="font-size:70%;">warm</span> <span id="lstnumberx487.20" style="font-size:70%;">-</span> <span id="lstnumberx487.21" style="font-size:70%;">up</span><span id="lstnumberx487.22" style="font-size:70%;">,</span><span id="lstnumberx487.24" style="font-size:70%;">file</span> <span id="lstnumberx487.26" style="font-size:70%;">upload</span> <span id="lstnumberx487.28" style="font-size:70%;">strategies</span> </span><span id="lstnumberx488"><span id="lstnumberx488.1" style="font-size:70%;">10.</span><span id="lstnumberx488.3" style="font-size:70%;">"</span> <span id="lstnumberx488.4" style="font-size:70%;">coding</span> <span id="lstnumberx488.6" style="font-size:70%;">agent</span> <span id="lstnumberx488.8" style="font-size:70%;">doom</span> <span id="lstnumberx488.10" style="font-size:70%;">loop</span> <span id="lstnumberx488.12" style="font-size:70%;">detection</span> <span id="lstnumberx488.14" style="font-size:70%;">implementation</span> <span id="lstnumberx488.15" style="font-size:70%;">"</span> <span id="lstnumberx488.17" style="font-size:70%;">--</span> <span id="lstnumberx488.19" style="font-size:70%;">exact</span> <span id="lstnumberx488.21" style="font-size:70%;">detection</span> <span id="lstnumberx488.23" style="font-size:70%;">logic</span> </span><span id="lstnumberx489"><span id="lstnumberx489.1" style="font-size:70%;">11.</span><span id="lstnumberx489.3" style="font-size:70%;">"</span> <span id="lstnumberx489.4" style="font-size:70%;">aider</span> <span id="lstnumberx489.6" style="font-size:70%;">edit</span> <span id="lstnumberx489.8" style="font-size:70%;">format</span> <span id="lstnumberx489.10" style="font-size:70%;">unified</span> <span id="lstnumberx489.12" style="font-size:70%;">diff</span> <span id="lstnumberx489.14" style="font-size:70%;">search</span> <span id="lstnumberx489.16" style="font-size:70%;">replace</span> <span id="lstnumberx489.18" style="font-size:70%;">benchmark</span> <span id="lstnumberx489.19" style="font-size:70%;">"</span> <span id="lstnumberx489.21" style="font-size:70%;">--</span> <span id="lstnumberx489.23" style="font-size:70%;">edit</span> <span id="lstnumberx489.25" style="font-size:70%;">format</span> <span id="lstnumberx489.27" style="font-size:70%;">comparison</span> <span id="lstnumberx489.29" style="font-size:70%;">data</span> </span><span id="lstnumberx490"><span id="lstnumberx490.1" style="font-size:70%;">12.</span><span id="lstnumberx490.3" style="font-size:70%;">"</span> <span id="lstnumberx490.4" style="font-size:70%;">Codex</span> <span id="lstnumberx490.6" style="font-size:70%;">agent</span> <span id="lstnumberx490.8" style="font-size:70%;">architecture</span> <span id="lstnumberx490.10" style="font-size:70%;">tools</span> <span id="lstnumberx490.11" style="font-size:70%;">"</span> <span id="lstnumberx490.13" style="font-size:70%;">--</span> <span id="lstnumberx490.15" style="font-size:70%;">exact</span> <span id="lstnumberx490.17" style="font-size:70%;">tool</span> <span id="lstnumberx490.19" style="font-size:70%;">set</span> <span id="lstnumberx490.21" style="font-size:70%;">and</span> <span id="lstnumberx490.23" style="font-size:70%;">descriptions</span> </span><span id="lstnumberx491"><span id="lstnumberx491.1" style="font-size:70%;">13.</span><span id="lstnumberx491.3" style="font-size:70%;">"</span> <span id="lstnumberx491.4" style="font-size:70%;">claude</span> <span id="lstnumberx491.6" style="font-size:70%;">code</span> <span id="lstnumberx491.8" style="font-size:70%;">hooks</span> <span id="lstnumberx491.10" style="font-size:70%;">compaction</span> <span id="lstnumberx491.12" style="font-size:70%;">implementation</span> <span id="lstnumberx491.13" style="font-size:70%;">"</span> <span id="lstnumberx491.15" style="font-size:70%;">--</span> <span id="lstnumberx491.17" style="font-size:70%;">exact</span> <span id="lstnumberx491.19" style="font-size:70%;">hook</span> <span id="lstnumberx491.21" style="font-size:70%;">sequence</span><span id="lstnumberx491.22" style="font-size:70%;">,</span><span id="lstnumberx491.24" style="font-size:70%;">compaction</span> <span id="lstnumberx491.26" style="font-size:70%;">details</span> </span><span id="lstnumberx492"><span id="lstnumberx492.1" style="font-size:70%;">14.</span><span id="lstnumberx492.3" style="font-size:70%;">"</span> <span id="lstnumberx492.4" style="font-size:70%;">coding</span> <span id="lstnumberx492.6" style="font-size:70%;">agent</span> <span id="lstnumberx492.8" style="font-size:70%;">negative</span> <span id="lstnumberx492.10" style="font-size:70%;">results</span> <span id="lstnumberx492.12" style="font-size:70%;">failed</span> <span id="lstnumberx492.14" style="font-size:70%;">techniques</span> <span id="lstnumberx492.16" style="font-size:70%;">{{</span> <span id="lstnumberx492.18" style="font-size:70%;">date</span> <span id="lstnumberx492.19" style="font-size:70%;">[:4]</span> <span id="lstnumberx492.21" style="font-size:70%;">}}"</span> <span id="lstnumberx492.23" style="font-size:70%;">--</span> <span id="lstnumberx492.25" style="font-size:70%;">what</span> <span id="lstnumberx492.27" style="font-size:70%;">didn</span> <span id="lstnumberx492.28" style="font-size:70%;">'</span> <span id="lstnumberx492.29" style="font-size:70%;">t</span> <span id="lstnumberx492.31" style="font-size:70%;">work</span> <span id="lstnumberx492.33" style="font-size:70%;">and</span> <span id="lstnumberx492.35" style="font-size:70%;">why</span> </span><span id="lstnumberx494"><span id="lstnumberx494.1" style="font-size:70%;">For</span> <span id="lstnumberx494.3" style="font-size:70%;">each</span> <span id="lstnumberx494.5" style="font-size:70%;">search</span> <span id="lstnumberx494.7" style="font-size:70%;">result</span><span id="lstnumberx494.8" style="font-size:70%;">:</span></span> <span id="lstnumberx495"><span id="lstnumberx495.1" style="font-size:70%;">-</span> <span id="lstnumberx495.3" style="font-size:70%;">Skip</span> <span id="lstnumberx495.5" style="font-size:70%;">overview</span> <span id="lstnumberx495.6" style="font-size:70%;">/</span> <span id="lstnumberx495.7" style="font-size:70%;">summary</span> <span id="lstnumberx495.9" style="font-size:70%;">articles</span> <span id="lstnumberx495.11" style="font-size:70%;">--</span> <span id="lstnumberx495.13" style="font-size:70%;">look</span> <span id="lstnumberx495.15" style="font-size:70%;">for</span> <span id="lstnumberx495.17" style="font-size:70%;">blog</span> <span id="lstnumberx495.19" style="font-size:70%;">posts</span> <span id="lstnumberx495.21" style="font-size:70%;">with</span> <span id="lstnumberx495.23" style="font-size:70%;">code</span><span id="lstnumberx495.24" style="font-size:70%;">,</span><span id="lstnumberx495.26" style="font-size:70%;">configs</span><span id="lstnumberx495.27" style="font-size:70%;">,</span><span id="lstnumberx495.29" style="font-size:70%;">or</span> <span id="lstnumberx495.31" style="font-size:70%;">data</span> </span><span id="lstnumberx497"><span id="lstnumberx497.1" style="font-size:70%;">-</span> <span id="lstnumberx497.3" style="font-size:70%;">If</span> <span id="lstnumberx497.5" style="font-size:70%;">a</span> <span id="lstnumberx497.7" style="font-size:70%;">page</span> <span id="lstnumberx497.9" style="font-size:70%;">is</span> <span id="lstnumberx497.11" style="font-size:70%;">inaccessible</span><span id="lstnumberx497.12" style="font-size:70%;">,</span><span id="lstnumberx497.14" style="font-size:70%;">note</span> <span id="lstnumberx497.16" style="font-size:70%;">"</span> <span id="lstnumberx497.17" style="font-size:70%;">INACCESSIBLE</span><span id="lstnumberx497.18" style="font-size:70%;">:</span><span id="lstnumberx497.20" style="font-size:70%;">&lt;</span> <span id="lstnumberx497.21" style="font-size:70%;">url</span> <span id="lstnumberx497.22" style="font-size:70%;">&gt;"</span> <span id="lstnumberx497.24" style="font-size:70%;">and</span> <span id="lstnumberx497.26" style="font-size:70%;">move</span> <span id="lstnumberx497.28" style="font-size:70%;">on</span> </span><span id="lstnumberx499"><span id="lstnumberx499.1" style="font-size:70%;">**[</span><span id="lstnumberx499.2" style="font-size:70%;">L</span><span id="lstnumberx499.3" style="font-size:70%;">]</span> <span id="lstnumberx499.5" style="font-size:70%;">After</span> <span id="lstnumberx499.7" style="font-size:70%;">completing</span> <span id="lstnumberx499.9" style="font-size:70%;">research</span><span id="lstnumberx499.10" style="font-size:70%;">:</span><span id="lstnumberx499.12" style="font-size:70%;">UPDATE</span> <span id="lstnumberx499.14" style="font-size:70%;">the</span> <span id="lstnumberx499.16" style="font-size:70%;">skill</span> <span id="lstnumberx499.18" style="font-size:70%;">file</span> <span id="lstnumberx499.20" style="font-size:70%;">with</span> <span id="lstnumberx499.22" style="font-size:70%;">all</span> <span id="lstnumberx499.24" style="font-size:70%;">findings</span><span id="lstnumberx499.25" style="font-size:70%;">,</span><span id="lstnumberx499.27" style="font-size:70%;">then</span> <span id="lstnumberx499.29" style="font-size:70%;">call</span> <span id="lstnumberx499.31" style="font-size:70%;">complete_task</span><span id="lstnumberx499.32" style="font-size:70%;">.**</span> </span><span id="lstnumberx501"><span id="lstnumberx501.1" style="font-size:70%;">#</span> <span id="lstnumberx501.3" style="font-size:70%;">Skill</span> <span id="lstnumberx501.5" style="font-size:70%;">Output</span> <span id="lstnumberx501.7" style="font-size:70%;">Specification</span> </span><span id="lstnumberx503"><span id="lstnumberx503.1" style="font-size:70%;">##</span> <span id="lstnumberx503.3" style="font-size:70%;">`</span> <span id="lstnumberx503.4" style="font-size:70%;">coding</span> <span id="lstnumberx503.5" style="font-size:70%;">-</span> <span id="lstnumberx503.6" style="font-size:70%;">agent</span> <span id="lstnumberx503.7" style="font-size:70%;">-</span> <span id="lstnumberx503.8" style="font-size:70%;">sota</span> <span id="lstnumberx503.9" style="font-size:70%;">-</span> <span id="lstnumberx503.10" style="font-size:70%;">research</span> <span id="lstnumberx503.11" style="font-size:70%;">/</span> <span id="lstnumberx503.12" style="font-size:70%;">SKILL</span><span id="lstnumberx503.13" style="font-size:70%;">.</span><span id="lstnumberx503.14" style="font-size:70%;">md</span> <span id="lstnumberx503.15" style="font-size:70%;">`</span> </span><span id="lstnumberx505"><span id="lstnumberx505.1" style="font-size:70%;">Must</span> <span id="lstnumberx505.3" style="font-size:70%;">cover</span> <span id="lstnumberx505.5" style="font-size:70%;">the</span> <span id="lstnumberx505.7" style="font-size:70%;">following</span> <span id="lstnumberx505.9" style="font-size:70%;">--</span> <span id="lstnumberx505.11" style="font-size:70%;">with</span> <span id="lstnumberx505.13" style="font-size:70%;">BOTH</span> <span id="lstnumberx505.15" style="font-size:70%;">design</span> <span id="lstnumberx505.17" style="font-size:70%;">patterns</span> <span id="lstnumberx505.19" style="font-size:70%;">AND</span> <span id="lstnumberx505.21" style="font-size:70%;">exact</span> <span id="lstnumberx505.23" style="font-size:70%;">data</span><span id="lstnumberx505.24" style="font-size:70%;">:</span></span> <span id="lstnumberx507"><span id="lstnumberx507.1" style="font-size:70%;">###</span> <span id="lstnumberx507.3" style="font-size:70%;">Section</span> <span id="lstnumberx507.5" style="font-size:70%;">1.</span><span id="lstnumberx507.7" style="font-size:70%;">Leaderboard</span> <span id="lstnumberx507.9" style="font-size:70%;">Data</span> <span id="lstnumberx507.11" style="font-size:70%;">(</span><span id="lstnumberx507.12" style="font-size:70%;">exact</span> <span id="lstnumberx507.14" style="font-size:70%;">numbers</span> <span id="lstnumberx507.16" style="font-size:70%;">required</span><span id="lstnumberx507.17" style="font-size:70%;">)</span> </span><span id="lstnumberx509"><span id="lstnumberx509.1" style="font-size:70%;">For</span> <span id="lstnumberx509.3" style="font-size:70%;">each</span> <span id="lstnumberx509.5" style="font-size:70%;">top</span> <span id="lstnumberx509.7" style="font-size:70%;">agent</span> <span id="lstnumberx509.8" style="font-size:70%;">/</span> <span id="lstnumberx509.9" style="font-size:70%;">team</span> <span id="lstnumberx509.11" style="font-size:70%;">(</span><span id="lstnumberx509.12" style="font-size:70%;">aim</span> <span id="lstnumberx509.14" style="font-size:70%;">for</span> <span id="lstnumberx509.16" style="font-size:70%;">10+):</span></span> <span id="lstnumberx511"><span id="lstnumberx511.1" style="font-size:70%;">|</span> <span id="lstnumberx511.3" style="font-size:70%;">Agent</span> <span id="lstnumberx511.5" style="font-size:70%;">|</span> <span id="lstnumberx511.7" style="font-size:70%;">TB2</span> <span id="lstnumberx511.9" style="font-size:70%;">Score</span> <span id="lstnumberx511.11" style="font-size:70%;">|</span> <span id="lstnumberx511.13" style="font-size:70%;">Model</span> <span id="lstnumberx511.15" style="font-size:70%;">|</span> <span id="lstnumberx511.17" style="font-size:70%;">Max</span> <span id="lstnumberx511.19" style="font-size:70%;">Iterations</span> <span id="lstnumberx511.21" style="font-size:70%;">|</span> <span id="lstnumberx511.23" style="font-size:70%;">Context</span> <span id="lstnumberx511.25" style="font-size:70%;">Window</span> <span id="lstnumberx511.27" style="font-size:70%;">|</span> <span id="lstnumberx511.29" style="font-size:70%;">Date</span> <span id="lstnumberx511.31" style="font-size:70%;">|</span> <span id="lstnumberx511.33" style="font-size:70%;">Source</span> <span id="lstnumberx511.35" style="font-size:70%;">|</span> </span><span id="lstnumberx512"><span id="lstnumberx512.1" style="font-size:70%;">|-------|-----------|-------|----------------|----------------|------|--------|</span> </span><span id="lstnumberx513"><span id="lstnumberx513.1" style="font-size:70%;">|</span> <span id="lstnumberx513.3" style="font-size:70%;">deepagents</span> <span id="lstnumberx513.5" style="font-size:70%;">|</span> <span id="lstnumberx513.7" style="font-size:70%;">66.5%</span> <span id="lstnumberx513.9" style="font-size:70%;">|</span> <span id="lstnumberx513.11" style="font-size:70%;">GPT</span> <span id="lstnumberx513.12" style="font-size:70%;">-4.1</span> <span id="lstnumberx513.14" style="font-size:70%;">|</span><span id="lstnumberx513.16" style="font-size:70%;">???</span><span id="lstnumberx513.18" style="font-size:70%;">|</span><span id="lstnumberx513.20" style="font-size:70%;">???</span><span id="lstnumberx513.22" style="font-size:70%;">|</span> <span id="lstnumberx513.24" style="font-size:70%;">2025-</span> <span id="lstnumberx513.25" style="font-size:70%;">XX</span> <span id="lstnumberx513.27" style="font-size:70%;">|</span> <span id="lstnumberx513.29" style="font-size:70%;">URL</span> <span id="lstnumberx513.31" style="font-size:70%;">|</span> </span><span id="lstnumberx515"><span id="lstnumberx515.1" style="font-size:70%;">Also</span> <span id="lstnumberx515.3" style="font-size:70%;">include</span><span id="lstnumberx515.4" style="font-size:70%;">:</span><span id="lstnumberx515.6" style="font-size:70%;">score</span> <span id="lstnumberx515.8" style="font-size:70%;">progression</span> <span id="lstnumberx515.10" style="font-size:70%;">history</span><span id="lstnumberx515.11" style="font-size:70%;">,</span><span id="lstnumberx515.13" style="font-size:70%;">SWE</span> <span id="lstnumberx515.14" style="font-size:70%;">-</span> <span id="lstnumberx515.15" style="font-size:70%;">bench</span> <span id="lstnumberx515.17" style="font-size:70%;">scores</span> <span id="lstnumberx515.19" style="font-size:70%;">if</span> <span id="lstnumberx515.21" style="font-size:70%;">available</span><span id="lstnumberx515.22" style="font-size:70%;">.</span></span> <span id="lstnumberx517"><span id="lstnumberx517.1" style="font-size:70%;">###</span> <span id="lstnumberx517.3" style="font-size:70%;">Section</span> <span id="lstnumberx517.5" style="font-size:70%;">2.</span><span id="lstnumberx517.7" style="font-size:70%;">Concrete</span> <span id="lstnumberx517.9" style="font-size:70%;">Implementation</span> <span id="lstnumberx517.11" style="font-size:70%;">Details</span> <span id="lstnumberx517.13" style="font-size:70%;">(</span><span id="lstnumberx517.14" style="font-size:70%;">one</span> <span id="lstnumberx517.16" style="font-size:70%;">subsection</span> <span id="lstnumberx517.18" style="font-size:70%;">per</span> <span id="lstnumberx517.20" style="font-size:70%;">top</span> <span id="lstnumberx517.22" style="font-size:70%;">team</span><span id="lstnumberx517.23" style="font-size:70%;">)</span> </span><span id="lstnumberx519"><span id="lstnumberx519.1" style="font-size:70%;">For</span> <span id="lstnumberx519.3" style="font-size:70%;">EACH</span> <span id="lstnumberx519.5" style="font-size:70%;">top</span> <span id="lstnumberx519.7" style="font-size:70%;">team</span><span id="lstnumberx519.8" style="font-size:70%;">,</span><span id="lstnumberx519.10" style="font-size:70%;">document</span> <span id="lstnumberx519.12" style="font-size:70%;">SPECIFICS</span> <span id="lstnumberx519.14" style="font-size:70%;">(</span><span id="lstnumberx519.15" style="font-size:70%;">not</span> <span id="lstnumberx519.17" style="font-size:70%;">design</span> <span id="lstnumberx519.19" style="font-size:70%;">philosophy</span><span id="lstnumberx519.20" style="font-size:70%;">):</span></span> <span id="lstnumberx520"><span id="lstnumberx520.1" style="font-size:70%;">-</span> <span id="lstnumberx520.3" style="font-size:70%;">**</span> <span id="lstnumberx520.4" style="font-size:70%;">Exact</span> <span id="lstnumberx520.6" style="font-size:70%;">system</span> <span id="lstnumberx520.8" style="font-size:70%;">prompt</span> <span id="lstnumberx520.9" style="font-size:70%;">**</span> <span id="lstnumberx520.11" style="font-size:70%;">(</span><span id="lstnumberx520.12" style="font-size:70%;">copy</span> <span id="lstnumberx520.14" style="font-size:70%;">verbatim</span> <span id="lstnumberx520.16" style="font-size:70%;">if</span> <span id="lstnumberx520.18" style="font-size:70%;">available</span><span id="lstnumberx520.19" style="font-size:70%;">,</span><span id="lstnumberx520.21" style="font-size:70%;">or</span> <span id="lstnumberx520.23" style="font-size:70%;">quote</span> <span id="lstnumberx520.25" style="font-size:70%;">key</span> <span id="lstnumberx520.27" style="font-size:70%;">sections</span><span id="lstnumberx520.28" style="font-size:70%;">)</span> </span><span id="lstnumberx521"><span id="lstnumberx521.1" style="font-size:70%;">-</span> <span id="lstnumberx521.3" style="font-size:70%;">**</span> <span id="lstnumberx521.4" style="font-size:70%;">Exact</span> <span id="lstnumberx521.6" style="font-size:70%;">tool</span> <span id="lstnumberx521.8" style="font-size:70%;">definitions</span> <span id="lstnumberx521.9" style="font-size:70%;">**</span> <span id="lstnumberx521.11" style="font-size:70%;">(</span><span id="lstnumberx521.12" style="font-size:70%;">tool</span> <span id="lstnumberx521.14" style="font-size:70%;">names</span><span id="lstnumberx521.15" style="font-size:70%;">,</span><span id="lstnumberx521.17" style="font-size:70%;">parameter</span> <span id="lstnumberx521.19" style="font-size:70%;">schemas</span><span id="lstnumberx521.20" style="font-size:70%;">,</span><span id="lstnumberx521.22" style="font-size:70%;">description</span> <span id="lstnumberx521.24" style="font-size:70%;">text</span><span id="lstnumberx521.25" style="font-size:70%;">)</span> </span><span id="lstnumberx522"><span id="lstnumberx522.1" style="font-size:70%;">-</span> <span id="lstnumberx522.3" style="font-size:70%;">**</span> <span id="lstnumberx522.4" style="font-size:70%;">Exact</span> <span id="lstnumberx522.6" style="font-size:70%;">middleware</span> <span id="lstnumberx522.8" style="font-size:70%;">configs</span> <span id="lstnumberx522.9" style="font-size:70%;">**</span> <span id="lstnumberx522.11" style="font-size:70%;">(</span><span id="lstnumberx522.12" style="font-size:70%;">param</span> <span id="lstnumberx522.14" style="font-size:70%;">values</span><span id="lstnumberx522.15" style="font-size:70%;">:</span><span id="lstnumberx522.17" style="font-size:70%;">max_iterations</span> <span id="lstnumberx522.18" style="font-size:70%;">=300,</span><span id="lstnumberx522.20" style="font-size:70%;">threshold</span> <span id="lstnumberx522.21" style="font-size:70%;">=0.75,</span><span id="lstnumberx522.23" style="font-size:70%;">etc</span><span id="lstnumberx522.24" style="font-size:70%;">.)</span> </span><span id="lstnumberx523"><span id="lstnumberx523.1" style="font-size:70%;">-</span> <span id="lstnumberx523.3" style="font-size:70%;">**</span> <span id="lstnumberx523.4" style="font-size:70%;">Exact</span> <span id="lstnumberx523.6" style="font-size:70%;">compaction</span> <span id="lstnumberx523.8" style="font-size:70%;">algorithm</span> <span id="lstnumberx523.9" style="font-size:70%;">**</span> <span id="lstnumberx523.11" style="font-size:70%;">(</span><span id="lstnumberx523.12" style="font-size:70%;">e</span><span id="lstnumberx523.13" style="font-size:70%;">.</span><span id="lstnumberx523.14" style="font-size:70%;">g</span><span id="lstnumberx523.15" style="font-size:70%;">.,</span><span id="lstnumberx523.17" style="font-size:70%;">"</span> <span id="lstnumberx523.18" style="font-size:70%;">keeps</span> <span id="lstnumberx523.20" style="font-size:70%;">last</span> <span id="lstnumberx523.22" style="font-size:70%;">15</span> <span id="lstnumberx523.24" style="font-size:70%;">messages</span> <span id="lstnumberx523.26" style="font-size:70%;">as</span> <span id="lstnumberx523.27" style="font-size:70%;">-</span> <span id="lstnumberx523.28" style="font-size:70%;">is</span><span id="lstnumberx523.29" style="font-size:70%;">,</span><span id="lstnumberx523.31" style="font-size:70%;">summarizes</span> <span id="lstnumberx523.33" style="font-size:70%;">messages</span> <span id="lstnumberx523.35" style="font-size:70%;">0-</span> <span id="lstnumberx523.36" style="font-size:70%;">N</span> <span id="lstnumberx523.38" style="font-size:70%;">into</span> <span id="lstnumberx523.40" style="font-size:70%;">a</span> <span id="lstnumberx523.42" style="font-size:70%;">single</span> <span id="lstnumberx523.44" style="font-size:70%;">message</span> <span id="lstnumberx523.46" style="font-size:70%;">using</span> <span id="lstnumberx523.48" style="font-size:70%;">prompt</span><span id="lstnumberx523.49" style="font-size:70%;">:</span><span id="lstnumberx523.51" style="font-size:70%;">'...'")</span> </span><span id="lstnumberx524"><span id="lstnumberx524.1" style="font-size:70%;">-</span> <span id="lstnumberx524.3" style="font-size:70%;">**</span> <span id="lstnumberx524.4" style="font-size:70%;">Exact</span> <span id="lstnumberx524.6" style="font-size:70%;">retry</span> <span id="lstnumberx524.8" style="font-size:70%;">logic</span> <span id="lstnumberx524.9" style="font-size:70%;">**</span> <span id="lstnumberx524.11" style="font-size:70%;">(</span><span id="lstnumberx524.12" style="font-size:70%;">e</span><span id="lstnumberx524.13" style="font-size:70%;">.</span><span id="lstnumberx524.14" style="font-size:70%;">g</span><span id="lstnumberx524.15" style="font-size:70%;">.,</span><span id="lstnumberx524.17" style="font-size:70%;">"</span> <span id="lstnumberx524.18" style="font-size:70%;">retries</span> <span id="lstnumberx524.20" style="font-size:70%;">3</span> <span id="lstnumberx524.22" style="font-size:70%;">times</span> <span id="lstnumberx524.24" style="font-size:70%;">with</span> <span id="lstnumberx524.26" style="font-size:70%;">2</span> <span id="lstnumberx524.27" style="font-size:70%;">s</span> <span id="lstnumberx524.28" style="font-size:70%;">/4</span> <span id="lstnumberx524.29" style="font-size:70%;">s</span> <span id="lstnumberx524.30" style="font-size:70%;">/8</span> <span id="lstnumberx524.31" style="font-size:70%;">s</span> <span id="lstnumberx524.33" style="font-size:70%;">backoff</span> <span id="lstnumberx524.35" style="font-size:70%;">on</span> <span id="lstnumberx524.37" style="font-size:70%;">status</span> <span id="lstnumberx524.39" style="font-size:70%;">429,</span><span id="lstnumberx524.41" style="font-size:70%;">500,</span><span id="lstnumberx524.43" style="font-size:70%;">502")</span> </span><span id="lstnumberx525"><span id="lstnumberx525.1" style="font-size:70%;">-</span> <span id="lstnumberx525.3" style="font-size:70%;">**</span> <span id="lstnumberx525.4" style="font-size:70%;">Exact</span> <span id="lstnumberx525.6" style="font-size:70%;">loop</span> <span id="lstnumberx525.8" style="font-size:70%;">detection</span> <span id="lstnumberx525.9" style="font-size:70%;">**</span> <span id="lstnumberx525.11" style="font-size:70%;">(</span><span id="lstnumberx525.12" style="font-size:70%;">e</span><span id="lstnumberx525.13" style="font-size:70%;">.</span><span id="lstnumberx525.14" style="font-size:70%;">g</span><span id="lstnumberx525.15" style="font-size:70%;">.,</span><span id="lstnumberx525.17" style="font-size:70%;">"</span> <span id="lstnumberx525.18" style="font-size:70%;">tracks</span> <span id="lstnumberx525.20" style="font-size:70%;">{</span> <span id="lstnumberx525.21" style="font-size:70%;">tool_name</span> <span id="lstnumberx525.23" style="font-size:70%;">+</span> <span id="lstnumberx525.25" style="font-size:70%;">first_arg</span><span id="lstnumberx525.26" style="font-size:70%;">:</span><span id="lstnumberx525.28" style="font-size:70%;">count</span> <span id="lstnumberx525.29" style="font-size:70%;">},</span><span id="lstnumberx525.31" style="font-size:70%;">injects</span> <span id="lstnumberx525.33" style="font-size:70%;">warning</span> <span id="lstnumberx525.35" style="font-size:70%;">at</span> <span id="lstnumberx525.37" style="font-size:70%;">count</span> <span id="lstnumberx525.38" style="font-size:70%;">=4")</span> </span><span id="lstnumberx526"><span id="lstnumberx526.1" style="font-size:70%;">-</span> <span id="lstnumberx526.3" style="font-size:70%;">**</span> <span id="lstnumberx526.4" style="font-size:70%;">Exact</span> <span id="lstnumberx526.6" style="font-size:70%;">pre</span> <span id="lstnumberx526.7" style="font-size:70%;">-</span> <span id="lstnumberx526.8" style="font-size:70%;">completion</span> <span id="lstnumberx526.10" style="font-size:70%;">check</span> <span id="lstnumberx526.11" style="font-size:70%;">**</span> <span id="lstnumberx526.13" style="font-size:70%;">(</span><span id="lstnumberx526.14" style="font-size:70%;">e</span><span id="lstnumberx526.15" style="font-size:70%;">.</span><span id="lstnumberx526.16" style="font-size:70%;">g</span><span id="lstnumberx526.17" style="font-size:70%;">.,</span><span id="lstnumberx526.19" style="font-size:70%;">"</span> <span id="lstnumberx526.20" style="font-size:70%;">intercepts</span> <span id="lstnumberx526.22" style="font-size:70%;">complete_task</span><span id="lstnumberx526.23" style="font-size:70%;">,</span><span id="lstnumberx526.25" style="font-size:70%;">injects</span> <span id="lstnumberx526.27" style="font-size:70%;">message</span><span id="lstnumberx526.28" style="font-size:70%;">:</span><span id="lstnumberx526.30" style="font-size:70%;">'</span> <span id="lstnumberx526.31" style="font-size:70%;">Before</span> <span id="lstnumberx526.33" style="font-size:70%;">completing</span><span id="lstnumberx526.34" style="font-size:70%;">,</span><span id="lstnumberx526.36" style="font-size:70%;">verify</span><span id="lstnumberx526.37" style="font-size:70%;">:</span><span id="lstnumberx526.39" style="font-size:70%;">(1)...</span><span id="lstnumberx526.41" style="font-size:70%;">(2)...</span><span id="lstnumberx526.43" style="font-size:70%;">(3)...'")</span> </span><span id="lstnumberx528"><span id="lstnumberx528.1" style="font-size:70%;">###</span> <span id="lstnumberx528.3" style="font-size:70%;">Section</span> <span id="lstnumberx528.5" style="font-size:70%;">3.</span><span id="lstnumberx528.7" style="font-size:70%;">Technique</span> <span id="lstnumberx528.9" style="font-size:70%;">Ablation</span> <span id="lstnumberx528.11" style="font-size:70%;">Data</span> <span id="lstnumberx528.13" style="font-size:70%;">(</span><span id="lstnumberx528.14" style="font-size:70%;">measured</span> <span id="lstnumberx528.16" style="font-size:70%;">impact</span> <span id="lstnumberx528.18" style="font-size:70%;">required</span><span id="lstnumberx528.19" style="font-size:70%;">)</span> </span><span id="lstnumberx530"><span id="lstnumberx530.1" style="font-size:70%;">For</span> <span id="lstnumberx530.3" style="font-size:70%;">each</span> <span id="lstnumberx530.5" style="font-size:70%;">technique</span><span id="lstnumberx530.6" style="font-size:70%;">,</span><span id="lstnumberx530.8" style="font-size:70%;">document</span> <span id="lstnumberx530.10" style="font-size:70%;">the</span> <span id="lstnumberx530.12" style="font-size:70%;">MEASURED</span> <span id="lstnumberx530.14" style="font-size:70%;">impact</span><span id="lstnumberx530.15" style="font-size:70%;">:</span></span> <span id="lstnumberx532"><span id="lstnumberx532.1" style="font-size:70%;">|</span> <span id="lstnumberx532.3" style="font-size:70%;">Technique</span> <span id="lstnumberx532.5" style="font-size:70%;">|</span> <span id="lstnumberx532.7" style="font-size:70%;">Team</span> <span id="lstnumberx532.9" style="font-size:70%;">|</span> <span id="lstnumberx532.11" style="font-size:70%;">Impact</span> <span id="lstnumberx532.13" style="font-size:70%;">|</span> <span id="lstnumberx532.15" style="font-size:70%;">Baseline</span> <span id="lstnumberx532.17" style="font-size:70%;">|</span> <span id="lstnumberx532.19" style="font-size:70%;">With</span> <span id="lstnumberx532.21" style="font-size:70%;">Technique</span> <span id="lstnumberx532.23" style="font-size:70%;">|</span> <span id="lstnumberx532.25" style="font-size:70%;">Source</span> <span id="lstnumberx532.27" style="font-size:70%;">|</span> </span><span id="lstnumberx533"><span id="lstnumberx533.1" style="font-size:70%;">|-----------|------|--------|----------|----------------|--------|</span> </span><span id="lstnumberx534"><span id="lstnumberx534.1" style="font-size:70%;">|</span> <span id="lstnumberx534.3" style="font-size:70%;">Pre</span> <span id="lstnumberx534.4" style="font-size:70%;">-</span> <span id="lstnumberx534.5" style="font-size:70%;">completion</span> <span id="lstnumberx534.7" style="font-size:70%;">checklist</span> <span id="lstnumberx534.9" style="font-size:70%;">|</span> <span id="lstnumberx534.11" style="font-size:70%;">LangChain</span> <span id="lstnumberx534.13" style="font-size:70%;">|</span> <span id="lstnumberx534.15" style="font-size:70%;">+</span> <span id="lstnumberx534.16" style="font-size:70%;">X</span><span id="lstnumberx534.17" style="font-size:70%;">.</span><span id="lstnumberx534.18" style="font-size:70%;">X</span> <span id="lstnumberx534.19" style="font-size:70%;">%</span> <span id="lstnumberx534.21" style="font-size:70%;">|</span><span id="lstnumberx534.23" style="font-size:70%;">??%</span> <span id="lstnumberx534.25" style="font-size:70%;">|</span><span id="lstnumberx534.27" style="font-size:70%;">??%</span> <span id="lstnumberx534.29" style="font-size:70%;">|</span> <span id="lstnumberx534.31" style="font-size:70%;">URL</span> <span id="lstnumberx534.33" style="font-size:70%;">|</span> </span><span id="lstnumberx535"><span id="lstnumberx535.1" style="font-size:70%;">|</span> <span id="lstnumberx535.3" style="font-size:70%;">Loop</span> <span id="lstnumberx535.5" style="font-size:70%;">detection</span> <span id="lstnumberx535.7" style="font-size:70%;">|</span> <span id="lstnumberx535.9" style="font-size:70%;">LangChain</span> <span id="lstnumberx535.11" style="font-size:70%;">|</span> <span id="lstnumberx535.13" style="font-size:70%;">+</span> <span id="lstnumberx535.14" style="font-size:70%;">X</span><span id="lstnumberx535.15" style="font-size:70%;">.</span><span id="lstnumberx535.16" style="font-size:70%;">X</span> <span id="lstnumberx535.17" style="font-size:70%;">%</span> <span id="lstnumberx535.19" style="font-size:70%;">|</span><span id="lstnumberx535.21" style="font-size:70%;">??%</span> <span id="lstnumberx535.23" style="font-size:70%;">|</span><span id="lstnumberx535.25" style="font-size:70%;">??%</span> <span id="lstnumberx535.27" style="font-size:70%;">|</span> <span id="lstnumberx535.29" style="font-size:70%;">URL</span> <span id="lstnumberx535.31" style="font-size:70%;">|</span> </span><span id="lstnumberx536"><span id="lstnumberx536.1" style="font-size:70%;">|</span> <span id="lstnumberx536.3" style="font-size:70%;">Context</span> <span id="lstnumberx536.5" style="font-size:70%;">compaction</span> <span id="lstnumberx536.7" style="font-size:70%;">|</span><span id="lstnumberx536.9" style="font-size:70%;">???</span><span id="lstnumberx536.11" style="font-size:70%;">|</span> <span id="lstnumberx536.13" style="font-size:70%;">+</span> <span id="lstnumberx536.14" style="font-size:70%;">X</span><span id="lstnumberx536.15" style="font-size:70%;">.</span><span id="lstnumberx536.16" style="font-size:70%;">X</span> <span id="lstnumberx536.17" style="font-size:70%;">%</span> <span id="lstnumberx536.19" style="font-size:70%;">|</span><span id="lstnumberx536.21" style="font-size:70%;">??%</span> <span id="lstnumberx536.23" style="font-size:70%;">|</span><span id="lstnumberx536.25" style="font-size:70%;">??%</span> <span id="lstnumberx536.27" style="font-size:70%;">|</span> <span id="lstnumberx536.29" style="font-size:70%;">URL</span> <span id="lstnumberx536.31" style="font-size:70%;">|</span> </span><span id="lstnumberx538"><span id="lstnumberx538.1" style="font-size:70%;">If</span> <span id="lstnumberx538.3" style="font-size:70%;">exact</span> <span id="lstnumberx538.5" style="font-size:70%;">ablation</span> <span id="lstnumberx538.7" style="font-size:70%;">numbers</span> <span id="lstnumberx538.9" style="font-size:70%;">aren</span> <span id="lstnumberx538.10" style="font-size:70%;">'</span> <span id="lstnumberx538.11" style="font-size:70%;">t</span> <span id="lstnumberx538.13" style="font-size:70%;">available</span><span id="lstnumberx538.14" style="font-size:70%;">,</span><span id="lstnumberx538.16" style="font-size:70%;">note</span> <span id="lstnumberx538.18" style="font-size:70%;">"</span> <span id="lstnumberx538.19" style="font-size:70%;">NO</span> <span id="lstnumberx538.21" style="font-size:70%;">ABLATION</span> <span id="lstnumberx538.23" style="font-size:70%;">DATA</span> <span id="lstnumberx538.24" style="font-size:70%;">"</span> <span id="lstnumberx538.26" style="font-size:70%;">and</span> <span id="lstnumberx538.28" style="font-size:70%;">provide</span> <span id="lstnumberx538.30" style="font-size:70%;">the</span> <span id="lstnumberx538.32" style="font-size:70%;">team</span> <span id="lstnumberx538.33" style="font-size:70%;">'</span> <span id="lstnumberx538.34" style="font-size:70%;">s</span> <span id="lstnumberx538.36" style="font-size:70%;">qualitative</span> <span id="lstnumberx538.38" style="font-size:70%;">assessment</span><span id="lstnumberx538.39" style="font-size:70%;">.</span></span> <span id="lstnumberx540"><span id="lstnumberx540.1" style="font-size:70%;">###</span> <span id="lstnumberx540.3" style="font-size:70%;">Section</span> <span id="lstnumberx540.5" style="font-size:70%;">4.</span><span id="lstnumberx540.7" style="font-size:70%;">Actual</span> <span id="lstnumberx540.9" style="font-size:70%;">Code</span> <span id="lstnumberx540.11" style="font-size:70%;">&amp;</span> <span id="lstnumberx540.13" style="font-size:70%;">Config</span> <span id="lstnumberx540.15" style="font-size:70%;">Examples</span> </span><span id="lstnumberx542"><span id="lstnumberx542.1" style="font-size:70%;">Collect</span> <span id="lstnumberx542.3" style="font-size:70%;">REAL</span> <span id="lstnumberx542.5" style="font-size:70%;">code</span> <span id="lstnumberx542.7" style="font-size:70%;">and</span> <span id="lstnumberx542.9" style="font-size:70%;">config</span> <span id="lstnumberx542.11" style="font-size:70%;">from</span> <span id="lstnumberx542.13" style="font-size:70%;">open</span> <span id="lstnumberx542.14" style="font-size:70%;">-</span> <span id="lstnumberx542.15" style="font-size:70%;">source</span> <span id="lstnumberx542.17" style="font-size:70%;">agents</span><span id="lstnumberx542.18" style="font-size:70%;">:</span></span> <span id="lstnumberx543"><span id="lstnumberx543.1" style="font-size:70%;">-</span> <span id="lstnumberx543.3" style="font-size:70%;">System</span> <span id="lstnumberx543.5" style="font-size:70%;">prompt</span> <span id="lstnumberx543.7" style="font-size:70%;">text</span> <span id="lstnumberx543.9" style="font-size:70%;">(</span><span id="lstnumberx543.10" style="font-size:70%;">verbatim</span> <span id="lstnumberx543.12" style="font-size:70%;">quotes</span><span id="lstnumberx543.13" style="font-size:70%;">,</span><span id="lstnumberx543.15" style="font-size:70%;">as</span> <span id="lstnumberx543.17" style="font-size:70%;">long</span> <span id="lstnumberx543.19" style="font-size:70%;">as</span> <span id="lstnumberx543.21" style="font-size:70%;">needed</span><span id="lstnumberx543.22" style="font-size:70%;">)</span> </span><span id="lstnumberx544"><span id="lstnumberx544.1" style="font-size:70%;">-</span> <span id="lstnumberx544.3" style="font-size:70%;">Middleware</span> <span id="lstnumberx544.5" style="font-size:70%;">implementations</span> <span id="lstnumberx544.7" style="font-size:70%;">(</span><span id="lstnumberx544.8" style="font-size:70%;">actual</span> <span id="lstnumberx544.10" style="font-size:70%;">Python</span> <span id="lstnumberx544.12" style="font-size:70%;">code</span><span id="lstnumberx544.13" style="font-size:70%;">)</span> </span><span id="lstnumberx545"><span id="lstnumberx545.1" style="font-size:70%;">-</span> <span id="lstnumberx545.3" style="font-size:70%;">Tool</span> <span id="lstnumberx545.5" style="font-size:70%;">YAML</span> <span id="lstnumberx545.7" style="font-size:70%;">definitions</span> <span id="lstnumberx545.9" style="font-size:70%;">(</span><span id="lstnumberx545.10" style="font-size:70%;">actual</span> <span id="lstnumberx545.12" style="font-size:70%;">schemas</span><span id="lstnumberx545.13" style="font-size:70%;">)</span> </span><span id="lstnumberx546"><span id="lstnumberx546.1" style="font-size:70%;">-</span> <span id="lstnumberx546.3" style="font-size:70%;">Agent</span> <span id="lstnumberx546.5" style="font-size:70%;">config</span> <span id="lstnumberx546.7" style="font-size:70%;">files</span> <span id="lstnumberx546.9" style="font-size:70%;">(</span><span id="lstnumberx546.10" style="font-size:70%;">actual</span> <span id="lstnumberx546.12" style="font-size:70%;">YAML</span><span id="lstnumberx546.13" style="font-size:70%;">)</span> </span><span id="lstnumberx548"><span id="lstnumberx548.1" style="font-size:70%;">###</span> <span id="lstnumberx548.3" style="font-size:70%;">Section</span> <span id="lstnumberx548.5" style="font-size:70%;">5.</span><span id="lstnumberx548.7" style="font-size:70%;">Negative</span> <span id="lstnumberx548.9" style="font-size:70%;">Results</span> <span id="lstnumberx548.11" style="font-size:70%;">&amp;</span> <span id="lstnumberx548.13" style="font-size:70%;">Failed</span> <span id="lstnumberx548.15" style="font-size:70%;">Techniques</span> </span><span id="lstnumberx550"><span id="lstnumberx550.1" style="font-size:70%;">What</span> <span id="lstnumberx550.3" style="font-size:70%;">did</span> <span id="lstnumberx550.5" style="font-size:70%;">top</span> <span id="lstnumberx550.7" style="font-size:70%;">teams</span> <span id="lstnumberx550.9" style="font-size:70%;">try</span> <span id="lstnumberx550.11" style="font-size:70%;">that</span> <span id="lstnumberx550.13" style="font-size:70%;">DIDN</span> <span id="lstnumberx550.14" style="font-size:70%;">'</span> <span id="lstnumberx550.15" style="font-size:70%;">T</span> <span id="lstnumberx550.17" style="font-size:70%;">work</span><span id="lstnumberx550.18" style="font-size:70%;">?</span></span> <span id="lstnumberx551"><span id="lstnumberx551.1" style="font-size:70%;">-</span> <span id="lstnumberx551.3" style="font-size:70%;">Techniques</span> <span id="lstnumberx551.5" style="font-size:70%;">that</span> <span id="lstnumberx551.7" style="font-size:70%;">were</span> <span id="lstnumberx551.9" style="font-size:70%;">attempted</span> <span id="lstnumberx551.11" style="font-size:70%;">and</span> <span id="lstnumberx551.13" style="font-size:70%;">rolled</span> <span id="lstnumberx551.15" style="font-size:70%;">back</span> </span><span id="lstnumberx552"><span id="lstnumberx552.1" style="font-size:70%;">-</span> <span id="lstnumberx552.3" style="font-size:70%;">Ablations</span> <span id="lstnumberx552.5" style="font-size:70%;">showing</span> <span id="lstnumberx552.7" style="font-size:70%;">certain</span> <span id="lstnumberx552.9" style="font-size:70%;">changes</span> <span id="lstnumberx552.11" style="font-size:70%;">hurt</span> <span id="lstnumberx552.13" style="font-size:70%;">performance</span> </span><span id="lstnumberx553"><span id="lstnumberx553.1" style="font-size:70%;">-</span> <span id="lstnumberx553.3" style="font-size:70%;">Common</span> <span id="lstnumberx553.5" style="font-size:70%;">pitfalls</span> <span id="lstnumberx553.7" style="font-size:70%;">documented</span> <span id="lstnumberx553.9" style="font-size:70%;">by</span> <span id="lstnumberx553.11" style="font-size:70%;">teams</span> </span><span id="lstnumberx555"><span id="lstnumberx555.1" style="font-size:70%;">###</span> <span id="lstnumberx555.3" style="font-size:70%;">Section</span> <span id="lstnumberx555.5" style="font-size:70%;">6.</span><span id="lstnumberx555.7" style="font-size:70%;">Architecture</span> <span id="lstnumberx555.9" style="font-size:70%;">Patterns</span> <span id="lstnumberx555.11" style="font-size:70%;">&amp;</span> <span id="lstnumberx555.13" style="font-size:70%;">Design</span> <span id="lstnumberx555.15" style="font-size:70%;">Principles</span> </span><span id="lstnumberx557"><span id="lstnumberx557.1" style="font-size:70%;">Synthesize</span> <span id="lstnumberx557.3" style="font-size:70%;">the</span> <span id="lstnumberx557.5" style="font-size:70%;">common</span> <span id="lstnumberx557.7" style="font-size:70%;">patterns</span> <span id="lstnumberx557.9" style="font-size:70%;">across</span> <span id="lstnumberx557.11" style="font-size:70%;">top</span> <span id="lstnumberx557.13" style="font-size:70%;">teams</span><span id="lstnumberx557.14" style="font-size:70%;">:</span></span> <span id="lstnumberx558"><span id="lstnumberx558.1" style="font-size:70%;">-</span> <span id="lstnumberx558.3" style="font-size:70%;">**</span> <span id="lstnumberx558.4" style="font-size:70%;">Component</span> <span id="lstnumberx558.6" style="font-size:70%;">blueprint</span> <span id="lstnumberx558.7" style="font-size:70%;">**:</span><span id="lstnumberx558.9" style="font-size:70%;">What</span> <span id="lstnumberx558.11" style="font-size:70%;">categories</span> <span id="lstnumberx558.13" style="font-size:70%;">of</span> <span id="lstnumberx558.15" style="font-size:70%;">components</span> <span id="lstnumberx558.17" style="font-size:70%;">do</span> <span id="lstnumberx558.19" style="font-size:70%;">top</span> <span id="lstnumberx558.21" style="font-size:70%;">agents</span> <span id="lstnumberx558.23" style="font-size:70%;">have</span><span id="lstnumberx558.24" style="font-size:70%;">?</span></span> <span id="lstnumberx559"><span id="lstnumberx559.1" style="font-size:70%;">-</span> <span id="lstnumberx559.3" style="font-size:70%;">**</span> <span id="lstnumberx559.4" style="font-size:70%;">Constraint</span> <span id="lstnumberx559.6" style="font-size:70%;">hierarchy</span> <span id="lstnumberx559.7" style="font-size:70%;">**:</span><span id="lstnumberx559.9" style="font-size:70%;">Which</span> <span id="lstnumberx559.11" style="font-size:70%;">enforcement</span> <span id="lstnumberx559.13" style="font-size:70%;">mechanisms</span> <span id="lstnumberx559.15" style="font-size:70%;">are</span> <span id="lstnumberx559.17" style="font-size:70%;">strongest</span><span id="lstnumberx559.18" style="font-size:70%;">?</span><span id="lstnumberx559.20" style="font-size:70%;">(</span><span id="lstnumberx559.21" style="font-size:70%;">e</span><span id="lstnumberx559.22" style="font-size:70%;">.</span><span id="lstnumberx559.23" style="font-size:70%;">g</span><span id="lstnumberx559.24" style="font-size:70%;">.,</span><span id="lstnumberx559.26" style="font-size:70%;">tool_impl</span> <span id="lstnumberx559.28" style="font-size:70%;">&gt;</span> <span id="lstnumberx559.30" style="font-size:70%;">middleware</span> <span id="lstnumberx559.32" style="font-size:70%;">&gt;</span> <span id="lstnumberx559.34" style="font-size:70%;">tool_desc</span> <span id="lstnumberx559.36" style="font-size:70%;">&gt;</span> <span id="lstnumberx559.38" style="font-size:70%;">skill</span> <span id="lstnumberx559.40" style="font-size:70%;">&gt;</span> <span id="lstnumberx559.42" style="font-size:70%;">system_prompt</span><span id="lstnumberx559.43" style="font-size:70%;">)</span> </span><span id="lstnumberx560"><span id="lstnumberx560.1" style="font-size:70%;">-</span> <span id="lstnumberx560.3" style="font-size:70%;">**</span> <span id="lstnumberx560.4" style="font-size:70%;">Gap</span> <span id="lstnumberx560.6" style="font-size:70%;">analysis</span> <span id="lstnumberx560.7" style="font-size:70%;">**:</span><span id="lstnumberx560.9" style="font-size:70%;">How</span> <span id="lstnumberx560.11" style="font-size:70%;">to</span> <span id="lstnumberx560.13" style="font-size:70%;">identify</span> <span id="lstnumberx560.15" style="font-size:70%;">what</span> <span id="lstnumberx560.16" style="font-size:70%;">'</span> <span id="lstnumberx560.17" style="font-size:70%;">s</span> <span id="lstnumberx560.19" style="font-size:70%;">missing</span> <span id="lstnumberx560.21" style="font-size:70%;">in</span> <span id="lstnumberx560.23" style="font-size:70%;">an</span> <span id="lstnumberx560.25" style="font-size:70%;">agent</span> <span id="lstnumberx560.27" style="font-size:70%;">harness</span> <span id="lstnumberx560.29" style="font-size:70%;">--</span> <span id="lstnumberx560.31" style="font-size:70%;">map</span> <span id="lstnumberx560.33" style="font-size:70%;">failure</span> <span id="lstnumberx560.35" style="font-size:70%;">patterns</span> <span id="lstnumberx560.37" style="font-size:70%;">to</span> <span id="lstnumberx560.39" style="font-size:70%;">component</span> <span id="lstnumberx560.41" style="font-size:70%;">categories</span><span id="lstnumberx560.42" style="font-size:70%;">,</span><span id="lstnumberx560.44" style="font-size:70%;">classify</span> <span id="lstnumberx560.46" style="font-size:70%;">as</span> <span id="lstnumberx560.48" style="font-size:70%;">PATCH</span> <span id="lstnumberx560.50" style="font-size:70%;">vs</span> <span id="lstnumberx560.52" style="font-size:70%;">CREATE</span><span id="lstnumberx560.53" style="font-size:70%;">.</span></span> <span id="lstnumberx563"><span id="lstnumberx563.1" style="font-size:70%;">###</span> <span id="lstnumberx563.3" style="font-size:70%;">Section</span> <span id="lstnumberx563.5" style="font-size:70%;">7.</span><span id="lstnumberx563.7" style="font-size:70%;">Actionable</span> <span id="lstnumberx563.9" style="font-size:70%;">Recommendations</span> <span id="lstnumberx563.11" style="font-size:70%;">(</span><span id="lstnumberx563.12" style="font-size:70%;">with</span> <span id="lstnumberx563.14" style="font-size:70%;">implementation</span> <span id="lstnumberx563.16" style="font-size:70%;">specifics</span><span id="lstnumberx563.17" style="font-size:70%;">)</span> </span><span id="lstnumberx565"><span id="lstnumberx565.1" style="font-size:70%;">Top</span> <span id="lstnumberx565.3" style="font-size:70%;">10</span> <span id="lstnumberx565.5" style="font-size:70%;">concrete</span> <span id="lstnumberx565.7" style="font-size:70%;">improvements</span><span id="lstnumberx565.8" style="font-size:70%;">,</span><span id="lstnumberx565.10" style="font-size:70%;">each</span> <span id="lstnumberx565.12" style="font-size:70%;">with</span><span id="lstnumberx565.13" style="font-size:70%;">:</span></span> <span id="lstnumberx566"><span id="lstnumberx566.1" style="font-size:70%;">-</span> <span id="lstnumberx566.3" style="font-size:70%;">**</span> <span id="lstnumberx566.4" style="font-size:70%;">What</span> <span id="lstnumberx566.5" style="font-size:70%;">**:</span><span id="lstnumberx566.7" style="font-size:70%;">Exact</span> <span id="lstnumberx566.9" style="font-size:70%;">description</span> <span id="lstnumberx566.11" style="font-size:70%;">of</span> <span id="lstnumberx566.13" style="font-size:70%;">the</span> <span id="lstnumberx566.15" style="font-size:70%;">change</span> </span><span id="lstnumberx567"><span id="lstnumberx567.1" style="font-size:70%;">-</span> <span id="lstnumberx567.3" style="font-size:70%;">**</span> <span id="lstnumberx567.4" style="font-size:70%;">Why</span> <span id="lstnumberx567.5" style="font-size:70%;">**:</span><span id="lstnumberx567.7" style="font-size:70%;">Evidence</span> <span id="lstnumberx567.9" style="font-size:70%;">from</span> <span id="lstnumberx567.11" style="font-size:70%;">research</span> <span id="lstnumberx567.13" style="font-size:70%;">(</span><span id="lstnumberx567.14" style="font-size:70%;">cite</span> <span id="lstnumberx567.16" style="font-size:70%;">specific</span> <span id="lstnumberx567.18" style="font-size:70%;">scores</span> <span id="lstnumberx567.19" style="font-size:70%;">/</span> <span id="lstnumberx567.20" style="font-size:70%;">ablations</span><span id="lstnumberx567.21" style="font-size:70%;">)</span> </span><span id="lstnumberx568"><span id="lstnumberx568.1" style="font-size:70%;">-</span> <span id="lstnumberx568.3" style="font-size:70%;">**</span> <span id="lstnumberx568.4" style="font-size:70%;">How</span> <span id="lstnumberx568.6" style="font-size:70%;">(</span><span id="lstnumberx568.7" style="font-size:70%;">in</span> <span id="lstnumberx568.9" style="font-size:70%;">NexAU</span><span id="lstnumberx568.10" style="font-size:70%;">)**:</span><span id="lstnumberx568.12" style="font-size:70%;">Which</span> <span id="lstnumberx568.14" style="font-size:70%;">file</span> <span id="lstnumberx568.16" style="font-size:70%;">to</span> <span id="lstnumberx568.18" style="font-size:70%;">modify</span><span id="lstnumberx568.19" style="font-size:70%;">,</span><span id="lstnumberx568.21" style="font-size:70%;">what</span> <span id="lstnumberx568.23" style="font-size:70%;">code</span> <span id="lstnumberx568.25" style="font-size:70%;">to</span> <span id="lstnumberx568.27" style="font-size:70%;">write</span><span id="lstnumberx568.28" style="font-size:70%;">,</span><span id="lstnumberx568.30" style="font-size:70%;">what</span> <span id="lstnumberx568.32" style="font-size:70%;">config</span> <span id="lstnumberx568.34" style="font-size:70%;">to</span> <span id="lstnumberx568.36" style="font-size:70%;">set</span> </span><span id="lstnumberx569"><span id="lstnumberx569.1" style="font-size:70%;">-</span> <span id="lstnumberx569.3" style="font-size:70%;">**</span> <span id="lstnumberx569.4" style="font-size:70%;">Expected</span> <span id="lstnumberx569.6" style="font-size:70%;">impact</span> <span id="lstnumberx569.7" style="font-size:70%;">**:</span><span id="lstnumberx569.9" style="font-size:70%;">Based</span> <span id="lstnumberx569.11" style="font-size:70%;">on</span> <span id="lstnumberx569.13" style="font-size:70%;">published</span> <span id="lstnumberx569.15" style="font-size:70%;">data</span> </span><span id="lstnumberx570"><span id="lstnumberx570.1" style="font-size:70%;">-</span> <span id="lstnumberx570.3" style="font-size:70%;">**</span> <span id="lstnumberx570.4" style="font-size:70%;">Risk</span> <span id="lstnumberx570.5" style="font-size:70%;">**:</span><span id="lstnumberx570.7" style="font-size:70%;">What</span> <span id="lstnumberx570.9" style="font-size:70%;">could</span> <span id="lstnumberx570.11" style="font-size:70%;">go</span> <span id="lstnumberx570.13" style="font-size:70%;">wrong</span><span id="lstnumberx570.14" style="font-size:70%;">,</span><span id="lstnumberx570.16" style="font-size:70%;">based</span> <span id="lstnumberx570.18" style="font-size:70%;">on</span> <span id="lstnumberx570.20" style="font-size:70%;">negative</span> <span id="lstnumberx570.22" style="font-size:70%;">results</span> </span><span id="lstnumberx572"><span id="lstnumberx572.1" style="font-size:70%;">Target</span> <span id="lstnumberx572.3" style="font-size:70%;">length</span><span id="lstnumberx572.4" style="font-size:70%;">:</span><span id="lstnumberx572.6" style="font-size:70%;">**400-800</span> <span id="lstnumberx572.8" style="font-size:70%;">lines</span> <span id="lstnumberx572.9" style="font-size:70%;">**.</span></span> <span id="lstnumberx574"><span id="lstnumberx574.1" style="font-size:70%;">#</span> <span id="lstnumberx574.3" style="font-size:70%;">Quality</span> <span id="lstnumberx574.5" style="font-size:70%;">Criteria</span> </span><span id="lstnumberx576"><span id="lstnumberx576.1" style="font-size:70%;">The</span> <span id="lstnumberx576.3" style="font-size:70%;">skill</span> <span id="lstnumberx576.5" style="font-size:70%;">file</span> <span id="lstnumberx576.7" style="font-size:70%;">MUST</span><span id="lstnumberx576.8" style="font-size:70%;">:</span></span> <span id="lstnumberx577"><span id="lstnumberx577.1" style="font-size:70%;">1.</span><span id="lstnumberx577.3" style="font-size:70%;">Start</span> <span id="lstnumberx577.5" style="font-size:70%;">with</span> <span id="lstnumberx577.7" style="font-size:70%;">valid</span> <span id="lstnumberx577.9" style="font-size:70%;">YAML</span> <span id="lstnumberx577.11" style="font-size:70%;">frontmatter</span> </span><span id="lstnumberx578"><span id="lstnumberx578.1" style="font-size:70%;">2.</span><span id="lstnumberx578.3" style="font-size:70%;">Cite</span> <span id="lstnumberx578.5" style="font-size:70%;">source</span> <span id="lstnumberx578.7" style="font-size:70%;">URLs</span> <span id="lstnumberx578.9" style="font-size:70%;">for</span> <span id="lstnumberx578.11" style="font-size:70%;">every</span> <span id="lstnumberx578.13" style="font-size:70%;">factual</span> <span id="lstnumberx578.15" style="font-size:70%;">claim</span> </span><span id="lstnumberx579"><span id="lstnumberx579.1" style="font-size:70%;">3.</span><span id="lstnumberx579.3" style="font-size:70%;">Include</span> <span id="lstnumberx579.5" style="font-size:70%;">exact</span> <span id="lstnumberx579.7" style="font-size:70%;">numbers</span> <span id="lstnumberx579.9" style="font-size:70%;">--</span> <span id="lstnumberx579.11" style="font-size:70%;">NO</span> <span id="lstnumberx579.13" style="font-size:70%;">vague</span> <span id="lstnumberx579.15" style="font-size:70%;">descriptions</span> </span><span id="lstnumberx580"><span id="lstnumberx580.1" style="font-size:70%;">4.</span><span id="lstnumberx580.3" style="font-size:70%;">Include</span> <span id="lstnumberx580.5" style="font-size:70%;">actual</span> <span id="lstnumberx580.7" style="font-size:70%;">code</span> <span id="lstnumberx580.8" style="font-size:70%;">/</span> <span id="lstnumberx580.9" style="font-size:70%;">config</span> <span id="lstnumberx580.11" style="font-size:70%;">snippets</span> <span id="lstnumberx580.13" style="font-size:70%;">from</span> <span id="lstnumberx580.15" style="font-size:70%;">real</span> <span id="lstnumberx580.17" style="font-size:70%;">agents</span> <span id="lstnumberx580.19" style="font-size:70%;">(</span><span id="lstnumberx580.20" style="font-size:70%;">not</span> <span id="lstnumberx580.22" style="font-size:70%;">fabricated</span><span id="lstnumberx580.23" style="font-size:70%;">)</span> </span><span id="lstnumberx581"><span id="lstnumberx581.1" style="font-size:70%;">5.</span><span id="lstnumberx581.3" style="font-size:70%;">Flag</span> <span id="lstnumberx581.5" style="font-size:70%;">uncertainty</span><span id="lstnumberx581.6" style="font-size:70%;">:</span><span id="lstnumberx581.8" style="font-size:70%;">"</span> <span id="lstnumberx581.9" style="font-size:70%;">UNVERIFIED</span><span id="lstnumberx581.10" style="font-size:70%;">:</span><span id="lstnumberx581.12" style="font-size:70%;">..."</span> <span id="lstnumberx581.14" style="font-size:70%;">or</span> <span id="lstnumberx581.16" style="font-size:70%;">"</span> <span id="lstnumberx581.17" style="font-size:70%;">NO</span> <span id="lstnumberx581.19" style="font-size:70%;">DATA</span> <span id="lstnumberx581.20" style="font-size:70%;">"</span> <span id="lstnumberx581.22" style="font-size:70%;">for</span> <span id="lstnumberx581.24" style="font-size:70%;">unconfirmed</span> <span id="lstnumberx581.26" style="font-size:70%;">claims</span> </span><span id="lstnumberx582"><span id="lstnumberx582.1" style="font-size:70%;">6.</span><span id="lstnumberx582.3" style="font-size:70%;">Cover</span> <span id="lstnumberx582.5" style="font-size:70%;">both</span> <span id="lstnumberx582.7" style="font-size:70%;">high</span> <span id="lstnumberx582.8" style="font-size:70%;">-</span> <span id="lstnumberx582.9" style="font-size:70%;">level</span> <span id="lstnumberx582.11" style="font-size:70%;">design</span> <span id="lstnumberx582.13" style="font-size:70%;">patterns</span> <span id="lstnumberx582.15" style="font-size:70%;">AND</span> <span id="lstnumberx582.17" style="font-size:70%;">concrete</span> <span id="lstnumberx582.19" style="font-size:70%;">implementation</span> <span id="lstnumberx582.21" style="font-size:70%;">details</span> </span><span id="lstnumberx583"><span id="lstnumberx583.1" style="font-size:70%;">7.</span><span id="lstnumberx583.3" style="font-size:70%;">Be</span> <span id="lstnumberx583.5" style="font-size:70%;">directly</span> <span id="lstnumberx583.7" style="font-size:70%;">implementable</span><span id="lstnumberx583.8" style="font-size:70%;">:</span><span id="lstnumberx583.10" style="font-size:70%;">an</span> <span id="lstnumberx583.12" style="font-size:70%;">Evolution</span> <span id="lstnumberx583.14" style="font-size:70%;">Agent</span> <span id="lstnumberx583.16" style="font-size:70%;">should</span> <span id="lstnumberx583.18" style="font-size:70%;">be</span> <span id="lstnumberx583.20" style="font-size:70%;">able</span> <span id="lstnumberx583.22" style="font-size:70%;">to</span> <span id="lstnumberx583.24" style="font-size:70%;">copy</span> <span id="lstnumberx583.26" style="font-size:70%;">configs</span> <span id="lstnumberx583.27" style="font-size:70%;">/</span> <span id="lstnumberx583.28" style="font-size:70%;">code</span> <span id="lstnumberx583.30" style="font-size:70%;">from</span> <span id="lstnumberx583.32" style="font-size:70%;">this</span> <span id="lstnumberx583.34" style="font-size:70%;">skill</span> </span><span id="lstnumberx585"><span id="lstnumberx585.1" style="font-size:70%;">When</span> <span id="lstnumberx585.3" style="font-size:70%;">done</span><span id="lstnumberx585.4" style="font-size:70%;">,</span><span id="lstnumberx585.6" style="font-size:70%;">call</span> <span id="lstnumberx585.8" style="font-size:70%;">`</span> <span id="lstnumberx585.9" style="font-size:70%;">complete_task</span> <span id="lstnumberx585.10" style="font-size:70%;">`.</span></span></span></span></foreignObject></g></g></svg>

## Appendix C Qualitative Case Study

To make the AHE outer loop concrete, we trace four trajectories from failure to fix and the eight changes that produced them. The four trajectories correspond to the four peaks in the best-so-far curve of Figure 1: trajectory 1 to peak 1 at iteration 2, trajectory 2 to peak 2 at iteration 5, trajectory 3 to peak 3 at iteration 6, and trajectory 4 to peak 4 at iteration 8. We split the case study into two parts. Section C.1 narrates the failing-versus-passing rollouts for each of the four trajectories. Section C.2 documents the chg-\* manifest entries shipped by the Evolve Agent on each of the four winning rounds. Trajectory visualizations for trajectories 1 and 3 appear in Figures 5 and 6; the four manifest figures appear in Figures 7, 8, 9, and 10. Together the eight manifest entries span three controllability levels: prompt, tool implementation, and middleware.

### C.1 Trajectories: failing versus passing rollouts

#### C.1.1 Trajectory 1: db-wal-recovery

##### The task.

db-wal-recovery asks the agent to reconstruct a SQLite database from a corrupted write-ahead log file, abbreviated WAL, by applying both new-row inserts and value updates encoded in the WAL, and to emit the reconstructed table as /app/recovered.json. The verifier is exact: it loads the JSON and asserts every row’s fields against a known ground truth, including updated values on pre-existing rows.

##### Trajectory before and after the iteration-2 changes.

On the NexAU <sub>0</sub> seed the task passed 1 of 2 rollouts. The failing rollout, summarized in the left column of Figure 5, recovered the WAL bytes from a stale shell buffer, invented the missing rows from a guessed pattern, missed that the WAL also encoded mutations to pre-existing rows, and submitted on a self-check that only counted entries. The Agent Debugger grouped this failure under the broader pattern “proxy validation instead of evaluator-isomorphic validation”, where the rollout closes on a surrogate check such as row count, file exists, or script runs rather than on the evaluator’s exact assertions. After the iteration-2 changes are installed, four of the eight new rules fire on this trajectory and are listed in the middle column of Figure 5, each mapped left to the failure step it catches and right to the corresponding step in the passing rollout. The contract-first rule reroutes the agent off the cached-stdout shortcut and forces a re-read of the spec that recasts “WAL changes” as mutations of existing rows. The no-overfit rule blocks the value = id times 100 extrapolation from 5 visible samples. The mirror-the-evaluator rule replaces the json length == 11 self-check with an end-state sweep that asserts the same fields the hidden verifier asserts. db-wal-recovery then passes 2/2 on the next evaluation and remains 2/2 across every subsequent iteration of the run. The Evolve Agent’s predicted\_fixes field for chg-1 did not list db-wal-recovery; the edit was proposed for a different cluster of partial-pass tasks, yet its general phrasing carried it across, illustrating how AHE converts a single-task symptom into a reusable harness rule.

<svg id="A3.F5.pic1" height="551.12" overflow="visible" version="1.1" viewBox="0 0 600 551.12" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,551.12) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#808080;" fill="#808080" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.63 L 0 546.49 C 0 549.05 2.07 551.12 4.63 551.12 L 595.37 551.12 C 597.93 551.12 600 549.05 600 546.49 L 600 4.63 C 600 2.07 597.93 0 595.37 0 L 4.63 0 C 2.07 0 0 2.07 0 4.63 Z"></path></g><g style="--ltx-fill-color:#F7F7F7;" fill="#F7F7F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 4.63 L 0.69 37.09 L 599.31 37.09 L 599.31 4.63 C 599.31 2.45 597.55 0.69 595.37 0.69 L 4.63 0.69 C 2.45 0.69 0.69 2.45 0.69 4.63 Z"></path></g><g style="--ltx-fill-color:#737373;" fill="#737373" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 37.78 L 0.69 546.49 C 0.69 548.67 2.45 550.43 4.63 550.43 L 595.37 550.43 C 597.55 550.43 599.31 548.67 599.31 546.49 L 599.31 37.78 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.16 539.85)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:52.55em;--ltx-fo-height:0.6em;--ltx-fo-depth:45.16em;" width="579.67" height="504.78" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F5.pic1.2.2.2.1.1" style="width:46.23em;"><span id="A3.F5.pic1.2.2.2.1.1.1"><span id="A3.F5.pic1.2.2.2.1.1.1.1" style="font-size:70%;">Shared prefix, both rollouts, same random seed</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.16 10.66)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:52.55em;--ltx-fo-height:1.66em;--ltx-fo-depth:0.17em;" width="579.67" height="20.22" transform="matrix(1 0 0 -1 0 18.33)" overflow="visible" color="#000000"><span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:52.55em;"><span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;">S1.</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">ls /app</span> <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5" style="font-size:70%;">main.db</span><span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6" style="font-size:70%;">,</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7" style="font-size:70%;">main.db-wal</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8" style="font-size:70%;">|  S2.</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.9" style="font-size:70%;">xxd /app/main.db-wal</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.10" style="font-size:70%;">reveals an</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.11" style="font-size:70%;">0x42</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.12" style="font-size:70%;">XOR pattern   |  S3.&nbsp;First</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.13" style="font-size:70%;">sqlite3</span> <span id="A3.F5.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.14" style="font-size:70%;">call auto-checkpoints, the WAL file silently disappears</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F5.pic2" height="1116.85" overflow="visible" style="vertical-align:-558.43px" version="1.1" viewBox="0 0 196.31 1116.85" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1116.85) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 1112.08 C 0 1114.72 2.13 1116.85 4.77 1116.85 L 191.54 1116.85 C 194.18 1116.85 196.31 1114.72 196.31 1112.08 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 602.54 L 195.48 602.54 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 603.37 L 0.83 1112.08 C 0.83 1114.26 2.59 1116.02 4.77 1116.02 L 191.54 1116.02 C 193.72 1116.02 195.48 1114.26 195.48 1112.08 L 195.48 603.37 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 1105.44)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.6em;--ltx-fo-depth:45.16em;" width="175.71" height="504.78" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F5.pic2.4.4.4.1.1" style="width:14.01em;"><span id="A3.F5.pic2.4.4.4.1.1.1"><span id="A3.F5.pic2.4.4.4.1.1.1.1" style="font-size:70%;">Before <span id="A3.F5.pic2.4.4.4.1.1.1.1.1">chg-1</span>, NexAU <sub id="A3.F5.pic2.4.4.4.1.1.1.1.2">0</sub> seed, iteration 1, reward 0.0</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 10.8)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:52.91em;--ltx-fo-depth:0.17em;" width="175.71" height="585.53" transform="matrix(1 0 0 -1 0 583.65)" overflow="visible" color="#000000"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="width:15.93em;"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Divergence: invent the missing rows</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1" style="font-size:70%;">F1.&nbsp;XORs the</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.2" style="font-size:70%;">cached</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.4" style="font-size:70%;">xxd</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.5" style="font-size:70%;">stdout, raw WAL bytes are already gone</span></span> <span id="A3.F5.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><span id="A3.F5.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">F2.&nbsp;Reads the 5 visible rows, then</span> <span id="A3.F5.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" style="font-size:70%;">assumes</span> <span id="A3.F5.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4" style="font-size:70%;">the missing rows follow</span> <span id="A3.F5.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;">value = id <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\times"><semantics><mo>×</mo> <annotation encoding="application/x-tex">\times</annotation></semantics></math> 100</span></span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1" style="font-size:70%;">F3.</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.2" style="font-size:70%;">INSERT OR REPLACE</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.3" style="font-size:70%;">rows 6 to 11 with guessed values, writes</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.4" style="font-size:70%;">recovered.json</span></span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.1" style="font-size:70%;">F4.&nbsp;Self-check</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.2" style="font-size:70%;">json length == 11</span><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.3" style="font-size:70%;">, returns yes, stops here</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Outcome</span></span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9"><span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9.1" style="font-size:70%;">Submitted values:</span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9.2" style="font-size:70%;">100, 200, 300, …, 1100</span></span> <span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.1" style="font-size:70%;">Hidden verifier asserts</span> <span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">value == 150</span> <span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3" style="font-size:70%;">on</span> <span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4" style="font-size:70%;">id == 1</span> <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6" style="font-size:70%;">AssertionError</span></span> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.2" style="font-size:70%;">2 of 7 tests fail, reward 0</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F5.pic3" height="834.68" overflow="visible" style="vertical-align:-417.34px" version="1.1" viewBox="0 0 196.31 834.68" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,834.68) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#595959;" fill="#595959" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 829.91 C 0 832.54 2.13 834.68 4.77 834.68 L 191.54 834.68 C 194.18 834.68 196.31 832.54 196.31 829.91 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F7F7F7;" fill="#F7F7F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 496.69 L 195.48 496.69 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#404040;" fill="#404040" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 497.52 L 0.83 829.91 C 0.83 832.08 2.59 833.85 4.77 833.85 L 191.54 833.85 C 193.72 833.85 195.48 832.08 195.48 829.91 L 195.48 497.52 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 825.74)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.38em;--ltx-fo-depth:29.4em;" width="175.71" height="328.46" transform="matrix(1 0 0 -1 0 4.17)" overflow="visible" color="#FFFFFF"><span id="A3.F5.pic3.5.5.5.1.1" style="width:14.01em;"><span id="A3.F5.pic3.5.5.5.1.1.1"><span id="A3.F5.pic3.5.5.5.1.1.1.1" style="font-size:70%;">chg-1 <span id="A3.F5.pic3.5.5.5.1.1.1.1.1">rules that close each gap</span></span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 8.92)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:43.48em;--ltx-fo-depth:0em;" width="175.71" height="479.68" transform="matrix(1 0 0 -1 0 479.68)" overflow="visible" color="#000000"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4" style="width:15.93em;"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.5"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.5.1" style="font-size:70%;--ltx-fg-color:#262626;">R1.&nbsp;Contract first.</span> <span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.5.2" style="font-size:70%;">Tests and verifier scripts are the source of truth, not shell history.</span></span> <span id="A3.F5.pic3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F1: cached stdout is not the contract.</span></span> <span style="width:433.6pt;height:0.2pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.6"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.6.1" style="font-size:70%;--ltx-fg-color:#262626;">R5.&nbsp;Generalize, do not overfit visible samples.</span></span> <span id="A3.F5.pic3.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic3.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F2: 5 rows are too few to infer the missing 6.</span></span> <span style="width:433.6pt;height:0.2pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7.1" style="font-size:70%;--ltx-fg-color:#262626;">R1, second clause.</span> <span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7.2" style="font-size:70%;">The contract names forbidden extras and multiple-answer requirements.</span></span> <span id="A3.F5.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F3: rereading the spec exposes “WAL changes” as mutations of existing rows.</span></span> <span style="width:433.6pt;height:0.2pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8"><span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.1" style="font-size:70%;--ltx-fg-color:#262626;">R2 + R8.&nbsp;Mirror the evaluator before finishing.</span> <span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.2" style="font-size:70%;">Run an end-state acceptance sweep, trust the failing check over a theory, do not substitute a self-invented proxy metric.</span></span> <span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic3.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F4: row count is not the verifier’s check.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F5.pic4" height="1349.31" overflow="visible" style="vertical-align:-674.66px" version="1.1" viewBox="0 0 196.31 1349.31" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1349.31) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 1344.54 C 0 1347.18 2.13 1349.31 4.77 1349.31 L 191.54 1349.31 C 194.18 1349.31 196.31 1347.18 196.31 1344.54 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 868.21 L 195.48 868.21 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 869.04 L 0.83 1344.54 C 0.83 1346.72 2.59 1348.48 4.77 1348.48 L 191.54 1348.48 C 193.72 1348.48 195.48 1346.72 195.48 1344.54 L 195.48 869.04 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 1337.9)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.6em;--ltx-fo-depth:42.15em;" width="175.71" height="471.57" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F5.pic4.5.5.5.1.1" style="width:14.01em;"><span id="A3.F5.pic4.5.5.5.1.1.1"><span id="A3.F5.pic4.5.5.5.1.1.1.1" style="font-size:70%;">After <span id="A3.F5.pic4.5.5.5.1.1.1.1.1">chg-1</span>, same seed, iteration 2, reward 1.0</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 10.8)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:76.99em;--ltx-fo-depth:0.17em;" width="175.71" height="851.2" transform="matrix(1 0 0 -1 0 849.32)" overflow="visible" color="#000000"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4" style="width:15.93em;"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.5"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.5.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Divergence: re-read the contract, recover the bytes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.6"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.6.1" style="font-size:70%;">P1.&nbsp;Re-reads task spec verbatim, treats “WAL changes” as</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.6.2" style="font-size:70%;">mutations of existing rows</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7.1" style="font-size:70%;">P2.</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7.2" style="font-size:70%;">find / -name "*.wal"</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.7.3" style="font-size:70%;">returns empty, switches to raw-disk recovery</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.1" style="font-size:70%;">P3.&nbsp;Carves</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.2" style="font-size:70%;">/dev/vda</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.3" style="font-size:70%;">at block 203050, XORs with</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.4" style="font-size:70%;">0x42</span><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.5" style="font-size:70%;">, writes back</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.6" style="font-size:70%;">/app/main.db-wal</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.7" style="font-size:70%;">with valid magic</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.8.8" style="font-size:70%;">377f0682</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.9"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.9.1" style="font-size:70%;">P4.</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.9.2" style="font-size:70%;">sqlite3</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.9.3" style="font-size:70%;">now reports 11 rows with</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.9.4" style="font-size:70%;">value = 150, 250, 300, …</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.10"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.10.1" style="font-size:70%;">P5.&nbsp;Final acceptance sweep mirrors the verifier:</span></span> <span id="A3.F5.pic4.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F5.pic4.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3" style="font-size:70%;">wal_magic == 377f0682</span></span> <span id="A3.F5.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F5.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3" style="font-size:70%;">json length == 11</span><span id="A3.F5.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4" style="font-size:70%;">,</span> <span id="A3.F5.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5" style="font-size:70%;">sorted ids == 1..11</span></span> <span id="A3.F5.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F5.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="font-size:70%;">json rows == db rows</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.11"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.11.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Outcome</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.12"><span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.12.1" style="font-size:70%;">Submitted values:</span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.12.2" style="font-size:70%;">150, 250, 300, …, 1100</span></span> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F5.pic4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.4.2" style="font-size:70%;">7 of 7 tests pass, reward 1</span></span></span></foreignObject></g></g></svg>

Figure 5: Three-column trajectory comparison for db-wal-recovery before and after chg-1. Both rollouts share the same random seed and the same first three steps S1 to S3, summarized in the banner above the columns. The left column lists the four divergence steps F1 to F4 of the failing rollout. The middle column lists the four chg-1 rules out of eight that fire on this trajectory, each annotated with the failure step it catches. The right column lists the corresponding steps P1 to P5 of the passing rollout. Each F to R to P chain reads across one row of the figure: a failure mode, the rule that names and forbids that failure mode, and the step the rule produces in the passing rollout. chg-1 is a 68-line append to workspace/systemprompt.md with no mention of SQLite, WAL, or db-wal-recovery; the full manifest entry appears in Figure 7.

#### C.1.2 Trajectory 2: path-tracing

The first trajectory shows a single round of evolution flipping one task. The second shows how the iteration-5 round, which targeted a cross-task “post-validation state destruction” regression, raised the score on tasks the evolve agent had not necessarily named, including path-tracing.

##### The task.

path-tracing asks the agent to implement a path tracer that renders a scene description into /app/reconstructed.ppm. The verifier reads that single output file and compares it pixel-for-pixel against a reference image; nothing else in the working tree is read.

##### Trajectory before and after the iteration-5 changes.

At iteration 4 the task scored 0/2. The shared failure mode in both rollouts was a four-step sequence: the agent rendered a correct /app/reconstructed.ppm, ran a self-check that confirmed the image matched a structural acceptance criterion, then issued a sweeping cleanup command of the form rm -rf /app/image /app/reconstructed.ppm /app/scratch as a final tidy-up step, and submitted on the shell exit code of that cleanup. The verifier subsequently found no reconstructed.ppm on disk and rejected the rollout. The seed harness’s prompt advice against “destroying verified state” was already present, but no execution-time mechanism enforced it. At iteration 5 path-tracing flips from 0/2 to 2/2. In both passing rollouts the agent reaches the same render-and-self-check state as before, then issues the cleanup; the shell guard intercepts it with a message naming /app/reconstructed.ppm as protected, the agent acknowledges the message and finishes without rerunning the cleanup, and the verifier finds the correct file on disk. The same iteration-5 round also recovers polyglot-rust-c and large-scale-text-editing, both listed in the change-manifest’s predicted\_fixes. configure-git-webserver, also predicted, recovers only partially at iteration 5 because its failure mode involves a state reset path that the iteration-5 guard still treats as overrideable; that gap is closed by the iteration-8 changes described in trajectory 4.

#### C.1.3 Trajectory 3: mcmc-sampling-stan

The first two trajectories each used a prompt-and-tool pair. The third shows two harness components from different controllability levels, a tool-level publish-state guard and a step-spanning middleware, working together to flip a task that had been failing for five iterations. Figure 6 summarizes the before-and-after rollouts.

##### The task.

mcmc-sampling-stan asks the agent to install rstan 2.32.7, fit a hierarchical beta-binomial model to 30 observations, and write the posterior means of alpha and beta to two text files. The verifier installs the package itself and reruns the agent’s analysis.R end-to-end, then asserts alpha lies in \[2.84, 2.91\] and beta lies in \[16.1, 16.7\].

##### Trajectory before and after the iteration-6 changes.

The task scored 0/2 from iteration 1 through iteration 5. The shared failure mode, summarized in the left column of Figure 6, is a proxy-then-skip pattern in five steps: the agent computes an independent grid-integration estimate of the posterior, writes those numbers as the deliverable, fires the real MCMC sampling as a background job, kills it before completion to “preserve the already-created deliverables”, and submits on a final sweep that only checks the files exist and parse as numbers. The verifier then reruns analysis.R from scratch; the unconverged sampler produces values around 1e19, far outside the expected range. None of the prior rounds catches this trajectory: the iteration-2 prompt edit names a contract-first principle but the agent already believes the grid integration is a faithful contract; the iteration-5 publish-state guard protects the deliverable files but treats analysis.R itself as an unprotected scratch artifact. After the iteration-6 changes are installed, both rollouts run analysis.R at the full iter = 100000 to completion, cross-check against an independent scratch full run in /tmp, and publish the converged values via the new override token; the right column of Figure 6 traces the passing rollout. The task passes 6/6 verifier tests in both rollouts and stays 2/2 for the next four iterations. The converged values land at alpha approximately 2.872, beta approximately 16.43, near the centers of the expected ranges. The same iteration-6 round also benefits sam-cell-seg, query-optimize, caffe-cifar-10, dna-assembly, and train-fasttext, all of which match one or more of the seven middleware patterns.

<svg id="A3.F6.pic1" height="551.66" overflow="visible" version="1.1" viewBox="0 0 600 551.66" width="600"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,551.66) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#808080;" fill="#808080" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.63 L 0 547.03 C 0 549.59 2.07 551.66 4.63 551.66 L 595.37 551.66 C 597.93 551.66 600 549.59 600 547.03 L 600 4.63 C 600 2.07 597.93 0 595.37 0 L 4.63 0 C 2.07 0 0 2.07 0 4.63 Z"></path></g><g style="--ltx-fill-color:#F7F7F7;" fill="#F7F7F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 4.63 L 0.69 37.62 L 599.31 37.62 L 599.31 4.63 C 599.31 2.45 597.55 0.69 595.37 0.69 L 4.63 0.69 C 2.45 0.69 0.69 2.45 0.69 4.63 Z"></path></g><g style="--ltx-fill-color:#737373;" fill="#737373" fill-opacity="1.0"><path style="stroke:none" d="M 0.69 38.32 L 0.69 547.03 C 0.69 549.2 2.45 550.97 4.63 550.97 L 595.37 550.97 C 597.55 550.97 599.31 549.2 599.31 547.03 L 599.31 38.32 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.16 540.38)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:52.55em;--ltx-fo-height:0.6em;--ltx-fo-depth:45.16em;" width="579.67" height="504.78" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F6.pic1.2.2.2.1.1" style="width:46.23em;"><span id="A3.F6.pic1.2.2.2.1.1.1"><span id="A3.F6.pic1.2.2.2.1.1.1.1" style="font-size:70%;">Shared prefix, both rollouts, same random seed</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.16 11.2)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:52.55em;--ltx-fo-height:1.66em;--ltx-fo-depth:0.22em;" width="579.67" height="20.76" transform="matrix(1 0 0 -1 0 18.33)" overflow="visible" color="#000000"><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:52.55em;"><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;">S1.</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">ls /app</span> <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5" style="font-size:70%;">data.csv</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6" style="font-size:70%;">with 30 rows of columns</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7" style="font-size:70%;">y</span><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8" style="font-size:70%;">,</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.9" style="font-size:70%;">n</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.10" style="font-size:70%;">|  S2.&nbsp;Install</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.11" style="font-size:70%;">rstan 2.32.7</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.12" style="font-size:70%;">from CRAN as a long background job   |  S3.&nbsp;Author</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.13" style="font-size:70%;">hierarchical_model.stan</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.14" style="font-size:70%;">and</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.15" style="font-size:70%;">analysis.R</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.16" style="font-size:70%;">with</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.17" style="font-size:70%;">chains = 4</span><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.18" style="font-size:70%;">,</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.19" style="font-size:70%;">iter = 100000</span><span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.20" style="font-size:70%;">,</span> <span id="A3.F6.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.21" style="font-size:70%;">seed = 1</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F6.pic2" height="1371.45" overflow="visible" style="vertical-align:-685.73px" version="1.1" viewBox="0 0 196.31 1371.45" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1371.45) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 1366.68 C 0 1369.32 2.13 1371.45 4.77 1371.45 L 191.54 1371.45 C 194.18 1371.45 196.31 1369.32 196.31 1366.68 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 801.79 L 195.48 801.79 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 802.62 L 0.83 1366.68 C 0.83 1368.86 2.59 1370.62 4.77 1370.62 L 191.54 1370.62 C 193.72 1370.62 195.48 1368.86 195.48 1366.68 L 195.48 802.62 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 1360.04)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.6em;--ltx-fo-depth:50.17em;" width="175.71" height="560.13" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F6.pic2.4.4.4.1.1" style="width:14.01em;"><span id="A3.F6.pic2.4.4.4.1.1.1"><span id="A3.F6.pic2.4.4.4.1.1.1.1" style="font-size:70%;">Before iteration 6 changes, iteration 5, reward 0.0</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 10.8)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:70.97em;--ltx-fo-depth:0.17em;" width="175.71" height="784.78" transform="matrix(1 0 0 -1 0 782.9)" overflow="visible" color="#000000"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="width:15.93em;"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Divergence: trust the proxy, skip the real run</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3" style="font-size:70%;">F1.&nbsp;Runs an</span> <span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4" style="font-size:70%;">independent</span> <span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5" style="font-size:70%;">R grid integration of the marginal posterior, gets</span> <span id="A3.F6.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;">alpha <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\approx"><semantics><mo>≈</mo> <annotation encoding="application/x-tex">\approx</annotation></semantics></math> 2.876</span><span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6" style="font-size:70%;">,</span> <span id="A3.F6.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">beta <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\approx"><semantics><mo>≈</mo> <annotation encoding="application/x-tex">\approx</annotation></semantics></math> 16.375</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1" style="font-size:70%;">F2.&nbsp;Writes those grid values into</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.2" style="font-size:70%;">/app/posterior_alpha_mean.txt</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.3" style="font-size:70%;">and</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.4" style="font-size:70%;">/app/posterior_beta_mean.txt</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.5" style="font-size:70%;">as the deliverable</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1" style="font-size:70%;">F3.&nbsp;Fires</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.2" style="font-size:70%;">Rscript /app/analysis.R</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.3" style="font-size:70%;">as a background job, polls every 30s</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.1" style="font-size:70%;">F4.&nbsp;After about 3 minutes,</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.2" style="font-size:70%;">kills</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.3" style="font-size:70%;">the unfinished sampling to “preserve the already-created deliverables”</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.1" style="font-size:70%;">F5.&nbsp;Final sweep only checks files exist and parse as numbers, returns yes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Outcome</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.1" style="font-size:70%;">Verifier reruns</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.2" style="font-size:70%;">analysis.R</span><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.3" style="font-size:70%;">; the actual MCMC chain diverges</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.11"><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.11.1" style="font-size:70%;">Submitted:</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.11.2" style="font-size:70%;">alpha = 1.28e19</span><span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.11.3" style="font-size:70%;">,</span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.11.4" style="font-size:70%;">beta = 2.60e17</span></span> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.2" style="font-size:70%;">2 of 6 tests fail, reward 0</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F6.pic3" height="1003.69" overflow="visible" style="vertical-align:-501.85px" version="1.1" viewBox="0 0 196.31 1003.69" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1003.69) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#595959;" fill="#595959" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 998.93 C 0 1001.56 2.13 1003.69 4.77 1003.69 L 191.54 1003.69 C 194.18 1003.69 196.31 1001.56 196.31 998.93 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F7F7F7;" fill="#F7F7F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 564.99 L 195.48 564.99 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#404040;" fill="#404040" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 565.82 L 0.83 998.93 C 0.83 1001.1 2.59 1002.86 4.77 1002.86 L 191.54 1002.86 C 193.72 1002.86 195.48 1001.1 195.48 998.93 L 195.48 565.82 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 992.28)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.6em;--ltx-fo-depth:38.3em;" width="175.71" height="429.17" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F6.pic3.4.4.4.1.1" style="width:14.01em;"><span id="A3.F6.pic3.4.4.4.1.1.1"><span id="A3.F6.pic3.4.4.4.1.1.1.1" style="font-size:70%;">Iteration 6 changes that close each gap</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 10.8)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:49.5em;--ltx-fo-depth:0.17em;" width="175.71" height="547.98" transform="matrix(1 0 0 -1 0 546.1)" overflow="visible" color="#000000"><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="width:15.93em;"><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4"><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1" style="font-size:70%;--ltx-fg-color:#262626;">Middleware <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1.1">chg-2</span>.</span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.2" style="font-size:70%;">Pattern catalog flags “inline or self-written proxy validator instead of the named evaluator”. The risk hint is injected into the next model turn.</span></span> <span id="A3.F6.pic3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F1, F2, F4: the grid integration is a proxy for the named MCMC pipeline, and the kill of <span id="A3.F6.pic3.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="--ltx-fg-color:#666666;">analysis.R</span> keeps that proxy in place. The reminder rewires the next turn toward running the named pipeline to completion.</span></span> <span style="width:433.6pt;height:0.2pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5"><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1" style="font-size:70%;--ltx-fg-color:#262626;">Middleware <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1.1">chg-2</span>, second pattern.</span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.2" style="font-size:70%;">Catalog also flags “shallow validation” and “benchmark run with no explicit golden or threshold comparator”.</span></span> <span id="A3.F6.pic3.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic3.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.1" style="font-size:70%;--ltx-fg-color:#666666;">catches F5: a file-existence sweep without a tolerance comparator on the verifier’s named outputs is forbidden, and an independent re-run with cross-check is required instead.</span></span> <span style="width:433.6pt;height:0.2pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6"><span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1" style="font-size:70%;--ltx-fg-color:#262626;">Publish-state guard <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1.1">chg-1</span>.</span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.2" style="font-size:70%;">Once a script entrypoint is tied to the named evaluator and a final check has passed, that script and its consumed files become protected; cleanup or rerun requires the explicit</span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.3" style="font-size:70%;">ALLOW_POST_SUCCESS_RESET</span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.4" style="font-size:70%;">token.</span></span> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo style="--ltx-fg-color:#666666;" mathcolor="#666666" mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.1" style="font-size:70%;--ltx-fg-color:#666666;">visible at P4 and P5: the override token at every successful submit is evidence the guard is engaged, not silently bypassed.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F6.pic4" height="1592.84" overflow="visible" style="vertical-align:-796.42px" version="1.1" viewBox="0 0 196.31 1592.84" width="196.31"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1592.84) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 1588.08 C 0 1590.71 2.13 1592.84 4.77 1592.84 L 191.54 1592.84 C 194.18 1592.84 196.31 1590.71 196.31 1588.08 L 196.31 4.77 C 196.31 2.13 194.18 0 191.54 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 912.49 L 195.48 912.49 L 195.48 4.77 C 195.48 2.59 193.72 0.83 191.54 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 913.32 L 0.83 1588.08 C 0.83 1590.25 2.59 1592.01 4.77 1592.01 L 191.54 1592.01 C 193.72 1592.01 195.48 1590.25 195.48 1588.08 L 195.48 913.32 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 1581.43)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:15.93em;--ltx-fo-height:0.6em;--ltx-fo-depth:60.21em;" width="175.71" height="670.82" transform="matrix(1 0 0 -1 0 6.65)" overflow="visible" color="#FFFFFF"><span id="A3.F6.pic4.4.4.4.1.1" style="width:14.01em;"><span id="A3.F6.pic4.4.4.4.1.1.1"><span id="A3.F6.pic4.4.4.4.1.1.1.1" style="font-size:70%;">After iteration 6 changes, same seed, iteration 6, reward 1.0</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 10.8)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:15.93em;--ltx-fo-height:81.01em;--ltx-fo-depth:0.17em;" width="175.71" height="895.48" transform="matrix(1 0 0 -1 0 893.6)" overflow="visible" color="#000000"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="width:15.93em;"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Divergence: drive the evaluator pipeline to convergence</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1" style="font-size:70%;">P1.&nbsp;Smoke-tests</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.2" style="font-size:70%;">analysis.R</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.3" style="font-size:70%;">with overrides</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.4" style="font-size:70%;">STAN_ITER=2000</span><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.5" style="font-size:70%;">,</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.6" style="font-size:70%;">STAN_WARMUP=1000</span><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.7" style="font-size:70%;">, confirms compilation and end-to-end output</span></span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3" style="font-size:70%;">P2.&nbsp;Runs</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4" style="font-size:70%;">analysis.R</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5" style="font-size:70%;">at the full</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6" style="font-size:70%;">iter = 100000</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7" style="font-size:70%;">and</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8" style="font-size:70%;">waits for completion</span><span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9" style="font-size:70%;">, gets</span> <span id="A3.F6.pic4.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="font-size:70%;">alpha <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\approx"><semantics><mo>≈</mo> <annotation encoding="application/x-tex">\approx</annotation></semantics></math> 2.872</span><span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.10" style="font-size:70%;">,</span> <span id="A3.F6.pic4.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">beta <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\approx"><semantics><mo>≈</mo> <annotation encoding="application/x-tex">\approx</annotation></semantics></math> 16.43</span></span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1" style="font-size:70%;">P3.&nbsp;Reruns the same script in</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.2" style="font-size:70%;">/tmp</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.3" style="font-size:70%;">as an independent scratch copy, both copies agree to 3 significant figures</span></span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.1" style="font-size:70%;">P4.&nbsp;Publishes the cross-validated values with the new</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.2" style="font-size:70%;">ALLOW_POST_SUCCESS_RESET</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.3" style="font-size:70%;">override required by the publish-state guard</span></span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.1" style="font-size:70%;">P5.&nbsp;Cleans the unrequested</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.2" style="font-size:70%;">hierarchical_model.rds</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.3" style="font-size:70%;">cache, reruns the final</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.4" style="font-size:70%;">/app</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.5" style="font-size:70%;">acceptance sweep</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Outcome</span></span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10"><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.1" style="font-size:70%;">Submitted:</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.2" style="font-size:70%;">alpha = 2.872</span><span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.3" style="font-size:70%;">,</span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.4" style="font-size:70%;">beta = 16.43</span></span> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\rightarrow"><semantics><mo mathsize="0.700em" stretchy="false">→</mo> <annotation encoding="application/x-tex">\rightarrow</annotation></semantics></math> <span id="A3.F6.pic4.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.2" style="font-size:70%;">6 of 6 tests pass, reward 1</span></span></span></foreignObject></g></g></svg>

Figure 6: Three-column trajectory comparison for mcmc-sampling-stan before and after the two harness changes shipped at the start of iteration 6: the tool-level publish-state guard chg-1 at commit ff0cf3d and the middleware-level execution-risk hints chg-2 at commit 9651986, whose full manifest entry appears in Figure 9. The banner shows the shared prefix S1 to S3. The left column lists the five divergence steps F1 to F5 of the failing rollout at iteration 5. The middle column lists the iteration-6 components that fire on this trajectory, each annotated with the failure steps it catches. The right column lists the corresponding steps P1 to P5 of the passing rollout at iteration 6. The task stays 2/2 for the next four evaluation rounds.

#### C.1.4 Trajectory 4: configure-git-webserver

The fourth trajectory shows the evolve agent doubling back on its own prior decisions. By iteration 7 the publish-state guard had been carried over for three rounds, the middleware for two, and the score had regressed from 75.8 to 73.0. Rather than roll either back, the iteration-7 round patched a loophole in the guard and a salience gap in the middleware; both patches turn out to be load-bearing for configure-git-webserver.

##### The task.

configure-git-webserver asks the agent to set up a git repository under /git/server, configure a webserver that serves the working tree under /git/www, deploy a hello-world page, and produce a configuration in which the externally observable URL returns the expected content. The verifier issues an HTTP request from outside the agent’s shell and reads the response body.

##### Trajectory before and after the iteration-8 changes.

At iteration 7 the task scored 0/2. The failing rollout reached a fully working deployment, ran a curl-against-localhost self-check that returned the right body, and then issued two cleanup commands prefixed with ALLOW\_POST\_SUCCESS\_RESET: one deletion of /git/www/hello.html and one reset of /git/server/refs/heads/master to an empty state, both rationalized as “leaving a clean repo for grading”. The shell tool’s iteration-5 guard caught these as overrideable resets and let them through once the override token was attached. The external verifier then received a 404 and rejected the rollout. git-multibranch failed in iteration 7 for the same structural reason. In parallel, polyglot-c-py and pytorch-model-recovery failed at iteration 7 with a different but related symptom: the iteration-6 middleware had already emitted the right warnings about clean-layout violation and inline-helper validation, but the warnings were appended only to the tool output, and on the very next model turn the agent ignored them and published. After the iteration-8 changes are installed, configure-git-webserver flips from 0/2 to 2/2. Both rollouts reach the same successful deployment as before, attempt the same overrideable cleanup commands, and have them refused at the shell layer with hard-block messages naming the protected web root and protected ref; the agent acknowledges the messages, drops the cleanup, and submits the live state. git-multibranch flips along the same path. polyglot-c-py, polyglot-rust-c, pytorch-model-recovery, and mteb-retrieve flip via the middleware path: in each, the FRAMEWORK reminder injected before the next model turn carries enough salience for the agent to fix the violation rather than publish over it. Iteration 8’s overall score lands at 76.97, the run’s high-water mark on Figure LABEL:fig:evolution-curve, and the single biggest jump of the run.

### C.2 Changes shipped on the four winning rounds

#### C.2.1 Iteration 2: prompt rules and shell-timeout argument

The Evolve Agent’s response after iteration 1 was two changes. Change chg-1 at commit c0b8a05 is a 68-line append to workspace/systemprompt.md with no mention of SQLite, WAL, or db-wal-recovery; the appended block contains eight numbered rules covering acceptance-contract extraction, evaluator mirroring, minimal-edit semantics, candidate scoring, generalization, time budgeting, end-state readiness, and a stop rule. Change chg-2 at commit 169c34c is a tool-implementation edit that exposes the shell timeout as a per-call argument with a higher ceiling, addressing a class of failures in which the seed harness silently truncated long-running setup commands. Both manifest entries appear in Figure 7.

<svg id="A3.F7.pic1" height="782.69" overflow="visible" style="vertical-align:-391.35px" version="1.1" viewBox="0 0 297.23 782.69" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,782.69) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 777.92 C 0 780.56 2.13 782.69 4.77 782.69 L 292.47 782.69 C 295.1 782.69 297.23 780.56 297.23 777.92 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 303.93 L 296.4 303.93 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 304.76 L 0.83 777.92 C 0.83 780.1 2.59 781.86 4.77 781.86 L 292.47 781.86 C 294.64 781.86 296.4 780.1 296.4 777.92 L 296.4 304.76 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 773.62)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:42.15em;" width="276.63" height="469.23" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F7.pic1.2.2.2.1.1" style="width:22.06em;"><span id="A3.F7.pic1.2.2.2.1.1.1"><span id="A3.F7.pic1.2.2.2.1.1.1.1" style="font-size:70%;">chg-1, iteration 1, commit <span id="A3.F7.pic1.2.2.2.1.1.1.1.1">c0b8a05</span>, level: prompt</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:25.81em;--ltx-fo-depth:0.2em;" width="276.63" height="286.92" transform="matrix(1 0 0 -1 0 284.77)" overflow="visible" color="#000000"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:25.08em;"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">workspace/systemprompt.md</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" style="font-size:70%;--ltx-fg-color:#1B262C;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.1" style="font-size:70%;">Appended a contract-first workflow of eight numbered rules covering acceptance-contract extraction, evaluator mirroring, minimal-edit semantics, candidate scoring, generalization, time budgeting, end-state readiness, and a stop rule. No SQLite, WAL, or task-specific keywords.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.1" style="font-size:70%;">Agent submitted on a self-invented proxy check such as row count or file exists, instead of reproducing the evaluator’s literal assertions.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8"><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.1" style="font-size:70%;">14 tasks. Examples:</span> <span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.2" style="font-size:70%;">configure-git-webserver</span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.3" style="font-size:70%;">,</span> <span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.4" style="font-size:70%;">query-optimize</span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.5" style="font-size:70%;">,</span> <span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.6" style="font-size:70%;">mteb-retrieve</span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.7" style="font-size:70%;">,</span> <span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.8" style="font-size:70%;">train-fasttext</span><span id="A3.F7.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.9" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F7.pic2" height="771.62" overflow="visible" style="vertical-align:-385.81px" version="1.1" viewBox="0 0 297.23 771.62" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,771.62) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 766.85 C 0 769.49 2.13 771.62 4.77 771.62 L 292.47 771.62 C 295.1 771.62 297.23 769.49 297.23 766.85 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 315 L 296.4 315 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 315.83 L 0.83 766.85 C 0.83 769.03 2.59 770.79 4.77 770.79 L 292.47 770.79 C 294.64 770.79 296.4 769.03 296.4 766.85 L 296.4 315.83 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 762.55)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:40.14em;" width="276.63" height="447.09" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F7.pic2.3.3.3.1.1" style="width:22.06em;"><span id="A3.F7.pic2.3.3.3.1.1.1"><span id="A3.F7.pic2.3.3.3.1.1.1.1" style="font-size:70%;">chg-2, iteration 1, commit <span id="A3.F7.pic2.3.3.3.1.1.1.1.1">169c34c</span>, level: tool</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:26.82em;--ltx-fo-depth:0.2em;" width="276.63" height="297.99" transform="matrix(1 0 0 -1 0 295.84)" overflow="visible" color="#000000"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="width:25.08em;"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F7.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">tool_descriptions/run_shell_command.tool.yaml</span></span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">tools/shell_tools/run_shell_command.py</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.1" style="font-size:70%;">Exposed a per-call</span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.2" style="font-size:70%;">timeout_ms</span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.3" style="font-size:70%;">on the shell tool, added background-execution guidance, and appended a timeout-recovery hint to timed-out shell output so the agent can switch to short probes plus background jobs instead of sitting on the default 5 minute wait.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7.1" style="font-size:70%;">Agent burned rollout budget on long foreground installs and sleep-poll loops, repeatedly hitting the default 5 minute timeout.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9"><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.1" style="font-size:70%;">8 tasks. Examples:</span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.2" style="font-size:70%;">compile-compcert</span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.3" style="font-size:70%;">,</span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.4" style="font-size:70%;">regex-chess</span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.5" style="font-size:70%;">,</span> <span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.6" style="font-size:70%;">adaptive-rejection-sampler</span><span id="A3.F7.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.7" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

Figure 7: Two change-manifest entries written in iteration 1, one editing the system prompt and one editing the shell tool. Both appear in the same change\_manifest.json produced by the evolve agent, then enter Phase 3 of the next round as binding contracts that the attribution check rolls back if their predicted fixes do not materialize.

#### C.2.2 Iteration 5: publish-state mechanism (prompt rules + shell-tool guard)

The Evolve Agent shipped two complementary changes at the iteration-4 boundary, both written for iteration 5. Change chg-7 at commit 3ba3a90 edits workspace/systemprompt.md together with the descriptor of run\_shell\_command; it adds three rules to the harness’s working memory: a publish-state rule that names the post-acceptance filesystem state as the deliverable surface, a scratch-directory rule for tasks with constrained delivery layouts, and a literal-output rule for DSL, config, and script outputs in which equivalence is judged at the byte level. Change chg-8 at commit 4e0aab9 edits workspace/tools/shell\_tools/run\_shell\_command.py; it installs a stateful publish-state guard inside the shell tool with three behaviors. First, when the shell observes a successful evaluator-style final check, it parses the acceptance command for explicit file paths and roots and records them as protected. Second, when a later command would delete a protected output or reset a protected root, the guard intercepts the command before execution and returns a targeted error explaining which protected target is at risk. Third, the guard accepts an explicit ALLOW\_POST\_SUCCESS\_RESET token from the agent that downgrades the block to a warning and forces the agent to revalidate before submitting. The two changes are paired by design: chg-7 tells the model what publish state is, chg-8 stops the agent from destroying it even when the model forgets the rule. Both manifest entries appear in Figure 8.

<svg id="A3.F8.pic1" height="1048.36" overflow="visible" style="vertical-align:-524.18px" version="1.1" viewBox="0 0 297.23 1048.36" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,1048.36) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 1043.59 C 0 1046.23 2.13 1048.36 4.77 1048.36 L 292.47 1048.36 C 295.1 1048.36 297.23 1046.23 297.23 1043.59 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 370.35 L 296.4 370.35 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 371.18 L 0.83 1043.59 C 0.83 1045.77 2.59 1047.53 4.77 1047.53 L 292.47 1047.53 C 294.64 1047.53 296.4 1045.77 296.4 1043.59 L 296.4 371.18 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 1039.29)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:60.21em;" width="276.63" height="668.48" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F8.pic1.3.3.3.1.1" style="width:22.06em;"><span id="A3.F8.pic1.3.3.3.1.1.1"><span id="A3.F8.pic1.3.3.3.1.1.1.1" style="font-size:70%;">chg-7, iteration 5, commit <span id="A3.F8.pic1.3.3.3.1.1.1.1.1">3ba3a90</span>, level: prompt + tool descriptor</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:31.84em;--ltx-fo-depth:0.2em;" width="276.63" height="353.34" transform="matrix(1 0 0 -1 0 351.19)" overflow="visible" color="#000000"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="width:25.08em;"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F8.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">workspace/systemprompt.md</span></span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">tool_descriptions/run_shell_command.tool.yaml</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4.1" style="font-size:70%;--ltx-fg-color:#1B262C;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.1" style="font-size:70%;">Appended three rules to the harness’s working memory. Publish-state rule: once an evaluator-style final check passes, the resulting filesystem and service state is the deliverable surface and must not be reset to “look clean”. Scratch-directory rule: place exploratory artifacts under</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.2" style="font-size:70%;">/tmp</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.3" style="font-size:70%;">or a scratch path the verifier ignores. Literal-output rule: for DSL, config, or script outputs with byte-level contracts, validate equality at the byte level.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7.1" style="font-size:70%;">Agent reached an evaluator-passing state, then issued sweeping cleanup or rewrote outputs to “tidy up”, leaving the verifier with no deliverable.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9"><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.1" style="font-size:70%;">4 tasks. Examples:</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.2" style="font-size:70%;">path-tracing</span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.3" style="font-size:70%;">,</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.4" style="font-size:70%;">configure-git-webserver</span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.5" style="font-size:70%;">,</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.6" style="font-size:70%;">polyglot-rust-c</span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.7" style="font-size:70%;">,</span> <span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.8" style="font-size:70%;">large-scale-text-editing</span><span id="A3.F8.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.9" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F8.pic2" height="993.01" overflow="visible" style="vertical-align:-496.51px" version="1.1" viewBox="0 0 297.23 993.01" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,993.01) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 988.25 C 0 990.88 2.13 993.01 4.77 993.01 L 292.47 993.01 C 295.1 993.01 297.23 990.88 297.23 988.25 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 370.35 L 296.4 370.35 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 371.18 L 0.83 988.25 C 0.83 990.42 2.59 992.18 4.77 992.18 L 292.47 992.18 C 294.64 992.18 296.4 990.42 296.4 988.25 L 296.4 371.18 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 983.94)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:55.19em;" width="276.63" height="613.13" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F8.pic2.2.2.2.1.1" style="width:22.06em;"><span id="A3.F8.pic2.2.2.2.1.1.1"><span id="A3.F8.pic2.2.2.2.1.1.1.1" style="font-size:70%;">chg-8, iteration 5, commit <span id="A3.F8.pic2.2.2.2.1.1.1.1.1">4e0aab9</span>, level: tool implementation</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:31.84em;--ltx-fo-depth:0.2em;" width="276.63" height="353.34" transform="matrix(1 0 0 -1 0 351.19)" overflow="visible" color="#000000"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:25.08em;"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">tools/shell_tools/run_shell_command.py</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.1" style="font-size:70%;">Installed a stateful publish-state guard inside the shell tool. After a successful evaluator-style final check, the guard parses the acceptance command for explicit file paths and roots and records them as protected. Later destructive commands that would delete a protected output or reset a protected root are intercepted before execution and returned as a targeted error. An explicit</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.2" style="font-size:70%;">ALLOW_POST_SUCCESS_RESET</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.3" style="font-size:70%;">token can downgrade the block to a warning, after which the agent must re-validate before submit.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.1" style="font-size:70%;">Even with the prompt rule in place, the agent still issued destructive cleanup commands after publish-state. Execution-time enforcement at the shell tool is the most direct interlock.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8"><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.1" style="font-size:70%;">Same 4 tasks; load-bearing on</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.2" style="font-size:70%;">path-tracing</span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.3" style="font-size:70%;">, whose F4 is the</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.4" style="font-size:70%;">rm -rf</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.5" style="font-size:70%;">of</span> <span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.6" style="font-size:70%;">/app/reconstructed.ppm</span><span id="A3.F8.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.7" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

Figure 8: The two change-manifest entries written together at the iteration-4 boundary and shipped as the iteration-5 harness. chg-7 names the publish-state rule in the system prompt and tool descriptor; chg-8 installs the execution-time interlock inside the shell tool. The pair flips path-tracing on the next round.

#### C.2.3 Iteration 6: protected entrypoints and execution-risk middleware

The Evolve Agent shipped two complementary changes for iteration 6. Change chg-1 at commit ff0cf3d extends the publish-state guard so that script entrypoints tied to the named evaluator become protected after a passing check, with an explicit ALLOW\_POST\_SUCCESS\_RESET token required to override; the token at every successful submit in the passing rollout is the externally visible evidence that the guard is engaged, not silently bypassed. Change chg-2 at commit 9651986 introduces the ExecutionRiskHintsMiddleware; the middleware watches the live sequence of shell commands and tool outputs and emits a targeted note when it detects any of seven cross-step risk patterns: shallow validation that relies on -h, py\_compile, or pure existence checks; localhost-only service validation when the contract names an external endpoint; inline or self-written proxy validators replacing a named evaluator; lower-level model or internal API access when the contract names a specific wrapper; benchmark checks with no explicit golden or threshold comparator; repeated long runs that have already exhausted budget for a known failure mode; and repeated retries against the same error. The two patterns relevant to trajectory 3 are inline-proxy validation and shallow validation, which together cover the F1 to F5 sequence: the grid-integration proxy and the kill of analysis.R are the proxy-validator pattern, and the file-existence sweep without a tolerance comparator is the shallow-validation pattern. The shell tool change covers F4 specifically: with analysis.R now protected, the kill becomes a guarded action that requires the override token and forces a revalidation pass before submit. Both manifest entries appear in Figure 9.

<svg id="A3.F9.pic1" height="970.87" overflow="visible" style="vertical-align:-485.44px" version="1.1" viewBox="0 0 297.23 970.87" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,970.87) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 966.11 C 0 968.74 2.13 970.87 4.77 970.87 L 292.47 970.87 C 295.1 970.87 297.23 968.74 297.23 966.11 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 348.21 L 296.4 348.21 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 349.04 L 0.83 966.11 C 0.83 968.28 2.59 970.04 4.77 970.04 L 292.47 970.04 C 294.64 970.04 296.4 968.28 296.4 966.11 L 296.4 349.04 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 961.8)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:55.19em;" width="276.63" height="613.13" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F9.pic1.3.3.3.1.1" style="width:22.06em;"><span id="A3.F9.pic1.3.3.3.1.1.1"><span id="A3.F9.pic1.3.3.3.1.1.1.1" style="font-size:70%;">chg-1, iteration 6, commit <span id="A3.F9.pic1.3.3.3.1.1.1.1.1">ff0cf3d</span>, level: tool implementation</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:29.83em;--ltx-fo-depth:0.2em;" width="276.63" height="331.2" transform="matrix(1 0 0 -1 0 329.05)" overflow="visible" color="#000000"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="width:25.08em;"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.3.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F9.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">tools/shell_tools/run_shell_command.py</span></span> <span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">tool_descriptions/run_shell_command.tool.yaml</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.4.1" style="font-size:70%;--ltx-fg-color:#1B262C;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.5.1" style="font-size:70%;">Extended publish-state target extraction to include script entrypoints and explicitly referenced final-check files, on top of the deliverable files and roots already covered by iteration 5. After a successful evaluator-style final check, the guard now blocks rewriting protected files and rerunning protected generator scripts, in addition to the deletion and root-reset cases.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.6.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7.1" style="font-size:70%;">Agent reached publish-state with a converged generator script, then re-ran or rewrote the script as a “tidy up” pass, invalidating the verified output; this is the F4 step of</span> <span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7.2" style="font-size:70%;">mcmc-sampling-stan</span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.7.3" style="font-size:70%;">.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.8.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9"><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.1" style="font-size:70%;">mcmc-sampling-stan</span> <span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.2" style="font-size:70%;">plus residual “validated then mutate” cases such as</span> <span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.3" style="font-size:70%;">configure-git-webserver</span><span id="A3.F9.pic1.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.9.4" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F9.pic2" height="948.73" overflow="visible" style="vertical-align:-474.37px" version="1.1" viewBox="0 0 297.23 948.73" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,948.73) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 943.97 C 0 946.6 2.13 948.73 4.77 948.73 L 292.47 948.73 C 295.1 948.73 297.23 946.6 297.23 943.97 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 425.69 L 296.4 425.69 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 426.52 L 0.83 943.97 C 0.83 946.14 2.59 947.9 4.77 947.9 L 292.47 947.9 C 294.64 947.9 296.4 946.14 296.4 943.97 L 296.4 426.52 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 939.66)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:46.16em;" width="276.63" height="513.51" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F9.pic2.4.4.4.1.1" style="width:22.06em;"><span id="A3.F9.pic2.4.4.4.1.1.1"><span id="A3.F9.pic2.4.4.4.1.1.1.1" style="font-size:70%;">chg-2, iteration 6, commit <span id="A3.F9.pic2.4.4.4.1.1.1.1.1">9651986</span>, level: middleware</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:36.85em;--ltx-fo-depth:0.2em;" width="276.63" height="408.69" transform="matrix(1 0 0 -1 0 406.53)" overflow="visible" color="#000000"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3" style="width:25.08em;"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.4.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F9.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">workspace/code_agent.yaml</span></span> <span id="A3.F9.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F9.pic2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2.2" style="font-size:70%;">workspace/middleware/__init__.py</span></span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.2" style="font-size:70%;">workspace/middleware/execution_risk_hints.py</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.5.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.1" style="font-size:70%;">Registered a new</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.2" style="font-size:70%;">ExecutionRiskHintsMiddleware</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.3" style="font-size:70%;">via an</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.4" style="font-size:70%;">AfterToolHook</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.5" style="font-size:70%;">that scans every shell command and result, accumulates lightweight state across steps, and queues a targeted reminder when the live history matches one of seven risk patterns: shallow validation via</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.6" style="font-size:70%;">--help</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.7" style="font-size:70%;">or</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.8" style="font-size:70%;">py_compile</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.6.9" style="font-size:70%;">or existence-only checks; localhost-only service check while the contract names an external interface; inline or self-written proxy validator instead of the named evaluator; low-level model API call bypassing the official wrapper; benchmark run with no explicit golden or threshold comparator; repeated long timeouts on the same command shape; repeated retries hitting the same error signature. Reminders are deduplicated and capped per rollout.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.7.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.8.1" style="font-size:70%;">Cross-step behaviors that only become obvious from the live command history, which prompt-only rules cannot react to in time.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.9.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10"><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.1" style="font-size:70%;">6 tasks. Examples:</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.2" style="font-size:70%;">caffe-cifar-10</span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.3" style="font-size:70%;">,</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.4" style="font-size:70%;">sam-cell-seg</span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.5" style="font-size:70%;">,</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.6" style="font-size:70%;">mteb-retrieve</span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.7" style="font-size:70%;">,</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.8" style="font-size:70%;">dna-assembly</span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.9" style="font-size:70%;">,</span> <span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.10" style="font-size:70%;">train-fasttext</span><span id="A3.F9.pic2.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.3.10.11" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

Figure 9: The two change-manifest entries shipped as the iteration-6 harness. chg-1 extends the iteration-5 publish-state guard from deliverable files to script entrypoints, the missing piece that protects analysis.R in mcmc-sampling-stan. chg-2 introduces the first cross-step component in this run, namely the ExecutionRiskHintsMiddleware watching the live command history for seven risk patterns.

#### C.2.4 Iteration 8: hard blocks and FRAMEWORK reminders

The Evolve Agent shipped two changes for iteration 8 that explicitly keep the prior architecture and patch its weak points. Change chg-1 at commit ca35f53 edits workspace/tools/shell\_tools/run\_shell\_command.py and upgrades two soft reasons to hard blocks: deletion of any non-/tmp protected output is now a hard block, and reset of any non-/tmp protected root is now a hard block. The ALLOW\_POST\_SUCCESS\_RESET token can still downgrade other classes of post-success interlocks but can no longer wipe verified live deliverables or empty live roots. Change chg-2 at commit a4a4a29 edits workspace/middleware/execution\_risk\_hints.py and adds three behaviors. First, a new before\_model hook promotes any execution-risk note emitted on the previous step into a FRAMEWORK reminder visible in the next model turn, so the warning becomes part of the reasoning context rather than text appended after the tool output. Second, the middleware infers two contract types once per task from the user request: clean-layout or single-file delivery contracts, and official-wrapper or named-revision contracts. Third, the middleware adds two contract-aware after-tool heuristics: a warning when the agent compiles or builds inside a clean-layout live tree, and a warning when the contract names an official wrapper or revision but the command uses a raw SentenceTransformer or AutoModel style API instead. Both changes are deliberately scoped: chg-1 prevents the destructive shell command itself, chg-2 makes the right warning impossible to overlook on the very next model turn. Both manifest entries appear in Figure 10.

<svg id="A3.F10.pic1" height="970.87" overflow="visible" style="vertical-align:-485.44px" version="1.1" viewBox="0 0 297.23 970.87" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,970.87) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 966.11 C 0 968.74 2.13 970.87 4.77 970.87 L 292.47 970.87 C 295.1 970.87 297.23 968.74 297.23 966.11 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#F3FAFE;" fill="#F3FAFE" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 348.21 L 296.4 348.21 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#1B262C;" fill="#1B262C" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 349.04 L 0.83 966.11 C 0.83 968.28 2.59 970.04 4.77 970.04 L 292.47 970.04 C 294.64 970.04 296.4 968.28 296.4 966.11 L 296.4 349.04 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 961.8)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:55.19em;" width="276.63" height="613.13" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F10.pic1.2.2.2.1.1" style="width:22.06em;"><span id="A3.F10.pic1.2.2.2.1.1.1"><span id="A3.F10.pic1.2.2.2.1.1.1.1" style="font-size:70%;">chg-1, iteration 8, commit <span id="A3.F10.pic1.2.2.2.1.1.1.1.1">ca35f53</span>, level: tool implementation</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:29.83em;--ltx-fo-depth:0.2em;" width="276.63" height="331.2" transform="matrix(1 0 0 -1 0 329.05)" overflow="visible" color="#000000"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:25.08em;"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">tools/shell_tools/run_shell_command.py</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" style="font-size:70%;--ltx-fg-color:#1B262C;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.1" style="font-size:70%;">Upgraded two soft reasons to hard blocks. Deletion of any non-</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.2" style="font-size:70%;">/tmp</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.3" style="font-size:70%;">protected output is now a hard block. Reset of any non-</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.4" style="font-size:70%;">/tmp</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.5" style="font-size:70%;">protected root to an empty state is also a hard block. The</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.6" style="font-size:70%;">ALLOW_POST_SUCCESS_RESET</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.7" style="font-size:70%;">token still exists for other classes of post-success interlock but can no longer wipe verified live deliverables or empty live roots.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.1" style="font-size:70%;">Agent attached the override token to delete</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.2" style="font-size:70%;">/git/www/hello.html</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.3" style="font-size:70%;">and reset</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.4" style="font-size:70%;">/git/server/refs/heads/master</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.5" style="font-size:70%;">after a successful deployment check, “returning to a clean repo”; verifier then 404s.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7.1" style="font-size:70%;--ltx-fg-color:#1B262C;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8"><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.1" style="font-size:70%;">2 tasks. Examples:</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.2" style="font-size:70%;">configure-git-webserver</span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.3" style="font-size:70%;">,</span> <span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.4" style="font-size:70%;">git-multibranch</span><span id="A3.F10.pic1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.5" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

<svg id="A3.F10.pic2" height="926.6" overflow="visible" style="vertical-align:-463.3px" version="1.1" viewBox="0 0 297.23 926.6" width="297.23"><g style="--ltx-stroke-color:#000000;--ltx-fill-color:#000000;" transform="translate(0,926.6) matrix(1 0 0 -1 0 0)" fill="#000000" stroke="#000000" stroke-width="0.4pt"><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0 4.77 L 0 921.83 C 0 924.46 2.13 926.6 4.77 926.6 L 292.47 926.6 C 295.1 926.6 297.23 924.46 297.23 921.83 L 297.23 4.77 C 297.23 2.13 295.1 0 292.47 0 L 4.77 0 C 2.13 0 0 2.13 0 4.77 Z"></path></g><g style="--ltx-fill-color:#E6F0F7;" fill="#E6F0F7" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 4.77 L 0.83 403.55 L 296.4 403.55 L 296.4 4.77 C 296.4 2.59 294.64 0.83 292.47 0.83 L 4.77 0.83 C 2.59 0.83 0.83 2.59 0.83 4.77 Z"></path></g><g style="--ltx-fill-color:#0F4C75;" fill="#0F4C75" fill-opacity="1.0"><path style="stroke:none" d="M 0.83 404.38 L 0.83 921.83 C 0.83 924 2.59 925.77 4.77 925.77 L 292.47 925.77 C 294.64 925.77 296.4 924 296.4 921.83 L 296.4 404.38 Z"></path></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 917.52)"><foreignObject style="--ltx-fg-color:#FFFFFF;--ltx-fo-width:25.08em;--ltx-fo-height:0.39em;--ltx-fo-depth:46.16em;" width="276.63" height="513.51" transform="matrix(1 0 0 -1 0 4.3)" overflow="visible" color="#FFFFFF"><span id="A3.F10.pic2.2.2.2.1.1" style="width:22.06em;"><span id="A3.F10.pic2.2.2.2.1.1.1"><span id="A3.F10.pic2.2.2.2.1.1.1.1" style="font-size:70%;">chg-2, iteration 8, commit <span id="A3.F10.pic2.2.2.2.1.1.1.1.1">a4a4a29</span>, level: middleware</span></span> </span></foreignObject></g><g fill-opacity="1.0" transform="matrix(1.0 0.0 0.0 1.0 10.3 11.07)"><foreignObject style="--ltx-fg-color:#000000;--ltx-fo-width:25.08em;--ltx-fo-height:34.85em;--ltx-fo-depth:0.2em;" width="276.63" height="386.55" transform="matrix(1 0 0 -1 0 384.4)" overflow="visible" color="#000000"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1" style="width:25.08em;"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Files</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1"><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline" data-latex="\bullet"><semantics><mo mathsize="0.700em">∙</mo> <annotation encoding="application/x-tex">\bullet</annotation></semantics></math> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.2" style="font-size:70%;">workspace/middleware/execution_risk_hints.py</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.3.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">What changed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.1" style="font-size:70%;">Added a</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.2" style="font-size:70%;">BeforeModelHook</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.3" style="font-size:70%;">that promotes any execution-risk note emitted on the previous step into a FRAMEWORK reminder visible at the top of the next model turn, so warnings enter the reasoning context rather than trail after the tool output. Added one-time per-task contract inference for clean-layout or single-file delivery contracts and official-wrapper or named-revision contracts. Added two new after-tool heuristics: a warning when the agent compiles or builds inside a clean-layout live tree, and a warning when the contract names an official wrapper but the command uses a raw</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.4" style="font-size:70%;">SentenceTransformer</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.5" style="font-size:70%;">or</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.6" style="font-size:70%;">AutoModel</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.4.7" style="font-size:70%;">style API instead.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.5.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Failure pattern fixed</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.6.1" style="font-size:70%;">Iteration-6 middleware emitted the right warnings but only into tool output; the agent often made the publish/stop decision on the next model turn and ignored them. Salience promotion plus contract-aware heuristics close this gap.</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.7.1" style="font-size:70%;--ltx-fg-color:#0F4C75;">Predicted fixes</span></span> <span style="width:433.6pt;height:0.3pt;--ltx-bg-color:black;display:inline-block;"></span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8"><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.1" style="font-size:70%;">4 tasks. Examples:</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.2" style="font-size:70%;">polyglot-c-py</span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.3" style="font-size:70%;">,</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.4" style="font-size:70%;">polyglot-rust-c</span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.5" style="font-size:70%;">,</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.6" style="font-size:70%;">mteb-retrieve</span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.7" style="font-size:70%;">,</span> <span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.8" style="font-size:70%;">pytorch-model-recovery</span><span id="A3.F10.pic2.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.8.9" style="font-size:70%;">.</span></span></span></foreignObject></g></g></svg>

Figure 10: Two change-manifest entries written together at the iteration-7 boundary and shipped as the iteration-8 harness. chg-1 hardens the existing publish-state shell guard so that the override token can no longer wipe verified live deliverables. chg-2 makes execution-risk warnings impossible to overlook at the next model turn and adds two contract-aware heuristics. Both are deliberately scoped: chg-1 prevents the destructive command itself, chg-2 fixes the salience gap of the iteration-6 middleware.

### C.3 Reading the change-manifest figures

The trajectories above track individual edits through individual tasks. The change-manifest carries each edit along with its predicted fixes, predicted regressions, and constraint level into Phase 3 of the next iteration, where the attribution check decides whether to keep or roll it back. One manifest figure is attached to each of the four winning rounds, all in the same Files / What changed / Failure pattern fixed / Predicted fixes layout. Figure 7 shows iteration 2’s prompt edit and shell-tool edit written together in the seed round. Figure 8 shows iteration 5’s prompt-and-descriptor rule and shell-guard installation that introduce the publish-state mechanism. Figure 9 shows iteration 6’s extension of the publish-state guard to script entrypoints and the introduction of the cross-step ExecutionRiskHintsMiddleware. Figure 10 shows iteration 8’s keep-and-improve patches that close the override-token loophole on the guard and promote middleware reminders into a FRAMEWORK note visible at the next model turn. Together the four figures cover three of the four constraint levels the evolve agent uses, namely prompt, tool implementation, and middleware, all written in the same JSON shape and all subject to the same automatic rollback if their predicted fixes do not appear.

## Appendix D Per-round Self-attribution Breakdown

This appendix expands the aggregate self-attribution result of §4.4.2 with a per-round breakdown across the four fix/regression by precision/recall panels.

Figures˜11 and 12 show the per-round breakdown across the four fix/regression by precision/recall panels. Bars decompose each denominator, predicted for precision and actual for recall, into deep-blue TP versus pale FP or FN; the dashed line traces the metric on the right-hand $0$ to $100\%$ axis, and the solid line shows contemporaneous pass@1. Fix-precision and fix-recall both swing from near-zero to near-saturation across rounds, so the evolve model’s causal attribution for its own improvements is informative if noisy. Regression predictions instead stay near the floor, below $25\%$ on most rounds: across the 9 rounds the agent issued 43 unique regression predictions and only 5 landed, giving cumulative $P=11.6\%$, while 40 regressions the agent did not foresee actually occurred, giving cumulative $R=11.1\%$.

![Refer to caption](https://arxiv.org/html/2604.25850v3/x6.png)

Figure 11: Per-round fix predictions. Left: precision. Right: recall. Bars decompose each denominator into TP versus FP or FN; lines overlay the metric and contemporaneous pass@1.

![Refer to caption](https://arxiv.org/html/2604.25850v3/x8.png)

Figure 12: Per-round regression predictions. Left: precision. Right: recall. Same encoding as Fig. 11.

[^1]: L. A. Agrawal, S. Tan, D. Soylu, N. Ziems, R. Khare, K. Opsahl-Ong, A. Singhvi, H. Shandilya, M. J. Ryan, M. Jiang, C. Potts, K. Sen, A. Dimakis, I. Stoica, D. Klein, M. Zaharia, and O. Khattab (2025-10) GEPA: reflective prompt evolution can outperform reinforcement learning. In The Fourteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=RQm2KQTM5r) Cited by: §1, §2.2.

[^2]: Anomaly (2025) Opencode: the open source coding agent.. External Links: [Link](https://github.com/anomalyco/opencode) Cited by: §4.2.

[^3]: Anthropic (2025) Claude-code. External Links: [Link](https://github.com/anthropics/claude-code) Cited by: §2.1.

[^4]: Y. Cai, S. Cai, Y. Shi, Z. Xu, L. Chen, Y. Qin, X. Tan, G. Li, Z. Li, H. Lin, Y. Mao, K. Li, and X. Sun (2025-10) Training-free group relative policy optimization. arXiv. External Links: 2510.08191, [Document](https://dx.doi.org/10.48550/arXiv.2510.08191), [Link](http://arxiv.org/abs/2510.08191) Cited by: §1, §1, §2.2, §4.2.

[^5]: J. S. Chan, N. Chowdhury, O. Jaffe, J. Aung, D. Sherburn, E. Mays, G. Starace, K. Liu, L. Maksin, T. Patwardhan, A. Madry, and L. Weng (2024-10) MLE-bench: evaluating machine learning agents on machine learning engineering. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=6s5uXNWGIh) Cited by: §2.1.

[^6]: DeepSeek-AI (2026-04) DeepSeek-v4: towards highly efficient million-token context intelligence. External Links: [Link](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf) Cited by: §1, §4.1.

[^7]: X. Deng, J. Da, E. Pan, Y. Y. He, C. Ide, K. Garg, N. Lauffer, A. Park, C. Rane, K. Sampath, M. Krishnan, S. R. Kundurthy, S. M. Hendryx, Z. Wang, C. B. C. Zhang, N. Jacobson, B. Liu, and B. Kenstler (2025-10) SWE-bench pro: can ai agents solve long-horizon software engineering tasks?. External Links: [Link](https://openreview.net/forum?id=9R2iUHhVfr) Cited by: §1, §2.1.

[^8]: Google (2026-03) Gemini-3-1-flash-lite-model-card. External Links: [Link](https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Flash-Lite-Model-Card.pdf) Cited by: §4.1.

[^9]: H. Guo, K. Lv, Q. Guo, T. Liang, Z. Xi, D. Song, Q. Zhang, Y. Sun, K. Chen, X. Qiu, and T. Gui (2025-07) CritiQ: mining data quality criteria from human preferences. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), W. Che, J. Nabende, E. Shutova, and M. T. Pilehvar (Eds.), Vienna, Austria, pp. 16240–16261. External Links: [Document](https://dx.doi.org/10.18653/v1/2025.acl-long.792), [Link](https://aclanthology.org/2025.acl-long.792/), ISBN 979-8-89176-251-0 Cited by: §2.2.

[^10]: Harbor (2026) Terminus-2. External Links: [Link](https://www.harborframework.com/docs/agents/terminus-2) Cited by: §4.2.

[^11]: S. Hu, C. Lu, and J. Clune (2024-10) Automated design of agentic systems. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=t9U3LW7JVX) Cited by: §2.2.

[^12]: N. Jain, K. Han, A. Gu, W. Li, F. Yan, T. Zhang, S. Wang, A. Solar-Lezama, K. Sen, and I. Stoica (2024-10) LiveCodeBench: holistic and contamination free evaluation of large language models for code. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=chfJJYC3iL) Cited by: §2.1.

[^13]: N. Jain, J. Singh, M. Shetty, T. Zhang, L. Zheng, K. Sen, and I. Stoica (2025-08) R2E-gym: procedural environment generation and hybrid verifiers for scaling open-weights swe agents. In Second Conference on Language Modeling, External Links: [Link](https://openreview.net/forum?id=7evvwwdo3z#discussion) Cited by: §2.1.

[^14]: C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan (2023-10) SWE-bench: can language models resolve real-world github issues?. In The Twelfth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=VTF8yNQM66) Cited by: §1, §1, §2.1, §4.1.

[^15]: O. Khattab, A. Singhvi, P. Maheshwari, Z. Zhang, K. Santhanam, S. Vardhamanan, S. Haq, A. Sharma, T. T. Joshi, H. Moazam, H. Miller, M. Zaharia, and C. Potts (2023-10) DSPy: compiling declarative language model calls into self-improving pipelines. arXiv. External Links: 2310.03714, [Document](https://dx.doi.org/10.48550/arXiv.2310.03714), [Link](http://arxiv.org/abs/2310.03714) Cited by: §2.2.

[^16]: Y. Lee, R. Nair, Q. Zhang, K. Lee, O. Khattab, and C. Finn (2026-03) Meta-harness: end-to-end optimization of model harnesses. arXiv. External Links: 2603.28052, [Document](https://dx.doi.org/10.48550/arXiv.2603.28052), [Link](http://arxiv.org/abs/2603.28052) Cited by: §1.

[^17]: L. Lin (2026-02) Agent debugger: understanding agent trajectory with agentic workflows - dawning road. External Links: [Link](https://dawning-road.github.io/blog/agent-debugger) Cited by: §3.2.

[^18]: R. Lopopolo (2026-02) Harness engineering: leveraging codex in an agent-first world. External Links: [Link](https://openai.com/zh-Hans-CN/index/harness-engineering/) Cited by: §1, §2.1.

[^19]: Z. Ma, S. Yang, Y. Ji, X. Wang, Y. Wang, Y. Hu, T. Huang, and X. Chu (2026-04) SkillClaw: let skills evolve collectively with agentic evolver. arXiv. External Links: 2604.08377, [Document](https://dx.doi.org/10.48550/arXiv.2604.08377), [Link](http://arxiv.org/abs/2604.08377) Cited by: §1.

[^20]: A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Gupta, B. P. Majumder, K. Hermann, S. Welleck, A. Yazdanbakhsh, and P. Clark (2023-11) Self-refine: iterative refinement with self-feedback. In Thirty-Seventh Conference on Neural Information Processing Systems, External Links: [Link](https://openreview.net/forum?id=S37hOerQLB) Cited by: §1, §2.2.

[^21]: M. A. Merrill, A. G. Shaw, N. Carlini, B. Li, H. Raj, I. Bercovich, L. Shi, J. Y. Shin, T. Walshe, E. K. Buchanan, J. Shen, G. Ye, H. Lin, J. Poulos, M. Wang, M. Nezhurina, J. Jitsev, D. Lu, O. M. Mastromichalakis, Z. Xu, Z. Chen, Y. Liu, R. Zhang, L. L. Chen, A. Kashyap, J. Uslu, J. Li, J. Wu, M. Yan, S. Bian, V. Sharma, K. Sun, S. Dillmann, A. Anand, A. Lanpouthakoun, B. Koopah, C. Hu, E. Guha, G. H. S. Dreiman, J. Zhu, K. Krauth, L. Zhong, N. Muennighoff, R. Amanfu, S. Tan, S. Pimpalgaonkar, T. Aggarwal, X. Lin, X. Lan, X. Zhao, Y. Liang, Y. Wang, Z. Wang, C. Zhou, D. Heineman, H. Liu, H. Trivedi, J. Yang, J. Lin, M. Shetty, M. Yang, N. Omi, N. Raoof, S. Li, T. Y. Zhuo, W. Lin, Y. Dai, Y. Wang, W. Chai, S. Zhou, D. Wahdany, Z. She, J. Hu, Z. Dong, Y. Zhu, S. Cui, A. Saiyed, A. Kolbeinsson, J. Hu, C. M. Rytting, R. Marten, Y. Wang, A. Dimakis, A. Konwinski, and L. Schmidt (2026-01) Terminal-bench: benchmarking agents on hard, realistic tasks in command line interfaces. arXiv. External Links: 2601.11868, [Document](https://dx.doi.org/10.48550/arXiv.2601.11868), [Link](http://arxiv.org/abs/2601.11868) Cited by: §1, §1, §2.1, §4.1.

[^22]: S. Miserendino, M. Wang, T. Patwardhan, and J. Heidecke (2025-06) SWE-lancer: can frontier llms earn $1 million from real-world freelance software engineering?. In Forty-Second International Conference on Machine Learning, External Links: [Link](https://openreview.net/forum?id=xZXhFg43EI) Cited by: §2.1.

[^23]: Nex-AGI (2025) NexAU (au for agent universe), a general-purpose agent framework for building intelligent agents with tool capabilities.. External Links: [Link](https://github.com/nex-agi/NexAU) Cited by: §3.1.

[^24]: A. Novikov, N. Vũ, M. Eisenberger, E. Dupont, P. Huang, A. Z. Wagner, S. Shirobokov, B. Kozlovskii, F. J. R. Ruiz, A. Mehrabian, M. P. Kumar, A. See, S. Chaudhuri, G. Holland, A. Davies, S. Nowozin, P. Kohli, and M. Balog (2025-06) AlphaEvolve: a coding agent for scientific and algorithmic discovery. arXiv. External Links: 2506.13131, [Document](https://dx.doi.org/10.48550/arXiv.2506.13131), [Link](http://arxiv.org/abs/2506.13131) Cited by: §2.2.

[^25]: OpenAI (2025) Codex cli. External Links: [Link](https://developers.openai.com/codex/cli) Cited by: §1, §4.2.

[^26]: OpenAI (2026-03) Introducing gpt-5.4. External Links: [Link](https://openai.com/index/introducing-gpt-5-4/) Cited by: §4.1.

[^27]: K. Opsahl-Ong, M. J. Ryan, J. Purtell, D. Broman, C. Potts, M. Zaharia, and O. Khattab (2024-11) Optimizing instructions and demonstrations for multi-stage language model programs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Y. Al-Onaizan, M. Bansal, and Y. Chen (Eds.), Miami, Florida, USA, pp. 9340–9366. External Links: [Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.525), [Link](https://aclanthology.org/2024.emnlp-main.525/) Cited by: §2.2.

[^28]: J. Pan, X. Wang, G. Neubig, N. Jaitly, H. Ji, A. Suhr, and Y. Zhang (2025-06) Training software engineering agents and verifiers with swe-gym. In Forty-Second International Conference on Machine Learning, External Links: [Link](https://openreview.net/forum?id=Cq1BNvHx74) Cited by: §2.1.

[^29]: P. Rajasekaran, E. Dixon, C. Ryan, J. Hadfield, R. Ayub, H. Moran, C. Rueb, C. Jennings, M. Vorwerck, S. Ritchie, and M. Vo (2025-09) Effective context engineering for ai agents. External Links: [Link](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) Cited by: §3.2.

[^30]: P. Rajasekaran (2026-03) Harness design for long-running application development. External Links: [Link](https://www.anthropic.com/engineering/harness-design-long-running-apps) Cited by: §1, §2.1.

[^31]: N. Research (2026) Hermes agent — the agent that grows with you. External Links: [Link](https://hermes-agent.nousresearch.com/) Cited by: §1, §2.1.

[^32]: N. Shinn, F. Cassano, A. Gopinath, K. R. Narasimhan, and S. Yao (2023-11) Reflexion: language agents with verbal reinforcement learning. In Thirty-Seventh Conference on Neural Information Processing Systems, External Links: [Link](https://openreview.net/forum?id=vAElhFcKW6) Cited by: §1, §2.2.

[^33]: P. Steinberger (2026-02) OpenClaw — personal ai assistant. External Links: [Link](https://openclaw.ai/) Cited by: §1, §1, §2.1.

[^34]: R. Sutton (2019-03) The bitter lesson. External Links: [Link](https://www.cs.utexas.edu/%CB%9Ceunsol/courses/data/bitter_lesson.pdf) Cited by: §1.

[^35]: K. Team, T. Bai, Y. Bai, Y. Bao, S. H. Cai, Y. Cao, Y. Charles, H. S. Che, C. Chen, G. Chen, H. Chen, J. Chen, J. Chen, J. Chen, J. Chen, K. Chen, L. Chen, R. Chen, X. Chen, Y. Chen, Y. Chen, Y. Chen, Y. Chen, Y. Chen, Y. Chen, Y. Chen, Y. Chen, Z. Chen, Z. Chen, D. Cheng, M. Chu, J. Cui, J. Deng, M. Diao, H. Ding, M. Dong, M. Dong, Y. Dong, Y. Dong, A. Du, C. Du, D. Du, L. Du, Y. Du, Y. Fan, S. Fang, Q. Feng, Y. Feng, G. Fu, K. Fu, H. Gao, T. Gao, Y. Ge, S. Geng, C. Gong, X. Gong, Z. Gongque, Q. Gu, X. Gu, Y. Gu, L. Guan, Y. Guo, X. Hao, W. He, W. He, Y. He, C. Hong, H. Hu, J. Hu, Y. Hu, Z. Hu, K. Huang, R. Huang, W. Huang, Z. Huang, T. Jiang, Z. Jiang, X. Jin, Y. Jing, G. Lai, A. Li, C. Li, C. Li, F. Li, G. Li, G. Li, H. Li, H. Li, J. Li, J. Li, J. Li, L. Li, M. Li, W. Li, W. Li, X. Li, X. Li, Y. Li, Y. Li, Y. Li, Y. Li, Z. Li, Z. Li, W. Liao, J. Lin, X. Lin, Z. Lin, Z. Lin, C. Liu, C. Liu, H. Liu, L. Liu, S. Liu, S. Liu, S. Liu, T. Liu, T. Liu, W. Liu, X. Liu, Y. Liu, Y. Liu, Y. Liu, Y. Liu, Y. Liu, Z. Liu, Z. Liu, E. Lu, H. Lu, Z. Lu, J. Luo, T. Luo, Y. Luo, L. Ma, Y. Ma, S. Mao, Y. Mei, X. Men, F. Meng, Z. Meng, Y. Miao, M. Ni, K. Ouyang, S. Pan, B. Pang, Y. Qian, R. Qin, Z. Qin, J. Qiu, B. Qu, Z. Shang, Y. Shao, T. Shen, Z. Shen, J. Shi, L. Shi, S. Shi, F. Song, P. Song, T. Song, X. Song, H. Su, J. Su, Z. Su, L. Sui, J. Sun, J. Sun, T. Sun, F. Sung, Y. Tai, C. Tang, H. Tang, X. Tang, Z. Tang, J. Tao, S. Teng, C. Tian, P. Tian, A. Wang, B. Wang, C. Wang, C. Wang, C. Wang, D. Wang, D. Wang, D. Wang, F. Wang, H. Wang, H. Wang, H. Wang, H. Wang, H. Wang, J. Wang, J. Wang, J. Wang, K. Wang, L. Wang, Q. Wang, S. Wang, S. Wang, S. Wang, W. Wang, X. Wang, X. Wang, Y. Wang, Y. Wang, Y. Wang, Y. Wang, Y. Wang, Y. Wang, Z. Wang, Z. Wang, Z. Wang, Z. Wang, Z. Wang, Z. Wang, C. Wei, M. Wei, C. Wen, Z. Wen, C. Wu, H. Wu, J. Wu, R. Wu, W. Wu, Y. Wu, Y. Wu, Y. Wu, Z. Wu, C. Xiao, J. Xie, X. Xie, Y. Xie, Y. Xin, B. Xing, B. Xu, J. Xu, J. Xu, J. Xu, L. H. Xu, L. Xu, S. Xu, W. Xu, X. Xu, X. Xu, Y. Xu, Y. Xu, Y. Xu, Z. Xu, Z. Xu, J. Yan, Y. Yan, G. Yang, H. Yang, J. Yang, K. Yang, N. Yang, R. Yang, X. Yang, X. Yang, Y. Yang, Y. Yang, Y. Yang, Z. Yang, Z. Yang, Z. Yang, H. Yao, D. Ye, W. Ye, Z. Ye, B. Yin, C. Yu, L. Yu, T. Yu, T. Yu, E. Yuan, M. Yuan, X. Yuan, Y. Yue, W. Zeng, D. Zha, H. Zhan, D. Zhang, H. Zhang, J. Zhang, P. Zhang, Q. Zhang, R. Zhang, X. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Y. Zhang, Z. Zhang, C. Zhao, F. Zhao, J. Zhao, S. Zhao, X. Zhao, Y. Zhao, Z. Zhao, H. Zheng, R. Zheng, S. Zheng, T. Zheng, J. Zhong, L. Zhong, W. Zhong, M. Zhou, R. Zhou, X. Zhou, Z. Zhou, J. Zhu, L. Zhu, X. Zhu, Y. Zhu, Z. Zhu, J. Zhuang, W. Zhuang, Y. Zou, and X. Zu (2026-02) Kimi k2.5: visual agentic intelligence. arXiv. External Links: 2602.02276, [Document](https://dx.doi.org/10.48550/arXiv.2602.02276), [Link](http://arxiv.org/abs/2602.02276) Cited by: §1.

[^36]: K. Team (2026-04) Kimi k2.6 tech blog: advancing open-source coding. External Links: [Link](https://www.kimi.com/blog/kimi-k2-6) Cited by: §1.

[^37]: N. Team, Y. Cai, L. Chen, Q. Chen, Y. Ding, L. Fan, W. Fu, Y. Gao, H. Guo, P. Guo, Z. Han, Z. He, H. Hu, K. Hu, S. Hua, T. Huai, B. Huang, L. Ji, Z. Jiang, Z. Lei, B. Li, J. Lin, L. Lin, J. Liu, S. Liu, Z. Liu, Y. Ni, P. Qian, Y. Shen, Q. Shi, W. Shu, P. Sun, Y. Suo, T. Tang, B. Tian, G. Wang, J. Wang, P. Wang, Z. Xi, H. Yan, J. Yang, Z. Yang, T. Yao, G. Ye, Q. Yu, S. Zhang, X. Zhang, Y. Zhang, J. Zhao, M. Zheng, R. Zheng, E. Zhou, J. Zhou, M. Zhou, Y. Zhou, T. Gui, Y. Zheng, X. Chen, J. Zhou, S. Feng, Q. Chen, L. He, Q. Zhang, X. Huang, and X. Qiu (2025-12) Nex-n1: agentic models trained via a unified ecosystem for large-scale environment construction. arXiv. External Links: 2512.04987, [Document](https://dx.doi.org/10.48550/arXiv.2512.04987), [Link](http://arxiv.org/abs/2512.04987) Cited by: §3.1.

[^38]: Q. Team (2026-04) Qwen3.6-plus: towards real world agents. External Links: [Link](https://qwenlm.github.io/blog/qwen3.6/) Cited by: §1, §4.1.

[^39]: X. M. Team (2026-04) MiMo-v2.5-pro. External Links: [Link](https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro) Cited by: §1.

[^40]: V. Trivedy (2026-02) Improving deep agents with harness engineering. External Links: [Link](https://www.langchain.com/blog/improving-deep-agents-with-harness-engineering) Cited by: §1, §2.1.

[^41]: G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar (2023-10) Voyager: an open-ended embodied agent with large language models. arXiv. External Links: 2305.16291, [Document](https://dx.doi.org/10.48550/arXiv.2305.16291), [Link](http://arxiv.org/abs/2305.16291) Cited by: §2.2.

[^42]: X. Wang, B. Li, Y. Song, F. F. Xu, X. Tang, M. Zhuge, J. Pan, Y. Song, B. Li, J. Singh, H. H. Tran, F. Li, R. Ma, M. Zheng, B. Qian, Y. Shao, N. Muennighoff, Y. Zhang, B. Hui, J. Lin, R. Brennan, H. Peng, H. Ji, and G. Neubig (2025-04) OpenHands: an open platform for ai software developers as generalist agents. arXiv. External Links: 2407.16741, [Document](https://dx.doi.org/10.48550/arXiv.2407.16741), [Link](http://arxiv.org/abs/2407.16741) Cited by: §1, §1, §2.1.

[^43]: P. Xia, J. Chen, H. Wang, J. Liu, K. Zeng, Y. Wang, S. Han, Y. Zhou, X. Zhao, H. Chen, Z. Zheng, C. Xie, and H. Yao (2026-02) SkillRL: evolving agents via recursive skill-augmented reinforcement learning. arXiv. External Links: 2602.08234, [Document](https://dx.doi.org/10.48550/arXiv.2602.08234), [Link](http://arxiv.org/abs/2602.08234) Cited by: §1.

[^44]: A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lv, C. Zheng, D. Liu, F. Zhou, F. Huang, F. Hu, H. Ge, H. Wei, H. Lin, J. Tang, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Zhou, J. Lin, K. Dang, K. Bao, K. Yang, L. Yu, L. Deng, M. Li, M. Xue, M. Li, P. Zhang, P. Wang, Q. Zhu, R. Men, R. Gao, S. Liu, S. Luo, T. Li, T. Tang, W. Yin, X. Ren, X. Wang, X. Zhang, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Zhang, Y. Wan, Y. Liu, Z. Wang, Z. Cui, Z. Zhang, Z. Zhou, and Z. Qiu (2025-05) Qwen3 technical report. arXiv. External Links: 2505.09388, [Document](https://dx.doi.org/10.48550/arXiv.2505.09388), [Link](http://arxiv.org/abs/2505.09388) Cited by: §1, §4.1.

[^45]: J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press (2024-11) SWE-agent: agent-computer interfaces enable automated software engineering. In The Thirty-Eighth Annual Conference on Neural Information Processing Systems, External Links: [Link](https://openreview.net/forum?id=mXpq6ut8J3&referrer=%5Bthe%20profile%20of%20Shunyu%20Yao%5D\(%2Fprofile%3Fid%3D%CB%9CShunyu_Yao1\)) Cited by: §1, §2.1.

[^46]: J. Yang, C. E. Jimenez, A. L. Zhang, K. Lieret, J. Yang, X. Wu, O. Press, N. Muennighoff, G. Synnaeve, K. R. Narasimhan, D. Yang, S. Wang, and O. Press (2024-10) SWE-bench multimodal: do ai systems generalize to visual software domains?. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=riTiq3i21b) Cited by: §1, §2.1.

[^47]: Y. Zeng, S. Li, D. Dong, R. Xu, Z. Chen, L. Zheng, Y. Li, Z. Zhou, H. Zhao, L. Tian, H. Xiao, T. Zhu, L. Hao, and J. Wu (2026-02) SWE-hub: a unified production system for scalable, executable software engineering tasks. arXiv. External Links: 2603.00575, [Document](https://dx.doi.org/10.48550/arXiv.2603.00575), [Link](http://arxiv.org/abs/2603.00575) Cited by: §2.1.

[^48]: J. Zhang, J. Xiang, Z. Yu, F. Teng, X. Chen, J. Chen, M. Zhuge, X. Cheng, S. Hong, J. Wang, B. Zheng, B. Liu, Y. Luo, and C. Wu (2024-10) AFlow: automating agentic workflow generation. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=z5uVAKwmjf) Cited by: §2.2.

[^49]: Q. Zhang, C. Hu, S. Upasani, B. Ma, F. Hong, V. Kamanuru, J. Rainton, C. Wu, M. Ji, H. Li, U. Thakker, J. Zou, and K. Olukotun (2025-10) Agentic context engineering: evolving contexts for self-improving language models. In The Fourteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=eC4ygDs02R) Cited by: §1, §1, §2.2, §4.2.

[^50]: A. Zhao, D. Huang, Q. Xu, M. Lin, Y. Liu, and G. Huang (2024-12) ExpeL: llm agents are experiential learners. arXiv. External Links: 2308.10144, [Document](https://dx.doi.org/10.48550/arXiv.2308.10144), [Link](http://arxiv.org/abs/2308.10144) Cited by: §1.

[^51]: W. Zhou, Y. Ou, S. Ding, L. Li, J. Wu, T. Wang, J. Chen, S. Wang, X. Xu, N. Zhang, H. Chen, and Y. E. Jiang (2024-06) Symbolic learning enables self-evolving agents. arXiv. External Links: 2406.18532, [Document](https://dx.doi.org/10.48550/arXiv.2406.18532), [Link](http://arxiv.org/abs/2406.18532) Cited by: §2.2.

[^52]: T. Y. Zhuo, V. M. Chien, J. Chim, H. Hu, W. Yu, R. Widyasari, I. N. B. Yusuf, H. Zhan, J. He, I. Paul, S. Brunner, C. Gong, J. Hoang, A. R. Zebaze, X. Hong, W. Li, J. Kaddour, M. Xu, Z. Zhang, P. Yadav, N. Jain, A. Gu, Z. Cheng, J. Liu, Q. Liu, Z. Wang, B. Hui, N. Muennighoff, D. Lo, D. Fried, X. Du, H. de Vries, and L. V. Werra (2024-10) BigCodeBench: benchmarking code generation with diverse function calls and complex instructions. In The Thirteenth International Conference on Learning Representations, External Links: [Link](https://openreview.net/forum?id=YrycTjllL0) Cited by: §2.1.

[^53]: G. Zunic (2026-04) The bitter lesson of agent harnesses. External Links: [Link](https://browser-use.com/posts/bitter-lesson-agent-harnesses) Cited by: §1.
