6 lines
558 B
Markdown
6 lines
558 B
Markdown
### Submitted Proposals
|
|
|
|
### Crimson Leaf -- Task ee0c11c4-33d0-49ae-a8e1-f9ab2c34e35b
|
|
Date: 2026-04-29
|
|
Status: AWAITING DAVID'S APPROVAL
|
|
Summary: This proposal outlines the development of the Foreman Probe, a standardized suite of model probe tasks designed to benchmark LLM reasoning and instruction-following. It fills the gap in internal evaluation by providing a controlled environment for performance stress-testing. Unlike previous iterations, this approach focuses on the Foreman's specific task-creation logic to ensure higher difficulty scaling. |