Data Analyst Harness
The Data Analyst harness extends the Generic harness with capabilities for data analysis: SQL databases, persistent cross-session memory, rich visualization via OpenUI, and a curated knowledge scaffold. Its system prompt implements a structured 6-step analysis pipeline inspired by OpenAI’s Kepler data agent and the open-source Dash project.
When to Use
Section titled “When to Use”- Natural-language data analysis (ask questions, get SQL + charts)
- Interactive data exploration with visualization
- Agents that learn from corrections and remember them across sessions
- Analytics workflows grounded in curated knowledge bases (table docs, business rules, validated SQL)
Configuration
Section titled “Configuration”| Property | Value |
|---|---|
| Type | data-analyst |
| System Prompt | Structured 6-step analysis pipeline |
| Default Model | None (inherits from agent or organization) |
Analysis Pipeline
Section titled “Analysis Pipeline”The system prompt guides the agent through six steps on every data question:
- Recall — Search persistent memory for corrections, column mappings, and business definitions from earlier sessions
- Inspect — Use
sql_schemato verify table structure before writing SQL - Plan — State the query plan: tables, joins, filters, expected grain, and potential pitfalls
- Execute & Validate — Run the query, then validate (zero rows? duplicates? NULL aggregations?). Self-correct if results look wrong
- Visualize — Summarize findings in plain language, then render charts and tables via OpenUI
- Learn — Use
rememberto save corrections and patterns for future sessions
This mirrors the six-layer context pattern described in OpenAI’s data agent blog post and implemented by Dash.
Bundled Capabilities
Section titled “Bundled Capabilities”All Generic harness capabilities plus:
| Capability | What it provides |
|---|---|
| Session SQL Database | sql_execute, sql_query, sql_schema — session-scoped SQLite databases that auto-create on first write |
| Persistent Memory | remember, recall, forget — cross-session memory with passive recall (8 memories auto-injected per turn) |
| OpenUI | Rich interactive charts, tables, dashboards, and KPI cards rendered inline in chat |
| Todo List | write_todos — track multi-step analysis tasks |
| Data Knowledge | Mounts /knowledge/ scaffold with directories for table docs, business rules, and validated SQL patterns |
Knowledge Files
Section titled “Knowledge Files”The harness mounts a /knowledge/ directory scaffold in every session:
/knowledge/ tables/README.md # Add one .md per table: columns, types, gotchas business/README.md # Add metric definitions, business rules, domain terms queries/README.md # Add validated .sql files as reusable templatesThese files are read-only scaffolds. Populate them with your organization’s curated knowledge to ground the agent’s SQL generation in reality. The agent reads these files before writing any SQL query.
Combined with persistent memory (which accumulates corrections automatically), this implements the layered context pattern:
| Layer | Source |
|---|---|
| Table usage & schema | sql_schema tool + /knowledge/tables/ |
| Business annotations | /knowledge/business/ + AGENTS.md |
| Validated queries | /knowledge/queries/ |
| Institutional knowledge | MCP servers (Slack, Notion, Confluence) |
| Learning memory | remember / recall tools |
| Runtime context | sql_query / sql_execute |
Example Session
Section titled “Example Session”User: Load this CSV and tell me which product category has the highest revenue
Agent: [recalls relevant memories] [inspects any existing schema] [creates table, imports data] [runs SELECT category, SUM(revenue) ... GROUP BY category] [validates: 5 categories, no NULLs, totals match] [renders bar chart via OpenUI] [remembers: "revenue column is net of refunds"]See Also
Section titled “See Also”- Generic Harness — the parent harness this extends
- Capabilities overview — full capability catalog including memory and OpenUI
- Harnesses feature guide — harness selection and API management