Skip to content

Data Analyst Harness

The Data Analyst harness extends the Generic harness with capabilities for data analysis: SQL databases, persistent cross-session memory, rich visualization via OpenUI, and a curated knowledge scaffold. Its system prompt implements a structured 6-step analysis pipeline inspired by OpenAI’s Kepler data agent and the open-source Dash project.

  • Natural-language data analysis (ask questions, get SQL + charts)
  • Interactive data exploration with visualization
  • Agents that learn from corrections and remember them across sessions
  • Analytics workflows grounded in curated knowledge bases (table docs, business rules, validated SQL)
PropertyValue
Typedata-analyst
System PromptStructured 6-step analysis pipeline
Default ModelNone (inherits from agent or organization)

The system prompt guides the agent through six steps on every data question:

  1. Recall — Search persistent memory for corrections, column mappings, and business definitions from earlier sessions
  2. Inspect — Use sql_schema to verify table structure before writing SQL
  3. Plan — State the query plan: tables, joins, filters, expected grain, and potential pitfalls
  4. Execute & Validate — Run the query, then validate (zero rows? duplicates? NULL aggregations?). Self-correct if results look wrong
  5. Visualize — Summarize findings in plain language, then render charts and tables via OpenUI
  6. Learn — Use remember to save corrections and patterns for future sessions

This mirrors the six-layer context pattern described in OpenAI’s data agent blog post and implemented by Dash.

All Generic harness capabilities plus:

CapabilityWhat it provides
Session SQL Databasesql_execute, sql_query, sql_schema — session-scoped SQLite databases that auto-create on first write
Persistent Memoryremember, recall, forget — cross-session memory with passive recall (8 memories auto-injected per turn)
OpenUIRich interactive charts, tables, dashboards, and KPI cards rendered inline in chat
Todo Listwrite_todos — track multi-step analysis tasks
Data KnowledgeMounts /knowledge/ scaffold with directories for table docs, business rules, and validated SQL patterns

The harness mounts a /knowledge/ directory scaffold in every session:

/knowledge/
tables/README.md # Add one .md per table: columns, types, gotchas
business/README.md # Add metric definitions, business rules, domain terms
queries/README.md # Add validated .sql files as reusable templates

These files are read-only scaffolds. Populate them with your organization’s curated knowledge to ground the agent’s SQL generation in reality. The agent reads these files before writing any SQL query.

Combined with persistent memory (which accumulates corrections automatically), this implements the layered context pattern:

LayerSource
Table usage & schemasql_schema tool + /knowledge/tables/
Business annotations/knowledge/business/ + AGENTS.md
Validated queries/knowledge/queries/
Institutional knowledgeMCP servers (Slack, Notion, Confluence)
Learning memoryremember / recall tools
Runtime contextsql_query / sql_execute
User: Load this CSV and tell me which product category has the highest revenue
Agent: [recalls relevant memories] [inspects any existing schema]
[creates table, imports data]
[runs SELECT category, SUM(revenue) ... GROUP BY category]
[validates: 5 categories, no NULLs, totals match]
[renders bar chart via OpenUI]
[remembers: "revenue column is net of refunds"]