Skip to content
IEEE VIS 2026 · Short Paper

Echo: Visual Steering of AI-Assisted EHR Cohort Construction

An interactive visual analytics system for constructing, inspecting, and comparing EHR cohorts through AI-assisted SQL generation and coordinated views.

Echo interface showing the Criteria Graph, SQL Console, and Cohort Analysis views
My Role
First Author (co-equal)

Complete frontend design, from brainstorming through high-fidelity implementation. Information design system, visual encoding, design objectives, formative research.

Team

Vincent J. Zhang (co-equal first author)
Lingfei Qian · Vipina K. Keloth
Yujia Zhou · Na Hong
Hua Xu · Huan He
Yale School of Medicine

Venue

IEEE VIS 2026 · Short Paper
Status: Under Review

Stack

Vue.js · Vite · Vue Flow
FastAPI · Claude Opus · OMOP CDM

What is Echo?

Echo is a visual analytics system that lets clinical researchers manage cohort definitions on an interactive Criteria Graph, inspect and edit AI-generated SQL at the criterion level, and compare alternative cohort definitions side by side through coordinated views. It integrates a Text-to-SQL agent into the authoring workflow so researchers can focus on short natural-language descriptions for each criterion rather than writing SQL directly.

account_tree Criteria Graph
terminal Text-to-SQL Agent
compare Multi-cohort Comparison
database OMOP CDM

Design Leadership

design_services
I designed the system, not just the interface

Every core design decision in Echo came from me: the choice to represent eligibility criteria as an editable directed graph rather than a flat checklist, the visual language distinguishing inclusion from exclusion, how the four views coordinate and update in lockstep, and the information hierarchy that keeps complex cohort logic readable without overwhelming expert users. I then implemented the entire frontend myself in Vue.js and Vue Flow.

psychology
Formative Research

Led formative discussions with clinical informatics experts to understand how researchers think about cohort construction, which directly shaped the three design objectives and the four visualization tasks.

hub
Criteria Graph Encoding

Designed the DAG representation from scratch: node types, inclusion vs. exclusion color logic, N-count propagation at each edge, branching from shared upstream criteria, and the operator node visual language for AND, OR, and At-Least-N.

grid_view
Coordinated View Architecture

Designed how the Criteria Graph, SQL Console, Cohort Analysis panel, and Agent Insights relate to each other — what each view shows, what triggers updates, and how the workspace stays in shared state across all four panes.

code
Frontend Implementation

Built the complete frontend in Vue.js, Vite, and Vue Flow, implementing the canvas interactions, real-time N-count updates, SQL synchronization, and the cohort comparison panel from design to production code.

The Challenge

Eligibility criteria are complex and interrelated. When an LLM generates the entire cohort as a single SQL query, the output is faster to produce but nearly impossible to inspect. When it generates criteria one at a time, local quality improves but global coherence and cross-criterion relationships are lost.
search_off
Hard to Inspect

A monolithic SQL query generated from the full cohort definition gives researchers no way to verify whether each criterion was correctly captured and grounded to clinical concepts.

sync_problem
Hard to Refine

Adjusting a single threshold or criterion requires regenerating the entire query, losing the context, rationale, and incremental decisions from previous edits.

difference
Hard to Compare

Existing tools offer no unified workspace for authoring and comparing multiple alternative cohort definitions side by side, making exploratory study design slow and error-prone.

Design Objectives

Through formative discussions with domain experts in clinical oncology, systematic review, and public health, three core design objectives shaped Echo's visual design.

fact_check
DO1 · Inspectable

Make cohort eligibility logic inspectable and verifiable at the criterion level, so experts can confirm whether each criterion matches their clinical intent before execution.

tune
DO2 · Refinable

Support incremental, structured refinement rather than whole-query regeneration. Changes to individual criteria should recompile only the affected subgraph.

insights
DO3 · Comparable

Surface how each criterion reshapes the resulting cohort in size, demographics, and comorbidity profile, so researchers see the downstream impact of every design decision.

System Architecture

System architecture of Echo showing Frontend, Backend, Text2SQL Agent, and Database components
Fig 1. System architecture of Echo organized into four components.
desktop_windows
Frontend

The user-facing visual workflow built with Vue.js, Vite, and Vue Flow. Exposes the Criteria Graph canvas, SQL Console, Cohort Analysis panel, and Agent Insights in a unified workspace.

settings
Backend

A FastAPI service that mediates between the Frontend, the Text2SQL Agent, and the databases. Manages workspace state, criteria-to-SQL translation, and cohort analytics.

smart_toy
Text2SQL Agent

Built on a Claude Code Agent. Translates natural-language criterion descriptions into SQL predicates grounded to OMOP CDM concepts via vocabulary lookup and schema-aware generation.

database
Database

MongoDB stores workspace and project state. PostgreSQL holds the OMOP CDM clinical data instance used for cohort execution and N-count computation.

Visualization Design

Echo's interface is composed of a Criteria Graph (A), a Console for criterion-level SQL inspection and editing (B), and three Cohort Analysis views (C). Every view reacts to what happens on the Criteria Graph, keeping the visual authoring surface and the underlying query in shared state.

Echo full interface showing Criteria Graph (A), SQL Console (B), and Cohort Analysis panel (C)
Fig 2. Echo with four cohorts active. The Criteria Graph (A) occupies the center canvas; the SQL and Agent Console (B) and Cohort Analysis panel (C) are docked to the right.

A · Criteria Graph

Rather than treating eligibility criteria as a flat checklist, Echo transforms the entire eligibility-criterion specification into an editable directed graph: criterion logic becomes nodes and dependency relations become directed edges. The graph flows top-to-bottom from a shared dataset root node, mirroring the researcher's mental model of sequential patient filtering. Each layer narrows the cohort and vertical position corresponds to filter-chain depth.

check_circle
Criterion Nodes

Rounded rectangles encoding an eligibility condition and its intermediate cohort count. Inclusion predicates receive green badges; exclusion predicates receive red badges, so researchers can verify that each criterion matches their clinical intent at a glance.

device_hub
Logical Operator Nodes

Rounded rectangles encoding AND, OR, and a quantified At-Least-N operator that captures cardinality constraints across multiple criteria.

group
Target Cohort Nodes

Pill-shaped bars closing each subgraph with the final patient count and a slot-assignment control that designates the cohort as Cohort A, Cohort B, or unassigned.

From natural language to graph. A researcher types a free-form cohort description into the Criteria Agent chat. The agent extracts discrete clinical predicates, grounds each one to an OMOP concept, instantiates the corresponding nodes on the graph, and emits the SQL alongside them. Because the graph and its SQL always appear together, the translation from natural-language intent to executable query is immediately inspectable rather than hidden inside the agent.

Interactivity. Users can freely add nodes, draw edges, and drag nodes to reorganize the canvas. When a user modifies a criterion node, Echo localizes the change to that node's SQL predicate and recompiles only the affected subgraph. N counts propagate immediately through all downstream nodes and drop-off percentages update at each edge, giving researchers live feedback on the impact of every edit.

B · Console

The Console exposes the technical details of the selected criterion node: its short name, original natural-language description, generated SQL predicate, execution preview, and activity history.

Text2SQL workflow. When a user enters a criterion, the Text2SQL Agent generates a short node label and translates the description into a SQL predicate grounded to OMOP CDM concepts. Executing the node returns intermediate cohort counts on the graph and preview rows in the Console.

Correction support. Users can directly edit the generated SQL predicate when the automated translation does not match their intent. Echo treats the edited SQL as the source of truth for the next execution and records the change in the activity log, making the translation from criterion text to executable query inspectable rather than hidden.

Cohort Analysis panel. The Cohort Analysis panel occupies the lower portion of the right-side column and is always visible regardless of node selection. It opens with a TARGET COHORTS checklist (one row per cohort on the graph), each showing a color swatch, cohort name, and final N count, which controls which cohorts participate in the three views below. All views update in response to graph edits and Execute events.

C · Cohort Analysis Views

Once cohorts are constructed on the Criteria Graph, Echo provides three lightweight views that report basic descriptive statistics for at-a-glance comparison.

join
C1 · Cohort Overlap

An UpSet plot shows the exclusive and shared patient subsets across selected cohorts, supporting comparison of the number of patients in every intersection proportionally. Researchers can see whether alternative definitions capture meaningfully different populations before committing to a definition, revealing unexpected overlap that would otherwise require separate queries to discover.

contacts
C2 · Cohort Profile

Each cohort is shown as a card with its final patient count and the inclusion and exclusion criteria contributed by its graph path. Exclusion criteria are rendered in red so the two predicate types are visually distinct. Below the cards, a metrics table reports basic descriptive statistics across selected cohorts, organized into Demographics (N patients, average age, percentage male) and Comorbidity, with a Range column summarizing cross-cohort spread for each metric.

smart_toy
C3 · Agent Insights

LLM-generated observations across the selected cohorts, grounded in the computed statistics rather than external medical knowledge to limit hallucination risk. Insights refresh whenever the Criteria Graph is edited or re-executed.

Figure 3: Echo Cohort Analysis Views — Cohort Overlap UpSet plot (A) and Cohort Profile (B)
Fig 3. Echo's Cohort Analysis Views: (A) Cohort Overlap UpSet plot showing exclusive and shared patient subsets across cohorts, and (B) Cohort Profile listing each cohort's final count and inclusion/exclusion criteria with a comparative metrics table.

Coordinated Interaction

The Criteria Graph is the central interactive surface, and every other view reacts to it. Selecting a node immediately highlights the corresponding entries in the Console and the Cohort Analysis views, so the user can see the SQL, the patient overlap, the demographic profile, and the AI-generated insights tied to that exact criterion. Editing a criterion or clicking Execute propagates in a single step: per-node N counts update on the graph, and the Cohort Overlap, Cohort Profile, and Insights panels re-render to reflect the new cohorts. Choosing which cohorts to compare is a graph-level interaction rather than a per-panel filter. A Compare selector toggles each cohort between active and parked, and all three analysis views switch in lockstep. The user steers the graph, and the rest of the interface follows.

Expert Assessment

We conducted an informal assessment with two domain experts, one clinical oncology faculty member and one data scientist with clinical trials experience, using Echo on a synthetic MIMIC-IV ICU dataset in OMOP CDM format. Each session lasted approximately 90 minutes and included a walkthrough of Echo, free exploration, and a semi-structured interview.

sentiment_satisfied
Key Finding

Participants successfully constructed and compared target cohorts. The Criteria Graph helped them grasp the overall logic structure of a cohort definition, and visual edits were perceived as low-friction. One participant went beyond the assigned task to fork several additional sub-cohorts, exploring alternative criterion combinations they had not planned in advance.

account_tree
Logic Made Visible

Seeing criteria as a graph made it easy to follow which filters applied first and how inclusion and exclusion logic was organized across the cohort definition.

edit_note
Low-Friction Iteration

Visual edits felt fast and recoverable. The convenience enabled exploratory sub-cohort derivation the researchers had not explicitly planned, demonstrating that the design supports open-ended study design.

construction
Identified Future Work

SQL correctness still requires human verification for concept-code mappings. Decomposing complex eligibility criteria into individual nodes is currently a manual step. Query execution can be slow on complex queries.