Overview
Role
Challenge
Objectives
System
Interface
Graph
Console
Analysis
Interaction
Assessment
Impact

IEEE VIS 2026 · Short Paper

Echo: Visual Steering of AI-Assisted EHR Cohort Construction

An interactive visual analytics system for constructing, inspecting, and comparing EHR cohorts through AI-assisted SQL generation and coordinated views.

Echo interface showing the Criteria Graph, SQL Console, and Cohort Analysis views

My Role

First Author (co-equal)

Complete frontend design, from brainstorming through high-fidelity implementation. Information design system, visual encoding, design objectives, formative research.

Team

Vincent J. Zhang (co-equal first author)
Lingfei Qian · Vipina K. Keloth
Yujia Zhou · Na Hong
Hua Xu · Huan He
Yale School of Medicine

Venue

IEEE VIS 2026 · Short Paper
Status: Under Review

Stack

Vue.js · Vite · Vue Flow
FastAPI · Claude Opus · OMOP CDM

What is Echo?

Echo is a visual analytics system that lets clinical researchers manage cohort definitions on an interactive Criteria Graph, inspect and edit AI-generated SQL at the criterion level, and compare alternative cohort definitions side by side through coordinated views. It integrates a Text-to-SQL agent into the authoring workflow so researchers can focus on short natural-language descriptions for each criterion rather than writing SQL directly.

account_tree Criteria Graph

terminal Text-to-SQL Agent

compare Multi-cohort Comparison

database OMOP CDM

Design Leadership

design_services

End-to-end design ownership across the system

I led the core design decisions in Echo: representing eligibility criteria as an editable directed graph rather than a flat checklist, defining the visual language distinguishing inclusion from exclusion, structuring how the four views coordinate and update in lockstep, and shaping the information hierarchy that keeps complex cohort logic readable without overwhelming expert users. I then implemented the frontend in Vue.js and Vue Flow.

psychology

Formative Research

Led formative discussions with clinical informatics experts to understand how researchers think about cohort construction, which directly shaped the three design objectives and the four visualization tasks.

hub

Criteria Graph Encoding

Designed the DAG representation from scratch: node types, inclusion vs. exclusion color logic, N-count propagation at each edge, branching from shared upstream criteria, and the operator node visual language for AND, OR, and At-Least-N.

grid_view

Coordinated View Architecture

Designed how the Criteria Graph, SQL Console, Cohort Analysis panel, and Agent Insights relate to each other — what each view shows, what triggers updates, and how the workspace stays in shared state across all four panes.

code

Frontend Implementation

Built the complete frontend in Vue.js, Vite, and Vue Flow, implementing the canvas interactions, real-time N-count updates, SQL synchronization, and the cohort comparison panel from design to production code.

The Challenge

Eligibility criteria are complex and interrelated. When an LLM generates the entire cohort as a single SQL query, the output is faster to produce but nearly impossible to inspect. When it generates criteria one at a time, local quality improves but global coherence and cross-criterion relationships are lost.

search_off

Hard to Inspect

A monolithic SQL query generated from the full cohort definition gives researchers no way to verify whether each criterion was correctly captured and grounded to clinical concepts.

sync_problem

Hard to Refine

Adjusting a single threshold or criterion requires regenerating the entire query, losing the context, rationale, and incremental decisions from previous edits.

difference

Hard to Compare

Existing tools offer no unified workspace for authoring and comparing multiple alternative cohort definitions side by side, making exploratory study design slow and error-prone.

Design Objectives

Through formative discussions with domain experts in clinical oncology, systematic review, and public health, three core design objectives shaped Echo's visual design.

fact_check

DO1 · Inspectable

Make cohort eligibility logic inspectable and verifiable at the criterion level, so experts can confirm whether each criterion matches their clinical intent before execution.

tune

DO2 · Refinable

Support incremental, structured refinement rather than whole-query regeneration. Changes to individual criteria should recompile only the affected subgraph.

insights

DO3 · Comparable

Surface how each criterion reshapes the resulting cohort in size, demographics, and comorbidity profile, so researchers see the downstream impact of every design decision.

System Architecture

Fig 1. System architecture of Echo organized into four components.

desktop_windows

Frontend

The user-facing visual workflow built with Vue.js, Vite, and Vue Flow. Exposes the Criteria Graph canvas, SQL Console, Cohort Analysis panel, and Agent Insights in a unified workspace.

settings

Backend

A FastAPI service that mediates between the Frontend, the Text2SQL Agent, and the databases. Manages workspace state, criteria-to-SQL translation, and cohort analytics.

smart_toy

Text2SQL Agent

Built on a Claude Code Agent. Translates natural-language criterion descriptions into SQL predicates grounded to OMOP CDM concepts via vocabulary lookup and schema-aware generation.

database

Database

MongoDB stores workspace and project state. PostgreSQL holds the OMOP CDM clinical data instance used for cohort execution and N-count computation.

Visualization Design

Echo's interface is composed of a Criteria Graph (A), a Console for criterion-level SQL inspection and editing (B), and three Cohort Analysis views (C). Every view reacts to what happens on the Criteria Graph, keeping the visual authoring surface and the underlying query in shared state.

Fig 2. Echo with four cohorts active. The Criteria Graph (A) occupies the center canvas; the SQL and Agent Console (B) and Cohort Analysis panel (C) are docked to the right.

A · Criteria Graph

Rather than treating eligibility criteria as a flat checklist, Echo transforms the entire eligibility-criterion specification into an editable directed graph: criterion logic becomes nodes and dependency relations become directed edges. The graph flows top-to-bottom from a shared dataset root node, mirroring the researcher's mental model of sequential patient filtering. Each layer narrows the cohort and vertical position corresponds to filter-chain depth.

check_circle

Criterion Nodes

Rounded rectangles encoding an eligibility condition and its intermediate cohort count. Inclusion predicates receive green badges; exclusion predicates receive red badges, so researchers can verify that each criterion matches their clinical intent at a glance.

device_hub

Logical Operator Nodes

Rounded rectangles encoding AND, OR, and a quantified At-Least-N operator that captures cardinality constraints across multiple criteria.

group

Target Cohort Nodes

Pill-shaped bars closing each subgraph with the final patient count and a slot-assignment control that designates the cohort as Cohort A, Cohort B, or unassigned.

From natural language to graph. A researcher types a free-form cohort description into the Criteria Agent chat. The agent extracts discrete clinical predicates, grounds each one to an OMOP concept, instantiates the corresponding nodes on the graph, and emits the SQL alongside them. Because the graph and its SQL always appear together, the translation from natural-language intent to executable query is immediately inspectable rather than hidden inside the agent.

Interactivity. Users can freely add nodes, draw edges, and drag nodes to reorganize the canvas. When a user modifies a criterion node, Echo localizes the change to that node's SQL predicate and recompiles only the affected subgraph. N counts propagate immediately through all downstream nodes and drop-off percentages update at each edge, giving researchers live feedback on the impact of every edit.

B · Console

The Console exposes the technical details of the selected criterion node: its short name, original natural-language description, generated SQL predicate, execution preview, and activity history.

Text2SQL workflow. When a user enters a criterion, the Text2SQL Agent generates a short node label and translates the description into a SQL predicate grounded to OMOP CDM concepts. Executing the node returns intermediate cohort counts on the graph and preview rows in the Console.

Correction support. Users can directly edit the generated SQL predicate when the automated translation does not match their intent. Echo treats the edited SQL as the source of truth for the next execution and records the change in the activity log, making the translation from criterion text to executable query inspectable rather than hidden.

Cohort Analysis panel. The Cohort Analysis panel occupies the lower portion of the right-side column and is always visible regardless of node selection. It opens with a TARGET COHORTS checklist (one row per cohort on the graph), each showing a color swatch, cohort name, and final N count, which controls which cohorts participate in the three views below. All views update in response to graph edits and Execute events.

C · Cohort Analysis Views

Once cohorts are constructed on the Criteria Graph, Echo provides three lightweight views that report basic descriptive statistics for at-a-glance comparison.

join

C1 · Cohort Overlap

An UpSet plot shows the exclusive and shared patient subsets across selected cohorts, supporting comparison of the number of patients in every intersection proportionally. Researchers can see whether alternative definitions capture meaningfully different populations before committing to a definition, revealing unexpected overlap that would otherwise require separate queries to discover.

contacts

C2 · Cohort Profile

Each cohort is shown as a card with its final patient count and the inclusion and exclusion criteria contributed by its graph path. Exclusion criteria are rendered in red so the two predicate types are visually distinct. Below the cards, a metrics table reports basic descriptive statistics across selected cohorts, organized into Demographics (N patients, average age, percentage male) and Comorbidity, with a Range column summarizing cross-cohort spread for each metric.

smart_toy

C3 · Agent Insights

LLM-generated observations across the selected cohorts, grounded in the computed statistics rather than external medical knowledge to limit hallucination risk. Insights refresh whenever the Criteria Graph is edited or re-executed.

Fig 3. Echo's Cohort Analysis Views: (A) Cohort Overlap UpSet plot showing exclusive and shared patient subsets across cohorts, and (B) Cohort Profile listing each cohort's final count and inclusion/exclusion criteria with a comparative metrics table.

Coordinated Interaction

The Criteria Graph is the central interactive surface, and every other view reacts to it. Selecting a node immediately highlights the corresponding entries in the Console and the Cohort Analysis views, so the user can see the SQL, the patient overlap, the demographic profile, and the AI-generated insights tied to that exact criterion. Editing a criterion or clicking Execute propagates in a single step: per-node N counts update on the graph, and the Cohort Overlap, Cohort Profile, and Insights panels re-render to reflect the new cohorts. Choosing which cohorts to compare is a graph-level interaction rather than a per-panel filter. A Compare selector toggles each cohort between active and parked, and all three analysis views switch in lockstep. The user steers the graph, and the rest of the interface follows.

Expert Assessment

We conducted an informal assessment with two domain experts, one clinical oncology faculty member and one data scientist with clinical trials experience, using Echo on a synthetic MIMIC-IV ICU dataset in OMOP CDM format. Each session lasted approximately 90 minutes and included a walkthrough of Echo, free exploration, and a semi-structured interview.

sentiment_satisfied

Key Finding

Participants successfully constructed and compared target cohorts. The Criteria Graph helped them grasp the overall logic structure of a cohort definition, and visual edits were perceived as low-friction. One participant went beyond the assigned task to fork several additional sub-cohorts, exploring alternative criterion combinations they had not planned in advance.

account_tree

Logic Made Visible

Seeing criteria as a graph made it easy to follow which filters applied first and how inclusion and exclusion logic was organized across the cohort definition.

edit_note

Low-Friction Iteration

Visual edits felt fast and recoverable. The convenience enabled exploratory sub-cohort derivation the researchers had not explicitly planned, demonstrating that the design supports open-ended study design.

construction

Identified Future Work

SQL correctness still requires human verification for concept-code mappings. Decomposing complex eligibility criteria into individual nodes is currently a manual step. Query execution can be slow on complex queries.

Why This Matters

Echo addresses two trends in U.S. clinical research that often pull against each other. AI-assisted cohort tools are getting faster at generating eligibility logic. At the same time, clinicians and researchers still need to inspect, modify, and trust that logic before they can use it.

visibility

Bridging an AI interpretability gap

Text-to-SQL and AI-assisted cohort tools can now generate eligibility logic faster than ever, but the output is typically a single opaque query that researchers cannot inspect, modify, or trust without rewriting from scratch. The Criteria Graph paradigm makes AI-generated cohort logic readable, editable, and auditable at the criterion level, so clinical researchers can adopt AI assistance without giving up control over the cohort definition.

library_books

Peer-reviewed dissemination

Echo is documented as a IEEE VIS 2026 short paper, where I serve as co-equal first author. The paper formalizes the Criteria Graph paradigm and the coordinated multi-view architecture so the design can be referenced and reused by other research groups working on AI-assisted clinical-research interfaces.

Reusable beyond Echo

The design patterns introduced here are not specific to clinical-trial recruitment. The DAG-based eligibility representation, criterion-level SQL inspection, N-count propagation across the graph, and coordinated cohort-comparison views generalize to other AI-assisted research workflows where users need to inspect and refine AI-generated logic step by step.

biotech

Research context

Echo was built within biomedical AI research at the Yale School of Medicine, Department of Biomedical Informatics and Data Science. It is part of broader work on making AI-assisted biomedical systems (clinical NLP, EHR cohort tools, NIH Common Data Element navigation) interpretable and adoptable for U.S. researchers and clinicians.