Electronic health records (EHRs) have been widely championed as a revolutionary aid in
clinical trial recruitment—offering efficiency, scale, and access to rich clinical history. Yet
beneath this veneer lies a fractured reality: one where fragmented, biased, and often opaque
data systems can silently and systematically exclude patients from opportunities. Heavy
reliance on structured EHR data, fragmented systems, and free-text blind spots can
undermine equitable trial enrollment, highlighting the need for sponsors to address the gap
between digital eligibility and actual representation.
EHRs undeniably enhance trial workflows. A systematic review found consistent
improvements across recruitment, screening, and data collection when EHRs were
integrated into clinical trials. Structured data (e.g., labs and coded diagnoses) is invaluable,
but studies show it captures only 50–70% of the information needed to resolve trial eligibility
criteria1. Meanwhile, across healthcare as a whole, nearly 80% of all data still resides in
unstructured formats like clinical notes, imaging, and narratives2.
With such a large share of healthcare data trapped in unstructured formats, the limitations
of current approaches are clear. Patient eligibility often depends on nuanced details (e.g.,
disease stage, comorbidities, medication history, or social context) that can sometimes
appear in free-text notes, radiology images, or scanned documents. Identifying these signals
requires staff to manually read and interpret clinical notes, a task that is both time-intensive
and error-prone. With research teams already stretched thin, this additional burden can
mean eligibility reviews are rushed or incomplete, leaving qualified patients overlooked.
Looking ahead, large language models (LLMs) and natural language processing (NLP) offer
the promise of curating and analyzing unstructured data at scale. But this efficiency comes
with new risks: if the underlying notes contain inconsistencies, shorthand, or biased
language, those blind spots are baked into the algorithms. Without careful oversight, NLP-
driven tools could automate exclusion rather than broaden access.
Bias in EHR-driven recruitment doesn’t only stem from what is missing; it also emerges from
how information is recorded. Clinical notes often include stigmatizing or subjective
language, and when that language is fed into algorithmic screening tools, it can distort
outcomes. Research has shown that biased descriptors in notes can directly affect model
performance, with Black patients disproportionately harmed when mortality-prediction
models were trained on such data. What may feel like a small annotation in a chart can,
when scaled through automation, have profound effects on who is identified as eligible.
Another layer of bias comes from the absence of granular data. Informal diagnoses, disease
progression, or social context often remain uncaptured in structured fields, meaning
patients who technically qualify may never be flagged. In these cases, the issue that they
exist in ways the recruitment systems cannot interpret.
Finally, the structure of EHRs themselves contributes to inequity. These systems were not
originally designed with research readiness in mind, and as a result they carry selection bias,
missingness, and time-orientation issues that limit causal inference and representation.
Taken together, these factors create a feedback loop: underrepresented populations are
under-captured in the data, algorithms trained on that data reflect those gaps, and trial
enrollment continues to skew away from the very diversity sponsors aim to achieve.
Closing the gap between digital eligibility and real representation requires intentional design.
Sponsors and sites must move beyond an overreliance on structured fields by incorporating
methods to unlock unstructured data in a way that is accurate, safe, and unbiased. NLP and
LLMs can help surface critical details hidden in notes and reports, but they must be deployed
with careful oversight to avoid automating existing biases.
Equity also depends on better data foundations. Standardized demographic fields,
consistent use of research-ready data elements, and mechanisms for patients to self-report
or correct their information are all essential. Just as important is interoperability: linking
records across fragmented systems ensures care histories are complete, particularly for
patients who move between community clinics, academic centers, and safety-net
providers.
Moreover, accountability has to be built into the process. Trial teams should track
representativeness metrics in real time and audit which populations are missing. By pairing
technical innovation with transparency and equity-focused design, EHR-driven recruitment
can fulfill its promise of broadening access to trials rather than reinforcing systemic blind
spots.
Electronic health records hold vast potential to streamline trial recruitment but only if
implemented with intentional safeguards. Left unaddressed, fragmented data, bias-laden
notes, and hidden demographics can reinforce systemic inequities; further embedding blind
spots into eligibility. Sponsors must treat EHRs as complex and biased systems that require
thoughtful augmentation to ensure that EHR-driven recruitment expands access, rather than
narrows it. In doing so, EHRs can move toward their promise of a recruitment engine that
reflects the clinical diversity of real-world populations.
References
Denise N. Bronner, Ph.D. has roughly 15 years of organizational thought leadership experience within the global healthcare space and has held various roles in academia, consulting, pharma, and venture capital.
During her career, she has specialized in health equity, data-driven global therapy program strategy development, pitch and storytelling refinement, and identifying business opportunities within pharma.
Beyond her professional endeavors, she’s passionate about enhancing diversity in STEM fields, serving on advisory boards, participating as a judge in pitch/business competitions, and mentoring young professionals.
She holds a bachelor’s degree in Biological Sciences from Wayne State University, a Ph.D. in Microbiology & Immunology from the University of Michigan – Ann Arbor, and certification from the Venture Capital Executive Program from UC Berkeley Haas School of Business.
She is the founder of Empactful Ventures, which currently consults healthcare-focused startups and venture funds. She is also a member of the Clinical Leader editorial board, a Board Director for healthtech startup Naviday Health, and a Board Director for the patient advocacy group We Are ILL.