When the Record Doesn’t Tell the Whole Story

Electronic health records (EHRs) have been widely championed as a revolutionary aid in
clinical trial recruitment—offering efficiency, scale, and access to rich clinical history. Yet
beneath this veneer lies a fractured reality: one where fragmented, biased, and often opaque
data systems can silently and systematically exclude patients from opportunities. Heavy
reliance on structured EHR data, fragmented systems, and free-text blind spots can
undermine equitable trial enrollment, highlighting the need for sponsors to address the gap
between digital eligibility and actual representation.

The Achilles’ Heel

EHRs undeniably enhance trial workflows. A systematic review found consistent
improvements across recruitment, screening, and data collection when EHRs were
integrated into clinical trials. Structured data (e.g., labs and coded diagnoses) is invaluable,
but studies show it captures only 50–70% of the information needed to resolve trial eligibility
criteria1. Meanwhile, across healthcare as a whole, nearly 80% of all data still resides in
unstructured formats like clinical notes, imaging, and narratives2.

With such a large share of healthcare data trapped in unstructured formats, the limitations
of current approaches are clear. Patient eligibility often depends on nuanced details (e.g.,
disease stage, comorbidities, medication history, or social context) that can sometimes
appear in free-text notes, radiology images, or scanned documents. Identifying these signals
requires staff to manually read and interpret clinical notes, a task that is both time-intensive
and error-prone. With research teams already stretched thin, this additional burden can
mean eligibility reviews are rushed or incomplete, leaving qualified patients overlooked.

Looking ahead, large language models (LLMs) and natural language processing (NLP) offer
the promise of curating and analyzing unstructured data at scale. But this efficiency comes
with new risks: if the underlying notes contain inconsistencies, shorthand, or biased
language, those blind spots are baked into the algorithms. Without careful oversight, NLP-
driven tools could automate exclusion rather than broaden access.

Bias Amplified

Bias in EHR-driven recruitment doesn’t only stem from what is missing; it also emerges from
how information is recorded. Clinical notes often include stigmatizing or subjective
language, and when that language is fed into algorithmic screening tools, it can distort
outcomes. Research has shown that biased descriptors in notes can directly affect model
performance, with Black patients disproportionately harmed when mortality-prediction
models were trained on such data. What may feel like a small annotation in a chart can,
when scaled through automation, have profound effects on who is identified as eligible.

Another layer of bias comes from the absence of granular data. Informal diagnoses, disease
progression, or social context often remain uncaptured in structured fields, meaning
patients who technically qualify may never be flagged. In these cases, the issue that they
exist in ways the recruitment systems cannot interpret.

Finally, the structure of EHRs themselves contributes to inequity. These systems were not
originally designed with research readiness in mind, and as a result they carry selection bias,
missingness, and time-orientation issues that limit causal inference and representation.
Taken together, these factors create a feedback loop: underrepresented populations are
under-captured in the data, algorithms trained on that data reflect those gaps, and trial
enrollment continues to skew away from the very diversity sponsors aim to achieve.

Bridging the Gap

Closing the gap between digital eligibility and real representation requires intentional design.
Sponsors and sites must move beyond an overreliance on structured fields by incorporating
methods to unlock unstructured data in a way that is accurate, safe, and unbiased. NLP and
LLMs can help surface critical details hidden in notes and reports, but they must be deployed
with careful oversight to avoid automating existing biases.

Equity also depends on better data foundations. Standardized demographic fields,
consistent use of research-ready data elements, and mechanisms for patients to self-report
or correct their information are all essential. Just as important is interoperability: linking
records across fragmented systems ensures care histories are complete, particularly for
patients who move between community clinics, academic centers, and safety-net
providers.

Moreover, accountability has to be built into the process. Trial teams should track
representativeness metrics in real time and audit which populations are missing. By pairing
technical innovation with transparency and equity-focused design, EHR-driven recruitment
can fulfill its promise of broadening access to trials rather than reinforcing systemic blind
spots.

Intentional Design

Electronic health records hold vast potential to streamline trial recruitment but only if
implemented with intentional safeguards. Left unaddressed, fragmented data, bias-laden
notes, and hidden demographics can reinforce systemic inequities; further embedding blind
spots into eligibility. Sponsors must treat EHRs as complex and biased systems that require
thoughtful augmentation to ensure that EHR-driven recruitment expands access, rather than
narrows it. In doing so, EHRs can move toward their promise of a recruitment engine that
reflects the clinical diversity of real-world populations.

References

  1. What Impact has DS & Technologies Associated with it had on Pharmaceutical R&D?
  2. Managing Unstructured Big Data in Healthcare System

Share:

Leave a comment

Search

Recent Topics