From Observational Evidence to Regulatory Acceptance
The FDA's interest in real-world data (RWD) and real-world evidence (RWE) as inputs to regulatory decision-making predates the current wave of EHR infrastructure investment by nearly a decade. The 21st Century Cures Act of 2016 formally directed FDA to develop a framework for evaluating whether RWE could be used to support approval of new indications for approved drugs and to satisfy post-approval study requirements. FDA's subsequent Real-World Evidence Program — launched in 2018 and operating through a series of framework documents, pilot programs, and final and draft guidances — has progressively clarified the conditions under which RWD meets the evidentiary standards FDA requires for regulatory use.
For clinical trial sponsors, the practical relevance of this framework operates at several levels. At the most direct level, it expands the types of evidence that can inform FDA submissions — with implications for supplemental NDA/BLA submissions, post-approval commitments, and certain rare disease approvals. At a less direct but operationally important level, the regulatory acceptance of RWD for evidence purposes has parallel implications for how RWD — specifically EHR-derived data — can be used in the trial design and patient identification phases that precede submission.
FDA's Framework: What It Says and What It Doesn't
FDA has published several guidance documents directly relevant to RWD use in clinical evidence, including the 2021 final guidance on submitting documents using RWE from electronic health records, the 2022 draft guidance on considerations for the design, conduct, and reporting of clinical trials using RWD, and ongoing framework publications from the RWE Program pilots. Taken together, these establish several key principles that sponsors and clinical teams should understand.
First, FDA distinguishes sharply between the fit-for-purpose standard for RWD and the assumption that EHR data equals research-quality data. The 2021 final guidance is explicit: EHR data collected for clinical care purposes may not capture all data elements needed for a clinical trial endpoint with the precision and completeness required for regulatory review. This is the FDA's formal acknowledgment of what clinical informaticists have known for years — EHR data quality is highly variable and context-dependent.
Second, FDA's RWE framework for efficacy and safety endpoints (as opposed to supplemental indications) currently applies primarily to observational study designs and certain pragmatic trial designs that incorporate usual-care data collection. Sponsors should not read the RWE framework as a signal that conventional randomized controlled trials can be replaced by EHR-based observational analyses for primary efficacy claims — that remains the exception rather than the rule, with rare disease and unmet need situations representing the most common approved use cases.
Third, and most practically relevant for enrollment operations, FDA's framework for using RWD to inform trial design — including natural history studies, external control arms, and patient population characterization — is substantially more permissive than its framework for using RWD as primary efficacy evidence. The bar for using EHR-derived data to understand disease prevalence, characterize a patient population, or inform eligibility criterion design is considerably lower than the bar for using that same data as an evidentiary record in a regulatory submission.
Natural History and Patient Population Characterization
The most established and least controversial use of RWD in the clinical trial context is natural history study design — using RWD to characterize the untreated or standard-of-care progression of a disease, to establish what patient outcomes look like without the investigational treatment, and to define the patient population that will be targeted for enrollment.
For rare disease programs, where prospective natural history data collection is often infeasible due to small population size and long disease timelines, FDA has explicitly recognized RWD-derived natural history data as a legitimate input to trial design. Several approved rare disease drugs have relied on external control arms derived from registries, claims data, and EHR cohorts to contextualize efficacy results — a use case that FDA's 2019 guidance on rare diseases specifically accommodates.
For more common therapeutic areas, EHR-derived population characterization serves a different but equally valuable function at the trial design stage: informing the realistic prevalence of specific eligibility criteria combinations in the target patient population. This informs screen failure rate projections, enrollment timeline planning, and site selection decisions by providing data on what fraction of patients with the primary diagnosis also meet the secondary biomarker, organ function, and treatment history criteria that the protocol will require.
External Control Arms: The Regulatory Pathway and Its Constraints
External control arms using real-world historical data have been approved by FDA as part of the regulatory pathway for several oncology drug approvals, typically in rare tumor types or heavily pre-treated populations where a concurrent randomized control arm is not feasible. The FDA-Oncology Center of Excellence's Project Optimus and related initiatives have engaged the question of when RWD-based external controls are scientifically and regulatorily defensible.
The constraints on external control arms are significant and should be understood clearly before investing in the design. FDA's primary concern with external controls is confounding — the patients in a historical RWD cohort may differ systematically from trial participants in ways that are not fully captured in the EHR data. Factors like performance status documentation quality, the completeness of prior treatment history records, and time-period effects (changes in standard of care over the historical control period) all introduce potential confounding that FDA reviewers will scrutinize carefully.
FDA's framework for external controls generally requires sponsors to pre-specify the RWD source and selection criteria, characterize data quality and completeness, and conduct sensitivity analyses addressing potential confounders. This is not a lightweight validation exercise — it is a substantial methodological commitment that should be planned for in protocol design, not retrofitted at submission time.
RWD for Patient Identification: The Enrollment Operations Use Case
Separate from its use in regulatory evidence — and this distinction matters for how sponsors plan their data strategies — EHR-derived RWD has a well-established and operationally important use in patient identification for clinical trial enrollment. This is not a regulatory submission use case. It is a pre-enrollment operations use case: using de-identified patient population data from EHR systems to identify individuals who match a trial's inclusion and exclusion criteria, before formal protocol screening begins.
This use of RWD does not require the fit-for-purpose evidentiary standards that FDA applies to regulatory submissions. It requires different standards: de-identification standards consistent with HIPAA's Safe Harbor or Expert Determination methods, data quality sufficient to support accurate phenotype matching, and data use agreement terms that permit use of the data for this purpose. These are distinct regulatory and contractual frameworks from the FDA RWE framework, though they require their own careful compliance planning.
The operational benefit is substantial. EHR phenotype matching at the patient identification stage reduces the screen failure rate by filtering out patients who fail clear eligibility criteria before coordinator time is invested in formal screening. The RWD use case here is decidedly operational rather than evidentiary — but it is grounded in the same data infrastructure investments that also enable the regulatory evidence use cases.
21 CFR Part 11 and the Audit Trail Requirement
Where EHR-derived data does enter the regulatory submission context — as source documentation for trial data, as contemporaneous record supporting case report form entries, or as electronic records subject to FDA inspection — 21 CFR Part 11 requirements apply. Part 11 establishes requirements for electronic records and electronic signatures used in clinical investigations: audit trails for electronic record modifications, access controls, system validation documentation, and record retention requirements.
EHR systems vary substantially in their native compliance with Part 11 requirements. Health system EHRs are designed primarily for clinical care documentation, not for clinical research documentation — and the audit trail requirements for research-quality data are more stringent than those that typically apply to routine clinical documentation. Sponsors using EHR data in a manner that brings it within the scope of Part 11 need to assess whether the specific EHR systems at their sites meet Part 11 requirements for the specific data elements being used, and whether additional controls or documentation are required.
We're not suggesting that standard EHR data fails Part 11 categorically — many modern EHR implementations include audit trail and access control features that meet Part 11 requirements for specific use cases. The point is that compliance cannot be assumed; it requires explicit assessment of each system against the specific regulatory context in which the data will be used.
The Trajectory: Where FDA's RWE Program Is Heading
FDA's RWE Program has been operating as an ongoing, iterative policy development exercise since 2018, with new guidances, pilot program findings, and framework updates published regularly. The trajectory is toward expanded use — particularly for supplemental indications in already-approved therapeutic areas, for pragmatic clinical trials that embed research procedures within usual care, and for post-approval safety and effectiveness studies where the research question can be adequately addressed with observational data.
For sponsors designing Phase II/III programs in 2025, the practical implication is that RWD infrastructure investments — EHR integration capability, de-identification pipelines, FHIR data exchange — now serve multiple functions simultaneously: they support patient identification for enrollment, they enable external control arm design for small populations, and they create the data infrastructure that post-approval commitments increasingly require. The enrollment operations investment and the regulatory evidence investment are increasingly the same investment. That convergence is one of the more important structural changes in how sponsors should think about clinical data infrastructure over the course of a development program.