Ninety-one days. That's the median lead time we measured when we ran our N-drug graph analysis retrospectively against FAERS signals that were formally confirmed between 2019 and 2022. If our model had been running live on the quarterly data during that period, it would have flagged the interaction pattern an average of 91 days before the traditional disproportionality methods — running on the same data — crossed their detection threshold.
I want to be careful about what we're claiming and what we're not. This is a retrospective validation exercise, not a prospective clinical trial. We designed the methodology to be as honest as possible about its limitations, and I think it's worth walking through how we set it up, what we found, and where the numbers should be taken with appropriate skepticism.
How We Defined "Confirmed Signal"
The hardest methodological choice in a lead time study is defining what counts as ground truth confirmation. We used three criteria for a signal to enter our validation set:
- The drug-event association appears in a FAERS quarterly drug safety update, a label change, or an FDA safety communication issued between January 2019 and December 2022
- The association was classified as "new" or "strengthened" in the regulatory action — signals that were simply reconfirmed from prior years were excluded
- The drug was available in FAERS with at least 500 total ICSRs in the database at the time of the regulatory action
This gave us a working set of 312 confirmed signal-drug-event pairs. The 500-ICSR minimum was a practical cut: below that count, statistical comparison between methods becomes unreliable because we can't distinguish methodological differences from small-sample noise. This did introduce a bias toward higher-volume reporting drugs — the validation set skews toward products with larger market share — and we flag that as a limitation throughout.
Simulating the Retrospective Environment
Lead time measurement requires that you never let the model "see" data from after the quarter you're testing. For each of the 312 confirmed signals, we identified the FAERS quarterly release that preceded the regulatory confirmation by at least one full quarter and reconstructed the database as it existed at that point. The model was run on that snapshot, not on any subsequent data.
This temporal isolation is critical and also annoying to implement. FAERS has a non-trivial rate of report submission backdating — reports submitted in Q4 of a year may have a patient start date two years earlier. We had to make a decision about whether "database as of quarter X" means reports received by quarter X or reports with adverse event dates prior to quarter X. We used receipt date, which is standard in signal detection practice, but acknowledges that receipt-date cutoffs systematically exclude late-submitted reports that might have crossed the detection threshold earlier.
For the comparison method, we ran PRR with the EMA's standard threshold (PRR ≥ 2, chi-squared ≥ 4, minimum 3 cases) and ROR with lower 95% confidence interval ≥ 1 and a minimum case count. We also ran BCPNN with IC025 ≥ 0 (lower end of the 95% credible interval for IC above zero). For our graph-based analysis, we used relative drug co-occurrence density metrics that we've described elsewhere, with a fixed false positive rate calibrated to match the PRR threshold's approximate specificity on held-out historical data.
The 91-Day Number and How It Distributes
The median 91-day lead time across the 312 signals masks a distribution that is considerably more informative than the median alone. The distribution is right-skewed: the mode is actually closer to 45 days, but a subset of signals showed lead times of 180 days or more, which pulls the median up. These are the cases where the interaction signal was visible in the co-occurrence structure of the database well before enough individual reports had accumulated to cross the pairwise threshold.
Breaking the results down by adverse event category:
- Hepatotoxicity signals: median lead time 104 days. These showed the largest gains, consistent with our hypothesis that hepatic enzyme-mediated interactions leave a clear co-occurrence footprint in FAERS even before individual drug reports are flagged
- QT prolongation/cardiac arrhythmia signals: median 87 days. Similar pattern — the co-medication structure in reports involving QT-related events is distinctive enough that the graph analysis picks up the pattern early
- Hematologic events (cytopenias, bleeding): median 71 days. Smaller gain, likely because many drugs with hematologic AEs are used in cleaner monotherapy contexts where co-occurrence patterns are less informative
- CNS events (encephalopathy, seizure, psychiatric): median 52 days. Smallest gain. CNS adverse events in FAERS tend to involve complex multifactorial histories that make co-occurrence patterns noisier
This pattern makes sense pharmacologically. The signals where we gain the most lead time are precisely the adverse event categories most associated with pharmacokinetic drug-drug interactions — hepatotoxicity and QT prolongation — where the co-medication context is mechanistically relevant.
False Positives: The Number That Matters More
A lead time result without a false positive rate is meaningless. If the graph analysis flags 10× more false positives than PRR, the earlier detection doesn't help — it just moves the workload problem upstream and buries the true signals in noise.
We measured false positive rate as: signals flagged by graph analysis that never resulted in a regulatory action within the 24-month follow-up window. On the validation set, calibrated to match PRR's specificity, the graph analysis false positive rate was within 15% of PRR's false positive rate. That is, for every 100 signals PRR generates that don't become confirmed, graph analysis generates approximately 115. That's a modest specificity cost for the lead time gain.
We want to be clear that 24-month follow-up is an imperfect confirmation criterion. Some true drug-interaction signals take longer than 24 months to reach regulatory action. Some signals are real pharmacologically but are not practically important enough to generate regulatory action. Our false positive estimate is thus conservative — we are calling some true positives "false positives" because they hadn't yet been confirmed at follow-up close.
What the 312-Signal Analysis Doesn't Tell You
We've been careful throughout our internal review to identify what this analysis can't support as conclusions:
It doesn't generalize to all therapeutic areas equally. The validation set is disproportionately drawn from cardiovascular, oncology, and infectious disease drugs — the areas with highest FAERS report volume. Whether the lead time advantage holds in lower-volume areas like rare disease or pediatric neurology is an open question we can't answer with this data.
It doesn't account for the signal assessment burden. Earlier detection only translates to better patient safety outcomes if the PV team has the capacity to act on signals that are flagged earlier. If quarterly review bandwidth is the bottleneck — and for many smaller PV teams it is — detecting signals 91 days sooner only helps if you can also assess them 91 days sooner. This is an organizational question, not a technical one.
It doesn't measure clinical outcome improvement directly. We are measuring time-to-detection, not time-to-label-change or reduction in adverse event incidence. Those downstream outcomes depend on many factors beyond signal detection — regulatory review timelines, labeling negotiation, prescriber behavior change — that are outside the scope of what we were testing.
Why We Published This Internally Before Building Into the Product
One of the things we spent considerable time on early in building TrialVyx was deciding whether to treat our validation methodology as a marketing claim or as a scientific question. The difference matters: a marketing claim gets shaped to look as favorable as possible; a scientific question gets examined for the ways it might be wrong.
We ran the analysis with the intention of publishing the methodology and the limitations alongside the headline number. If the lead time advantage turned out to be smaller than we expected, or concentrated in areas that matter less, that would be important to know before building a product around it — and before asking PV teams to change their workflows based on it.
The 91-day number held up under scrutiny. The distribution pattern was consistent with our mechanistic hypotheses about where graph analysis should and shouldn't outperform pairwise methods. The false positive cost was smaller than we feared. But the appropriate epistemic posture is still "promising retrospective result that deserves prospective validation," not "proven 3-month advantage." That distinction matters for how PV teams should weight this evidence when making decisions about their signal detection workflows.
We are actively working with early-access teams to design prospective validation protocols. That data will look different from retrospective analysis in ways we don't fully anticipate yet, and we expect to revise our understanding of where the real lead time gains are when it comes in.