Automating ICSR Triage: A Case Study from a Generic Manufacturer PV Department

The following describes a PV department workflow change we followed over approximately 14 months. The company is a generic pharmaceutical manufacturer with a US and EU portfolio of 23 marketed products. Their PV team at the start of this period consisted of 7 scientists — 3 with case processing as their primary function, 2 focused on signal management, 1 handling aggregate reporting, and a VP-level pharmacovigilance lead who reviewed escalations and signed off on regulatory submissions. We've described them as "Generix" throughout.

Generix was not in crisis. They were compliant. Their 15-day expedited reporting was consistently on time, their PSUR schedule was maintained, their signal management program met applicable ICH E2C(R2) requirements. What they had was a capacity problem: ICSR intake volume had grown 40% over the prior two years as they expanded into new EU markets, and the three case-processing scientists were routinely working at the edge of their bandwidth without any slack for unexpected volume spikes — a product recall, a cluster of serious cases following a formulation change, a busy influenza season that drove background ADR reporting up across their portfolio.

The Baseline Measurement

Before implementing any changes, the PV lead asked us to shadow the case triage workflow for three weeks and document where time actually went. We want to be specific about what "triage" meant in their workflow because the term gets used loosely.

At Generix, ICSR triage meant: receiving a report from their case intake sources (primarily the 15-day partner network, their patient support program, and a literature surveillance feed), making the initial classification decisions required before a case enters their safety database as a formal ICSR, and completing enough data entry to make the case reviewable by the medical reviewer for seriousness and expectedness assessment.

That initial classification and entry step took an average of 4.2 hours per case across the 3-week observation period, measured against 63 cases. The range was substantial — 1.8 hours for straightforward consumer-reported non-serious cases with a single preferred term, 7.4 hours for complex serious cases with multiple concomitant medications, incomplete reporter information, and PT-level coding uncertainty.

The 4.2-hour average broke down roughly as follows:

Duplicate check and source reconciliation: 48 minutes on average. Generix received cases through three intake channels, and the same case occasionally appeared in multiple channels with different reporter information. Manual deduplication required checking narrative similarity, patient descriptors, and event dates across sources.
MedDRA coding: 62 minutes average, ranging from 15 minutes for straightforward events to over 3 hours for multi-symptom narratives where the appropriate level of coding specificity was unclear.
Product identification and license number reconciliation: 31 minutes average. With 23 products across multiple markets, cases didn't always arrive with clearly identified product identifiers. Mapping reported product names to their internal safety database product records required cross-referencing a product master list.
Data entry and completeness assessment: 47 minutes average, including the structured follow-up request generation for incomplete cases.
Intake classification (serious/non-serious, listedness): 34 minutes average.

Where Automation Was Applicable

Working from the time breakdown, it became clear that three of the five triage sub-tasks had meaningful automation potential and two did not.

Duplicate detection was the clearest candidate. The logic for recognizing a probable duplicate — same patient, same drug, same event, similar dates with minor variation — is well-defined and doesn't require clinical judgment. Generix's manual process involved a scientist reviewing cases side-by-side and making a judgment call. An automated matching algorithm using demographic profile similarity scoring and narrative NLP could surface probable duplicates for human confirmation rather than requiring the scientist to discover them through search.

Product identification and license number reconciliation was similarly bounded. The variation in how reporters described products was finite and catalogable — brand name variants, strength misspecifications, country-specific trade names. A lookup table backed by fuzzy string matching could handle 70-80% of cases without human intervention, flagging only the genuinely ambiguous ones.

Data entry and completeness assessment had partial automation potential. Structured fields — date of birth, event onset date, reporter type, country — could be populated directly from structured intake forms. Follow-up request generation for missing information was largely templated and product-specific, with the variable content being the specific missing fields rather than the clinical framing.

MedDRA coding and seriousness/expectedness assessment were not automated. Coding requires clinical judgment, particularly for consumer-reported cases where the narrative uses lay language that maps ambiguously onto MedDRA preferred terms. A case describing "felt like my heart was racing and I couldn't catch my breath" requires a scientist to determine whether the appropriate coding is Palpitations, Tachycardia, Dyspnoea, or a combination — and that determination has downstream regulatory consequences. Seriousness assessment (ICH E2A serious criteria) similarly requires clinical interpretation. We're clear that these are human judgment tasks and not candidates for replacement by automation in the current state of NLP reliability for regulatory-grade decisions.

Implementation and the Adjusted Workflow

Generix implemented structured automation on the three automatable sub-tasks over a 6-month period. The implementation was deliberately staged: they started with product identification reconciliation because the failure mode (an incorrect mapping) was detectable in downstream review, then moved to duplicate detection, then data entry prefill.

The approach to duplicate detection used probabilistic matching across patient-level descriptors with a confidence threshold. Above the threshold, cases were flagged as probable duplicates for a 10-minute scientist review rather than a full manual search. Below the threshold, they proceeded as distinct cases. The threshold was set conservatively to minimize false dismissals — it was better to review a non-duplicate than to merge two distinct cases.

After 6 months of operation, we measured average triage time at 47 minutes per case against a comparison set of 78 cases over three weeks. The reduction from 4.2 hours to 47 minutes represented primarily the elimination of duplicate search time and product reconciliation time, plus reduced data entry time from prefilled structured fields. MedDRA coding time decreased only marginally — the automation flag for ambiguous terms helped scientists prioritize attention but didn't reduce the coding work itself.

What the Time Savings Actually Enabled

This is the part of the case study we think matters more than the efficiency numbers. The 3 case-processing scientists recovered an average of roughly 2.5 FTE-hours per case processed. At their ICSR intake volume (approximately 140 cases per quarter at the time of post-implementation measurement), that's approximately 350 hours per quarter returned to the team.

The PV lead had a deliberate decision to make about how to deploy that capacity. She chose not to reduce headcount. Instead, she had one of the three case-processing scientists rotate 60% of their time into signal management support — specifically, the case narrative review and case series analysis that the signal management scientists had been doing themselves because case processing was too constrained to assist. The two remaining case-processing scientists maintained the ICSR volume with comfortable compliance margin.

The signal management benefit was the intended outcome but was harder to quantify. What the team reported was that the signal assessment cycle — from initial disproportionality flag to completed signal assessment report — shortened meaningfully because preliminary case series analysis could be completed within the same review cycle rather than waiting for the next quarterly period.

What Didn't Work as Expected

We want to be honest about where the implementation fell short of initial expectations.

The MedDRA coding suggestion tool that was piloted in parallel with the structured automation delivered inconsistent value. For high-volume, straightforward events, it suggested appropriate PTs accurately. For the complex, low-frequency cases that consumed the most coding time, it consistently underperformed — precisely the cases where coding support would have been most valuable. The team ultimately disabled the coding suggestion layer after three months and returned to pure manual coding for all cases. The rule here is: automation that performs well on easy cases and poorly on hard ones doesn't relieve bottlenecks; it risks introducing errors exactly where the stakes are highest.

Follow-up request generation also underperformed. While the templates were accurate, reporters found automated follow-up communications less engaging than scientist-authored requests and response rates declined. The team moved to hybrid generation — automated draft plus a brief scientist personalization step — which partially recovered response rates. The overhead of the personalization step reduced the time saving in this sub-task.

The net result was still strongly positive, and the team remained at the reduced average of under an hour per case. But the path there had detours that a more cautious initial automation scope might have avoided.