Every fraud detection system that a P&C carrier deploys faces a fundamental tension between two error types. A false positive — flagging a legitimate claim as potentially fraudulent — delays payment to a claimant who is owed money, consumes SIU referral capacity on non-productive reviews, and creates an unfair-settlement-practices exposure under state DOI rules. A false negative — passing a fraudulent claim through to payment without review — results in a paid loss that may never be recovered. The carrier's job is not to maximize detection at the expense of payment accuracy. It is to calibrate precision and recall to fit its specific operational and regulatory environment.
This article examines that calibration — what precision and recall actually mean in fraud flagging, why the default setting for most P&C auto carriers should favor precision over recall, and what the SIU referral economics look like when the model is miscalibrated in either direction.
Precision and Recall: What the Terms Mean on a Claims Floor
In the fraud detection context: precision is the fraction of flagged claims that are genuinely fraudulent (or worth SIU review). Recall is the fraction of genuinely fraudulent claims that were flagged. A model with 90% precision and 40% recall flags a small number of claims, and nearly all of them warrant review — but it misses 60% of the fraud in the portfolio. A model with 40% precision and 85% recall generates many flags and catches most fraud — but 60% of SIU referrals turn out to be legitimate claims that were delayed and inconvenienced.
Insurance fraud detection literature frequently cites recall as the primary success metric: "we caught 85% of fraudulent claims." This framing makes sense in a context where SIU capacity is unlimited and payment delay has no cost. In a real P&C carrier operation, neither condition holds.
Why High-Precision, Lower-Recall Is the Right Setting for Most Carriers
SIU referral has a cost that is rarely captured in fraud detection ROI analyses. A referral requires an SIU investigator to open a file, review the anomaly signal, conduct phone or field investigation (for higher-suspicion cases), and document a disposition. The fully-loaded cost of a routine SIU referral review — phone investigation only, no field work — runs $150–$300 at industry-average staffing. A field investigation can reach $800–$1,500. These costs are incurred whether or not the referral results in a denial or reduction.
At a carrier processing 1,200 personal auto claims per month with a low-precision model generating flags on 12% of files (144 referrals per month) and a 35% precision rate, 51 of those referrals involve genuinely suspicious claims and 93 involve legitimate claims being investigated. The 93 false-positive referrals consume $14,000–$28,000 in SIU cost per month, produce delayed payment on legitimate claims, and expose the carrier to DOI scrutiny on prompt-payment compliance.
A high-precision model calibrated to flag 4% of files (48 referrals per month) with 75% precision produces 36 genuine referrals and 12 false positives. SIU capacity is concentrated on the files most likely to yield recoveries. The 12 false positives still represent 12 inappropriate delays, but the exposure is materially lower. The tradeoff is that the recall model catches fewer total fraudulent claims — but the fraudulent claims it misses are predominantly the marginal cases where evidence is weakest and successful SIU outcomes are least likely anyway.
We're not saying low recall is always acceptable — carriers in high-fraud geographic markets (certain Florida metro areas, specific California zip codes for soft-tissue BI, staged-accident corridors in New Jersey) may find that the frequency and severity of fraud justify higher referral rates and lower precision thresholds. The calibration is market-specific, not universal. But the default intuition — "flag more claims, you'll catch more fraud" — ignores the operational cost of every false positive in the queue.
The Fraud Indicator Signal Stack
Fraud flagging in personal auto claims does not operate from a single signal. Effective scoring models aggregate multiple indicator types, each with different predictive weight:
- Temporal indicators: Loss date shortly after policy inception (particularly new policies with high-value coverage that trigger shortly after binding); claim filed immediately after policy renewal; loss date clustering near end of policy period.
- Claimant and vehicle indicators: Policy address mismatching DMV registration address; multiple prior claims on the same policy or from the same claimant identity; vehicle VIN with mismatched title chain; vehicle age-to-coverage ratio anomalies (comprehensive on high-mileage beater).
- Loss circumstance indicators: Unwitnessed single-vehicle collision; fire loss with recent policy lapse followed by reinstatement; total-loss threshold proximity on repair estimates (estimate suspiciously close to the carrier's total-loss threshold).
- Social network indicators: Claimant sharing address, phone, or attorney with prior fraud-flagged files; body shop overlap with previously investigated files.
- ISO ClaimSearch: Prior claim history across carriers through the ISO ClaimSearch database — the industry-standard multi-carrier claim history lookup. Files that query high on ClaimSearch prior-loss frequency are a foundational input signal.
A scoring model that weights these signals against a calibrated threshold produces a composite fraud score per file. The threshold at which a score triggers SIU referral is the precision/recall dial — moving it up reduces flags and increases precision; moving it down increases flags and recall. The optimization target is the precision level at which SIU referral yield (recovery per referral dollar spent) is maximized.
A Calibration Scenario
A growing personal auto carrier in a three-state Northeast footprint reviewed its fraud model calibration after noticing that SIU referral volume had grown 40% year-over-year while SIU recovery had grown only 18%. The carrier's fraud model was generating signals on 9.8% of files with a measured precision rate (confirmed actionable referrals) of 31%. Adjusters were spending an average of 2.3 additional handling days on flagged files awaiting SIU disposition — directly impacting cycle time on nearly 10% of the portfolio.
After recalibrating the flagging threshold to target a precision rate of 68–72%, referral volume dropped to 4.1% of files. SIU recovery per dollar spent on investigation increased by 44% over the following two quarters. Average handling time on the previously-flagged cohort dropped back toward the unflagged baseline. The carrier did miss some fraud — recall fell from approximately 62% to roughly 38% — but the fraud it missed was concentrated in the lowest-scoring decile of the original flag population, where investigation outcomes were weakest.
DOI Scrutiny and the Prompt-Payment Dimension
State DOI prompt-payment regulations (enacted under the framework of NAIC Model 472 or equivalent state rules) require carriers to complete claims investigations and pay or deny within defined windows. A fraud flag that triggers a 90-day SIU review on a claim that ultimately proves legitimate is not a neutral event from a regulatory standpoint. Several state DOIs have issued market conduct guidance specifically addressing over-broad SIU referral practices as a potential unfair-claims-settlement violation.
The carrier's internal standard should require that a fraud flag not pause the prompt-payment clock indefinitely — rather, it triggers an expedited SIU review with a defined maximum duration (typically 30 days for phone investigation, 45 days for field) before the file either proceeds to payment (legitimate) or proceeds to denial with a documented basis (actionable). SIU referral processes that lack defined review windows create both prompt-payment exposure and operational unpredictability in the adjuster queue.
The Working Principle
Fraud detection in P&C auto is a resource allocation problem as much as a detection problem. SIU capacity is finite. Each referral consumes capacity. The goal is to deploy that capacity on the files most likely to yield recoveries — which means designing the model around precision, not recall. Catch the right claims, not the most claims. The fraud your model misses because it falls below a well-calibrated threshold is, in expectation, fraud that would have cost more to investigate than to recover. That is not a failure of the model — it is the model working correctly.