Why your supplier risk score is probably wrong
A number typed in last quarter isn't risk intelligence. Four ways scores go wrong, and the three properties of one you can trust.
Pick a critical vendor and look at its risk score. Now ask one question: what would have to happen in the real world for that number to change today?
For most third-party risk programs the honest answer is "nothing." The score was computed from a questionnaire answered eight months ago, weighted by a tiering decision made at onboarding, and it will not move again until the next assessment cycle. It is not a measurement. It is a memory.
The four ways scores go wrong
- They are stale by design. An annual assessment cadence means your score is, on average, six months old. Certificates lapse, subprocessors change, key staff leave, breaches happen, and the number sits still. A score that cannot move between assessments is a snapshot wearing a gauge costume.
- They are self-reported at the worst moment. Questionnaires are answered by the vendor, during a sales cycle, by whoever was assigned the ticket. That does not make vendors liars; it makes the data optimistic at the margin, and the margin is where risk lives. A score built only on attestation measures the vendor's diligence at form-filling.
- They average away the signal. Most composite scores are weighted means, and means hide failure. A vendor that is excellent at twenty controls and catastrophic at one (say, no MFA on the admin plane) can average out to a comfortable amber. Some control failures should cap the score, not dilute into it. If your model cannot express "this single fact makes the vendor high-risk," it will systematically launder your worst exposures into the middle of the distribution.
- Nobody can explain them. Ask why a vendor is a 72 and watch what happens. If the answer is a shrug toward a black-box methodology, the team has already stopped trusting it, and a score the team does not trust does not get acted on. Scores do not fail loudly; they fail by being politely ignored.
What a score that means something looks like
None of this is an argument against scoring. Portfolios of hundreds of vendors need compression. It is an argument for scores with three properties:
- Live inputs. The score combines the last assessment with what has happened since: monitoring signals, certification status, incident history, open remediation items. When a control fails in the real world, the number moves the same day, and the movement routes a task to the vendor's owner.
- Evidence over attestation, weighted honestly. An answer backed by a current SOC 2 report, a pen-test summary, or a verified configuration outranks a bare "yes." Same questionnaire, different confidence, and the score should know the difference.
- Explainability on one screen. Anyone should be able to click the number and see exactly what built it: which assessment, which signals, which failures, which caps. Explainability is not a reporting nicety; it is the entire mechanism by which a team comes to trust, and therefore act on, the number.
The test
Here is the uncomfortable diagnostic. Take your highest-scoring critical vendor and your lowest-scoring one, and ask the team which one they would actually worry about in an incident. If the answer contradicts the scores (and in many programs it quietly does) the scoring model is not measuring risk. It is measuring paperwork.
A number typed in last quarter is not risk intelligence. A number that moves when the world does, built from evidence, explainable to the person who has to defend it in front of the board: that is the difference between reporting risk and seeing it.