Sample Prevalance vs Global Prevalence


Thanks to Evan Fields and Mike McLaren for editorial feedback on this post.

In Detecting Genetically Engineered Viruses With Metagenomic Sequencing we have:

our best guess is that if this system were deployed at the scale of approximately $1.5M/y it could detect something genetically engineered that shed like SARS-CoV-2 before 0.2% of people in the monitored sewersheds had been infected.

I want to focus on the last bit: "in the monitored sewersheds". The idea is, if a system like this is tracking wastewater from New York City, its ability to raise an alert for a new pandemic will depend on how far along that pandemic is in that particular city. This is closely related to another question: what fraction of the global population would have to be infected before it could raise an alert?

There are two main considerations pushing in opposite directions, both based on the observation that the pandemic will be farther along in some places than others:

My guess is that with a single monitored city, even the optimal one (which one is that even?) your sample prevalence will significantly lag global prevalence in most pandemics, but by carefully choosing a few cities to monitor around the world you can probably get to where it leads global prevalence. But I would love to see some research and modeling on this: qualitative intutitions don't take us very far. Specifically:

If you know of good work on these sorts of modeling questions or are interested in collaborating on them, please get in touch! My work email is jeff at

← back