How Much Data From a Sequencing Run?

2025-06-25

Manufacturers often give optimistic estimates for how much data their systems produce, but performance in practice can be pretty different. What have we seen at the NAO?

We've worked with two main sample types, wastewater and nasal swabs, but Simon already wrote something up on swabs so I'm just going to look at wastewater here.

We've sequenced with both Illumina and Oxford Nanopore (ONT), and the two main flow cells we've used are the:

Ignoring library prep consumables, labor, lab space, machine cost, and looking just at the flow cells, if we took the manufacturer's costs at face value this would be:

With 25B flow cells we've generally seen output meeting or exceeding the advertised 26B read pairs (8,000 Gbp). In our sequencing at BCL we've averaged 29.4B read pairs per sample (8,800 Gbp; n=37), while recently MU has been averaging 27.2B read pairs (8,200 Gbp; n=4, older runs were ~20% lower). It's great to be getting more than we expected!

On the other hand, with PromethION flow cells we've generally seen just 3.3 Gbp (n=25) on wastewater. This is slightly higher than the 2.5 Gbp we've seen with nasal swabs, but still far below 200 Gbp. We don't know why our output is so much lower than advertised, but this is what we've seen so far.

This would give us:

We're still not done, though, because while this is correct in terms of raw bases coming off the sequencer, with paired-end sequencing on short fragments like we have in wastewater a portion of many of your reads will be adapters. We see a median of 170bp after adapter trimming, out of an initial 300bp, which means we only retain ~60% of the raw bases. Accounting for this, we have:

Overall, Illumina is much more cost-effective for us with our current protocols. If we were able to get better results from ONT that would close the gap partially, but a gap of nearly two orders of magnitude we'd need very large improvements.


← back