2023-09-28 qPCR Quality Control

Author

Dan Rice

Published

September 28, 2023

Objectives

Putting together a standard set of quality control checks and plots for qPCR. Includes standard curves and dilution series.

Preliminary work

This is the same data as in this analysis.

Data import

library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(purrr)
library(tibble)
library(stringr)
library(ggplot2)
library(tidyr)
library(broom)
read_data <- function(dir, pattern, col_types) {
  list.files(
    dir,
    pattern = pattern,
    full.names = TRUE,
  ) |>
    map(function(f) {
      read_csv(f, skip = 23, col_types = col_types, ) |>
        mutate(plate = str_extract(basename(f), "(.*)_(.*)_[0-9]{8}_[0-9]{6}\\.csv", group = 1))
    }) |>
    list_rbind() |>
    separate_wider_regex(
      `Well Position`,
      c(well_row = "[A-Z]+", well_col = "[0-9]+"),
      cols_remove = FALSE,
    ) |>
    mutate(well_col = as.integer(well_col))
}

There were some wells mixed up between the intended plate layout and the actual plate for Norovirus. See Results doc.

corrected_samples <- tribble(
  ~`Well Position`, ~Sample,
  "F7",             "empty",
  "F8",             "empty",
  "F9",             "empty",
  "F10",            "empty",
  "F11",            "empty",
  "F12",            "empty",
  "G7",             "6C/100",
  "G8",             "6C/10",
  "G9",             "7C/100",
  "G10",            "7C/10",
  "G11",            "3C/100",
  "G12",            "3C/10",
  "H7",             "1C/100",
  "H8",             "1C/10",
  "H9",             "1C/100",
  "H10",            "1C/10",
  "H11",            "NTC",
  "H12",            "NTC",
) |>
  add_column(plate = "2023-09-14_Noro_Extractions")
correct_samples <- function(df, corrected_samples) {
  left_join(df, corrected_samples, by = join_by(`Well Position`, plate)) |>
    mutate(Sample = ifelse(is.na(Sample.y), Sample.x, Sample.y), .keep = "unused")
}
data_dir <-
  "~/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/"
metadata_file <- paste0(
  data_dir,
  "[2023-09-11] Extraction Experiment 2 templates and results",
  " - sampleMetadata.csv"
)
metadata <- read_csv(metadata_file) |> glimpse()
Rows: 21 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Sample_ID, Extraction_kit, Short_kit, Elution_format, Brand, NA_Target
dbl (2): Kit Batch, Elution_volume
lgl (1): FP_sampleID

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 21
Columns: 9
$ Sample_ID      <chr> "1A", "1B", "1C", "2A", "2B", "2C", "3A", "3B", "3C", "…
$ Extraction_kit <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quick-RNA", "…
$ Short_kit      <chr> "1_ZR", "1_ZR", "1_ZR", "2_ZD", "2_ZD", "2_ZD", "3_IPL"…
$ `Kit Batch`    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3…
$ Elution_volume <dbl> 30, 30, 30, 100, 100, 100, 100, 100, 100, 60, 60, 60, 1…
$ Elution_format <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2", "50*2",…
$ Brand          <chr> "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Invitr…
$ NA_Target      <chr> "RNA", "RNA", "RNA", "DNA+RNA", "DNA+RNA", "DNA+RNA", "…
$ FP_sampleID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
col_types <- list(
  Target = col_character(),
  Cq = col_double()
)
res_data <- read_data(
  paste0(data_dir, "qpcr"),
  pattern = "Results",
  col_types = col_types
) |>
  correct_samples(corrected_samples) |>
  separate_wider_regex(
    Sample,
    c(Sample_ID = ".+", "/", dilution = "[0-9]+$"),
    too_few = "align_start",
  ) |>
  mutate(
    replicate = str_extract(Sample_ID, "[A-Z]$"),
    quantity = as.double(Sample_ID),
    dilution = as.integer(dilution) |> replace_na(1),
  ) |>
  left_join(metadata, by = join_by(Sample_ID)) |>
  glimpse()
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample_ID)`.
Caused by warning:
! NAs introduced by coercion
Rows: 369
Columns: 35
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ well_row                <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ well_col                <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "NO_…
$ `Amp Score`             <dbl> 1.1701024, 1.0212594, 1.1604009, 1.1057531, 1.…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 32.44077, 35.24757, 33.72678, 34.44077, 34.704…
$ `Cq Confidence`         <dbl> 0.9451320, 0.9773032, 0.9759027, 0.9634213, 0.…
$ `Cq Mean`               <dbl> 33.80504, 33.80504, 33.80504, 34.31332, 34.313…
$ `Cq SD`                 <dbl> 1.4050310, 1.4050310, 1.4050310, 0.4683633, 0.…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 32, 30, 31, 31, 31, 31, 31, 31, 21, 21, 21…
$ plate                   <chr> "2023-09-14_Cov2_extractions", "2023-09-14_Cov…
$ Sample_ID               <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ dilution                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ replicate               <chr> "A", "A", "A", "C", "C", "C", "B", "B", "B", N…
$ quantity                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1000, 1000…
$ Extraction_kit          <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quic…
$ Short_kit               <chr> "1_ZR", "1_ZR", "1_ZR", "3_IPL", "3_IPL", "3_I…
$ `Kit Batch`             <dbl> 1, 1, 1, 1, 1, 1, 3, 3, 3, NA, NA, NA, 1, 1, 1…
$ Elution_volume          <dbl> 30, 30, 30, 100, 100, 100, 80, 80, 80, NA, NA,…
$ Elution_format          <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2"…
$ Brand                   <chr> "Zymo", "Zymo", "Zymo", "Invitrogen", "Invitro…
$ NA_Target               <chr> "RNA", "RNA", "RNA", "RNA", "RNA", "RNA", "RNA…
$ FP_sampleID             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
amp_data <- read_data(
  paste0(data_dir, "qpcr"),
  pattern = "Amplification Data",
  col_types = col_types
) |>
  correct_samples(corrected_samples) |>
  mutate(
    replicate = str_extract(Sample, "[A-Z]"),
    Sample_ID = str_split_i(Sample, "/", 1),
    dilution = str_split_i(Sample, "/", 2) |> as.integer() |> replace_na(1),
    quantity = as.double(Sample),
  ) |>
  left_join(metadata, by = join_by(Sample_ID)) |>
  glimpse()
Warning: The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample)`.
Caused by warning:
! NAs introduced by coercion
Rows: 14,760
Columns: 23
$ Well            <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ well_row        <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
$ well_col        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ `Well Position` <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", …
$ `Cycle Number`  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ Target          <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Rn              <dbl> 0.8770601, 0.8700446, 0.8562693, 0.8477768, 0.8438748,…
$ dRn             <dbl> 3.107913e-02, 2.618482e-02, 1.453075e-02, 8.159439e-03…
$ Omit            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE…
$ plate           <chr> "2023-09-14_Cov2_extractions", "2023-09-14_Cov2_extrac…
$ Sample          <chr> "1A", "1A", "1A", "1A", "1A", "1A", "1A", "1A", "1A", …
$ replicate       <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",…
$ Sample_ID       <chr> "1A", "1A", "1A", "1A", "1A", "1A", "1A", "1A", "1A", …
$ dilution        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ quantity        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Extraction_kit  <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quick-RNA", …
$ Short_kit       <chr> "1_ZR", "1_ZR", "1_ZR", "1_ZR", "1_ZR", "1_ZR", "1_ZR"…
$ `Kit Batch`     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Elution_volume  <dbl> 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30…
$ Elution_format  <chr> "15*2", "15*2", "15*2", "15*2", "15*2", "15*2", "15*2"…
$ Brand           <chr> "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo"…
$ NA_Target       <chr> "RNA", "RNA", "RNA", "RNA", "RNA", "RNA", "RNA", "RNA"…
$ FP_sampleID     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

Amplification curves

amp_data |>
  filter(!is.na(Extraction_kit)) |>
  ggplot(mapping = aes(
    x = `Cycle Number`,
    y = dRn,
    color = as.factor(dilution),
    group = Well
  )) +
  geom_line() +
  facet_grid(
    cols = vars(Extraction_kit), rows = vars(Target), scales = "free_y"
  )

amp_data |>
  filter(!is.na(quantity)) |>
  ggplot(mapping = aes(
    x = `Cycle Number`,
    y = dRn,
    color = as.factor(quantity),
    group = interaction(plate, Well),
  )) +
  geom_line() +
  facet_wrap(~Target, scales = "free_y")

Standard ruler

Maximum efficiency means doubling every cycle. The standard curve points here represent 10x dilutions. We can compare the amplification curves for the standard curve samples against a standard ruler: a set of idealized curves \(y(c) = y_0 2^c\) with \(y_0\) spaced \(10\times\) apart.

plot_amp <- function(data, color) {
  ggplot(data, aes(x = `Cycle Number`, y = dRn)) +
    geom_line(mapping = aes(
      color = as.factor({{ color }}),
      group = interaction(plate, Well),
    )) +
    scale_y_log10(limits = c(1e-3, 1e1))
}

ruler <- function(y0_from, num_rules) {
  y0 <- 10^seq(from = y0_from, by = -1, length.out = num_rules)
  rules <- crossing(`Cycle Number` = amp_data$`Cycle Number`, y0 = y0) |>
    mutate(dRn = y0 * 2^`Cycle Number`)
  geom_line(
    data = rules,
    mapping = aes(group = y0),
    color = "black"
  )
}

plot_amp_with_ruler <- function(target, y0_from, num_rules) {
  amp_data |>
    filter(!is.na(quantity), Target == target) |>
    plot_amp(quantity) +
    ruler(y0_from, num_rules) +
    labs(title = target)
}
plot_amp_with_ruler("16S", -6, 5)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 11 rows containing missing values (`geom_line()`).
Warning: Removed 133 rows containing missing values (`geom_line()`).

plot_amp_with_ruler("CrA", -5, 5)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 25 rows containing missing values (`geom_line()`).
Warning: Removed 133 rows containing missing values (`geom_line()`).

plot_amp_with_ruler("Noro", -8, 5)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 53 rows containing missing values (`geom_line()`).
Warning: Removed 136 rows containing missing values (`geom_line()`).

plot_amp_with_ruler("Cov2", -9, 5)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 93 rows containing missing values (`geom_line()`).
Warning: Removed 142 rows containing missing values (`geom_line()`).

dilution_kits <- c(
  "Invitrogen PureLink RNA",
  "QIAamp Viral RNA mini kit",
  "Qiagen AllPrep PowerViral DNA/RNA",
  "Zymo quick-RNA"
)
target <- "Noro"
amp_data |>
  filter(Target == target, is.element(Extraction_kit, dilution_kits)) |>
  plot_amp(dilution) +
  ruler(-9, 3) +
  facet_wrap(~Extraction_kit) +
  labs(title = target)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 118 rows containing missing values (`geom_line()`).
Warning: Removed 80 rows containing missing values (`geom_line()`).

target <- "CrA"
amp_data |>
  filter(Target == target, is.element(Extraction_kit, dilution_kits)) |>
  plot_amp(dilution) +
  ruler(-7, 3) +
  facet_wrap(~Extraction_kit) +
  labs(title = target)
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 76 rows containing missing values (`geom_line()`).
Warning: Removed 80 rows containing missing values (`geom_line()`).