2023-07-18 Extraction Experiment 1 qPCR Analysis

Author

Dan Rice

Published

September 20, 2023

Objectives

Testing the efficacy of RNA extraction kits in influent samples. See experiment google doc.

Preliminary work

  • Someone else already exported qPCR results as CSV
  • I copied the file meta_samples.csv and manually added elution volumes that I found in this doc and saved the results in meta_samples_with_elution_volumes.csv.

Data import

library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(ggplot2)
library(tidyr)
library(broom)
data_dir <-
  "~/airport/[2023-07-18] Extraction-kit comparison 1: Influent/qPCR results/"
filename_pattern <- "_Results_"

Not sure why I had to change skip from 23 to 22.

col_types <- list(
  Target = col_character(),
  Cq = col_double()
)
raw_data <- list.files(
  data_dir,
  pattern = filename_pattern,
  recursive = TRUE,
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 22,
      col_types = col_types,
    )
  }) |>
  list_rbind()
[1] "/Users/dan/airport/[2023-07-18] Extraction-kit comparison 1: Influent/qPCR results//Cov2/2023-07-26_Cov2-kits_Results_20230728 133121.csv"        
[2] "/Users/dan/airport/[2023-07-18] Extraction-kit comparison 1: Influent/qPCR results//CrA/2023-07-26_CrA-kits_Results_20230728 135806.csv"          
[3] "/Users/dan/airport/[2023-07-18] Extraction-kit comparison 1: Influent/qPCR results//Noro/2023-07-26_Noro_kits_Results_20230728 140336.csv"        
[4] "/Users/dan/airport/[2023-07-18] Extraction-kit comparison 1: Influent/qPCR results//phagemid/2023-07-26_Phagemid-kits_Results_20230728 141050.csv"
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
print(raw_data)
# A tibble: 333 × 21
    Well `Well Position` Omit  Sample  Target Task    Reporter Quencher
   <dbl> <chr>           <lgl> <chr>   <chr>  <chr>   <chr>    <chr>   
 1     1 A1              FALSE 1_ZR_A  Cov2   UNKNOWN FAM      NFQ-MGB 
 2     2 A2              FALSE 1_ZR_B  Cov2   UNKNOWN FAM      NFQ-MGB 
 3     3 A3              FALSE 1_ZR_C  Cov2   UNKNOWN FAM      NFQ-MGB 
 4     4 A4              FALSE 2_ZDR_A Cov2   UNKNOWN FAM      NFQ-MGB 
 5     5 A5              FALSE 2_ZDR_B Cov2   UNKNOWN FAM      NFQ-MGB 
 6     6 A6              FALSE 2_ZDR_C Cov2   UNKNOWN FAM      NFQ-MGB 
 7     7 A7              FALSE 3_IP_A  Cov2   UNKNOWN FAM      NFQ-MGB 
 8     8 A8              FALSE 3_IP_B  Cov2   UNKNOWN FAM      NFQ-MGB 
 9     9 A9              FALSE 3_IP_C  Cov2   UNKNOWN FAM      NFQ-MGB 
10    10 A10             FALSE 4_NV_A  Cov2   UNKNOWN FAM      NFQ-MGB 
# ℹ 323 more rows
# ℹ 13 more variables: `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>
raw_data |> count(Target)
# A tibble: 4 × 2
  Target     n
  <chr>  <int>
1 Cov2      71
2 CrA       95
3 Noro      71
4 Phg       96

TODO: use other metadata table

metadata_file <- paste0(data_dir, "meta_samples_with_elution_volumes.csv")
metadata <- read_csv(metadata_file)
Rows: 27 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): sample_qPCR, Kit, treatment_group, LPA, other_treatment, Virus_spec...
dbl (2): group, Elution_volume

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(metadata)
Rows: 27
Columns: 9
$ sample_qPCR     <chr> "1_ZR_A", "1_ZR_B", "1_ZR_C", "2_ZDR_A", "2_ZDR_B", "2…
$ Kit             <chr> "Zymo RNA", "Zymo RNA", "Zymo RNA", "Zymo RNA/DNA", "Z…
$ treatment_group <chr> "ZQ-RNA", "ZQ-RNA", "ZQ-RNA", "ZQ-RNADNA", "ZQ-RNADNA"…
$ LPA             <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
$ group           <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ other_treatment <chr> "Shield", "Shield", "Shield", "Shield", "Shield", "Shi…
$ Virus_specific  <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", …
$ FP_Data         <chr> "No", "Yes", "No", "No", "Yes", "No", "No", "Yes", "No…
$ Elution_volume  <dbl> 15, 15, 15, 50, 50, 50, 100, 100, 100, 30, 30, 30, 100…
tidy_data <- raw_data |>
  mutate(
    replicate = str_extract(Sample, "[A-Z]$"),
  ) |>
  left_join(
    metadata,
    by = join_by(Sample == sample_qPCR)
  )
glimpse(tidy_data)
Rows: 333
Columns: 30
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Sample                  <chr> "1_ZR_A", "1_ZR_B", "1_ZR_C", "2_ZDR_A", "2_ZD…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "Amp", "Amp", "Amp", "Amp", "Amp", "Amp", "Amp…
$ `Amp Score`             <dbl> 1.1919015, 1.2241620, 1.3026766, 1.3064751, 1.…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 33.59224, 33.85453, 32.88852, 32.32325, 35.019…
$ `Cq Confidence`         <dbl> 0.9630252, 0.9623811, 0.9919308, 0.9907208, 0.…
$ `Cq Mean`               <dbl> 33.98352, 33.89186, 33.66943, 33.92028, 34.152…
$ `Cq SD`                 <dbl> 0.55335254, 0.05279505, 1.10437933, 2.25853860…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.04098764, 0.04098764, 0.04098764, 0.04098764…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 31, 30, 30, 29, 32, 31, 31, 31, 30, 32, 33, 31…
$ replicate               <chr> "A", "B", "C", "A", "B", "C", "A", "B", "C", "…
$ Kit                     <chr> "Zymo RNA", "Zymo RNA", "Zymo RNA", "Zymo RNA/…
$ treatment_group         <chr> "ZQ-RNA", "ZQ-RNA", "ZQ-RNA", "ZQ-RNADNA", "ZQ…
$ LPA                     <chr> "No", "No", "No", "No", "No", "No", "No", "No"…
$ group                   <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 3, 3, 3, 3…
$ other_treatment         <chr> "Shield", "Shield", "Shield", "Shield", "Shiel…
$ Virus_specific          <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No"…
$ FP_Data                 <chr> "No", "Yes", "No", "No", "Yes", "No", "No", "Y…
$ Elution_volume          <dbl> 15, 15, 15, 50, 50, 50, 100, 100, 100, 30, 30,…
tidy_data |>
  count(Sample, Kit, LPA, Target, replicate) |>
  print(n = Inf)
# A tibble: 126 × 6
    Sample      Kit                 LPA   Target replicate     n
    <chr>       <chr>               <chr> <chr>  <chr>     <int>
  1 1.0         <NA>                <NA>  Cov2   <NA>          3
  2 1.0         <NA>                <NA>  Noro   <NA>          3
  3 10.0        <NA>                <NA>  Cov2   <NA>          3
  4 10.0        <NA>                <NA>  Noro   <NA>          3
  5 100.0       <NA>                <NA>  Cov2   <NA>          3
  6 100.0       <NA>                <NA>  Noro   <NA>          3
  7 1000.0      <NA>                <NA>  Cov2   <NA>          3
  8 1000.0      <NA>                <NA>  Noro   <NA>          3
  9 10000.0     <NA>                <NA>  Cov2   <NA>          2
 10 10000.0     <NA>                <NA>  Noro   <NA>          2
 11 1_ZR_A      Zymo RNA            No    Cov2   A             2
 12 1_ZR_A      Zymo RNA            No    CrA    A             3
 13 1_ZR_A      Zymo RNA            No    Noro   A             2
 14 1_ZR_A      Zymo RNA            No    Phg    A             3
 15 1_ZR_B      Zymo RNA            No    Cov2   B             2
 16 1_ZR_B      Zymo RNA            No    CrA    B             3
 17 1_ZR_B      Zymo RNA            No    Noro   B             2
 18 1_ZR_B      Zymo RNA            No    Phg    B             3
 19 1_ZR_C      Zymo RNA            No    Cov2   C             2
 20 1_ZR_c      <NA>                <NA>  CrA    <NA>          3
 21 1_ZR_c      <NA>                <NA>  Noro   <NA>          2
 22 1_ZR_c      <NA>                <NA>  Phg    <NA>          3
 23 20          <NA>                <NA>  CrA    <NA>          3
 24 200         <NA>                <NA>  CrA    <NA>          3
 25 2000        <NA>                <NA>  CrA    <NA>          3
 26 20000       <NA>                <NA>  CrA    <NA>          3
 27 2_ZDR_A     Zymo RNA/DNA        No    Cov2   A             2
 28 2_ZDR_A     Zymo RNA/DNA        No    CrA    A             3
 29 2_ZDR_A     Zymo RNA/DNA        No    Noro   A             2
 30 2_ZDR_A     Zymo RNA/DNA        No    Phg    A             3
 31 2_ZDR_B     Zymo RNA/DNA        No    Cov2   B             2
 32 2_ZDR_B     Zymo RNA/DNA        No    CrA    B             2
 33 2_ZDR_B     Zymo RNA/DNA        No    Noro   B             2
 34 2_ZDR_B     Zymo RNA/DNA        No    Phg    B             3
 35 2_ZDR_C     Zymo RNA/DNA        No    Cov2   C             2
 36 2_ZDR_C     Zymo RNA/DNA        No    CrA    C             3
 37 2_ZDR_C     Zymo RNA/DNA        No    Noro   C             2
 38 2_ZDR_C     Zymo RNA/DNA        No    Phg    C             3
 39 3_IP_A      Invitrogen Purelink No    Cov2   A             2
 40 3_IP_A      Invitrogen Purelink No    CrA    A             3
 41 3_IP_A      Invitrogen Purelink No    Noro   A             2
 42 3_IP_A      Invitrogen Purelink No    Phg    A             3
 43 3_IP_B      Invitrogen Purelink No    Cov2   B             2
 44 3_IP_B      Invitrogen Purelink No    CrA    B             3
 45 3_IP_B      Invitrogen Purelink No    Noro   B             2
 46 3_IP_B      Invitrogen Purelink No    Phg    B             3
 47 3_IP_C      Invitrogen Purelink No    Cov2   C             2
 48 3_IP_C      Invitrogen Purelink No    CrA    C             3
 49 3_IP_C      Invitrogen Purelink No    Noro   C             2
 50 3_IP_C      Invitrogen Purelink No    Phg    C             3
 51 4_NV_A      Nucleospin Virus    No    Cov2   A             2
 52 4_NV_A      Nucleospin Virus    No    CrA    A             3
 53 4_NV_A      Nucleospin Virus    No    Noro   A             2
 54 4_NV_A      Nucleospin Virus    No    Phg    A             3
 55 4_NV_B      Nucleospin Virus    No    Cov2   B             2
 56 4_NV_B      Nucleospin Virus    No    CrA    B             3
 57 4_NV_B      Nucleospin Virus    No    Noro   B             2
 58 4_NV_B      Nucleospin Virus    No    Phg    B             3
 59 4_NV_C      Nucleospin Virus    No    Cov2   C             2
 60 4_NV_C      Nucleospin Virus    No    CrA    C             3
 61 4_NV_C      Nucleospin Virus    No    Noro   C             2
 62 4_NV_C      Nucleospin Virus    No    Phg    C             3
 63 5_QM_A      QIAamp MinElute     No    Cov2   A             2
 64 5_QM_A      QIAamp MinElute     No    CrA    A             3
 65 5_QM_A      QIAamp MinElute     No    Noro   A             2
 66 5_QM_A      QIAamp MinElute     No    Phg    A             3
 67 5_QM_B      QIAamp MinElute     No    Cov2   B             2
 68 5_QM_B      QIAamp MinElute     No    CrA    B             3
 69 5_QM_B      QIAamp MinElute     No    Noro   B             2
 70 5_QM_B      QIAamp MinElute     No    Phg    B             3
 71 5_QM_C      QIAamp MinElute     No    Cov2   C             2
 72 5_QM_C      QIAamp MinElute     No    CrA    C             3
 73 5_QM_C      QIAamp MinElute     No    Noro   C             2
 74 5_QM_C      QIAamp MinElute     No    Phg    C             3
 75 6_QVR_A     QIAamp Viral RNA    No    Cov2   A             2
 76 6_QVR_A     QIAamp Viral RNA    No    CrA    A             3
 77 6_QVR_A     QIAamp Viral RNA    No    Noro   A             2
 78 6_QVR_A     QIAamp Viral RNA    No    Phg    A             3
 79 6_QVR_B     QIAamp Viral RNA    No    Cov2   B             2
 80 6_QVR_B     QIAamp Viral RNA    No    CrA    B             3
 81 6_QVR_B     QIAamp Viral RNA    No    Noro   B             2
 82 6_QVR_B     QIAamp Viral RNA    No    Phg    B             3
 83 6_QVR_C     QIAamp Viral RNA    No    Cov2   C             2
 84 6_QVR_C     QIAamp Viral RNA    No    CrA    C             3
 85 6_QVR_C     QIAamp Viral RNA    No    Noro   C             2
 86 6_QVR_C     QIAamp Viral RNA    No    Phg    C             3
 87 7_NV-car_A  Nucleospin Virus    Yes   Cov2   A             2
 88 7_NV-car_A  Nucleospin Virus    Yes   CrA    A             3
 89 7_NV-car_A  Nucleospin Virus    Yes   Noro   A             2
 90 7_NV-car_A  Nucleospin Virus    Yes   Phg    A             3
 91 7_NV-car_B  Nucleospin Virus    Yes   Cov2   B             2
 92 7_NV-car_B  Nucleospin Virus    Yes   CrA    B             3
 93 7_NV-car_B  Nucleospin Virus    Yes   Noro   B             2
 94 7_NV-car_B  Nucleospin Virus    Yes   Phg    B             3
 95 7_NV-car_C  Nucleospin Virus    Yes   Cov2   C             2
 96 7_NV-car_C  Nucleospin Virus    Yes   CrA    C             3
 97 7_NV-car_C  Nucleospin Virus    Yes   Noro   C             2
 98 7_NV-car_C  Nucleospin Virus    Yes   Phg    C             3
 99 8_QM-car_A  QIAamp MinElute     Yes   Cov2   A             2
100 8_QM-car_A  QIAamp MinElute     Yes   CrA    A             3
101 8_QM-car_A  QIAamp MinElute     Yes   Noro   A             2
102 8_QM-car_A  QIAamp MinElute     Yes   Phg    A             3
103 8_QM-car_B  QIAamp MinElute     Yes   Cov2   B             2
104 8_QM-car_B  QIAamp MinElute     Yes   CrA    B             3
105 8_QM-car_B  QIAamp MinElute     Yes   Noro   B             2
106 8_QM-car_B  QIAamp MinElute     Yes   Phg    B             3
107 8_QM-car_C  QIAamp MinElute     Yes   Cov2   C             2
108 8_QM-car_C  QIAamp MinElute     Yes   CrA    C             3
109 8_QM-car_C  QIAamp MinElute     Yes   Noro   C             2
110 8_QM-car_C  QIAamp MinElute     Yes   Phg    C             3
111 9_QVR-car_A QIAamp Viral RNA    Yes   Cov2   A             2
112 9_QVR-car_A QIAamp Viral RNA    Yes   CrA    A             3
113 9_QVR-car_A QIAamp Viral RNA    Yes   Noro   A             2
114 9_QVR-car_A QIAamp Viral RNA    Yes   Phg    A             3
115 9_QVR-car_B QIAamp Viral RNA    Yes   Cov2   B             2
116 9_QVR-car_B QIAamp Viral RNA    Yes   CrA    B             3
117 9_QVR-car_B QIAamp Viral RNA    Yes   Noro   B             2
118 9_QVR-car_B QIAamp Viral RNA    Yes   Phg    B             3
119 9_QVR-car_C QIAamp Viral RNA    Yes   Cov2   C             2
120 9_QVR-car_C QIAamp Viral RNA    Yes   CrA    C             3
121 9_QVR-car_C QIAamp Viral RNA    Yes   Noro   C             2
122 9_QVR-car_C QIAamp Viral RNA    Yes   Phg    C             3
123 <NA>        <NA>                <NA>  Cov2   <NA>          3
124 <NA>        <NA>                <NA>  CrA    <NA>          3
125 <NA>        <NA>                <NA>  Noro   <NA>          3
126 <NA>        <NA>                <NA>  Phg    <NA>         15

Kit comparison

tidy_data |>
  filter(!is.na(Kit)) |>
  ggplot(mapping = aes(
    x = Cq,
    y = Kit,
    color = interaction(replicate, LPA),
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")
Warning: Removed 15 rows containing non-finite values (`stat_summary()`).

Adjusting for elution volume

Assume the amplification efficiency is 100%, so that an increase in initial concentration by a factor of \(x\) decreases \(C_q\) by \(\log_{2}(x)\).

If a method has elution volume \(v\) and we dilute it to total volume \(V\), this reduces its final concentration by a factor \(v / V\). We can put different methods on the same footing by adding \(\log_{2}(v/V)\) to \(C_q\) (so that large elution volumes are penalized with a higher adjusted \(C_q\)).

final_volume <- 100
tidy_data |>
  mutate(elution_adjusted_Cq = Cq + log2(Elution_volume / final_volume)) |>
  filter(!is.na(Kit)) |>
  ggplot(mapping = aes(
    x = elution_adjusted_Cq,
    y = Kit,
    color = interaction(replicate, LPA),
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")
Warning: Removed 15 rows containing non-finite values (`stat_summary()`).