2023-09-13 Extraction Experiment 2 qPCR Analysis

Author

Dan Rice

Published

September 15, 2023

Objectives

Testing the efficacy of RNA extraction kits in settled solid samples. See experiment google doc.

Preliminary work

  • Olivia put the .eds files in NAO qPCR data/Olivia on Google Drive and shared the folder with Dan.
  • Google Drive for desktop only syncs shared drives, not shared folders in other drives, so Dan figured out a work around. He made a shortcut to the shared folder in his own google drive so it would sync locally.
  • Opened up the .eds files in Design and Analysis locally and exported to .csv and saved those in the airport experiment folder on the main google drive.
  • Found a Google Sheet with metadata, downloaded as CSV and added the CSV back to the Google Drive

Data import

library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(ggplot2)
library(tidyr)
library(broom)
data_dir <-
  "~/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/"
filename_pattern <- "Results"
col_types <- list(
  Target = col_character(),
  Cq = col_double()
)
raw_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = filename_pattern,
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind()
[1] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Cov2_extractions_Results_20230915_100200.csv"
[2] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Noro_Extractions_Results_20230915_100255.csv"
[3] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-15_CrA_extractions_Results_20230915_105855.csv" 
[4] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-18_16S_extractions_Results_20230918_151203.csv" 
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
print(raw_data)
# A tibble: 369 × 21
    Well `Well Position` Omit  Sample Target Task     Reporter Quencher
   <dbl> <chr>           <lgl> <chr>  <chr>  <chr>    <chr>    <chr>   
 1     1 A1              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 2     2 A2              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 3     3 A3              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 4     4 A4              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 5     5 A5              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 6     6 A6              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 7     7 A7              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
 8     8 A8              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
 9     9 A9              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
10    10 A10             FALSE 1000.0 Cov2   STANDARD FAM      NFQ-MGB 
# ℹ 359 more rows
# ℹ 13 more variables: `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>

There were some wells mixed up between the intended plate layout and the actual plate for Norovirus. See Results doc.

corrected_samples <- tribble(
  ~`Well Position`, ~Sample,
  "F7",             "empty",
  "F8",             "empty",
  "F9",             "empty",
  "F10",            "empty",
  "F11",            "empty",
  "F12",            "empty",
  "G7",             "6C/100",
  "G8",             "6C/10",
  "G9",             "7C/100",
  "G10",            "7C/10",
  "G11",            "3C/100",
  "G12",            "3C/10",
  "H7",             "1C/100",
  "H8",             "1C/10",
  "H9",             "1C/100",
  "H10",            "1C/10",
  "H11",            "NTC",
  "H12",            "NTC",
)
corrected_samples$Target <- "Noro"
print(corrected_samples)
# A tibble: 18 × 3
   `Well Position` Sample Target
   <chr>           <chr>  <chr> 
 1 F7              empty  Noro  
 2 F8              empty  Noro  
 3 F9              empty  Noro  
 4 F10             empty  Noro  
 5 F11             empty  Noro  
 6 F12             empty  Noro  
 7 G7              6C/100 Noro  
 8 G8              6C/10  Noro  
 9 G9              7C/100 Noro  
10 G10             7C/10  Noro  
11 G11             3C/100 Noro  
12 G12             3C/10  Noro  
13 H7              1C/100 Noro  
14 H8              1C/10  Noro  
15 H9              1C/100 Noro  
16 H10             1C/10  Noro  
17 H11             NTC    Noro  
18 H12             NTC    Noro  
corrected_raw_data <- raw_data |>
  left_join(corrected_samples, by = join_by(`Well Position`, Target)) |>
  mutate(Sample = ifelse(is.na(Sample.y), Sample.x, Sample.y)) |>
  print()
# A tibble: 369 × 23
    Well `Well Position` Omit  Sample.x Target Task     Reporter Quencher
   <dbl> <chr>           <lgl> <chr>    <chr>  <chr>    <chr>    <chr>   
 1     1 A1              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 2     2 A2              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 3     3 A3              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 4     4 A4              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 5     5 A5              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 6     6 A6              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 7     7 A7              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
 8     8 A8              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
 9     9 A9              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
10    10 A10             FALSE 1000.0   Cov2   STANDARD FAM      NFQ-MGB 
# ℹ 359 more rows
# ℹ 15 more variables: `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>, Sample.y <chr>, Sample <chr>
corrected_raw_data |>
  count(Target, Sample) |>
  print(n = Inf)
# A tibble: 131 × 3
    Target Sample                    n
    <chr>  <chr>                 <int>
  1 16S    0.0034500000000000004     3
  2 16S    0.0345                    3
  3 16S    0.34500000000000003       3
  4 16S    1A                        3
  5 16S    1B                        3
  6 16S    1C                        3
  7 16S    2A                        3
  8 16S    2B                        3
  9 16S    2C                        3
 10 16S    3.45                      3
 11 16S    3.4500000000000004E-4     3
 12 16S    3A                        3
 13 16S    3B                        3
 14 16S    3C                        3
 15 16S    4A                        3
 16 16S    4B                        3
 17 16S    4C                        3
 18 16S    5A                        3
 19 16S    5B                        3
 20 16S    5C                        3
 21 16S    6A                        3
 22 16S    6B                        3
 23 16S    6C                        3
 24 16S    7A                        3
 25 16S    7B                        3
 26 16S    7C                        3
 27 16S    <NA>                      3
 28 Cov2   0.010000000000000002      3
 29 Cov2   0.1                       3
 30 Cov2   1.0                       3
 31 Cov2   10.0                      3
 32 Cov2   100.0                     3
 33 Cov2   1000.0                    3
 34 Cov2   1A                        3
 35 Cov2   1B                        3
 36 Cov2   1C                        3
 37 Cov2   2A                        3
 38 Cov2   2B                        3
 39 Cov2   2C                        3
 40 Cov2   3A                        3
 41 Cov2   3B                        3
 42 Cov2   3C                        3
 43 Cov2   3C/10                     2
 44 Cov2   3C/100                    2
 45 Cov2   4A                        3
 46 Cov2   4B                        3
 47 Cov2   4C                        3
 48 Cov2   5A                        3
 49 Cov2   5B                        3
 50 Cov2   5C                        3
 51 Cov2   6A                        3
 52 Cov2   6B                        3
 53 Cov2   6C                        3
 54 Cov2   6C/10                     2
 55 Cov2   6C/100                    2
 56 Cov2   7A                        3
 57 Cov2   7B                        3
 58 Cov2   7C                        3
 59 Cov2   7C/10                     2
 60 Cov2   7C/100                    2
 61 Cov2   <NA>                      3
 62 CrA    1A                        3
 63 CrA    1B                        3
 64 CrA    1C                        3
 65 CrA    1C/10                     2
 66 CrA    2A                        3
 67 CrA    2B                        3
 68 CrA    2C                        3
 69 CrA    3.0E7                     3
 70 CrA    3000.0                    3
 71 CrA    30000.0                   3
 72 CrA    300000.0                  3
 73 CrA    3000000.0                 3
 74 CrA    3A                        3
 75 CrA    3B                        3
 76 CrA    3C                        3
 77 CrA    3C/10                     2
 78 CrA    3C/100                    2
 79 CrA    4A                        3
 80 CrA    4B                        3
 81 CrA    4C                        3
 82 CrA    5A                        3
 83 CrA    5B                        3
 84 CrA    5C                        3
 85 CrA    6A                        3
 86 CrA    6B                        3
 87 CrA    6C                        3
 88 CrA    6C/10                     2
 89 CrA    6C/100                    2
 90 CrA    7A                        3
 91 CrA    7B                        3
 92 CrA    7C                        3
 93 CrA    7C/10                     2
 94 CrA    7C/100                    4
 95 CrA    <NA>                      2
 96 Noro   1.0                       3
 97 Noro   10.0                      3
 98 Noro   100.0                     3
 99 Noro   1000.0                    3
100 Noro   10000.0                   3
101 Noro   1A                        3
102 Noro   1B                        3
103 Noro   1C                        3
104 Noro   1C/10                     2
105 Noro   1C/100                    2
106 Noro   2A                        3
107 Noro   2B                        3
108 Noro   2C                        3
109 Noro   3A                        3
110 Noro   3B                        3
111 Noro   3C                        3
112 Noro   3C/10                     1
113 Noro   3C/100                    1
114 Noro   4A                        3
115 Noro   4B                        3
116 Noro   4C                        3
117 Noro   5A                        3
118 Noro   5B                        3
119 Noro   5C                        3
120 Noro   6A                        3
121 Noro   6B                        3
122 Noro   6C                        3
123 Noro   6C/10                     1
124 Noro   6C/100                    1
125 Noro   7A                        3
126 Noro   7B                        3
127 Noro   7C                        3
128 Noro   7C/10                     1
129 Noro   7C/100                    1
130 Noro   NTC                       2
131 Noro   empty                     6
metadata_file <- paste0(
  data_dir,
  "[2023-09-11] Extraction Experiment 2 templates and results",
  " - sampleMetadata.csv"
)
metadata <- read_csv(metadata_file)
Rows: 21 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Sample_ID, Extraction_kit, Short_kit, Elution_format, Brand, NA_Target
dbl (2): Kit Batch, Elution_volume
lgl (1): FP_sampleID

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(metadata)
Rows: 21
Columns: 9
$ Sample_ID      <chr> "1A", "1B", "1C", "2A", "2B", "2C", "3A", "3B", "3C", "…
$ Extraction_kit <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quick-RNA", "…
$ Short_kit      <chr> "1_ZR", "1_ZR", "1_ZR", "2_ZD", "2_ZD", "2_ZD", "3_IPL"…
$ `Kit Batch`    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3…
$ Elution_volume <dbl> 30, 30, 30, 100, 100, 100, 100, 100, 100, 60, 60, 60, 1…
$ Elution_format <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2", "50*2",…
$ Brand          <chr> "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Invitr…
$ NA_Target      <chr> "RNA", "RNA", "RNA", "DNA+RNA", "DNA+RNA", "DNA+RNA", "…
$ FP_sampleID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
tidy_data <- corrected_raw_data |>
  mutate(
    Sample_ID = str_split_i(Sample, "/", 1),
    dilution = as.integer(str_split_i(Sample, "/", 2)),
    replicate = str_extract(Sample_ID, "[A-Z]$"),
    quantity = as.double(Sample),
  ) |>
  mutate(dilution = replace_na(dilution, 1)) |>
  left_join(
    metadata,
    by = join_by(Sample_ID)
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample)`.
Caused by warning:
! NAs introduced by coercion
glimpse(tidy_data)
Rows: 369
Columns: 35
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Sample.x                <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "NO_…
$ `Amp Score`             <dbl> 1.1701024, 1.0212594, 1.1604009, 1.1057531, 1.…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 32.44077, 35.24757, 33.72678, 34.44077, 34.704…
$ `Cq Confidence`         <dbl> 0.9451320, 0.9773032, 0.9759027, 0.9634213, 0.…
$ `Cq Mean`               <dbl> 33.80504, 33.80504, 33.80504, 34.31332, 34.313…
$ `Cq SD`                 <dbl> 1.4050310, 1.4050310, 1.4050310, 0.4683633, 0.…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 32, 30, 31, 31, 31, 31, 31, 31, 21, 21, 21…
$ Sample.y                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Sample                  <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ Sample_ID               <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ dilution                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ replicate               <chr> "A", "A", "A", "C", "C", "C", "B", "B", "B", N…
$ quantity                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1000, 1000…
$ Extraction_kit          <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quic…
$ Short_kit               <chr> "1_ZR", "1_ZR", "1_ZR", "3_IPL", "3_IPL", "3_I…
$ `Kit Batch`             <dbl> 1, 1, 1, 1, 1, 1, 3, 3, 3, NA, NA, NA, 1, 1, 1…
$ Elution_volume          <dbl> 30, 30, 30, 100, 100, 100, 80, 80, 80, NA, NA,…
$ Elution_format          <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2"…
$ Brand                   <chr> "Zymo", "Zymo", "Zymo", "Invitrogen", "Invitro…
$ NA_Target               <chr> "RNA", "RNA", "RNA", "RNA", "RNA", "RNA", "RNA…
$ FP_sampleID             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
tidy_data |>
  count(Sample_ID, Target, dilution, replicate) |>
  print(n = Inf)
# A tibble: 131 × 5
    Sample_ID             Target dilution replicate     n
    <chr>                 <chr>     <int> <chr>     <int>
  1 0.0034500000000000004 16S           1 <NA>          3
  2 0.010000000000000002  Cov2          1 <NA>          3
  3 0.0345                16S           1 <NA>          3
  4 0.1                   Cov2          1 <NA>          3
  5 0.34500000000000003   16S           1 <NA>          3
  6 1.0                   Cov2          1 <NA>          3
  7 1.0                   Noro          1 <NA>          3
  8 10.0                  Cov2          1 <NA>          3
  9 10.0                  Noro          1 <NA>          3
 10 100.0                 Cov2          1 <NA>          3
 11 100.0                 Noro          1 <NA>          3
 12 1000.0                Cov2          1 <NA>          3
 13 1000.0                Noro          1 <NA>          3
 14 10000.0               Noro          1 <NA>          3
 15 1A                    16S           1 A             3
 16 1A                    Cov2          1 A             3
 17 1A                    CrA           1 A             3
 18 1A                    Noro          1 A             3
 19 1B                    16S           1 B             3
 20 1B                    Cov2          1 B             3
 21 1B                    CrA           1 B             3
 22 1B                    Noro          1 B             3
 23 1C                    16S           1 C             3
 24 1C                    Cov2          1 C             3
 25 1C                    CrA           1 C             3
 26 1C                    CrA          10 C             2
 27 1C                    Noro          1 C             3
 28 1C                    Noro         10 C             2
 29 1C                    Noro        100 C             2
 30 2A                    16S           1 A             3
 31 2A                    Cov2          1 A             3
 32 2A                    CrA           1 A             3
 33 2A                    Noro          1 A             3
 34 2B                    16S           1 B             3
 35 2B                    Cov2          1 B             3
 36 2B                    CrA           1 B             3
 37 2B                    Noro          1 B             3
 38 2C                    16S           1 C             3
 39 2C                    Cov2          1 C             3
 40 2C                    CrA           1 C             3
 41 2C                    Noro          1 C             3
 42 3.0E7                 CrA           1 <NA>          3
 43 3.45                  16S           1 <NA>          3
 44 3.4500000000000004E-4 16S           1 <NA>          3
 45 3000.0                CrA           1 <NA>          3
 46 30000.0               CrA           1 <NA>          3
 47 300000.0              CrA           1 <NA>          3
 48 3000000.0             CrA           1 <NA>          3
 49 3A                    16S           1 A             3
 50 3A                    Cov2          1 A             3
 51 3A                    CrA           1 A             3
 52 3A                    Noro          1 A             3
 53 3B                    16S           1 B             3
 54 3B                    Cov2          1 B             3
 55 3B                    CrA           1 B             3
 56 3B                    Noro          1 B             3
 57 3C                    16S           1 C             3
 58 3C                    Cov2          1 C             3
 59 3C                    Cov2         10 C             2
 60 3C                    Cov2        100 C             2
 61 3C                    CrA           1 C             3
 62 3C                    CrA          10 C             2
 63 3C                    CrA         100 C             2
 64 3C                    Noro          1 C             3
 65 3C                    Noro         10 C             1
 66 3C                    Noro        100 C             1
 67 4A                    16S           1 A             3
 68 4A                    Cov2          1 A             3
 69 4A                    CrA           1 A             3
 70 4A                    Noro          1 A             3
 71 4B                    16S           1 B             3
 72 4B                    Cov2          1 B             3
 73 4B                    CrA           1 B             3
 74 4B                    Noro          1 B             3
 75 4C                    16S           1 C             3
 76 4C                    Cov2          1 C             3
 77 4C                    CrA           1 C             3
 78 4C                    Noro          1 C             3
 79 5A                    16S           1 A             3
 80 5A                    Cov2          1 A             3
 81 5A                    CrA           1 A             3
 82 5A                    Noro          1 A             3
 83 5B                    16S           1 B             3
 84 5B                    Cov2          1 B             3
 85 5B                    CrA           1 B             3
 86 5B                    Noro          1 B             3
 87 5C                    16S           1 C             3
 88 5C                    Cov2          1 C             3
 89 5C                    CrA           1 C             3
 90 5C                    Noro          1 C             3
 91 6A                    16S           1 A             3
 92 6A                    Cov2          1 A             3
 93 6A                    CrA           1 A             3
 94 6A                    Noro          1 A             3
 95 6B                    16S           1 B             3
 96 6B                    Cov2          1 B             3
 97 6B                    CrA           1 B             3
 98 6B                    Noro          1 B             3
 99 6C                    16S           1 C             3
100 6C                    Cov2          1 C             3
101 6C                    Cov2         10 C             2
102 6C                    Cov2        100 C             2
103 6C                    CrA           1 C             3
104 6C                    CrA          10 C             2
105 6C                    CrA         100 C             2
106 6C                    Noro          1 C             3
107 6C                    Noro         10 C             1
108 6C                    Noro        100 C             1
109 7A                    16S           1 A             3
110 7A                    Cov2          1 A             3
111 7A                    CrA           1 A             3
112 7A                    Noro          1 A             3
113 7B                    16S           1 B             3
114 7B                    Cov2          1 B             3
115 7B                    CrA           1 B             3
116 7B                    Noro          1 B             3
117 7C                    16S           1 C             3
118 7C                    Cov2          1 C             3
119 7C                    Cov2         10 C             2
120 7C                    Cov2        100 C             2
121 7C                    CrA           1 C             3
122 7C                    CrA          10 C             2
123 7C                    CrA         100 C             4
124 7C                    Noro          1 C             3
125 7C                    Noro         10 C             1
126 7C                    Noro        100 C             1
127 NTC                   Noro          1 C             2
128 empty                 Noro          1 <NA>          6
129 <NA>                  16S           1 <NA>          3
130 <NA>                  Cov2          1 <NA>          3
131 <NA>                  CrA           1 <NA>          2
tidy_data |>
  count(`Result Quality Issues`)
# A tibble: 1 × 2
  `Result Quality Issues`     n
  <lgl>                   <int>
1 NA                        369

Kit comparison

With equal Cq axes

tidy_data |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target)
Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

With free Cq axes

tidy_data |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")
Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

Check that the overlapping points really do have three values

tidy_data |>
  filter(!is.na(Extraction_kit), is.na(Cq)) |>
  count(Target, Extraction_kit, Sample_ID, dilution) |>
  print()
# A tibble: 9 × 5
  Target Extraction_kit                    Sample_ID dilution     n
  <chr>  <chr>                             <chr>        <int> <int>
1 Cov2   Invitrogen PureLink RNA           3C              10     2
2 Cov2   Invitrogen PureLink RNA           3C             100     2
3 Cov2   NucleoSpin Virus                  4B               1     2
4 Cov2   QIAamp Viral RNA mini kit         6A               1     1
5 Cov2   QIAamp Viral RNA mini kit         6C              10     1
6 Cov2   QIAamp Viral RNA mini kit         6C             100     2
7 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7A               1     1
8 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7C              10     2
9 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7C             100     2
tidy_data |>
  filter(
    Extraction_kit == "Zymo quick-RNA",
    Target == "CrA", replicate == "A"
  ) |>
  print(width = Inf)
# A tibble: 3 × 35
   Well `Well Position` Omit  Sample.x Target Task    Reporter Quencher
  <dbl> <chr>           <lgl> <chr>    <chr>  <chr>   <chr>    <chr>   
1     1 A1              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
2     2 A2              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
3     3 A3              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
  `Amp Status` `Amp Score` `Curve Quality` `Result Quality Issues`    Cq
  <chr>              <dbl> <lgl>           <lgl>                   <dbl>
1 AMP                 1.41 NA              NA                       21.1
2 AMP                 1.43 NA              NA                       21.0
3 AMP                 1.42 NA              NA                       21.1
  `Cq Confidence` `Cq Mean` `Cq SD` `Auto Threshold` Threshold `Auto Baseline`
            <dbl>     <dbl>   <dbl> <lgl>                <dbl> <lgl>          
1           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
2           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
3           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
  `Baseline Start` `Baseline End` Sample.y Sample Sample_ID dilution replicate
             <dbl>          <dbl> <chr>    <chr>  <chr>        <int> <chr>    
1                3             16 <NA>     1A     1A               1 A        
2                3             15 <NA>     1A     1A               1 A        
3                3             16 <NA>     1A     1A               1 A        
  quantity Extraction_kit Short_kit `Kit Batch` Elution_volume Elution_format
     <dbl> <chr>          <chr>           <dbl>          <dbl> <chr>         
1       NA Zymo quick-RNA 1_ZR                1             30 15*2          
2       NA Zymo quick-RNA 1_ZR                1             30 15*2          
3       NA Zymo quick-RNA 1_ZR                1             30 15*2          
  Brand NA_Target FP_sampleID
  <chr> <chr>     <lgl>      
1 Zymo  RNA       NA         
2 Zymo  RNA       NA         
3 Zymo  RNA       NA         

Adjusting for elution volume

Assume the amplification efficiency is 100%, so that an increase in initial concentration by a factor of \(x\) decreases \(C_q\) by \(\log_{2}(x)\).

If a method has elution volume \(v\) and we dilute it to total volume \(V\), this reduces its final concentration by a factor \(v / V\). We can put different methods on the same footing by adding \(\log_{2}(v/V)\) to \(C_q\) (so that large elution volumes are penalized with a higher adjusted \(C_q\)).

final_volume <- 100
tidy_data |>
  mutate(elution_adjusted_Cq = Cq + log2(Elution_volume / final_volume)) |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = elution_adjusted_Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")
Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

Standard curves

tidy_data |>
  filter(
    Task == "STANDARD",
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = quantity,
    y = Cq,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  geom_smooth(method = "lm") +
  scale_x_continuous(trans = "log10") +
  facet_wrap(facets = ~Target, scales = "free_x")
Warning: Removed 4 rows containing non-finite values (`stat_summary()`).
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).

Not sure what the units of the X-axis are

fits <- tibble()
for (target in unique(tidy_data$Target)) {
  fit <- lm(Cq ~ log10(quantity),
    data = filter(tidy_data, Task == "STANDARD", Target == target)
  ) |>
    tidy() |>
    mutate(Target = target, efficiency = 10^-(1 / estimate) - 1)
  fits <- bind_rows(fits, fit)
}
print(fits |> filter(term == "log10(quantity)"))
# A tibble: 4 × 7
  term            estimate std.error statistic  p.value Target efficiency
  <chr>              <dbl>     <dbl>     <dbl>    <dbl> <chr>       <dbl>
1 log10(quantity)    -2.61    0.143      -18.2 1.23e-10 Cov2         1.42
2 log10(quantity)    -2.85    0.108      -26.4 5.34e-12 Noro         1.24
3 log10(quantity)    -2.59    0.0703     -36.8 1.58e-14 CrA          1.43
4 log10(quantity)    -2.68    0.148      -18.1 1.32e-10 16S          1.36

These match the Design & Analysis software.

Dilution series

tidy_data |>
  filter(Task == "UNKNOWN", dilution > 1) |>
  count(Target, Extraction_kit, dilution) |>
  print(n = Inf)
# A tibble: 21 × 4
   Target Extraction_kit                    dilution     n
   <chr>  <chr>                                <int> <int>
 1 Cov2   Invitrogen PureLink RNA                 10     2
 2 Cov2   Invitrogen PureLink RNA                100     2
 3 Cov2   QIAamp Viral RNA mini kit               10     2
 4 Cov2   QIAamp Viral RNA mini kit              100     2
 5 Cov2   Qiagen AllPrep PowerViral DNA/RNA       10     2
 6 Cov2   Qiagen AllPrep PowerViral DNA/RNA      100     2
 7 CrA    Invitrogen PureLink RNA                 10     2
 8 CrA    Invitrogen PureLink RNA                100     2
 9 CrA    QIAamp Viral RNA mini kit               10     2
10 CrA    QIAamp Viral RNA mini kit              100     2
11 CrA    Qiagen AllPrep PowerViral DNA/RNA       10     2
12 CrA    Qiagen AllPrep PowerViral DNA/RNA      100     4
13 CrA    Zymo quick-RNA                          10     2
14 Noro   Invitrogen PureLink RNA                 10     1
15 Noro   Invitrogen PureLink RNA                100     1
16 Noro   QIAamp Viral RNA mini kit               10     1
17 Noro   QIAamp Viral RNA mini kit              100     1
18 Noro   Qiagen AllPrep PowerViral DNA/RNA       10     1
19 Noro   Qiagen AllPrep PowerViral DNA/RNA      100     1
20 Noro   Zymo quick-RNA                          10     1
21 Noro   Zymo quick-RNA                         100     1

Looks like we have dilution series for four kits.

dilution_kits <- c(
  "Invitrogen PureLink RNA",
  "QIAamp Viral RNA mini kit",
  "Qiagen AllPrep PowerViral DNA/RNA",
  "Zymo quick-RNA"
)
tidy_data |>
  filter(
    Task == "UNKNOWN",
    Target != "16S",
    is.element(Extraction_kit, dilution_kits)
  ) |>
  ggplot(mapping = aes(
    x = dilution,
    y = Cq,
  )) +
  geom_point(mapping = aes(color = replicate)) +
  geom_smooth(method = "lm") +
  scale_x_continuous(trans = "log10") +
  facet_grid(cols = vars(Extraction_kit), rows = vars(Target), scale = "free_y")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 13 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 13 rows containing missing values (`geom_point()`).

fits <- tibble()
for (target in unique(tidy_data$Target)) {
  for (kit in dilution_kits) {
    fit <- lm(Cq ~ log10(dilution),
      data = filter(tidy_data, Target == target, Extraction_kit == kit)
    ) |>
      tidy() |>
      mutate(
        Target = target,
        Extraction_kit = kit,
        # Changed sign because we have dilution factor not quantity
        efficiency = 10^(1 / estimate) - 1
      )
    fits <- bind_rows(fits, fit)
  }
}
print(fits |> filter(Target != "16S", term == "log10(dilution)"))
# A tibble: 12 × 8
   term  estimate std.error statistic   p.value Target Extraction_kit efficiency
   <chr>    <dbl>     <dbl>     <dbl>     <dbl> <chr>  <chr>               <dbl>
 1 log1…    NA      NA          NA    NA        Cov2   Invitrogen Pu…     NA    
 2 log1…     2.51    0.869       2.88  2.35e- 2 Cov2   QIAamp Viral …      1.51 
 3 log1…    NA      NA          NA    NA        Cov2   Qiagen AllPre…     NA    
 4 log1…    NA      NA          NA    NA        Cov2   Zymo quick-RNA     NA    
 5 log1…     2.92    0.0778     37.6   3.30e-11 Noro   Invitrogen Pu…      1.20 
 6 log1…     3.07    0.151      20.3   7.80e- 9 Noro   QIAamp Viral …      1.12 
 7 log1…     3.20    0.118      27.1   6.24e-10 Noro   Qiagen AllPre…      1.05 
 8 log1…     2.69    0.0772     34.8   1.31e-12 Noro   Zymo quick-RNA      1.35 
 9 log1…     3.59    0.205      17.5   2.24e- 9 CrA    Invitrogen Pu…      0.898
10 log1…     3.58    0.0852     42.0   1.68e-13 CrA    QIAamp Viral …      0.902
11 log1…     2.84    0.330       8.61  9.92e- 7 CrA    Qiagen AllPre…      1.25 
12 log1…     5.14    0.644       7.98  2.25e- 5 CrA    Zymo quick-RNA      0.565

Amplification curves

amp_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = "Amplification Data",
  full.names = TRUE,
) |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind() |>
  mutate(
    replicate = str_extract(Sample, "[A-Z]"),
    Sample_ID = str_split_i(Sample, "/", 1),
    dilution = as.integer(str_split_i(Sample, "/", 2)),
    quantity = as.double(Sample),
  ) |>
  mutate(dilution = replace_na(dilution, 1)) |>
  left_join(
    metadata,
    by = join_by(Sample_ID)
  )
Warning: The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample)`.
Caused by warning:
! NAs introduced by coercion
print(amp_data)
# A tibble: 14,760 × 20
    Well `Well Position` `Cycle Number` Target    Rn      dRn Sample Omit 
   <dbl> <chr>                    <dbl> <chr>  <dbl>    <dbl> <chr>  <lgl>
 1     1 A1                           1 Cov2   0.877  0.0311  1A     FALSE
 2     1 A1                           2 Cov2   0.870  0.0262  1A     FALSE
 3     1 A1                           3 Cov2   0.856  0.0145  1A     FALSE
 4     1 A1                           4 Cov2   0.848  0.00816 1A     FALSE
 5     1 A1                           5 Cov2   0.844  0.00638 1A     FALSE
 6     1 A1                           6 Cov2   0.838  0.00299 1A     FALSE
 7     1 A1                           7 Cov2   0.832 -0.00128 1A     FALSE
 8     1 A1                           8 Cov2   0.829 -0.00202 1A     FALSE
 9     1 A1                           9 Cov2   0.824 -0.00480 1A     FALSE
10     1 A1                          10 Cov2   0.821 -0.00546 1A     FALSE
# ℹ 14,750 more rows
# ℹ 12 more variables: replicate <chr>, Sample_ID <chr>, dilution <int>,
#   quantity <dbl>, Extraction_kit <chr>, Short_kit <chr>, `Kit Batch` <dbl>,
#   Elution_volume <dbl>, Elution_format <chr>, Brand <chr>, NA_Target <chr>,
#   FP_sampleID <lgl>
amp_data |>
  filter(!is.na(Extraction_kit)) |>
  ggplot(mapping = aes(
    x = `Cycle Number`,
    y = dRn,
    color = as.factor(dilution),
    group = Well
  )) +
  geom_line() +
  facet_grid(
    cols = vars(Extraction_kit), rows = vars(Target), scales = "free_y"
  )

Background fluorescence

raw_qpcr_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = "_Multicomponent_",
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind() |>
  print()
[1] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Cov2_extractions_Multicomponent_20230915_100200.csv"
[2] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Noro_Extractions_Multicomponent_20230915_100255.csv"
[3] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-15_CrA_extractions_Multicomponent_20230915_105855.csv" 
[4] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-18_16S_extractions_Multicomponent_20230918_151203.csv" 
Warning: The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq
# A tibble: 14,760 × 7
    Well `Well Position` `Stage Number` `Step Number` `Cycle Number`     FAM
   <dbl> <chr>                    <dbl>         <dbl>          <dbl>   <dbl>
 1     1 A1                           2             2              1 326798.
 2     1 A1                           2             2              2 323795.
 3     1 A1                           2             2              3 318092 
 4     1 A1                           2             2              4 314307.
 5     1 A1                           2             2              5 312277.
 6     1 A1                           2             2              6 309855.
 7     1 A1                           2             2              7 307088.
 8     1 A1                           2             2              8 305802.
 9     1 A1                           2             2              9 303988.
10     1 A1                           2             2             10 302549.
# ℹ 14,750 more rows
# ℹ 1 more variable: ROX <dbl>

TODO

  • Boxplots
  • Standard curve data
  • Diluted samples
  • Amplification curves
  • Relabel flipped norovirus dilution samples
  • Investigate the threshold
  • Investigate the efficiency caluculations: compare amp curves, standard curves, and dilution series
  • Check amplification curves, including autothreshold in linear phase. (Possibly adjust)
  • Check negative controls (may need to use raw Rns rather than delta to avoid baseline subtraction issues)
  • Be aware of non-control samples with no amplification detected (no Cq values). Summary stats of which this is true for.
  • Plot baseline start and stop (determines the interval used for baseline subtraction)
  • Background fluorescence