2023-09-13 Extraction Experiment 2 qPCR Analysis

Author

Dan Rice

Published

September 15, 2023

Objectives

Testing the efficacy of RNA extraction kits in settled solid samples. See experiment google doc.

Preliminary work

Olivia put the .eds files in NAO qPCR data/Olivia on Google Drive and shared the folder with Dan.
Google Drive for desktop only syncs shared drives, not shared folders in other drives, so Dan figured out a work around. He made a shortcut to the shared folder in his own google drive so it would sync locally.
Opened up the .eds files in Design and Analysis locally and exported to .csv and saved those in the airport experiment folder on the main google drive.
Found a Google Sheet with metadata, downloaded as CSV and added the CSV back to the Google Drive

Data import

library(readr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(purrr)
library(stringr)
library(ggplot2)
library(tidyr)
library(broom)

data_dir <-
  "~/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/"
filename_pattern <- "Results"

col_types <- list(
  Target = col_character(),
  Cq = col_double()
)
raw_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = filename_pattern,
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind()

[1] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Cov2_extractions_Results_20230915_100200.csv"
[2] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Noro_Extractions_Results_20230915_100255.csv"
[3] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-15_CrA_extractions_Results_20230915_105855.csv" 
[4] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-18_16S_extractions_Results_20230918_151203.csv"

Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)

print(raw_data)

# A tibble: 369 × 21
    Well `Well Position` Omit  Sample Target Task     Reporter Quencher
   <dbl> <chr>           <lgl> <chr>  <chr>  <chr>    <chr>    <chr>   
 1     1 A1              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 2     2 A2              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 3     3 A3              FALSE 1A     Cov2   UNKNOWN  FAM      NFQ-MGB 
 4     4 A4              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 5     5 A5              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 6     6 A6              FALSE 3C     Cov2   UNKNOWN  FAM      NFQ-MGB 
 7     7 A7              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
 8     8 A8              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
 9     9 A9              FALSE 6B     Cov2   UNKNOWN  FAM      NFQ-MGB 
10    10 A10             FALSE 1000.0 Cov2   STANDARD FAM      NFQ-MGB 
# ℹ 359 more rows
# ℹ 13 more variables: `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>

There were some wells mixed up between the intended plate layout and the actual plate for Norovirus. See Results doc.

corrected_samples <- tribble(
  ~`Well Position`, ~Sample,
  "F7",             "empty",
  "F8",             "empty",
  "F9",             "empty",
  "F10",            "empty",
  "F11",            "empty",
  "F12",            "empty",
  "G7",             "6C/100",
  "G8",             "6C/10",
  "G9",             "7C/100",
  "G10",            "7C/10",
  "G11",            "3C/100",
  "G12",            "3C/10",
  "H7",             "1C/100",
  "H8",             "1C/10",
  "H9",             "1C/100",
  "H10",            "1C/10",
  "H11",            "NTC",
  "H12",            "NTC",
)
corrected_samples$Target <- "Noro"
print(corrected_samples)

# A tibble: 18 × 3
   `Well Position` Sample Target
   <chr>           <chr>  <chr> 
 1 F7              empty  Noro  
 2 F8              empty  Noro  
 3 F9              empty  Noro  
 4 F10             empty  Noro  
 5 F11             empty  Noro  
 6 F12             empty  Noro  
 7 G7              6C/100 Noro  
 8 G8              6C/10  Noro  
 9 G9              7C/100 Noro  
10 G10             7C/10  Noro  
11 G11             3C/100 Noro  
12 G12             3C/10  Noro  
13 H7              1C/100 Noro  
14 H8              1C/10  Noro  
15 H9              1C/100 Noro  
16 H10             1C/10  Noro  
17 H11             NTC    Noro  
18 H12             NTC    Noro

corrected_raw_data <- raw_data |>
  left_join(corrected_samples, by = join_by(`Well Position`, Target)) |>
  mutate(Sample = ifelse(is.na(Sample.y), Sample.x, Sample.y)) |>
  print()

# A tibble: 369 × 23
    Well `Well Position` Omit  Sample.x Target Task     Reporter Quencher
   <dbl> <chr>           <lgl> <chr>    <chr>  <chr>    <chr>    <chr>   
 1     1 A1              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 2     2 A2              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 3     3 A3              FALSE 1A       Cov2   UNKNOWN  FAM      NFQ-MGB 
 4     4 A4              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 5     5 A5              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 6     6 A6              FALSE 3C       Cov2   UNKNOWN  FAM      NFQ-MGB 
 7     7 A7              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
 8     8 A8              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
 9     9 A9              FALSE 6B       Cov2   UNKNOWN  FAM      NFQ-MGB 
10    10 A10             FALSE 1000.0   Cov2   STANDARD FAM      NFQ-MGB 
# ℹ 359 more rows
# ℹ 15 more variables: `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>, Sample.y <chr>, Sample <chr>

corrected_raw_data |>
  count(Target, Sample) |>
  print(n = Inf)

# A tibble: 131 × 3
    Target Sample                    n
    <chr>  <chr>                 <int>
  1 16S    0.0034500000000000004     3
  2 16S    0.0345                    3
  3 16S    0.34500000000000003       3
  4 16S    1A                        3
  5 16S    1B                        3
  6 16S    1C                        3
  7 16S    2A                        3
  8 16S    2B                        3
  9 16S    2C                        3
 10 16S    3.45                      3
 11 16S    3.4500000000000004E-4     3
 12 16S    3A                        3
 13 16S    3B                        3
 14 16S    3C                        3
 15 16S    4A                        3
 16 16S    4B                        3
 17 16S    4C                        3
 18 16S    5A                        3
 19 16S    5B                        3
 20 16S    5C                        3
 21 16S    6A                        3
 22 16S    6B                        3
 23 16S    6C                        3
 24 16S    7A                        3
 25 16S    7B                        3
 26 16S    7C                        3
 27 16S    <NA>                      3
 28 Cov2   0.010000000000000002      3
 29 Cov2   0.1                       3
 30 Cov2   1.0                       3
 31 Cov2   10.0                      3
 32 Cov2   100.0                     3
 33 Cov2   1000.0                    3
 34 Cov2   1A                        3
 35 Cov2   1B                        3
 36 Cov2   1C                        3
 37 Cov2   2A                        3
 38 Cov2   2B                        3
 39 Cov2   2C                        3
 40 Cov2   3A                        3
 41 Cov2   3B                        3
 42 Cov2   3C                        3
 43 Cov2   3C/10                     2
 44 Cov2   3C/100                    2
 45 Cov2   4A                        3
 46 Cov2   4B                        3
 47 Cov2   4C                        3
 48 Cov2   5A                        3
 49 Cov2   5B                        3
 50 Cov2   5C                        3
 51 Cov2   6A                        3
 52 Cov2   6B                        3
 53 Cov2   6C                        3
 54 Cov2   6C/10                     2
 55 Cov2   6C/100                    2
 56 Cov2   7A                        3
 57 Cov2   7B                        3
 58 Cov2   7C                        3
 59 Cov2   7C/10                     2
 60 Cov2   7C/100                    2
 61 Cov2   <NA>                      3
 62 CrA    1A                        3
 63 CrA    1B                        3
 64 CrA    1C                        3
 65 CrA    1C/10                     2
 66 CrA    2A                        3
 67 CrA    2B                        3
 68 CrA    2C                        3
 69 CrA    3.0E7                     3
 70 CrA    3000.0                    3
 71 CrA    30000.0                   3
 72 CrA    300000.0                  3
 73 CrA    3000000.0                 3
 74 CrA    3A                        3
 75 CrA    3B                        3
 76 CrA    3C                        3
 77 CrA    3C/10                     2
 78 CrA    3C/100                    2
 79 CrA    4A                        3
 80 CrA    4B                        3
 81 CrA    4C                        3
 82 CrA    5A                        3
 83 CrA    5B                        3
 84 CrA    5C                        3
 85 CrA    6A                        3
 86 CrA    6B                        3
 87 CrA    6C                        3
 88 CrA    6C/10                     2
 89 CrA    6C/100                    2
 90 CrA    7A                        3
 91 CrA    7B                        3
 92 CrA    7C                        3
 93 CrA    7C/10                     2
 94 CrA    7C/100                    4
 95 CrA    <NA>                      2
 96 Noro   1.0                       3
 97 Noro   10.0                      3
 98 Noro   100.0                     3
 99 Noro   1000.0                    3
100 Noro   10000.0                   3
101 Noro   1A                        3
102 Noro   1B                        3
103 Noro   1C                        3
104 Noro   1C/10                     2
105 Noro   1C/100                    2
106 Noro   2A                        3
107 Noro   2B                        3
108 Noro   2C                        3
109 Noro   3A                        3
110 Noro   3B                        3
111 Noro   3C                        3
112 Noro   3C/10                     1
113 Noro   3C/100                    1
114 Noro   4A                        3
115 Noro   4B                        3
116 Noro   4C                        3
117 Noro   5A                        3
118 Noro   5B                        3
119 Noro   5C                        3
120 Noro   6A                        3
121 Noro   6B                        3
122 Noro   6C                        3
123 Noro   6C/10                     1
124 Noro   6C/100                    1
125 Noro   7A                        3
126 Noro   7B                        3
127 Noro   7C                        3
128 Noro   7C/10                     1
129 Noro   7C/100                    1
130 Noro   NTC                       2
131 Noro   empty                     6

metadata_file <- paste0(
  data_dir,
  "[2023-09-11] Extraction Experiment 2 templates and results",
  " - sampleMetadata.csv"
)
metadata <- read_csv(metadata_file)

Rows: 21 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Sample_ID, Extraction_kit, Short_kit, Elution_format, Brand, NA_Target
dbl (2): Kit Batch, Elution_volume
lgl (1): FP_sampleID

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(metadata)

Rows: 21
Columns: 9
$ Sample_ID      <chr> "1A", "1B", "1C", "2A", "2B", "2C", "3A", "3B", "3C", "…
$ Extraction_kit <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quick-RNA", "…
$ Short_kit      <chr> "1_ZR", "1_ZR", "1_ZR", "2_ZD", "2_ZD", "2_ZD", "3_IPL"…
$ `Kit Batch`    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3…
$ Elution_volume <dbl> 30, 30, 30, 100, 100, 100, 100, 100, 100, 60, 60, 60, 1…
$ Elution_format <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2", "50*2",…
$ Brand          <chr> "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Zymo", "Invitr…
$ NA_Target      <chr> "RNA", "RNA", "RNA", "DNA+RNA", "DNA+RNA", "DNA+RNA", "…
$ FP_sampleID    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

tidy_data <- corrected_raw_data |>
  mutate(
    Sample_ID = str_split_i(Sample, "/", 1),
    dilution = as.integer(str_split_i(Sample, "/", 2)),
    replicate = str_extract(Sample_ID, "[A-Z]$"),
    quantity = as.double(Sample),
  ) |>
  mutate(dilution = replace_na(dilution, 1)) |>
  left_join(
    metadata,
    by = join_by(Sample_ID)
  )

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample)`.
Caused by warning:
! NAs introduced by coercion

glimpse(tidy_data)

Rows: 369
Columns: 35
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Sample.x                <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "NO_…
$ `Amp Score`             <dbl> 1.1701024, 1.0212594, 1.1604009, 1.1057531, 1.…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 32.44077, 35.24757, 33.72678, 34.44077, 34.704…
$ `Cq Confidence`         <dbl> 0.9451320, 0.9773032, 0.9759027, 0.9634213, 0.…
$ `Cq Mean`               <dbl> 33.80504, 33.80504, 33.80504, 34.31332, 34.313…
$ `Cq SD`                 <dbl> 1.4050310, 1.4050310, 1.4050310, 0.4683633, 0.…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04, 0.04…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 32, 30, 31, 31, 31, 31, 31, 31, 21, 21, 21…
$ Sample.y                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Sample                  <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ Sample_ID               <chr> "1A", "1A", "1A", "3C", "3C", "3C", "6B", "6B"…
$ dilution                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ replicate               <chr> "A", "A", "A", "C", "C", "C", "B", "B", "B", N…
$ quantity                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 1000, 1000…
$ Extraction_kit          <chr> "Zymo quick-RNA", "Zymo quick-RNA", "Zymo quic…
$ Short_kit               <chr> "1_ZR", "1_ZR", "1_ZR", "3_IPL", "3_IPL", "3_I…
$ `Kit Batch`             <dbl> 1, 1, 1, 1, 1, 1, 3, 3, 3, NA, NA, NA, 1, 1, 1…
$ Elution_volume          <dbl> 30, 30, 30, 100, 100, 100, 80, 80, 80, NA, NA,…
$ Elution_format          <chr> "15*2", "15*2", "15*2", "50*2", "50*2", "50*2"…
$ Brand                   <chr> "Zymo", "Zymo", "Zymo", "Invitrogen", "Invitro…
$ NA_Target               <chr> "RNA", "RNA", "RNA", "RNA", "RNA", "RNA", "RNA…
$ FP_sampleID             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

tidy_data |>
  count(Sample_ID, Target, dilution, replicate) |>
  print(n = Inf)

# A tibble: 131 × 5
    Sample_ID             Target dilution replicate     n
    <chr>                 <chr>     <int> <chr>     <int>
  1 0.0034500000000000004 16S           1 <NA>          3
  2 0.010000000000000002  Cov2          1 <NA>          3
  3 0.0345                16S           1 <NA>          3
  4 0.1                   Cov2          1 <NA>          3
  5 0.34500000000000003   16S           1 <NA>          3
  6 1.0                   Cov2          1 <NA>          3
  7 1.0                   Noro          1 <NA>          3
  8 10.0                  Cov2          1 <NA>          3
  9 10.0                  Noro          1 <NA>          3
 10 100.0                 Cov2          1 <NA>          3
 11 100.0                 Noro          1 <NA>          3
 12 1000.0                Cov2          1 <NA>          3
 13 1000.0                Noro          1 <NA>          3
 14 10000.0               Noro          1 <NA>          3
 15 1A                    16S           1 A             3
 16 1A                    Cov2          1 A             3
 17 1A                    CrA           1 A             3
 18 1A                    Noro          1 A             3
 19 1B                    16S           1 B             3
 20 1B                    Cov2          1 B             3
 21 1B                    CrA           1 B             3
 22 1B                    Noro          1 B             3
 23 1C                    16S           1 C             3
 24 1C                    Cov2          1 C             3
 25 1C                    CrA           1 C             3
 26 1C                    CrA          10 C             2
 27 1C                    Noro          1 C             3
 28 1C                    Noro         10 C             2
 29 1C                    Noro        100 C             2
 30 2A                    16S           1 A             3
 31 2A                    Cov2          1 A             3
 32 2A                    CrA           1 A             3
 33 2A                    Noro          1 A             3
 34 2B                    16S           1 B             3
 35 2B                    Cov2          1 B             3
 36 2B                    CrA           1 B             3
 37 2B                    Noro          1 B             3
 38 2C                    16S           1 C             3
 39 2C                    Cov2          1 C             3
 40 2C                    CrA           1 C             3
 41 2C                    Noro          1 C             3
 42 3.0E7                 CrA           1 <NA>          3
 43 3.45                  16S           1 <NA>          3
 44 3.4500000000000004E-4 16S           1 <NA>          3
 45 3000.0                CrA           1 <NA>          3
 46 30000.0               CrA           1 <NA>          3
 47 300000.0              CrA           1 <NA>          3
 48 3000000.0             CrA           1 <NA>          3
 49 3A                    16S           1 A             3
 50 3A                    Cov2          1 A             3
 51 3A                    CrA           1 A             3
 52 3A                    Noro          1 A             3
 53 3B                    16S           1 B             3
 54 3B                    Cov2          1 B             3
 55 3B                    CrA           1 B             3
 56 3B                    Noro          1 B             3
 57 3C                    16S           1 C             3
 58 3C                    Cov2          1 C             3
 59 3C                    Cov2         10 C             2
 60 3C                    Cov2        100 C             2
 61 3C                    CrA           1 C             3
 62 3C                    CrA          10 C             2
 63 3C                    CrA         100 C             2
 64 3C                    Noro          1 C             3
 65 3C                    Noro         10 C             1
 66 3C                    Noro        100 C             1
 67 4A                    16S           1 A             3
 68 4A                    Cov2          1 A             3
 69 4A                    CrA           1 A             3
 70 4A                    Noro          1 A             3
 71 4B                    16S           1 B             3
 72 4B                    Cov2          1 B             3
 73 4B                    CrA           1 B             3
 74 4B                    Noro          1 B             3
 75 4C                    16S           1 C             3
 76 4C                    Cov2          1 C             3
 77 4C                    CrA           1 C             3
 78 4C                    Noro          1 C             3
 79 5A                    16S           1 A             3
 80 5A                    Cov2          1 A             3
 81 5A                    CrA           1 A             3
 82 5A                    Noro          1 A             3
 83 5B                    16S           1 B             3
 84 5B                    Cov2          1 B             3
 85 5B                    CrA           1 B             3
 86 5B                    Noro          1 B             3
 87 5C                    16S           1 C             3
 88 5C                    Cov2          1 C             3
 89 5C                    CrA           1 C             3
 90 5C                    Noro          1 C             3
 91 6A                    16S           1 A             3
 92 6A                    Cov2          1 A             3
 93 6A                    CrA           1 A             3
 94 6A                    Noro          1 A             3
 95 6B                    16S           1 B             3
 96 6B                    Cov2          1 B             3
 97 6B                    CrA           1 B             3
 98 6B                    Noro          1 B             3
 99 6C                    16S           1 C             3
100 6C                    Cov2          1 C             3
101 6C                    Cov2         10 C             2
102 6C                    Cov2        100 C             2
103 6C                    CrA           1 C             3
104 6C                    CrA          10 C             2
105 6C                    CrA         100 C             2
106 6C                    Noro          1 C             3
107 6C                    Noro         10 C             1
108 6C                    Noro        100 C             1
109 7A                    16S           1 A             3
110 7A                    Cov2          1 A             3
111 7A                    CrA           1 A             3
112 7A                    Noro          1 A             3
113 7B                    16S           1 B             3
114 7B                    Cov2          1 B             3
115 7B                    CrA           1 B             3
116 7B                    Noro          1 B             3
117 7C                    16S           1 C             3
118 7C                    Cov2          1 C             3
119 7C                    Cov2         10 C             2
120 7C                    Cov2        100 C             2
121 7C                    CrA           1 C             3
122 7C                    CrA          10 C             2
123 7C                    CrA         100 C             4
124 7C                    Noro          1 C             3
125 7C                    Noro         10 C             1
126 7C                    Noro        100 C             1
127 NTC                   Noro          1 C             2
128 empty                 Noro          1 <NA>          6
129 <NA>                  16S           1 <NA>          3
130 <NA>                  Cov2          1 <NA>          3
131 <NA>                  CrA           1 <NA>          2

tidy_data |>
  count(`Result Quality Issues`)

# A tibble: 1 × 2
  `Result Quality Issues`     n
  <lgl>                   <int>
1 NA                        369

Kit comparison

With equal Cq axes

tidy_data |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target)

Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

With free Cq axes

tidy_data |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")

Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

Check that the overlapping points really do have three values

tidy_data |>
  filter(!is.na(Extraction_kit), is.na(Cq)) |>
  count(Target, Extraction_kit, Sample_ID, dilution) |>
  print()

# A tibble: 9 × 5
  Target Extraction_kit                    Sample_ID dilution     n
  <chr>  <chr>                             <chr>        <int> <int>
1 Cov2   Invitrogen PureLink RNA           3C              10     2
2 Cov2   Invitrogen PureLink RNA           3C             100     2
3 Cov2   NucleoSpin Virus                  4B               1     2
4 Cov2   QIAamp Viral RNA mini kit         6A               1     1
5 Cov2   QIAamp Viral RNA mini kit         6C              10     1
6 Cov2   QIAamp Viral RNA mini kit         6C             100     2
7 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7A               1     1
8 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7C              10     2
9 Cov2   Qiagen AllPrep PowerViral DNA/RNA 7C             100     2

tidy_data |>
  filter(
    Extraction_kit == "Zymo quick-RNA",
    Target == "CrA", replicate == "A"
  ) |>
  print(width = Inf)

# A tibble: 3 × 35
   Well `Well Position` Omit  Sample.x Target Task    Reporter Quencher
  <dbl> <chr>           <lgl> <chr>    <chr>  <chr>   <chr>    <chr>   
1     1 A1              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
2     2 A2              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
3     3 A3              FALSE 1A       CrA    UNKNOWN FAM      NFQ-MGB 
  `Amp Status` `Amp Score` `Curve Quality` `Result Quality Issues`    Cq
  <chr>              <dbl> <lgl>           <lgl>                   <dbl>
1 AMP                 1.41 NA              NA                       21.1
2 AMP                 1.43 NA              NA                       21.0
3 AMP                 1.42 NA              NA                       21.1
  `Cq Confidence` `Cq Mean` `Cq SD` `Auto Threshold` Threshold `Auto Baseline`
            <dbl>     <dbl>   <dbl> <lgl>                <dbl> <lgl>          
1           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
2           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
3           0.989      21.1  0.0535 TRUE                 0.175 TRUE           
  `Baseline Start` `Baseline End` Sample.y Sample Sample_ID dilution replicate
             <dbl>          <dbl> <chr>    <chr>  <chr>        <int> <chr>    
1                3             16 <NA>     1A     1A               1 A        
2                3             15 <NA>     1A     1A               1 A        
3                3             16 <NA>     1A     1A               1 A        
  quantity Extraction_kit Short_kit `Kit Batch` Elution_volume Elution_format
     <dbl> <chr>          <chr>           <dbl>          <dbl> <chr>         
1       NA Zymo quick-RNA 1_ZR                1             30 15*2          
2       NA Zymo quick-RNA 1_ZR                1             30 15*2          
3       NA Zymo quick-RNA 1_ZR                1             30 15*2          
  Brand NA_Target FP_sampleID
  <chr> <chr>     <lgl>      
1 Zymo  RNA       NA         
2 Zymo  RNA       NA         
3 Zymo  RNA       NA

Adjusting for elution volume

Assume the amplification efficiency is 100%, so that an increase in initial concentration by a factor of \(x\) decreases \(C_q\) by \(\log_{2}(x)\).

If a method has elution volume \(v\) and we dilute it to total volume \(V\), this reduces its final concentration by a factor \(v / V\). We can put different methods on the same footing by adding \(\log_{2}(v/V)\) to \(C_q\) (so that large elution volumes are penalized with a higher adjusted \(C_q\)).

final_volume <- 100
tidy_data |>
  mutate(elution_adjusted_Cq = Cq + log2(Elution_volume / final_volume)) |>
  filter(
    !is.na(Extraction_kit),
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = elution_adjusted_Cq,
    y = Extraction_kit,
    color = replicate,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x")

Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

Standard curves

tidy_data |>
  filter(
    Task == "STANDARD",
    dilution == 1,
  ) |>
  ggplot(mapping = aes(
    x = quantity,
    y = Cq,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  geom_smooth(method = "lm") +
  scale_x_continuous(trans = "log10") +
  facet_wrap(facets = ~Target, scales = "free_x")

Warning: Removed 4 rows containing non-finite values (`stat_summary()`).

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 4 rows containing non-finite values (`stat_smooth()`).

Not sure what the units of the X-axis are

fits <- tibble()
for (target in unique(tidy_data$Target)) {
  fit <- lm(Cq ~ log10(quantity),
    data = filter(tidy_data, Task == "STANDARD", Target == target)
  ) |>
    tidy() |>
    mutate(Target = target, efficiency = 10^-(1 / estimate) - 1)
  fits <- bind_rows(fits, fit)
}
print(fits |> filter(term == "log10(quantity)"))

# A tibble: 4 × 7
  term            estimate std.error statistic  p.value Target efficiency
  <chr>              <dbl>     <dbl>     <dbl>    <dbl> <chr>       <dbl>
1 log10(quantity)    -2.61    0.143      -18.2 1.23e-10 Cov2         1.42
2 log10(quantity)    -2.85    0.108      -26.4 5.34e-12 Noro         1.24
3 log10(quantity)    -2.59    0.0703     -36.8 1.58e-14 CrA          1.43
4 log10(quantity)    -2.68    0.148      -18.1 1.32e-10 16S          1.36

These match the Design & Analysis software.

Dilution series

tidy_data |>
  filter(Task == "UNKNOWN", dilution > 1) |>
  count(Target, Extraction_kit, dilution) |>
  print(n = Inf)

# A tibble: 21 × 4
   Target Extraction_kit                    dilution     n
   <chr>  <chr>                                <int> <int>
 1 Cov2   Invitrogen PureLink RNA                 10     2
 2 Cov2   Invitrogen PureLink RNA                100     2
 3 Cov2   QIAamp Viral RNA mini kit               10     2
 4 Cov2   QIAamp Viral RNA mini kit              100     2
 5 Cov2   Qiagen AllPrep PowerViral DNA/RNA       10     2
 6 Cov2   Qiagen AllPrep PowerViral DNA/RNA      100     2
 7 CrA    Invitrogen PureLink RNA                 10     2
 8 CrA    Invitrogen PureLink RNA                100     2
 9 CrA    QIAamp Viral RNA mini kit               10     2
10 CrA    QIAamp Viral RNA mini kit              100     2
11 CrA    Qiagen AllPrep PowerViral DNA/RNA       10     2
12 CrA    Qiagen AllPrep PowerViral DNA/RNA      100     4
13 CrA    Zymo quick-RNA                          10     2
14 Noro   Invitrogen PureLink RNA                 10     1
15 Noro   Invitrogen PureLink RNA                100     1
16 Noro   QIAamp Viral RNA mini kit               10     1
17 Noro   QIAamp Viral RNA mini kit              100     1
18 Noro   Qiagen AllPrep PowerViral DNA/RNA       10     1
19 Noro   Qiagen AllPrep PowerViral DNA/RNA      100     1
20 Noro   Zymo quick-RNA                          10     1
21 Noro   Zymo quick-RNA                         100     1

Looks like we have dilution series for four kits.

dilution_kits <- c(
  "Invitrogen PureLink RNA",
  "QIAamp Viral RNA mini kit",
  "Qiagen AllPrep PowerViral DNA/RNA",
  "Zymo quick-RNA"
)
tidy_data |>
  filter(
    Task == "UNKNOWN",
    Target != "16S",
    is.element(Extraction_kit, dilution_kits)
  ) |>
  ggplot(mapping = aes(
    x = dilution,
    y = Cq,
  )) +
  geom_point(mapping = aes(color = replicate)) +
  geom_smooth(method = "lm") +
  scale_x_continuous(trans = "log10") +
  facet_grid(cols = vars(Extraction_kit), rows = vars(Target), scale = "free_y")

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 13 rows containing non-finite values (`stat_smooth()`).

Warning: Removed 13 rows containing missing values (`geom_point()`).

fits <- tibble()
for (target in unique(tidy_data$Target)) {
  for (kit in dilution_kits) {
    fit <- lm(Cq ~ log10(dilution),
      data = filter(tidy_data, Target == target, Extraction_kit == kit)
    ) |>
      tidy() |>
      mutate(
        Target = target,
        Extraction_kit = kit,
        # Changed sign because we have dilution factor not quantity
        efficiency = 10^(1 / estimate) - 1
      )
    fits <- bind_rows(fits, fit)
  }
}
print(fits |> filter(Target != "16S", term == "log10(dilution)"))

# A tibble: 12 × 8
   term  estimate std.error statistic   p.value Target Extraction_kit efficiency
   <chr>    <dbl>     <dbl>     <dbl>     <dbl> <chr>  <chr>               <dbl>
 1 log1…    NA      NA          NA    NA        Cov2   Invitrogen Pu…     NA    
 2 log1…     2.51    0.869       2.88  2.35e- 2 Cov2   QIAamp Viral …      1.51 
 3 log1…    NA      NA          NA    NA        Cov2   Qiagen AllPre…     NA    
 4 log1…    NA      NA          NA    NA        Cov2   Zymo quick-RNA     NA    
 5 log1…     2.92    0.0778     37.6   3.30e-11 Noro   Invitrogen Pu…      1.20 
 6 log1…     3.07    0.151      20.3   7.80e- 9 Noro   QIAamp Viral …      1.12 
 7 log1…     3.20    0.118      27.1   6.24e-10 Noro   Qiagen AllPre…      1.05 
 8 log1…     2.69    0.0772     34.8   1.31e-12 Noro   Zymo quick-RNA      1.35 
 9 log1…     3.59    0.205      17.5   2.24e- 9 CrA    Invitrogen Pu…      0.898
10 log1…     3.58    0.0852     42.0   1.68e-13 CrA    QIAamp Viral …      0.902
11 log1…     2.84    0.330       8.61  9.92e- 7 CrA    Qiagen AllPre…      1.25 
12 log1…     5.14    0.644       7.98  2.25e- 5 CrA    Zymo quick-RNA      0.565

Amplification curves

amp_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = "Amplification Data",
  full.names = TRUE,
) |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind() |>
  mutate(
    replicate = str_extract(Sample, "[A-Z]"),
    Sample_ID = str_split_i(Sample, "/", 1),
    dilution = as.integer(str_split_i(Sample, "/", 2)),
    quantity = as.double(Sample),
  ) |>
  mutate(dilution = replace_na(dilution, 1)) |>
  left_join(
    metadata,
    by = join_by(Sample_ID)
  )

Warning: The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq
The following named parsers don't match the column names: Cq

Warning: There was 1 warning in `mutate()`.
ℹ In argument: `quantity = as.double(Sample)`.
Caused by warning:
! NAs introduced by coercion

print(amp_data)

# A tibble: 14,760 × 20
    Well `Well Position` `Cycle Number` Target    Rn      dRn Sample Omit 
   <dbl> <chr>                    <dbl> <chr>  <dbl>    <dbl> <chr>  <lgl>
 1     1 A1                           1 Cov2   0.877  0.0311  1A     FALSE
 2     1 A1                           2 Cov2   0.870  0.0262  1A     FALSE
 3     1 A1                           3 Cov2   0.856  0.0145  1A     FALSE
 4     1 A1                           4 Cov2   0.848  0.00816 1A     FALSE
 5     1 A1                           5 Cov2   0.844  0.00638 1A     FALSE
 6     1 A1                           6 Cov2   0.838  0.00299 1A     FALSE
 7     1 A1                           7 Cov2   0.832 -0.00128 1A     FALSE
 8     1 A1                           8 Cov2   0.829 -0.00202 1A     FALSE
 9     1 A1                           9 Cov2   0.824 -0.00480 1A     FALSE
10     1 A1                          10 Cov2   0.821 -0.00546 1A     FALSE
# ℹ 14,750 more rows
# ℹ 12 more variables: replicate <chr>, Sample_ID <chr>, dilution <int>,
#   quantity <dbl>, Extraction_kit <chr>, Short_kit <chr>, `Kit Batch` <dbl>,
#   Elution_volume <dbl>, Elution_format <chr>, Brand <chr>, NA_Target <chr>,
#   FP_sampleID <lgl>

amp_data |>
  filter(!is.na(Extraction_kit)) |>
  ggplot(mapping = aes(
    x = `Cycle Number`,
    y = dRn,
    color = as.factor(dilution),
    group = Well
  )) +
  geom_line() +
  facet_grid(
    cols = vars(Extraction_kit), rows = vars(Target), scales = "free_y"
  )

Background fluorescence

raw_qpcr_data <- list.files(
  paste0(data_dir, "qpcr"),
  pattern = "_Multicomponent_",
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    )
  }) |>
  list_rbind() |>
  print()

[1] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Cov2_extractions_Multicomponent_20230915_100200.csv"
[2] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-14_Noro_Extractions_Multicomponent_20230915_100255.csv"
[3] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-15_CrA_extractions_Multicomponent_20230915_105855.csv" 
[4] "/Users/dan/airport/[2023-09-06] Extraction-kit comparison 2: Settled Solids/qpcr/2023-09-18_16S_extractions_Multicomponent_20230918_151203.csv"

Warning: The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq
The following named parsers don't match the column names: Target, Cq

# A tibble: 14,760 × 7
    Well `Well Position` `Stage Number` `Step Number` `Cycle Number`     FAM
   <dbl> <chr>                    <dbl>         <dbl>          <dbl>   <dbl>
 1     1 A1                           2             2              1 326798.
 2     1 A1                           2             2              2 323795.
 3     1 A1                           2             2              3 318092 
 4     1 A1                           2             2              4 314307.
 5     1 A1                           2             2              5 312277.
 6     1 A1                           2             2              6 309855.
 7     1 A1                           2             2              7 307088.
 8     1 A1                           2             2              8 305802.
 9     1 A1                           2             2              9 303988.
10     1 A1                           2             2             10 302549.
# ℹ 14,750 more rows
# ℹ 1 more variable: ROX <dbl>

TODO

~~Boxplots~~
~~Standard curve data~~
~~Diluted samples~~
~~Amplification curves~~
Relabel flipped norovirus dilution samples
Investigate the threshold
Investigate the efficiency caluculations: compare amp curves, standard curves, and dilution series
Check amplification curves, including autothreshold in linear phase. (Possibly adjust)
Check negative controls (may need to use raw Rns rather than delta to avoid baseline subtraction issues)
Be aware of non-control samples with no amplification detected (no Cq values). Summary stats of which this is true for.
Plot baseline start and stop (determines the interval used for baseline subtraction)
Background fluorescence