2023-10-24 Analysis of sludge dissociation test qPCR

Author

Dan Rice

Published

October 24, 2023

Objectives

See Google Doc

Preliminary work

Exported csv files from Olivia’s eds file uploads. Also exported metadata google sheets as CSV

Data import

library(here)
here() starts at /Users/dan/notebook
library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(purrr)
library(stringr)
library(ggplot2)
library(tidyr)
library(broom)
library(ggh4x)
get_plate <- function(f) {
  str_extract(basename(f),
    "(.*)_(.*)_[0-9]{8}_[0-9]{6}\\.csv",
    group = 1
  )
}
data_dir <- here("~", "airport")
experiments <- c("[2023-10-18] New Combined Protocol run-through")
filename_pattern <- "_Results_"
col_types <- list(
  Target = col_character(),
  Cq = col_double(),
  `Treatment Group` = col_character()
)
raw_data <- list.files(
  map_chr(experiments, function(exp) {
    here(data_dir, exp, "qpcr")
  }),
  pattern = filename_pattern,
  recursive = TRUE,
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f, skip = 23, col_types = col_types) |>
      mutate(plate = get_plate(f))
  }) |>
  list_rbind() |>
  glimpse()
[1] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_Cov2Noro_Results_20231024_143610.csv"
[2] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_CrA16S_Results_20231024_143657.csv"  
[3] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_PMMoV_Results_20231024_160308.csv"   
Warning: The following named parsers don't match the column names: Treatment
Group
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Warning: The following named parsers don't match the column names: Treatment
Group
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Warning: The following named parsers don't match the column names: Treatment
Group
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 223
Columns: 22
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Sample                  <chr> "1A", "1A", "1A", "3C", "3C", "3C", "1A", "1A"…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "AMP…
$ `Amp Score`             <dbl> 1.285299, 1.261658, 1.291325, 1.307973, 1.3246…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 33.47677, 33.83380, 33.69172, 33.08026, 32.584…
$ `Cq Confidence`         <dbl> 0.9706938, 0.9862888, 0.9894744, 0.9870891, 0.…
$ `Cq Mean`               <dbl> 33.66743, 33.66743, 33.66743, 32.88368, 32.883…
$ `Cq SD`                 <dbl> 0.17974764, 0.17974764, 0.17974764, 0.26309028…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.09301463, 0.09301463, 0.09301463, 0.09301463…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 29, 29, 29, 28, 28, 24, 25, 24, 24, 24, 24…
$ plate                   <chr> "2023-10-24_Cov2Noro", "2023-10-24_Cov2Noro", …
metadata_file <- here(
  data_dir,
  experiments[1],
  "metadata.csv"
)
metadata <- experiments |>
  map(function(exp) {
    read_csv(here(data_dir, exp, "metadata.csv"), col_types = col_types)
  }) |>
  list_rbind() |>
  glimpse()
Warning: The following named parsers don't match the column names: Target, Cq
Rows: 9
Columns: 12
$ Sample_ID               <chr> "1A", "1B", "1C", "2A", "2B", "2C", "3A", "3B"…
$ `Treatment Group`       <chr> "1", "1", "1", "2", "2", "2", "3", "3", "3"
$ Dissocation             <chr> "Vortex 5m", "Vortex 5m", "Vortex 5m", "Sonica…
$ `Vortex time (min)`     <dbl> 5, 5, 5, 0, 0, 0, 1, 1, 1
$ `Sonication time (min)` <dbl> 0, 0, 0, 1, 1, 1, 1, 1, 1
$ CollectionDate          <date> 2023-10-17, 2023-10-17, 2023-10-17, 2023-10-17…
$ Source                  <chr> "Sludge", "Sludge", "Sludge", "Sludge", "Sludg…
$ `Filtration time (sec)` <dbl> 10, 10, 10, 10, 10, 60, 15, 10, 10
$ `CP Time`               <time> 03:37:00, 04:03:00, 04:00:00, 04:27:00, 04:00:…
$ `Shield Added (uL)`     <dbl> 400, 400, 400, 400, 400, 400, 400, 400, 400
$ `Qubit RNA`             <dbl> 24.2, 24.2, 18.5, 27.1, 27.7, 98.0, 100.0, 32…
$ `Qubit DNA`             <chr> "40", "36.4", "29.9", "59.2", "41.9", "97.2", …
tidy_data <- raw_data |>
  separate_wider_regex(
    `Well Position`,
    c(well_row = "[A-Z]+", well_col = "[0-9]+"),
    cols_remove = FALSE,
  ) |>
  left_join(metadata, by = join_by(Sample == Sample_ID)) |>
  mutate(Target = if_else(Target == "PMMV", "PMMoV", Target)) |>
  glimpse()
Rows: 223
Columns: 35
$ Well                    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ well_row                <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ well_col                <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "…
$ `Well Position`         <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ Sample                  <chr> "1A", "1A", "1A", "3C", "3C", "3C", "1A", "1A"…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "AMP…
$ `Amp Score`             <dbl> 1.285299, 1.261658, 1.291325, 1.307973, 1.3246…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 33.47677, 33.83380, 33.69172, 33.08026, 32.584…
$ `Cq Confidence`         <dbl> 0.9706938, 0.9862888, 0.9894744, 0.9870891, 0.…
$ `Cq Mean`               <dbl> 33.66743, 33.66743, 33.66743, 32.88368, 32.883…
$ `Cq SD`                 <dbl> 0.17974764, 0.17974764, 0.17974764, 0.26309028…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.09301463, 0.09301463, 0.09301463, 0.09301463…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 29, 29, 29, 28, 28, 24, 25, 24, 24, 24, 24…
$ plate                   <chr> "2023-10-24_Cov2Noro", "2023-10-24_Cov2Noro", …
$ `Treatment Group`       <chr> "1", "1", "1", "3", "3", "3", "1", "1", "1", "…
$ Dissocation             <chr> "Vortex 5m", "Vortex 5m", "Vortex 5m", "Vortex…
$ `Vortex time (min)`     <dbl> 5, 5, 5, 1, 1, 1, 5, 5, 5, 1, 1, 1, 5, 5, 5, N…
$ `Sonication time (min)` <dbl> 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, N…
$ CollectionDate          <date> 2023-10-17, 2023-10-17, 2023-10-17, 2023-10-1…
$ Source                  <chr> "Sludge", "Sludge", "Sludge", "Sludge", "Sludg…
$ `Filtration time (sec)` <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10…
$ `CP Time`               <time> 03:37:00, 03:37:00, 03:37:00, 04:13:00, 04:13…
$ `Shield Added (uL)`     <dbl> 400, 400, 400, 400, 400, 400, 400, 400, 400, 4…
$ `Qubit RNA`             <dbl> 24.2, 24.2, 24.2, 25.7, 25.7, 25.7, 24.2, 24.2…
$ `Qubit DNA`             <chr> "40", "40", "40", "29.2", "29.2", "29.2", "40"…
amp_data <- list.files(
  map_chr(experiments, function(exp) {
    here(data_dir, exp, "qpcr")
  }),
  pattern = "Amplification Data",
  recursive = TRUE,
  full.names = TRUE,
) |>
  print() |>
  map(function(f) {
    read_csv(f,
      skip = 23,
      col_types = col_types,
    ) |>
      mutate(plate = get_plate(f))
  }) |>
  list_rbind() |>
  mutate(Target = if_else(Target == "PMMV", "PMMoV", Target)) |>
  left_join(tidy_data,
    by = join_by(plate, Well, `Well Position`, Sample, Omit, Target)
  ) |>
  glimpse()
[1] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_Cov2Noro_Amplification Data_20231024_143610.csv"
[2] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_CrA16S_Amplification Data_20231024_143657.csv"  
[3] "/Users/dan/airport/[2023-10-18] New Combined Protocol run-through/qpcr/2023-10-24_PMMoV_Amplification Data_20231024_160308.csv"   
Warning: The following named parsers don't match the column names: Cq, Treatment Group
The following named parsers don't match the column names: Cq, Treatment Group
The following named parsers don't match the column names: Cq, Treatment Group
Rows: 8,920
Columns: 38
$ Well                    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ `Well Position`         <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1"…
$ `Cycle Number`          <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,…
$ Target                  <chr> "Cov2", "Cov2", "Cov2", "Cov2", "Cov2", "Cov2"…
$ Rn                      <dbl> 0.4810576, 0.4760514, 0.4685638, 0.4619383, 0.…
$ dRn                     <dbl> 0.0255262889, 0.0208546813, 0.0137017026, 0.00…
$ Sample                  <chr> "1A", "1A", "1A", "1A", "1A", "1A", "1A", "1A"…
$ Omit                    <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALS…
$ plate                   <chr> "2023-10-24_Cov2Noro", "2023-10-24_Cov2Noro", …
$ well_row                <chr> "A", "A", "A", "A", "A", "A", "A", "A", "A", "…
$ well_col                <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "…
$ Task                    <chr> "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "U…
$ Reporter                <chr> "FAM", "FAM", "FAM", "FAM", "FAM", "FAM", "FAM…
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "NFQ-MGB", "N…
$ `Amp Status`            <chr> "AMP", "AMP", "AMP", "AMP", "AMP", "AMP", "AMP…
$ `Amp Score`             <dbl> 1.285299, 1.285299, 1.285299, 1.285299, 1.2852…
$ `Curve Quality`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ `Result Quality Issues` <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ Cq                      <dbl> 33.47677, 33.47677, 33.47677, 33.47677, 33.476…
$ `Cq Confidence`         <dbl> 0.9706938, 0.9706938, 0.9706938, 0.9706938, 0.…
$ `Cq Mean`               <dbl> 33.66743, 33.66743, 33.66743, 33.66743, 33.667…
$ `Cq SD`                 <dbl> 0.1797476, 0.1797476, 0.1797476, 0.1797476, 0.…
$ `Auto Threshold`        <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ Threshold               <dbl> 0.09301463, 0.09301463, 0.09301463, 0.09301463…
$ `Auto Baseline`         <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
$ `Baseline Start`        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
$ `Baseline End`          <dbl> 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29, 29…
$ `Treatment Group`       <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "…
$ Dissocation             <chr> "Vortex 5m", "Vortex 5m", "Vortex 5m", "Vortex…
$ `Vortex time (min)`     <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
$ `Sonication time (min)` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ CollectionDate          <date> 2023-10-17, 2023-10-17, 2023-10-17, 2023-10-1…
$ Source                  <chr> "Sludge", "Sludge", "Sludge", "Sludge", "Sludg…
$ `Filtration time (sec)` <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10…
$ `CP Time`               <time> 03:37:00, 03:37:00, 03:37:00, 03:37:00, 03:37…
$ `Shield Added (uL)`     <dbl> 400, 400, 400, 400, 400, 400, 400, 400, 400, 4…
$ `Qubit RNA`             <dbl> 24.2, 24.2, 24.2, 24.2, 24.2, 24.2, 24.2, 24.2…
$ `Qubit DNA`             <chr> "40", "40", "40", "40", "40", "40", "40", "40"…

Quality control

tidy_data |> count(Task, is.na(Cq))
# A tibble: 4 × 3
  Task     `is.na(Cq)`     n
  <chr>    <lgl>       <int>
1 NTC      FALSE           2
2 NTC      TRUE           11
3 STANDARD FALSE          75
4 UNKNOWN  FALSE         135
tidy_data |>
  filter(Task == "NTC", !is.na(Cq)) |>
  glimpse()
Rows: 2
Columns: 35
$ Well                    <dbl> 82, 83
$ well_row                <chr> "G", "G"
$ well_col                <chr> "10", "11"
$ `Well Position`         <chr> "G10", "G11"
$ Omit                    <lgl> FALSE, FALSE
$ Sample                  <chr> NA, NA
$ Target                  <chr> "16S", "16S"
$ Task                    <chr> "NTC", "NTC"
$ Reporter                <chr> "FAM", "FAM"
$ Quencher                <chr> "NFQ-MGB", "NFQ-MGB"
$ `Amp Status`            <chr> "AMP", "AMP"
$ `Amp Score`             <dbl> 1.322672, 1.322650
$ `Curve Quality`         <lgl> NA, NA
$ `Result Quality Issues` <lgl> NA, NA
$ Cq                      <dbl> 30.11587, 30.00533
$ `Cq Confidence`         <dbl> 0.9847199, 0.9853924
$ `Cq Mean`               <dbl> 30.0606, 30.0606
$ `Cq SD`                 <dbl> 0.07816425, 0.07816425
$ `Auto Threshold`        <lgl> TRUE, TRUE
$ Threshold               <dbl> 0.2087637, 0.2087637
$ `Auto Baseline`         <lgl> TRUE, TRUE
$ `Baseline Start`        <dbl> 3, 3
$ `Baseline End`          <dbl> 24, 24
$ plate                   <chr> "2023-10-24_CrA16S", "2023-10-24_CrA16S"
$ `Treatment Group`       <chr> NA, NA
$ Dissocation             <chr> NA, NA
$ `Vortex time (min)`     <dbl> NA, NA
$ `Sonication time (min)` <dbl> NA, NA
$ CollectionDate          <date> NA, NA
$ Source                  <chr> NA, NA
$ `Filtration time (sec)` <dbl> NA, NA
$ `CP Time`               <time> NA, NA
$ `Shield Added (uL)`     <dbl> NA, NA
$ `Qubit RNA`             <dbl> NA, NA
$ `Qubit DNA`             <chr> NA, NA
amp_data |>
  filter(Task == "NTC") |>
  ggplot(aes(x = `Cycle Number`, y = dRn)) +
  geom_line(mapping = aes(
    group = Well,
  )) +
  geom_line(mapping = aes(
    x = `Cycle Number`,
    y = Threshold
  ), color = "Grey") +
  scale_y_log10(limits = c(1e-3, 1e1)) +
  facet_wrap(~ interaction(plate, Target))
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 95 rows containing missing values (`geom_line()`).

There is the usual amplification of 16S from contamination.

All the amplification curves

amp_data |>
  ggplot(aes(x = `Cycle Number`, y = dRn)) +
  geom_line(mapping = aes(
    color = Task,
    group = Well,
  )) +
  geom_line(mapping = aes(
    x = `Cycle Number`,
    y = Threshold
  ), color = "Grey") +
  scale_y_log10(limits = c(1e-3, 1e1)) +
  facet_wrap(vars(Target))
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 139 rows containing missing values (`geom_line()`).

16S unknowns only

Figure out the one outlier curve.

tidy_data |>
  filter(Target == "16S", Task == "UNKNOWN") |>
  filter(Cq == max(Cq))
# A tibble: 1 × 35
   Well well_row well_col `Well Position` Omit  Sample Target Task    Reporter
  <dbl> <chr>    <chr>    <chr>           <lgl> <chr>  <chr>  <chr>   <chr>   
1     7 A        7        A7              FALSE 1A     16S    UNKNOWN FAM     
# ℹ 26 more variables: Quencher <chr>, `Amp Status` <chr>, `Amp Score` <dbl>,
#   `Curve Quality` <lgl>, `Result Quality Issues` <lgl>, Cq <dbl>,
#   `Cq Confidence` <dbl>, `Cq Mean` <dbl>, `Cq SD` <dbl>,
#   `Auto Threshold` <lgl>, Threshold <dbl>, `Auto Baseline` <lgl>,
#   `Baseline Start` <dbl>, `Baseline End` <dbl>, plate <chr>,
#   `Treatment Group` <chr>, Dissocation <chr>, `Vortex time (min)` <dbl>,
#   `Sonication time (min)` <dbl>, CollectionDate <date>, Source <chr>, …
amp_data |>
  filter(Target == "16S", Task == "UNKNOWN") |>
  ggplot(aes(x = `Cycle Number`, y = dRn)) +
  geom_line(mapping = aes(
    color = Well == 7,
    group = Well,
  )) +
  geom_line(mapping = aes(
    x = `Cycle Number`,
    y = Threshold
  ), color = "Grey") +
  scale_y_log10(limits = c(1e-3, 1e1))
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 31 rows containing missing values (`geom_line()`).

This one well looks weird:

amp_data |>
  filter(Target == "16S", Task == "UNKNOWN") |>
  ggplot(aes(x = `Cycle Number`, y = Rn)) +
  geom_line(mapping = aes(
    color = Well == 7,
    group = Well,
  )) +
  scale_y_log10() # limits = c(1e-3, 1e1))

PMMoV only

One technical replicate is an outlier for low Cq:

amp_data |>
  filter(Target == "PMMoV", Task == "UNKNOWN", `Treatment Group` == 3) |>
  ggplot(aes(x = `Cycle Number`, y = dRn)) +
  geom_line(mapping = aes(
    color = Sample,
    group = Well,
  )) +
  geom_line(mapping = aes(
    x = `Cycle Number`,
    y = Threshold
  ), color = "Grey") +
  scale_y_log10(limits = c(1e-3, 1e1))
Warning in self$trans$transform(x): NaNs produced
Warning: Transformation introduced infinite values in continuous y-axis
Warning: Removed 45 rows containing missing values (`geom_line()`).

Amp curve looks ok

Compare methods

Using ggh4x to fix the scales to have the same spacing. Currently doing manually, but should automate.

scales <- list(
  scale_x_continuous(limits = c(21, 24)),
  scale_x_continuous(limits = c(32, 35)),
  scale_x_continuous(limits = c(21, 24)),
  scale_x_continuous(limits = c(27, 30)),
  scale_x_continuous(limits = c(22, 25))
)
tidy_data |>
  filter(Task == "UNKNOWN") |>
  ggplot(mapping = aes(
    x = Cq,
    y = Dissocation,
    color = Sample,
  )) +
  stat_summary(
    fun.min = min,
    fun.max = max,
    fun = median,
    position = position_dodge(width = 0.2),
    size = 0.2
  ) +
  facet_wrap(facets = ~Target, scales = "free_x") +
  facetted_pos_scales(x = scales)