Class 18: Pertussis and the CMI-PB project

Cecilia Wang (A18625854)

Background

Pertussis (whopping cough) is a common lung infection caused by the bacteria B.Pertussis. It can infect everyone but is most deadly for infants (under 1 year of age)

CDC tracking data

The CDC tracks the number of Pertussis cases:

library(ggplot2)

plot for Year vs.Number of Cases

ggplot(cdc) +
  aes(Year,Cases) +
  geom_point() +
  geom_line()+ theme_light()

Q. Using the ggplot geom_vline() function add lines to your previous plot for the 1946 introduction of the wP vaccine and the 1996 switch to aP vaccine (see example in the hint below).

ggplot(cdc) +
  aes(Year,Cases) +
  geom_point() +
  geom_line()+
  geom_vline(xintercept = 1946, col="blue",lty=2)+
  geom_vline(xintercept = 1996, col="red",lty=2)+
  geom_vline(xintercept = 2020, col="purple",lty=2)+
  theme_light()

After introduce the wP vaccine, the number of cases per year significantly decreased. However, after introduce aP vaccine, the number of cases per year slightly increase than before. This could due to multiple reasons: less newborn getting vaccine; the aP vaccine is less effective than wP vaccine due to the fewer antigens; waning immunity (immunity from aP vaccine decline faster than wP vaccine).

Exporing CMI-PB Data

The CMI-PB project’s https://www.cmi-pb.org/ mission is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of Pertussis booster vaccination.

Basically, make a available large dataset on immune response to Pertussis. They use a booster vaccination as a proxy for Pertussis infection.

They make their data available as JSON format API. We can read this into R with the read_json() function from the jsonlite package.

library(jsonlite)
subject <- read_json("https://www.cmi-pb.org/api/subject", simplifyVector = TRUE) 
head(subject, 3)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        2          wP         Female Not Hispanic or Latino White
        3          wP         Female                Unknown White
  year_of_birth date_of_boost      dataset
  1986-01-01    2016-09-12 2020_dataset
  1968-01-01    2019-01-28 2020_dataset
  1983-01-01    2016-10-10 2020_dataset

Q. How many aP and wP individuals are there in this subject table?

Q. How many male and female are there?

Q. What is the break down of gender and race?

table(subject$infancy_vac)

aP wP 
87 85 

table(subject$biological_sex)

Female   Male 
   112     60 

table(subject$race, subject$biological_sex) 

                                            Female Male
  American Indian/Alaska Native                  0    1
  Asian                                         32   12
  Black or African American                      2    3
  More Than One Race                            15    4
  Native Hawaiian or Other Pacific Islander      1    1
  Unknown or Not Reported                       14    7
  White                                         48   32

specimen <- read_json("http://cmi-pb.org/api/v5_1/specimen", simplifyVector = TRUE)
ab_titer <- read_json("http://cmi-pb.org/api/v5_1/plasma_ab_titer", simplifyVector = TRUE)

head(specimen,3)

  specimen_id subject_id actual_day_relative_to_boost
         1          1                           -3
         2          1                            1
         3          1                            3
  planned_day_relative_to_boost specimen_type visit
                           0         Blood     1
                           1         Blood     2
                           3         Blood     3

head(ab_titer,3)

  specimen_id isotype is_antigen_specific antigen        MFI MFI_normalised
         1     IgE               FALSE   Total 1110.21154       2.493425
         1     IgE               FALSE   Total 2708.91616       2.493425
         1     IgG                TRUE      PT   68.56614       3.736992
   unit lower_limit_of_detection
UG/ML                 2.096133
IU/ML                29.170000
IU/ML                 0.530000

To make sense of all this data we need to “join” (a.k.a. “merge” or “link”) all these tables together. Only then will you know that a given Ab measurement (from the ab_titier table ) was collected on a certain data (from the spicement table ) from a certain wP or aP subject ( the subject table).

We can use dplyr and the *_join() family of functions to do this.

library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Join specimen and subject tables to make a new merged data frame

meta <- inner_join(subject,specimen)

Joining with `by = join_by(subject_id)`

head(meta)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
  year_of_birth date_of_boost      dataset specimen_id
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           2
  1986-01-01    2016-09-12 2020_dataset           3
  1986-01-01    2016-09-12 2020_dataset           4
  1986-01-01    2016-09-12 2020_dataset           5
  1986-01-01    2016-09-12 2020_dataset           6
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                          1                             1         Blood
                          3                             3         Blood
                          7                             7         Blood
                         11                            14         Blood
                         32                            30         Blood
  visit
   1
   2
   3
   4
   5
   6

One more inner_join() to join ab_titer with our metadata.

abdata <- inner_join(meta,ab_titer)

Joining with `by = join_by(specimen_id)`

head(abdata)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
  year_of_birth date_of_boost      dataset specimen_id
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
  visit isotype is_antigen_specific antigen        MFI MFI_normalised  unit
   1     IgE               FALSE   Total 1110.21154       2.493425 UG/ML
   1     IgE               FALSE   Total 2708.91616       2.493425 IU/ML
   1     IgG                TRUE      PT   68.56614       3.736992 IU/ML
   1     IgG                TRUE     PRN  332.12718       2.602350 IU/ML
   1     IgG                TRUE     FHA 1887.12263      34.050956 IU/ML
   1     IgE                TRUE     ACT    0.10000       1.000000 IU/ML
  lower_limit_of_detection
               2.096133
              29.170000
               0.530000
               6.205949
               4.679535
               2.816431

Q. How many different Ab “isotypes” are in this dataset?
Q. How many different “antigen” values are measured?

table(abdata$isotype)

  IgE   IgG  IgG1  IgG2  IgG3  IgG4 
 6698  7265 11993 12000 12000 12000 

table(abdata$antigen)

    ACT   BETV1      DT   FELD1     FHA  FIM2/3   LOLP1     LOS Measles     OVA 
   1970    1970    6318    1970    6712    6318    1970    1970    1970    6318 
    PD1     PRN      PT     PTM   Total      TT 
   1970    6712    6712    1970     788    6318 

Let’s focus on IgG isotype

igg <- abdata |> 
  filter(isotype=="IgG")
head(igg)

  subject_id infancy_vac biological_sex              ethnicity  race
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
        1          wP         Female Not Hispanic or Latino White
  year_of_birth date_of_boost      dataset specimen_id
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           1
  1986-01-01    2016-09-12 2020_dataset           2
  1986-01-01    2016-09-12 2020_dataset           2
  1986-01-01    2016-09-12 2020_dataset           2
  actual_day_relative_to_boost planned_day_relative_to_boost specimen_type
                         -3                             0         Blood
                         -3                             0         Blood
                         -3                             0         Blood
                          1                             1         Blood
                          1                             1         Blood
                          1                             1         Blood
  visit isotype is_antigen_specific antigen        MFI MFI_normalised  unit
   1     IgG                TRUE      PT   68.56614       3.736992 IU/ML
   1     IgG                TRUE     PRN  332.12718       2.602350 IU/ML
   1     IgG                TRUE     FHA 1887.12263      34.050956 IU/ML
   2     IgG                TRUE      PT   41.38442       2.255534 IU/ML
   2     IgG                TRUE     PRN  174.89761       1.370393 IU/ML
   2     IgG                TRUE     FHA  246.00957       4.438960 IU/ML
  lower_limit_of_detection
               0.530000
               6.205949
               4.679535
               0.530000
               6.205949
               4.679535

Make a plot of MFI-normalized values for all antigen values.

ggplot(igg) +
  aes(MFI_normalised, antigen) +
  geom_boxplot() 

The antigens “PT”, “PRN”, FIM2/3”, and “FHA” appear to have the widest range of values.

Q. Is there a differnce for these responses between aP and wP individuals?

ggplot(igg) +
  aes(MFI_normalised, antigen, col=infancy_vac) +
  geom_boxplot() 

ggplot(igg) +
  aes(MFI_normalised, antigen) +
  geom_boxplot() +
  facet_wrap(~infancy_vac)

Q. Is there a difference with time (i.e. before booster shot vs after booster shot)

ggplot(igg) +
  aes(MFI_normalised, antigen, col=infancy_vac) +
  geom_boxplot() +
  facet_wrap(~visit)

abdata.21 <- abdata %>% filter(dataset == "2021_dataset")

abdata.21 %>% 
  filter(isotype == "IgG",  antigen == "PT") %>%
  ggplot() +
    aes(x=planned_day_relative_to_boost,
        y=MFI_normalised,
        col=infancy_vac,
        group=subject_id) +
    geom_point() +
    geom_line() +
    geom_vline(xintercept=0, linetype="dashed") +
    geom_vline(xintercept=14, linetype="dashed") +
  labs(title="2021 dataset IgG PT",
       subtitle = "Dashed lines indicate day 0 (pre-boost) and 14 (apparent peak levels)") +
  geom_smooth(aes(group = infancy_vac, color = infancy_vac), se = FALSE)

`geom_smooth()` using method = 'loess' and formula = 'y ~ x'