epicontacts
Class: Details
Regarding the Data Structure for epicontacts
ObjectsThe epicontacts
data
structure is useful for epidemiological network analysis of cases and
contacts. Data partitioned as line list and contact
list formats can be coerced to the epicontacts
class
in order to facilitate manipulation, visualization and analysis.
Using a simulated ebola outbreak dataset from the
outbreaks package, this vignette will explore how to
create an epicontacts
object and use several generic
methods to work with the data.
make_epicontacts()
make_epicontacts()
creates the epicontacts
data structure. The function accepts arguments for:
Before creating an epicontacts
object, it may be helpful
to examine the structure of the line list and contact data. The example
that follows uses the ebola_sim
data loaded from the
outbreaks package.
## List of 2
## $ linelist:'data.frame': 5888 obs. of 11 variables:
## ..$ case_id : chr [1:5888] "d1fafd" "53371b" "f5c3d8" "6c286a" ...
## ..$ generation : int [1:5888] 0 1 1 2 2 0 3 3 2 3 ...
## ..$ date_of_infection : Date[1:5888], format: NA "2014-04-09" ...
## ..$ date_of_onset : Date[1:5888], format: "2014-04-07" "2014-04-15" ...
## ..$ date_of_hospitalisation: Date[1:5888], format: "2014-04-17" "2014-04-20" ...
## ..$ date_of_outcome : Date[1:5888], format: "2014-04-19" NA ...
## ..$ outcome : Factor w/ 2 levels "Death","Recover": NA NA 2 1 2 NA 2 1 2 1 ...
## ..$ gender : Factor w/ 2 levels "f","m": 1 2 1 1 1 1 1 1 2 2 ...
## ..$ hospital : Factor w/ 11 levels "Connaught Hopital",..: 4 2 7 NA 7 NA 2 9 7 11 ...
## ..$ lon : num [1:5888] -13.2 -13.2 -13.2 -13.2 -13.2 ...
## ..$ lat : num [1:5888] 8.47 8.46 8.48 8.46 8.45 ...
## $ contacts:'data.frame': 3800 obs. of 3 variables:
## ..$ infector: chr [1:3800] "d1fafd" "cac51e" "f5c3d8" "0f58c4" ...
## ..$ case_id : chr [1:3800] "53371b" "f5c3d8" "0f58c4" "881bd4" ...
## ..$ source : Factor w/ 2 levels "funeral","other": 2 1 2 2 2 1 2 2 2 2 ...
ebola_sim
is a list with two data frames, which contain
the line list and contacts respectively. The line list data frame
already has a unique identifier for cases in the first column, and the
contacts data has the individual contacts represented in the first and
second columns. Note that if the input data were not formatted as such,
the id, from and to arguments allow for
explicit definition of the columns that contain these attributes.
Assuming this network of contacts is directed, the following call to
make_epicontacts
will generate an epicontacts
object:
x <- make_epicontacts(linelist = ebola_sim$linelist, contacts = ebola_sim$contacts, directed = TRUE)
Use class()
to confirm that
make_epicontacts()
worked:
## [1] "epicontacts"
epicontacts
objets are at their core list
objects.
## [1] TRUE
As with other lists, the named elements of the
epicontacts
data structure can be easily accessed with the
$
operator.
$linelist
## id generation date_of_infection date_of_onset date_of_hospitalisation
## 1 d1fafd 0 <NA> 2014-04-07 2014-04-17
## 2 53371b 1 2014-04-09 2014-04-15 2014-04-20
## 3 f5c3d8 1 2014-04-18 2014-04-21 2014-04-25
## 4 6c286a 2 <NA> 2014-04-27 2014-04-27
## 5 0f58c4 2 2014-04-22 2014-04-26 2014-04-29
## 6 49731d 0 2014-03-19 2014-04-25 2014-05-02
## date_of_outcome outcome gender hospital lon lat
## 1 2014-04-19 <NA> f Military Hospital -13.21799 8.473514
## 2 <NA> <NA> m Connaught Hospital -13.21491 8.464927
## 3 2014-04-30 Recover f other -13.22804 8.483356
## 4 2014-05-07 Death f <NA> -13.23112 8.464776
## 5 2014-05-17 Recover f other -13.21016 8.452143
## 6 2014-05-07 <NA> f <NA> -13.23443 8.468572
$contacts
## from to source
## 2 d1fafd 53371b other
## 3 cac51e f5c3d8 funeral
## 5 f5c3d8 0f58c4 other
## 8 0f58c4 881bd4 other
## 11 8508df 40ae5f other
## 12 127d83 f547d6 funeral
The epicontacts
data structure enables some convenient
implementations of “generic” functions in R. These functions
(plot()
, print()
, summary()
,
etc.) behave differently depending on the class of the input.
print.epicontacts()
Using the name of an object (or the print()
function
explicitly) will invoke the print method in R. For the
epicontacts
data structure, printing is conveniently
trimmed to show how many cases (rows in the line list) and how many
contacts (rows in the contact list), as well as a glimpse of the first
10 rows of each data frame.
##
## /// Epidemiological Contacts //
##
## // class: epicontacts
## // 5,888 cases in linelist; 3,800 contacts; directed
##
## // linelist
##
## # A tibble: 5,888 × 11
## id generation date_of_infection date_of_onset date_of_hospitalisation
## <chr> <int> <date> <date> <date>
## 1 d1fafd 0 NA 2014-04-07 2014-04-17
## 2 53371b 1 2014-04-09 2014-04-15 2014-04-20
## 3 f5c3d8 1 2014-04-18 2014-04-21 2014-04-25
## 4 6c286a 2 NA 2014-04-27 2014-04-27
## 5 0f58c4 2 2014-04-22 2014-04-26 2014-04-29
## 6 49731d 0 2014-03-19 2014-04-25 2014-05-02
## 7 f9149b 3 NA 2014-05-03 2014-05-04
## 8 881bd4 3 2014-04-26 2014-05-01 2014-05-05
## 9 e66fa4 2 NA 2014-04-21 2014-05-06
## 10 20b688 3 NA 2014-05-05 2014-05-06
## # ℹ 5,878 more rows
## # ℹ 6 more variables: date_of_outcome <date>, outcome <fct>, gender <fct>,
## # hospital <fct>, lon <dbl>, lat <dbl>
##
## // contacts
##
## # A tibble: 3,800 × 3
## from to source
## <chr> <chr> <fct>
## 1 d1fafd 53371b other
## 2 cac51e f5c3d8 funeral
## 3 f5c3d8 0f58c4 other
## 4 0f58c4 881bd4 other
## 5 8508df 40ae5f other
## 6 127d83 f547d6 funeral
## 7 f5c3d8 d58402 other
## 8 20b688 d8a13d other
## 9 2ae019 a3c8b8 other
## 10 20b688 974bc1 other
## # ℹ 3,790 more rows
summary.epicontacts()
The summary method provides descriptive information regarding the dimensions and relationship between the line list and contact list (i.e. how many ids they share).
##
## /// Overview //
## // number of unique IDs in linelist: 5888
## // number of unique IDs in contacts: 5511
## // number of unique IDs in both: 4352
## // number of contacts: 3800
## // contacts with both cases in linelist: 56.868 %
##
## /// Degrees of the network //
## // in-degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.6895 1.0000 1.0000
##
## // out-degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.6895 1.0000 6.0000
##
## // in and out degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.379 2.000 7.000
##
## /// Attributes //
## // attributes in linelist:
## generation date_of_infection date_of_onset date_of_hospitalisation date_of_outcome outcome gender hospital lon lat
subset.epicontacts()
With this method, one can reduce the size of the
epicontacts
object by filtering rows based on explicit
values in the line list (node) and contact list (edge) components. For
more on how to parameterize the subset, see
?subset.epicontacts
.
nb this function returns an epicontacts
object, which can in turn be passed to another generic method.
rokupafuneral <- subset(x,
node_attribute = list("hospital" = "Rokupa Hospital"),
edge_attribute = list("source" = "funeral"))
##
## /// Overview //
## // number of unique IDs in linelist: 443
## // number of unique IDs in contacts: 1019
## // number of unique IDs in both: 45
## // number of contacts: 572
## // contacts with both cases in linelist: 0 %
##
## /// Degrees of the network //
## // in-degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 1.0000 0.5613 1.0000 1.0000
##
## // out-degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.5613 1.0000 4.0000
##
## // in and out degree summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.123 1.000 4.000
##
## /// Attributes //
## // attributes in linelist:
## generation date_of_infection date_of_onset date_of_hospitalisation date_of_outcome outcome gender hospital lon lat
plot.epicontacts()
By default, passing an epicontacts
object into the plot
function is effectively the same as using
vis_epicontacts()
, and will generate an interactive
visualiztion of the network of cases and contacts. Note that this method
includes a number of options to customize the plot. For more see
?vis_epicontacts
.