GGplot 3: Advanced charts and graphs

Morley J Weston

This week, we’re going to be working on one larger project; re-creating and improving a spiral-shaped plot of COVID-19 cases, published In the New York Times several years ago.

The original graphic.

There were several criticisms of the chart.

  1. The x axis is longer for each year;
the length of the line for 2020 is shorter than the one for 2021,
making it hard to compare infection rates over time
  1. The spiral is off-center
and isn’t centered on the page, making the whole thing seem a bit
janky.

While there have been several remixes of the original, we’ll today focus on recreating a version of the spiral chart.

This will involve integrating our data processing into the chart, as well as doing some data transformations to make ggplot do some things that aren’t “out of the box”.

Getting the data

To start, we’ll use the COVID19 package to get some data on COVID-19 cases. You can find better, more up-to-date data elsewhere, but the data set gets really big and unwieldy.

R
1
install.packages("COVID19")

The package contains one function, covid19(), which returns a data frame with COVID-19 data. Let’s load the data and see what we’re working with.

R
1
library(tidyverse)
2
library(COVID19)
3
4
covid_data = covid19()
R
1
covid_data |> glimpse()
Rows: 287,783
Columns: 47
$ id                                  <chr> "8320791a", "8320791a", "8320791a"…
$ date                                <date> 2020-03-13, 2020-03-14, 2020-03-1…
$ confirmed                           <int> 2, 2, 2, 2, 8, 9, 9, 10, 10, 13, 1…
$ deaths                              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ recovered                           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ tests                               <int64> NA, NA, NA, NA, NA, NA, NA, NA, …
$ vaccines                            <int64> NA, NA, NA, NA, NA, NA, NA, NA, …
$ people_vaccinated                   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ people_fully_vaccinated             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ hosp                                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ icu                                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ vent                                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ school_closing                      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ workplace_closing                   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ cancel_events                       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ gatherings_restrictions             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ transport_closing                   <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ stay_home_restrictions              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ internal_movement_restrictions      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ international_movement_restrictions <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ information_campaigns               <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ testing_policy                      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ contact_tracing                     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ facial_coverings                    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ vaccination_policy                  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ elderly_people_protection           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ government_response_index           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ stringency_index                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ containment_health_index            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ economic_support_index              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ administrative_area_level           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ administrative_area_level_1         <chr> "Grand Princess", "Grand Princess"…
$ administrative_area_level_2         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ administrative_area_level_3         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ latitude                            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ longitude                           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ population                          <int> 3533, 3533, 3533, 3533, 3533, 3533…
$ iso_alpha_3                         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ iso_alpha_2                         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ iso_numeric                         <int> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ iso_currency                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_local                           <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_google_mobility                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_apple_mobility                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_jhu_csse                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_nuts                            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ key_gadm                            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA…

Looking back on the chart we’re trying to recreate, we can see that really the only data we need is the date and the number of confirmed cases. This data set also lists every country, so we’ll filter it down to just one country. I’m going to go with Switzerland. We’ll then select the date and the number of confirmed cases, as that’s the only data we’ll need in the end.

R
1
ch_covid_data <- covid_data |> 
2
  filter(administrative_area_level_1 == "Switzerland") |> 
3
  select(date, confirmed)
4
5
ch_covid_data 
date confirmed
1    2020-01-22        NA
2    2020-01-23        NA
3    2020-01-24        NA
4    2020-01-25        NA
5    2020-01-26        NA
6    2020-01-27        NA
7    2020-01-28        NA
8    2020-01-29        NA
9    2020-01-30        NA
10   2020-01-31        NA
11   2020-02-01        NA
12   2020-02-02        NA
13   2020-02-03        NA
14   2020-02-04        NA
15   2020-02-05        NA
16   2020-02-06        NA
17   2020-02-07        NA
18   2020-02-08        NA
19   2020-02-09        NA
20   2020-02-10        NA
21   2020-02-11        NA
22   2020-02-12        NA
23   2020-02-13        NA
24   2020-02-14        NA
25   2020-02-15        NA
26   2020-02-16        NA
27   2020-02-17        NA
28   2020-02-18        NA
29   2020-02-19        NA
30   2020-02-20        NA
31   2020-02-21        NA
32   2020-02-22        NA
33   2020-02-23        NA
34   2020-02-24         1
35   2020-02-25         2
36   2020-02-26        12
37   2020-02-27        22
38   2020-02-28        32
39   2020-02-29        45
40   2020-03-01        57
41   2020-03-02        87
42   2020-03-03       120
43   2020-03-04       181
44   2020-03-05       243
45   2020-03-06       316
46   2020-03-07       365
47   2020-03-08       434
48   2020-03-09       625
49   2020-03-10       835
50   2020-03-11      1168
51   2020-03-12      1525
52   2020-03-13      1956
53   2020-03-14      2373
54   2020-03-15      2699
55   2020-03-16      3759
56   2020-03-17      4841
57   2020-03-18      6047
58   2020-03-19      6881
59   2020-03-20      8020
60   2020-03-21      8710
61   2020-03-22      9257
62   2020-03-23     10719
63   2020-03-24     11961
64   2020-03-25     13029
65   2020-03-26     14143
66   2020-03-27     15448
67   2020-03-28     16170
68   2020-03-29     16602
69   2020-03-30     17909
70   2020-03-31     19041
71   2020-04-01     20054
72   2020-04-02     20931
73   2020-04-03     21859
74   2020-04-04     22345
75   2020-04-05     22626
76   2020-04-06     23554
77   2020-04-07     24267
78   2020-04-08     24879
79   2020-04-09     25439
80   2020-04-10     25749
81   2020-04-11     26007
82   2020-04-12     26226
83   2020-04-13     26471
84   2020-04-14     26895
85   2020-04-15     27222
86   2020-04-16     27543
87   2020-04-17     27832
88   2020-04-18     27977
89   2020-04-19     28063
90   2020-04-20     28340
91   2020-04-21     28553
92   2020-04-22     28708
93   2020-04-23     28915
94   2020-04-24     29082
95   2020-04-25     29169
96   2020-04-26     29230
97   2020-04-27     29403
98   2020-04-28     29557
99   2020-04-29     29677
100  2020-04-30     29778
101  2020-05-01     29861
102  2020-05-02     29908
103  2020-05-03     29930
104  2020-05-04     30019
105  2020-05-05     30092
106  2020-05-06     30171
107  2020-05-07     30217
108  2020-05-08     30276
109  2020-05-09     30315
110  2020-05-10     30330
111  2020-05-11     30377
112  2020-05-12     30419
113  2020-05-13     30456
114  2020-05-14     30493
115  2020-05-15     30534
116  2020-05-16     30549
117  2020-05-17     30561
118  2020-05-18     30595
119  2020-05-19     30631
120  2020-05-20     30660
121  2020-05-21     30671
122  2020-05-22     30691
123  2020-05-23     30704
124  2020-05-24     30714
125  2020-05-25     30729
126  2020-05-26     30747
127  2020-05-27     30772
128  2020-05-28     30802
129  2020-05-29     30823
130  2020-05-30     30833
131  2020-05-31     30840
132  2020-06-01     30843
133  2020-06-02     30865
134  2020-06-03     30888
135  2020-06-04     30905
136  2020-06-05     30920
137  2020-06-06     30930
138  2020-06-07     30938
139  2020-06-08     30963
140  2020-06-09     30974
141  2020-06-10     30998
142  2020-06-11     31023
143  2020-06-12     31057
144  2020-06-13     31066
145  2020-06-14     31078
146  2020-06-15     31110
147  2020-06-16     31131
148  2020-06-17     31149
149  2020-06-18     31167
150  2020-06-19     31204
151  2020-06-20     31236
152  2020-06-21     31249
153  2020-06-22     31305
154  2020-06-23     31342
155  2020-06-24     31406
156  2020-06-25     31455
157  2020-06-26     31548
158  2020-06-27     31584
159  2020-06-28     31608
160  2020-06-29     31769
161  2020-06-30     31908
162  2020-07-01     32005
163  2020-07-02     32132
164  2020-07-03     32228
165  2020-07-04     32254
166  2020-07-05     32290
167  2020-07-06     32396
168  2020-07-07     32503
169  2020-07-08     32600
170  2020-07-09     32697
171  2020-07-10     32808
172  2020-07-11     32879
173  2020-07-12     32926
174  2020-07-13     33046
175  2020-07-14     33175
176  2020-07-15     33298
177  2020-07-16     33404
178  2020-07-17     33519
179  2020-07-18     33599
180  2020-07-19     33641
181  2020-07-20     33768
182  2020-07-21     33906
183  2020-07-22     34023
184  2020-07-23     34166
185  2020-07-24     34313
186  2020-07-25     34408
187  2020-07-26     34465
188  2020-07-27     34658
189  2020-07-28     34847
190  2020-07-29     35067
191  2020-07-30     35279
192  2020-07-31     35443
193  2020-08-01     35540
194  2020-08-02     35620
195  2020-08-03     35816
196  2020-08-04     35997
197  2020-08-05     36149
198  2020-08-06     36305
199  2020-08-07     36490
200  2020-08-08     36620
201  2020-08-09     36720
202  2020-08-10     37001
203  2020-08-11     37256
204  2020-08-12     37502
205  2020-08-13     37776
206  2020-08-14     38008
207  2020-08-15     38157
208  2020-08-16     38294
209  2020-08-17     38579
210  2020-08-18     38873
211  2020-08-19     39158
212  2020-08-20     39507
213  2020-08-21     39821
214  2020-08-22     40013
215  2020-08-23     40149
216  2020-08-24     40453
217  2020-08-25     40791
218  2020-08-26     41153
219  2020-08-27     41505
220  2020-08-28     41882
221  2020-08-29     42097
222  2020-08-30     42241
223  2020-08-31     42588
224  2020-09-01     42952
225  2020-09-02     43369
226  2020-09-03     43807
227  2020-09-04     44234
228  2020-09-05     44529
229  2020-09-06     44699
230  2020-09-07     45166
231  2020-09-08     45553
232  2020-09-09     46015
233  2020-09-10     46522
234  2020-09-11     47055
235  2020-09-12     47321
236  2020-09-13     47559
237  2020-09-14     48063
238  2020-09-15     48520
239  2020-09-16     49036
240  2020-09-17     49535
241  2020-09-18     49980
242  2020-09-19     50264
243  2020-09-20     50464
244  2020-09-21     50859
245  2020-09-22     51285
246  2020-09-23     51674
247  2020-09-24     51984
248  2020-09-25     52343
249  2020-09-26     52565
250  2020-09-27     52685
251  2020-09-28     53054
252  2020-09-29     53503
253  2020-09-30     54043
254  2020-10-01     54645
255  2020-10-02     55322
256  2020-10-03     55834
257  2020-10-04     56146
258  2020-10-05     57165
259  2020-10-06     58263
260  2020-10-07     59640
261  2020-10-08     61113
262  2020-10-09     62836
263  2020-10-10     64088
264  2020-10-11     64899
265  2020-10-12     67457
266  2020-10-13     70145
267  2020-10-14     73143
268  2020-10-15     76537
269  2020-10-16     80217
270  2020-10-17     82596
271  2020-10-18     84380
272  2020-10-19     89778
273  2020-10-20     95456
274  2020-10-21    101912
275  2020-10-22    108684
276  2020-10-23    116186
277  2020-10-24    120848
278  2020-10-25    124058
279  2020-10-26    134185
280  2020-10-27    143929
281  2020-10-28    153208
282  2020-10-29    162182
283  2020-10-30    171125
284  2020-10-31    176253
285  2020-11-01    179879
286  2020-11-02    190396
287  2020-11-03    199993
288  2020-11-04    208705
289  2020-11-05    216532
290  2020-11-06    223908
291  2020-11-07    228095
292  2020-11-08    231074
293  2020-11-09    239753
294  2020-11-10    247142
295  2020-11-11    253501
296  2020-11-12    259464
297  2020-11-13    265145
298  2020-11-14    268349
299  2020-11-15    270664
300  2020-11-16    277242
301  2020-11-17    282658
302  2020-11-18    287513
303  2020-11-19    292003
304  2020-11-20    296331
305  2020-11-21    298904
306  2020-11-22    300776
307  2020-11-23    306210
308  2020-11-24    310643
309  2020-11-25    314936
310  2020-11-26    318885
311  2020-11-27    322952
312  2020-11-28    325262
313  2020-11-29    326953
314  2020-11-30    332220
315  2020-12-01    336472
316  2020-12-02    340895
317  2020-12-03    345214
318  2020-12-04    349773
319  2020-12-05    352291
320  2020-12-06    354230
321  2020-12-07    360057
322  2020-12-08    364726
323  2020-12-09    369966
324  2020-12-10    374603
325  2020-12-11    379281
326  2020-12-12    382144
327  2020-12-13    384230
328  2020-12-14    390145
329  2020-12-15    395311
330  2020-12-16    400129
331  2020-12-17    404515
332  2020-12-18    408857
333  2020-12-19    411534
334  2020-12-20    413489
335  2020-12-21    419112
336  2020-12-22    423718
337  2020-12-23    428193
338  2020-12-24    431241
339  2020-12-25    432962
340  2020-12-26    435319
341  2020-12-27    437435
342  2020-12-28    442470
343  2020-12-29    446933
344  2020-12-30    451142
345  2020-12-31    454197
346  2021-01-01    455881
347  2021-01-02    458608
348  2021-01-03    460833
349  2021-01-04    466120
350  2021-01-05    470298
351  2021-01-06    473652
352  2021-01-07    476829
353  2021-01-08    479884
354  2021-01-09    481570
355  2021-01-10    482866
356  2021-01-11    486333
357  2021-01-12    488944
358  2021-01-13    491435
359  2021-01-14    493674
360  2021-01-15    495832
361  2021-01-16    497169
362  2021-01-17    498063
363  2021-01-18    501006
364  2021-01-19    503340
365  2021-01-20    505618
366  2021-01-21    507675
367  2021-01-22    509716
368  2021-01-23    510817
369  2021-01-24    511548
370  2021-01-25    514040
371  2021-01-26    515878
372  2021-01-27    517753
373  2021-01-28    519534
374  2021-01-29    521301
375  2021-01-30    522321
376  2021-01-31    523064
377  2021-02-01    525229
378  2021-02-02    526881
379  2021-02-03    528614
380  2021-02-04    530055
381  2021-02-05    531561
382  2021-02-06    532448
383  2021-02-07    533121
384  2021-02-08    534872
385  2021-02-09    536222
386  2021-02-10    537481
387  2021-02-11    538651
388  2021-02-12    539781
389  2021-02-13    540500
390  2021-02-14    540985
391  2021-02-15    542290
392  2021-02-16    543427
393  2021-02-17    544642
394  2021-02-18    545651
395  2021-02-19    546745
396  2021-02-20    547436
397  2021-02-21    547964
398  2021-02-22    549371
399  2021-02-23    550628
400  2021-02-24    551705
401  2021-02-25    552824
402  2021-02-26    553904
403  2021-02-27    554627
404  2021-02-28    555174
405  2021-03-01    556593
406  2021-03-02    557823
407  2021-03-03    559063
408  2021-03-04    560128
409  2021-03-05    561407
410  2021-03-06    562173
411  2021-03-07    562755
412  2021-03-08    564457
413  2021-03-09    565845
414  2021-03-10    567230
415  2021-03-11    568570
416  2021-03-12    569912
417  2021-03-13    570835
418  2021-03-14    571538
419  2021-03-15    573461
420  2021-03-16    575115
421  2021-03-17    576940
422  2021-03-18    578587
423  2021-03-19    580132
424  2021-03-20    581244
425  2021-03-21    582032
426  2021-03-22    584295
427  2021-03-23    586289
428  2021-03-24    588220
429  2021-03-25    590362
430  2021-03-26    592361
431  2021-03-27    593568
432  2021-03-28    594562
433  2021-03-29    597094
434  2021-03-30    599187
435  2021-03-31    601319
436  2021-04-01    603304
437  2021-04-02    604509
438  2021-04-03    605826
439  2021-04-04    606848
440  2021-04-05    608138
441  2021-04-06    610828
442  2021-04-07    613215
443  2021-04-08    615705
444  2021-04-09    618121
445  2021-04-10    619803
446  2021-04-11    620984
447  2021-04-12    623726
448  2021-04-13    626101
449  2021-04-14    628315
450  2021-04-15    630462
451  2021-04-16    632583
452  2021-04-17    634013
453  2021-04-18    635061
454  2021-04-19    637805
455  2021-04-20    640196
456  2021-04-21    642505
457  2021-04-22    644770
458  2021-04-23    646935
459  2021-04-24    648390
460  2021-04-25    649578
461  2021-04-26    652019
462  2021-04-27    654057
463  2021-04-28    656066
464  2021-04-29    657777
465  2021-04-30    659454
466  2021-05-01    660613
467  2021-05-02    661445
468  2021-05-03    663462
469  2021-05-04    665197
470  2021-05-05    666845
471  2021-05-06    668329
472  2021-05-07    669902
473  2021-05-08    670963
474  2021-05-09    671742
475  2021-05-10    673461
476  2021-05-11    674844
477  2021-05-12    676285
478  2021-05-13    676957
479  2021-05-14    678126
480  2021-05-15    678928
481  2021-05-16    679581
482  2021-05-17    681252
483  2021-05-18    682596
484  2021-05-19    683808
485  2021-05-20    684848
486  2021-05-21    685832
487  2021-05-22    686455
488  2021-05-23    686925
489  2021-05-24    687506
490  2021-05-25    688654
491  2021-05-26    689577
492  2021-05-27    690426
493  2021-05-28    691177
494  2021-05-29    691677
495  2021-05-30    692034
496  2021-05-31    692888
497  2021-06-01    693540
498  2021-06-02    694091
499  2021-06-03    694559
500  2021-06-04    695050
501  2021-06-05    695363
502  2021-06-06    695550
503  2021-06-07    696134
504  2021-06-08    696551
505  2021-06-09    696926
506  2021-06-10    697230
507  2021-06-11    697548
508  2021-06-12    697742
509  2021-06-13    697863
510  2021-06-14    698191
511  2021-06-15    698424
512  2021-06-16    698615
513  2021-06-17    698771
514  2021-06-18    698920
515  2021-06-19    699023
516  2021-06-20    699101
517  2021-06-21    699266
518  2021-06-22    699388
519  2021-06-23    699515
520  2021-06-24    699628
521  2021-06-25    699726
522  2021-06-26    699786
523  2021-06-27    699845
524  2021-06-28    699995
525  2021-06-29    700110
526  2021-06-30    700269
527  2021-07-01    700438
528  2021-07-02    700603
529  2021-07-03    700732
530  2021-07-04    700871
531  2021-07-05    701181
532  2021-07-06    701470
533  2021-07-07    701790
534  2021-07-08    702106
535  2021-07-09    702433
536  2021-07-10    702688
537  2021-07-11    702918
538  2021-07-12    703489
539  2021-07-13    704009
540  2021-07-14    704599
541  2021-07-15    705222
542  2021-07-16    705867
543  2021-07-17    706321
544  2021-07-18    706716
545  2021-07-19    707573
546  2021-07-20    708310
547  2021-07-21    709184
548  2021-07-22    709908
549  2021-07-23    710651
550  2021-07-24    711189
551  2021-07-25    711669
552  2021-07-26    712620
553  2021-07-27    713360
554  2021-07-28    714145
555  2021-07-29    714927
556  2021-07-30    715730
557  2021-07-31    716386
558  2021-08-01    716921
559  2021-08-02    718155
560  2021-08-03    719145
561  2021-08-04    720215
562  2021-08-05    721335
563  2021-08-06    722488
564  2021-08-07    723534
565  2021-08-08    724543
566  2021-08-09    726863
567  2021-08-10    728893
568  2021-08-11    730978
569  2021-08-12    733145
570  2021-08-13    735353
571  2021-08-14    737032
572  2021-08-15    738683
573  2021-08-16    742634
574  2021-08-17    745472
575  2021-08-18    748182
576  2021-08-19    750694
577  2021-08-20    753254
578  2021-08-21    755002
579  2021-08-22    756635
580  2021-08-23    760486
581  2021-08-24    763402
582  2021-08-25    766167
583  2021-08-26    768861
584  2021-08-27    771457
585  2021-08-28    773134
586  2021-08-29    774519
587  2021-08-30    778081
588  2021-08-31    781055
589  2021-09-01    783997
590  2021-09-02    787048
591  2021-09-03    790123
592  2021-09-04    791883
593  2021-09-05    793282
594  2021-09-06    797277
595  2021-09-07    800321
596  2021-09-08    803381
597  2021-09-09    806150
598  2021-09-10    808772
599  2021-09-11    810323
600  2021-09-12    811429
601  2021-09-13    814193
602  2021-09-14    816489
603  2021-09-15    818731
604  2021-09-16    820729
605  2021-09-17    822420
606  2021-09-18    823447
607  2021-09-19    824149
608  2021-09-20    826029
609  2021-09-21    827732
610  2021-09-22    829263
611  2021-09-23    830641
612  2021-09-24    831984
613  2021-09-25    832870
614  2021-09-26    833477
615  2021-09-27    835138
616  2021-09-28    836346
617  2021-09-29    837533
618  2021-09-30    838727
619  2021-10-01    839865
620  2021-10-02    840651
621  2021-10-03    841124
622  2021-10-04    842403
623  2021-10-05    843437
624  2021-10-06    844446
625  2021-10-07    845456
626  2021-10-08    846376
627  2021-10-09    847014
628  2021-10-10    847495
629  2021-10-11    848667
630  2021-10-12    849631
631  2021-10-13    850594
632  2021-10-14    851551
633  2021-10-15    852536
634  2021-10-16    853236
635  2021-10-17    853916
636  2021-10-18    855487
637  2021-10-19    856776
638  2021-10-20    858206
639  2021-10-21    859579
640  2021-10-22    860931
641  2021-10-23    861885
642  2021-10-24    862612
643  2021-10-25    864636
644  2021-10-26    866335
645  2021-10-27    868142
646  2021-10-28    869945
647  2021-10-29    871809
648  2021-10-30    873068
649  2021-10-31    874149
650  2021-11-01    876492
651  2021-11-02    879473
652  2021-11-03    882309
653  2021-11-04    885151
654  2021-11-05    887906
655  2021-11-06    889837
656  2021-11-07    891374
657  2021-11-08    895578
658  2021-11-09    899316
659  2021-11-10    903285
660  2021-11-11    907237
661  2021-11-12    911381
662  2021-11-13    914090
663  2021-11-14    916255
664  2021-11-15    922388
665  2021-11-16    928014
666  2021-11-17    934195
667  2021-11-18    940249
668  2021-11-19    946443
669  2021-11-20    950580
670  2021-11-21    953983
671  2021-11-22    963147
672  2021-11-23    971150
673  2021-11-24    979619
674  2021-11-25    987606
675  2021-11-26    995573
676  2021-11-27   1000835
677  2021-11-28   1005077
678  2021-11-29   1016577
679  2021-11-30   1026576
680  2021-12-01   1036991
681  2021-12-02   1047144
682  2021-12-03   1056945
683  2021-12-04   1063202
684  2021-12-05   1068140
685  2021-12-06   1080802
686  2021-12-07   1091837
687  2021-12-08   1101788
688  2021-12-09   1112208
689  2021-12-10   1121964
690  2021-12-11   1128053
691  2021-12-12   1132448
692  2021-12-13   1144877
693  2021-12-14   1155403
694  2021-12-15   1164935
695  2021-12-16   1174078
696  2021-12-17   1182814
697  2021-12-18   1188690
698  2021-12-19   1193023
699  2021-12-20   1204492
700  2021-12-21   1215268
701  2021-12-22   1226276
702  2021-12-23   1237542
703  2021-12-24   1247174
704  2021-12-25   1253630
705  2021-12-26   1261001
706  2021-12-27   1278426
707  2021-12-28   1296094
708  2021-12-29   1315086
709  2021-12-30   1334832
710  2021-12-31   1350965
711  2022-01-01   1360363
712  2022-01-02   1373173
713  2022-01-03   1405443
714  2022-01-04   1438808
715  2022-01-05   1469458
716  2022-01-06   1495587
717  2022-01-07   1522852
718  2022-01-08   1540630
719  2022-01-09   1554015
720  2022-01-10   1588544
721  2022-01-11   1619464
722  2022-01-12   1649075
723  2022-01-13   1677694
724  2022-01-14   1705318
725  2022-01-15   1724607
726  2022-01-16   1739364
727  2022-01-17   1781159
728  2022-01-18   1820470
729  2022-01-19   1859564
730  2022-01-20   1897222
731  2022-01-21   1933859
732  2022-01-22   1958490
733  2022-01-23   1976825
734  2022-01-24   2025157
735  2022-01-25   2069280
736  2022-01-26   2110672
737  2022-01-27   2148381
738  2022-01-28   2184000
739  2022-01-29   2207279
740  2022-01-30   2225298
741  2022-01-31   2268530
742  2022-02-01   2305075
743  2022-02-02   2339133
744  2022-02-03   2368045
745  2022-02-04   2394373
746  2022-02-05   2412551
747  2022-02-06   2426968
748  2022-02-07   2459136
749  2022-02-08   2486108
750  2022-02-09   2511579
751  2022-02-10   2533812
752  2022-02-11   2553499
753  2022-02-12   2567488
754  2022-02-13   2578229
755  2022-02-14   2602892
756  2022-02-15   2623416
757  2022-02-16   2642512
758  2022-02-17   2656779
759  2022-02-18   2670221
760  2022-02-19   2679997
761  2022-02-20   2688043
762  2022-02-21   2707646
763  2022-02-22   2724639
764  2022-02-23   2742220
765  2022-02-24   2757990
766  2022-02-25   2772844
767  2022-02-26   2783921
768  2022-02-27   2793212
769  2022-02-28   2817029
770  2022-03-01   2839462
771  2022-03-02   2864293
772  2022-03-03   2888904
773  2022-03-04   2912505
774  2022-03-05   2930160
775  2022-03-06   2944539
776  2022-03-07   2979430
777  2022-03-08   3010203
778  2022-03-09   3041142
779  2022-03-10   3070532
780  2022-03-11   3098243
781  2022-03-12   3118522
782  2022-03-13   3134606
783  2022-03-14   3174978
784  2022-03-15   3208189
785  2022-03-16   3238832
786  2022-03-17   3265409
787  2022-03-18   3289152
788  2022-03-19   3305268
789  2022-03-20   3317464
790  2022-03-21   3348220
791  2022-03-22   3371243
792  2022-03-23   3390965
793  2022-03-24   3407762
794  2022-03-25   3422322
795  2022-03-26   3432136
796  2022-03-27   3439367
797  2022-03-28   3458051
798  2022-03-29   3471855
799  2022-03-30   3484001
800  2022-03-31   3492327
801  2022-04-01   3498709
802  2022-04-02   3502903
803  2022-04-03   3505953
804  2022-04-04   3514556
805  2022-04-05   3521571
806  2022-04-06   3528781
807  2022-04-07   3534923
808  2022-04-08   3540639
809  2022-04-09   3544305
810  2022-04-10   3546878
811  2022-04-11   3554099
812  2022-04-12   3559697
813  2022-04-13   3565229
814  2022-04-14   3569945
815  2022-04-15   3572716
816  2022-04-16   3576026
817  2022-04-17   3577901
818  2022-04-18   3580396
819  2022-04-19   3585851
820  2022-04-20   3589901
821  2022-04-21   3593527
822  2022-04-22   3596687
823  2022-04-23   3598709
824  2022-04-24   3600241
825  2022-04-25   3604123
826  2022-04-26   3606918
827  2022-04-27   3609651
828  2022-04-28   3612185
829  2022-04-29   3614482
830  2022-04-30   3615986
831  2022-05-01   3617009
832  2022-05-02   3620003
833  2022-05-03   3622151
834  2022-05-04   3624301
835  2022-05-05   3626219
836  2022-05-06   3628128
837  2022-05-07   3629310
838  2022-05-08   3630144
839  2022-05-09   3632642
840  2022-05-10   3634609
841  2022-05-11   3636455
842  2022-05-12   3638163
843  2022-05-13   3639708
844  2022-05-14   3640645
845  2022-05-15   3641341
846  2022-05-16   3643280
847  2022-05-17   3644756
848  2022-05-18   3646182
849  2022-05-19   3647451
850  2022-05-20   3648531
851  2022-05-21   3649304
852  2022-05-22   3649896
853  2022-05-23   3651355
854  2022-05-24   3652468
855  2022-05-25   3653552
856  2022-05-26   3654055
857  2022-05-27   3655218
858  2022-05-28   3655924
859  2022-05-29   3656584
860  2022-05-30   3658364
861  2022-05-31   3659945
862  2022-06-01   3661543
863  2022-06-02   3663128
864  2022-06-03   3664721
865  2022-06-04   3665867
866  2022-06-05   3666781
867  2022-06-06   3668071
868  2022-06-07   3671131
869  2022-06-08   3673779
870  2022-06-09   3676262
871  2022-06-10   3678716
872  2022-06-11   3680412
873  2022-06-12   3681706
874  2022-06-13   3685819
875  2022-06-14   3689473
876  2022-06-15   3693369
877  2022-06-16   3696907
878  2022-06-17   3700841
879  2022-06-18   3703418
880  2022-06-19   3705160
881  2022-06-20   3711120
882  2022-06-21   3716198
883  2022-06-22   3721451
884  2022-06-23   3726474
885  2022-06-24   3731097
886  2022-06-25   3734386
887  2022-06-26   3736723
888  2022-06-27   3745008
889  2022-06-28   3752253
890  2022-06-29   3759489
891  2022-06-30   3766645
892  2022-07-01   3773495
893  2022-07-02   3777997
894  2022-07-03   3780918
895  2022-07-04   3792146
896  2022-07-05   3801755
897  2022-07-06   3811553
898  2022-07-07   3820199
899  2022-07-08   3828116
900  2022-07-09   3833102
901  2022-07-10   3836408
902  2022-07-11   3847384
903  2022-07-12   3856855
904  2022-07-13   3865504
905  2022-07-14   3873216
906  2022-07-15   3880279
907  2022-07-16   3884586
908  2022-07-17   3887444
909  2022-07-18   3897073
910  2022-07-19   3904153
911  2022-07-20   3910489
912  2022-07-21   3916205
913  2022-07-22   3921499
914  2022-07-23   3924699
915  2022-07-24   3926683
916  2022-07-25   3933314
917  2022-07-26   3938153
918  2022-07-27   3942382
919  2022-07-28   3946037
920  2022-07-29   3949569
921  2022-07-30   3951890
922  2022-07-31   3953429
923  2022-08-01   3955237
924  2022-08-02   3960093
925  2022-08-03   3963751
926  2022-08-04   3966862
927  2022-08-05   3969837
928  2022-08-06   3971923
929  2022-08-07   3973315
930  2022-08-08   3977650
931  2022-08-09   3981009
932  2022-08-10   3984117
933  2022-08-11   3986849
934  2022-08-12   3989521
935  2022-08-13   3991064
936  2022-08-14   3992216
937  2022-08-15   3995528
938  2022-08-16   3998688
939  2022-08-17   4001556
940  2022-08-18   4004090
941  2022-08-19   4006425
942  2022-08-20   4007864
943  2022-08-21   4008871
944  2022-08-22   4012279
945  2022-08-23   4014875
946  2022-08-24   4017376
947  2022-08-25   4019654
948  2022-08-26   4021821
949  2022-08-27   4023186
950  2022-08-28   4024089
951  2022-08-29   4027438
952  2022-08-30   4030156
953  2022-08-31   4032739
954  2022-09-01   4034839
955  2022-09-02   4036876
956  2022-09-03   4038069
957  2022-09-04   4038850
958  2022-09-05   4041858
959  2022-09-06   4044247
960  2022-09-07   4046623
961  2022-09-08   4048623
962  2022-09-09   4050601
963  2022-09-10   4051766
964  2022-09-11   4052603
965  2022-09-12   4055596
966  2022-09-13   4057970
967  2022-09-14   4060341
968  2022-09-15   4062497
969  2022-09-16   4064498
970  2022-09-17   4065685
971  2022-09-18   4066442
972  2022-09-19   4069123
973  2022-09-20   4071620
974  2022-09-21   4074190
975  2022-09-22   4076669
976  2022-09-23   4079219
977  2022-09-24   4080851
978  2022-09-25   4082039
979  2022-09-26   4086466
980  2022-09-27   4090309
981  2022-09-28   4094100
982  2022-09-29   4097849
983  2022-09-30   4101567
984  2022-10-01   4103894
985  2022-10-02   4105520
986  2022-10-03   4111760
987  2022-10-04   4117277
988  2022-10-05   4123130
989  2022-10-06   4128739
990  2022-10-07   4134421
991  2022-10-08   4137824
992  2022-10-09   4139883
993  2022-10-10   4148095
994  2022-10-11   4154626
995  2022-10-12   4160989
996  2022-10-13   4166549
997  2022-10-14   4171903
998  2022-10-15   4175057
999  2022-10-16   4176964
1000 2022-10-17   4184504
1001 2022-10-18   4190229
1002 2022-10-19   4195507
1003 2022-10-20   4200028
1004 2022-10-21   4204270
1005 2022-10-22   4206711
1006 2022-10-23   4208264
1007 2022-10-24   4213940
1008 2022-10-25   4218441
1009 2022-10-26   4222493
1010 2022-10-27   4226075
1011 2022-10-28   4229456
1012 2022-10-29   4231441
1013 2022-10-30   4232752
1014 2022-10-31   4237494
1015 2022-11-01   4240661
1016 2022-11-02   4244691
1017 2022-11-03   4247955
1018 2022-11-04   4251057
1019 2022-11-05   4252807
1020 2022-11-06   4253908
1021 2022-11-07   4258323
1022 2022-11-08   4261937
1023 2022-11-09   4265364
1024 2022-11-10   4268482
1025 2022-11-11   4271599
1026 2022-11-12   4273360
1027 2022-11-13   4274542
1028 2022-11-14   4279164
1029 2022-11-15   4283007
1030 2022-11-16   4286636
1031 2022-11-17   4289973
1032 2022-11-18   4292879
1033 2022-11-19   4294703
1034 2022-11-20   4295852
1035 2022-11-21   4300266
1036 2022-11-22   4303705
1037 2022-11-23   4307108
1038 2022-11-24   4309975
1039 2022-11-25   4312678
1040 2022-11-26   4314234
1041 2022-11-27   4315217
1042 2022-11-28   4319216
1043 2022-11-29   4322481
1044 2022-11-30   4325759
1045 2022-12-01   4328551
1046 2022-12-02   4331311
1047 2022-12-03   4332868
1048 2022-12-04   4333803
1049 2022-12-05   4337605
1050 2022-12-06   4340817
1051 2022-12-07   4343655
1052 2022-12-08   4346021
1053 2022-12-09   4348319
1054 2022-12-10   4349583
1055 2022-12-11   4350394
1056 2022-12-12   4353564
1057 2022-12-13   4356006
1058 2022-12-14   4358271
1059 2022-12-15   4360415
1060 2022-12-16   4362328
1061 2022-12-17   4363360
1062 2022-12-18   4364049
1063 2022-12-19   4366599
1064 2022-12-20   4368798
1065 2022-12-21   4370799
1066 2022-12-22   4372595
1067 2022-12-23   4374210
1068 2022-12-24   4375085
1069 2022-12-25   4375567
1070 2022-12-26   4376250
1071 2022-12-27   4377532
1072 2022-12-28   4378584
1073 2022-12-29   4379585
1074 2022-12-30   4380463
1075 2022-12-31   4381033
1076 2023-01-01   4381252
1077 2023-01-02        NA
1078 2023-01-03        NA
1079 2023-01-04        NA
1080 2023-01-05        NA
1081 2023-01-06        NA
1082 2023-01-07        NA
1083 2023-01-08   4383454
1084 2023-01-09        NA
1085 2023-01-10        NA
1086 2023-01-11        NA
1087 2023-01-12        NA
1088 2023-01-13        NA
1089 2023-01-14        NA
1090 2023-01-15   4384791
1091 2023-01-16        NA
1092 2023-01-17        NA
1093 2023-01-18        NA
1094 2023-01-19        NA
1095 2023-01-20        NA
1096 2023-01-21        NA
1097 2023-01-22   4385717
1098 2023-01-23        NA
1099 2023-01-24        NA
1100 2023-01-25        NA
1101 2023-01-26        NA
1102 2023-01-27        NA
1103 2023-01-28        NA
1104 2023-01-29   4386564
1105 2023-01-30        NA
1106 2023-01-31        NA
1107 2023-02-01        NA
1108 2023-02-02        NA
1109 2023-02-03        NA
1110 2023-02-04        NA
1111 2023-02-05   4387476
1112 2023-02-06        NA
1113 2023-02-07        NA
1114 2023-02-08        NA
1115 2023-02-09        NA
1116 2023-02-10        NA
1117 2023-02-11        NA
1118 2023-02-12   4388447
1119 2023-02-13        NA
1120 2023-02-14        NA
1121 2023-02-15        NA
1122 2023-02-16        NA
1123 2023-02-17        NA
1124 2023-02-18        NA
1125 2023-02-19   4389665
1126 2023-02-20        NA
1127 2023-02-21        NA
1128 2023-02-22        NA
1129 2023-02-23        NA
1130 2023-02-24        NA
1131 2023-02-25        NA
1132 2023-02-26   4391025
1133 2023-02-27        NA
1134 2023-02-28        NA
1135 2023-03-01        NA
1136 2023-03-02        NA
1137 2023-03-03        NA
1138 2023-03-04        NA
1139 2023-03-05   4392515
1140 2023-03-06        NA
1141 2023-03-07        NA
1142 2023-03-08        NA
1143 2023-03-09        NA
1144 2023-03-10        NA
1145 2023-03-11        NA
1146 2023-03-12   4394313
1147 2023-03-13        NA
1148 2023-03-14        NA
1149 2023-03-15        NA
1150 2023-03-16        NA
1151 2023-03-17        NA
1152 2023-03-18        NA
1153 2023-03-19   4396099
1154 2023-03-20        NA
1155 2023-03-21        NA
1156 2023-03-22        NA
1157 2023-03-23        NA
1158 2023-03-24        NA
1159 2023-03-25        NA
1160 2023-03-26   4397683
1161 2023-03-27        NA
1162 2023-03-28        NA
1163 2023-03-29        NA
1164 2023-03-30        NA
1165 2023-03-31        NA
1166 2023-04-01        NA
1167 2023-04-02   4399119
1168 2023-04-03        NA
1169 2023-04-04        NA
1170 2023-04-05        NA
1171 2023-04-06        NA
1172 2023-04-07        NA
1173 2023-04-08        NA
1174 2023-04-09   4400299
1175 2023-04-10        NA
1176 2023-04-11        NA
1177 2023-04-12        NA
1178 2023-04-13        NA
1179 2023-04-14        NA
1180 2023-04-15        NA
1181 2023-04-16   4401413
1182 2023-04-17        NA
1183 2023-04-18        NA
1184 2023-04-19        NA
1185 2023-04-20        NA
1186 2023-04-21        NA
1187 2023-04-22        NA
1188 2023-04-23   4402475
1189 2023-04-24        NA
1190 2023-04-25        NA
1191 2023-04-26        NA
1192 2023-04-27        NA
1193 2023-04-28        NA
1194 2023-04-29        NA
1195 2023-04-30   4403491
1196 2023-05-01        NA
1197 2023-05-07   4404327

Classwork: Plot No. 1

Just to check that we have the data we need, let’s plot the number of confirmed cases over time.

This looks pretty good! However, we have to deal with a little bit of missing data.

First, we can see that towards the end of the pandemic, the number of confirmed cases was only reported about once a week. This means that we have to fill in the missing values with the last known value.

R
1
ch_covid_data |> tail(20)
date confirmed
1178 2023-04-13        NA
1179 2023-04-14        NA
1180 2023-04-15        NA
1181 2023-04-16   4401413
1182 2023-04-17        NA
1183 2023-04-18        NA
1184 2023-04-19        NA
1185 2023-04-20        NA
1186 2023-04-21        NA
1187 2023-04-22        NA
1188 2023-04-23   4402475
1189 2023-04-24        NA
1190 2023-04-25        NA
1191 2023-04-26        NA
1192 2023-04-27        NA
1193 2023-04-28        NA
1194 2023-04-29        NA
1195 2023-04-30   4403491
1196 2023-05-01        NA
1197 2023-05-07   4404327

Replacing missing values with `fill()`

We can use the fill() function to fill in the missing values. This takes the last known value above and fills it in below, until the next known value is found.

Here, we’re looking at the total confirmed cases, so filling in the missing values with the last known value is a good idea. If these were new cases, it would be a different story, we’d probably want to fill in the missing values with 0.

R
1
ch_covid_data <- ch_covid_data |>
2
  fill(confirmed, .direction = "down") 
3
4
ch_covid_data |> tail(20)
date confirmed
1178 2023-04-13   4400299
1179 2023-04-14   4400299
1180 2023-04-15   4400299
1181 2023-04-16   4401413
1182 2023-04-17   4401413
1183 2023-04-18   4401413
1184 2023-04-19   4401413
1185 2023-04-20   4401413
1186 2023-04-21   4401413
1187 2023-04-22   4401413
1188 2023-04-23   4402475
1189 2023-04-24   4402475
1190 2023-04-25   4402475
1191 2023-04-26   4402475
1192 2023-04-27   4402475
1193 2023-04-28   4402475
1194 2023-04-29   4402475
1195 2023-04-30   4403491
1196 2023-05-01   4403491
1197 2023-05-07   4404327

However, there are still some missing values at the beginning of the data set, before COVID was seen in Switzerland.

R
1
ch_covid_data |> head(20)
date confirmed
1  2020-01-22        NA
2  2020-01-23        NA
3  2020-01-24        NA
4  2020-01-25        NA
5  2020-01-26        NA
6  2020-01-27        NA
7  2020-01-28        NA
8  2020-01-29        NA
9  2020-01-30        NA
10 2020-01-31        NA
11 2020-02-01        NA
12 2020-02-02        NA
13 2020-02-03        NA
14 2020-02-04        NA
15 2020-02-05        NA
16 2020-02-06        NA
17 2020-02-07        NA
18 2020-02-08        NA
19 2020-02-09        NA
20 2020-02-10        NA

We can simply replace these missing values with 0, using replace_na().

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(confirmed = replace_na(confirmed, 0))
3
4
ch_covid_data |> head(20)
date confirmed
1  2020-01-22         0
2  2020-01-23         0
3  2020-01-24         0
4  2020-01-25         0
5  2020-01-26         0
6  2020-01-27         0
7  2020-01-28         0
8  2020-01-29         0
9  2020-01-30         0
10 2020-01-31         0
11 2020-02-01         0
12 2020-02-02         0
13 2020-02-03         0
14 2020-02-04         0
15 2020-02-05         0
16 2020-02-06         0
17 2020-02-07         0
18 2020-02-08         0
19 2020-02-09         0
20 2020-02-10         0

Let’s plot the data again to confirm that we didn’t mess anything up.

Classwork: Plot No. 2

Yep! This looks about the same as the last one, but now we have no missing values.

Calculating new daily cases

However, this data set has a slight problem: it represents the total number of confirmed cases. We want to plot the number of new cases each day. We can calculate this by taking the difference between the number of confirmed cases today and the number of confirmed cases yesterday.

To get yesterday’s number of confirmed cases, we can use the lag() function. This function shifts the column up by one row, so that the value in the first row is now in the second row, the value in the second row is now in the third row, and so on.

We’ll be left with 1 NA value in the first row, as there is no value before the first row. We can replace this with 0 using the default argument.

(The opposite of lag() is lead(), which shifts the column down by one row.)

To check the data, I’ve used tail() and head() to look a little bit further down the data set.

R
1
ch_covid_data <- ch_covid_data |> 
2
    mutate(yesterday_cases = lag(confirmed, 1, default = 0))
3
4
ch_covid_data |> tail(-30) |> head(20)
date confirmed yesterday_cases
31 2020-02-21         0               0
32 2020-02-22         0               0
33 2020-02-23         0               0
34 2020-02-24         1               0
35 2020-02-25         2               1
36 2020-02-26        12               2
37 2020-02-27        22              12
38 2020-02-28        32              22
39 2020-02-29        45              32
40 2020-03-01        57              45
41 2020-03-02        87              57
42 2020-03-03       120              87
43 2020-03-04       181             120
44 2020-03-05       243             181
45 2020-03-06       316             243
46 2020-03-07       365             316
47 2020-03-08       434             365
48 2020-03-09       625             434
49 2020-03-10       835             625
50 2020-03-11      1168             835

Now, we can calculate the number of new cases each day by taking the difference between the number of confirmed cases today and the number of confirmed cases yesterday.

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(daily_cases = confirmed - yesterday_cases)
3
4
ch_covid_data |> tail(-30) |> head(20)
date confirmed yesterday_cases daily_cases
31 2020-02-21         0               0           0
32 2020-02-22         0               0           0
33 2020-02-23         0               0           0
34 2020-02-24         1               0           1
35 2020-02-25         2               1           1
36 2020-02-26        12               2          10
37 2020-02-27        22              12          10
38 2020-02-28        32              22          10
39 2020-02-29        45              32          13
40 2020-03-01        57              45          12
41 2020-03-02        87              57          30
42 2020-03-03       120              87          33
43 2020-03-04       181             120          61
44 2020-03-05       243             181          62
45 2020-03-06       316             243          73
46 2020-03-07       365             316          49
47 2020-03-08       434             365          69
48 2020-03-09       625             434         191
49 2020-03-10       835             625         210
50 2020-03-11      1168             835         333

Now, let’s make a line chart to visualize the number of new cases each day.

Classwork: Plot No. 3

Smoothing out the data

We can see another problem from this: the number of new cases each day is very noisy. Sometimes, new cases are reported in batches, often after a weekend. This is going to make it hard to see any trends in the data.

We can fix this by taking a rolling average of the number of new cases each day. This will smooth out the data and make it easier to see any trends.

The easiest way to do this is to use the rollmean() function from the zoo package. You might need to install the package first.

R
1
install.packages("zoo")

In the rollmean() function, k is the number of days to average over. Here, I’ve chosen 14 days, meaning that the new value will be the average of the 7 days before and after the analyzed row. This should smooth out the data enough to make it easier to see any trends.

R
1
library(zoo)
R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(year_day = yday(date)) |> 
3
  mutate(daily_cases_smoothed = rollmean(daily_cases, k=14, fill=0)) 
4
5
ch_covid_data |> 
6
  tail(-30) |> 
7
  head(20) |> 
8
  select(date, daily_cases, daily_cases_smoothed)
date daily_cases daily_cases_smoothed
31 2020-02-21           0             2.285714
32 2020-02-22           0             3.214286
33 2020-02-23           0             4.071429
34 2020-02-24           1             6.214286
35 2020-02-25           1             8.571429
36 2020-02-26          10            12.928571
37 2020-02-27          10            17.357143
38 2020-02-28          10            22.571429
39 2020-02-29          13            26.071429
40 2020-03-01          12            31.000000
41 2020-03-02          30            44.571429
42 2020-03-03          33            59.500000
43 2020-03-04          61            82.571429
44 2020-03-05          62           107.357143
45 2020-03-06          73           137.428571
46 2020-03-07          49           166.285714
47 2020-03-08          69           188.714286
48 2020-03-09         191           262.285714
49 2020-03-10         210           337.214286
50 2020-03-11         333           419.000000

To make sure this works, let’s plot the data again.

Classwork: Plot No. 4

This looks much better! The smoothed line is much easier to read than the noisy line.

Creating a ribbon plot

Our next step is to convert this from a line chart to a spiral chart. We’ll start by creating a ribbon plot, where the area between two lines is filled in. Using geom_ribbon(), we can fill in the area between two lines.

geom_ribbon() takes two arguments, ymin and ymax, which are the lower and upper bounds of the ribbon.

Here, we’ll set ymin to -daily_cases_smoothed and ymax to daily_cases_smoothed. This will create a ribbon plot where the area between -daily_cases_smoothed and daily_cases_smoothed is filled in.

R
1
ch_covid_data |>
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = date,
6
      ymin = -daily_cases_smoothed,
7
      ymax = daily_cases_smoothed
8
    ),
9
    fill = "#eeaaaa",
10
    color = "#662222"
11
  )

Using polar coordinates

This is the basic structure of the spiral chart. Out next step is to wrap it around a circle.

This can be done with coord_polar(), which uses polar coordinates instead of Cartesian coordinates. This will wrap the chart around a circle, creating a spiral chart.

R
1
ch_covid_data |>
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = date,
6
      ymin = -daily_cases_smoothed,
7
      ymax = daily_cases_smoothed
8
    ),
9
    fill = "#eeaaaa",
10
    color = "#662222"
11
  ) + 
12
  coord_polar()

What’s going on here? The chart still has X and Y coordinates, but now they’re arranged so that Y=0 is at the center of the circle, and the X axis wraps around the circle. This is similar to how coordinates work around a globe, where the south pole is at 0 degrees north, and the east-west axis wraps around the globe.

Transforming a circle to a spiral

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Let’s think about polar coordinates for a moment. In polar coordinates, the X axis is the angle around the circle, and the Y axis is the distance from the center of the circle. This means that the further out you go, the higher the Y value.

So if we plotted something with, for example, the X axis as the day of the year and the Y axis as the year, we’d get concentric circles, with each year being a circle. We can see this in the two examples, with Cartesian and polar coordinates, below.

Now: this might take some work to wrap your mind around: if we keep the X axis as the day of the year, but change the Y axis to be the number of days since 2020, we get a spiral.This is because the Y axis is now the distance from the center of the circle, and the further out you go, the higher the Y value. When we get to the end of the year, we jump to the next year, and the spiral continues.

So to convert our circle plot to a spiral plot, we need to change the Y axis from the year to the number of days since 2020.

We’ll first make a new column, year_num that is the number of years since 2020. We’ll then make a new column, days_since_2020, that is the number of days since 2020. This is the number of years since 2020 times 365, plus the day of the year.

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(year_num = year(date) - 2020) |> 
3
  mutate(days_since_2020 = year_num * 365 + year_day)

Just to make sure everything is working, let’s just plot the center line of the spiral.

If we look closely, we can see a small issue! There’s a gap between the end of one year and the start of the next.

This is because 2020 was a leap year: it has 366 days. When we wrap back around, we’re missing a day.

We could potentially fix this with a whole bunch of math, but it’s much easier to just filter out the leap day. While this isn’t data-perfect solution, it’s good enough for this sort of visualization.

We have to do this before we calculate the days_since_2020 column, as we’re using the year_day column in the calculation.

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(year_num = year(date) - 2020) |> 
3
  filter(year_day <= 365) |>
4
  mutate(days_since_2020 = year_num * 365 + year_day)

We’ll also convert the year_day column to a date, so that ggplot knows how to label it: seeing the month of the year is much more informative than just a random number. Note that we have to specify a year for the origin, but this is arbitrary; we could have chosen any year.

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(year_day = as_date(year_day, origin = "1970-01-01")) 

Now when we plot the center line of the spiral, we can see that the gap is gone.

This is why it’s good to visualize your data as you go along: it’s much easier to spot mistakes when you can see them.

Making the spiral chart the easy way

A somewhat cheesy way to complete this spiral would then simply be to alter the size of the line based on the number of new cases each day. This will make the spiral look like it’s “breathing”, with the line getting thicker when there are more cases.

We can alter the relative size of the line using scale_size_continuous(). This will make the line thicker or thinner based on the value of daily_cases_smoothed.

This would be a valid way to do it! However, it doesn’t look quite as nice as the original spiral chart. The original chart has a ribbon plot, where the area between two lines is filled in. This makes it look like a spiral, rather than just a weird leech sitting in the water.

Making the spiral chart the hard way

So, we need to go back to our ribbon plot we did earlier in class, and think about ways to combine this with the spiral chart. In the ribbon plot, the Y axis is the total amount of daily cases, whereas in the spiral chart, the Y axis is the number of days since 2020.

This method was adapted from This page on Github, which was a much cleaner solution than the one I’d implemented on my own.

The number of daily cases is huge compared to the number of days since 2020. We need to rescale the number of daily cases to make it fit on the same scale as the number of days since 2020. We’re therefore going to re-scale the number of daily cases to be a smaller, arbitrary number.

First, we’ll re-scale the number of daily cases to be between 0 and 1, by dividing by the maximum number of daily cases. This will make the number of daily cases relative to the maximum number of daily cases.

We can use summary() to check that the maximum value of rescaled_daily_cases is 1.

R
1
ch_covid_data <- ch_covid_data |> 
2
  mutate(rescaled_daily_cases = daily_cases_smoothed / max(daily_cases_smoothed))
3
4
ch_covid_data |> select(rescaled_daily_cases, daily_cases_smoothed) |> summary()
rescaled_daily_cases daily_cases_smoothed
Min.   :0.000000     Min.   :    0.0     
1st Qu.:0.007261     1st Qu.:  252.8     
Median :0.044859     Median : 1561.6     
Mean   :0.105660     Mean   : 3678.3     
3rd Qu.:0.102114     3rd Qu.: 3554.8     
Max.   :1.000000     Max.   :34812.2     

Now, we can actually just add our two Y axes together, and we’ll get a spiral chart with a ribbon plot. It’s perfectly valid to do a little light calculation inside ggplot itself.

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases,
7
      ymax = days_since_2020 + rescaled_daily_cases,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num))

However, we still don’t see anything; the ribbon only goes between 0 and 1, which is too small to see. We need to re-scale the number of daily cases to be a larger number.

This requires a little experimentation, and in this case, you may wish to save our magic number as a variable, so that you can easily change it later. I can then play around with different numbers to see what looks good, and it will be automatically applied to different areas of the chart.

R
1
ribbon_multiplier = 300
2
3
ch_covid_data |> 
4
  ggplot() +
5
  geom_ribbon(
6
    aes(
7
      x = year_day,
8
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
9
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
10
      group = year_num
11
    ), fill = "#eeaaaa", color = "#662222") +
12
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num))

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
7
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num)) +
11
  coord_polar()

Cleaning up the chart

That’s basically it! We’ve rotated the data in our minds enough to form it into the shape of a spiral, having manipulated the data very far from the two columns we started with. Best of all, because we did this all through code, we could easily change the country, the smoothing, or the styling, and the chart would automatically update. Feel free to delete your code, go back to the beginning, and try it all again with a different country. You should get a different spiral chart, but with the same basic structure.

All that’s left to do is make it look a little cleaner and more presentable.

First, we notice that the line starts at the very center of the spiral, which gives us a bit of a strange appearance. We can fix this by changing limits of the Y axis to below 0; this will move the center of the spiral down a little bit.

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
7
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num)) +
11
  coord_polar() +
12
  lims(y = c(-300, 1300))

Next, I’ve changed our base theme to theme_minimal() and lightened the center line of the spiral a little bit using alpha = 0.2, which controls the transparency of the line.

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
7
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num), alpha =0.2) +
11
  coord_polar() +
12
  lims(y = c(-300, 1300)) +
13
  theme_minimal()

Third, dividing a year into eighths is a little cursed Because we changed the X axis to a date, we can use scale_x_date() to change the breaks to be every two months.

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
7
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num), alpha =0.2) +
11
  coord_polar() +
12
  lims(y = c(-300, 1300)) +
13
  theme_minimal() +
14
  scale_x_date(date_breaks = "2 months", date_labels = "%b")

Finally, we can remove the Y axis, the Y axis labels, the Y axis ticks, and the Y axis title. This will clean up the chart a little bit, as we don’t need the Y axis for a spiral chart. This leaves us with a similar appearance to the original spiral chart.

R
1
ch_covid_data |> 
2
  ggplot() +
3
  geom_ribbon(
4
    aes(
5
      x = year_day,
6
      ymin = days_since_2020 - rescaled_daily_cases * ribbon_multiplier,
7
      ymax = days_since_2020 + rescaled_daily_cases * ribbon_multiplier,
8
      group = year_num
9
    ), fill = "#eeaaaa", color = "#662222") +
10
  geom_line(aes(x = year_day, y = days_since_2020, group=year_num), alpha =0.2) +
11
  coord_polar() +
12
  lims(y = c(-300, 1300)) +
13
  theme_minimal() +
14
  scale_x_date(date_breaks = "2 months", date_labels = "%b") + 
15
  theme(legend.position = "none", 
16
        axis.text.y = element_blank(),
17
        axis.ticks.y = element_blank(),
18
        axis.title = element_blank(),
19
        panel.grid.major = element_blank()
20
  )

From here, we’d be just about ready to export the chart and share it with the world.

Just about as clean as the New York Times!


MJW