In recent weeks, COVID19 has spread through the Pennsylvania prison system at an alarming rate. Large-scale testing has revealed the true extent of this outbreak; In the last week alone, the Pennsylvania Department of Corrections reported multiple spikes in new cases in the incarcerated persons population. In addition to the rapid spread of COVID in PA prisons is another critical issue that we as data scientists find very troublesome. The Department of Corrections has shown gross negligence with the handling of sensitive coronavirus testing data.
Since the beginning of the pandemic, the DOC has provided our research team with a downloadable spreadsheet containing daily COVID19 test totals. The spreadsheet, which seems to be updated by hand and not clearly archived, has served as our primary source of information on the spread of the coronavirus in PA prisons. This small act of cooperation from the DOC was a positive advance in our tracking of the virus, however our analysis of this shared data reveals significant data mismanagement. What we have found particularly disturbing are the time lags between data becoming visible on the DOC Covid19 Dashboard and when the spreadsheet is available and the number of occasions on which there have been significant discrepancies between these two data sources.
As part of our work to track and understand the spread of and response to Covid19 in PA state correctional institutions, we created a data repository to archive and version the data obtained from the DOC. This will allow other concerned parties to work with the data. In addition, we set up a simple tracker to illustrate the number of cases of Covid19 contracted by incarcerated persons since March.
Our analysis stemming from our data repository and tracker indicates that the DOC likely maintains a rolling tally of tests and test results for all state prisons. That is to say, the case numbers are cumulative. Keeping track of testing results in this way means that these numbers should only ever go up or stay the same. If new positive tests are reported, the tally for new cases is added to. If none are found, there is no change.
On December 20 we had identified eight instances in which there had been an unexplained decrease in the number of tests and test results reported over a one-day period. Most recently 607 cases reported positive on December 17, became unaccounted for in the DOC data the next day. SCI Dallas, perhaps the most heavily impacted SCI in the state, saw their positive cases drop from 1,682 positive tests on the 17th to an unexplained 1,076 cases on the 18th. These numbers are not explained by the number of recovered cases the DOC reports on their dashboard, nor are they explained by the one COVID-related death reported by the DOC on December 21. On two prior occasions more than 300 test results have also gone unaccounted for.
On the evening of December 21, the DOC also updated their coronavirus dashboard with a shocking statistic: 23,279 new tests had been administered, with over 10,000 coming back positive. These frightening numbers implied an infection rate in the PA prison population of one in every four incarcerated persons having the virus. This data came as a surprise to our team, but more unsettling was how different the data was that was shared with us the following day. In the shared data, which should match that on the DOC dashboard, the number of tests and positive cases continued to go down. Again, we point out that is an impossible trend with cumulative total. As of the evening of December 22, the numbers reported on page 5 of the DOC dashboard are closer to those made available to us (55, 975 total tests administered). However, the test total for December 21 remains at 80,735. What accounts for these lost tests? Was there truly an alarming spike of thousands of new positive cases over the weekend, and if so why has the DOC changed their data once again?
The Pennsylvania DOC has provided no explanation for these errors, despite our multiple requests for greater transparency. Our continued monitoring of the Pennsylvania DOC’s reporting methods reveals that the DOC may be deliberately covering up evidence of their poor management of this crisis. The DOC’s own coronavirus dashboard is updated every evening at roughly 10:00 PM. despite multiple requests for the most up-to-date data, the DOC has continued to provide our team with only the hand-coded spreadsheet, which is updated ten to twelve hours after the dashboard and often lags several days. When easy solutions for data transparency have been proposed, the DOC has either ignored or directly declined our requests.
With sensitive information, like testing data from state correctional institutions, it is imperative that the methods used for data collection, recording and storage be made clear and understandable. When the DOC reports “recovered” cases, what does this mean? When the DOC removes information from their records, why is this done? These questions require documented explanations.
Why is this a problem? The numbers and tallies reported by the DOC are not simply abstractions. Every number reported is a human life under the care of the state. When critical information is mismanaged, or worse, intentionally white-washed, the value of these lives is diminished. For families hoping to understand what their loved ones in prison are experiencing during a rapidly worsening pandemic, errors in reporting can mean confusion and anxiety. For journalists and activists these errors represent negligence on the part of a government institution. In a press conference with Senator Sharif Street and Senator-elect Nikil Saval last week, Lorraine Haw, better known as Mrs. Dee Dee, described the sense of isolation and suffering felt by families cut off from their loved ones during the pandemic. Any amount of information is precious for these families, and any amount of misreporting is an injustice.
Whether intentional or not, the errors in this data have real impacts and require real solutions. Our data archive is a simple example of how such solutions might be implemented. For months we have requested greater transparency from the DOC with their data management practices, but have received none. The unwillingness of the DOC to provide accessible data and transparent documentation of their methods forces us to question the honesty with which they share any information.