It’s been over a week since this happened, but I keep getting asked about this, so I thought I’d provide my thoughts on the matter, as someone who has been scouring the GA DPH site on a daily basis.
Yes, Georgia DPH had a graph on the COVID-19 Daily Status Report that had the dates not in chronological order. Here’s what it looked like:
This is of course insane. Nobody makes graphs with dates in a non-chronological fashion. It’s just wrong. I was very glad to see they fixed it promptly. I’m not giving them a pass on it at all. However, I think the media is making a much bigger deal out of it than it actually is.
Is it a big deal?
- The graph still declines overall when the dates are the correct order, so it isn’t much more compelling to sort it incorrectly.
- It’s not one of the main graphs used on the site (Confirmed Cases over Time and Deaths Over Time), which have always been in chronological order. Those key graphs were completely unaffected by this issue.
- The graph is hidden under a tab – it’s not like it was the lead graph on the web site. And it only pertains to the top 5 counties, not the whole state.
- I am pretty certain this graph was online for less than 24 hours. People like me are digging through the site on a daily basis, and multiple people, myself included, noticed it the same night (May 10th). It was reported to the state and fixed the next day (May 11th).
- It was sorted in the wrong order, but the data was correct – so anyone who was really trying to use the graph to make decisions could clearly see the dates were not in order. They were in a large font, not fine print.
How did it happen?
There’s a lot of speculation as to how this happened, but personally I think it was just an dumb mistake by a web developer. The data below this graph is a list of the hardest hit counties sorted from most cases to least, so I think they just sorted the graph the same way, from most cases to least. After 20+ years in the computer industry, I have seen a lot of people make dumb mistakes like this. It’s easier and more common than you might realize.
The state government knows there are a lot of sharp people in Georgia who look at this site multiple times a day and are quick to point out every flaw on Twitter. If it was done on purpose, they had to know someone would call them out for it. Issues like this only hurt their reputation – it did not help their cause at all.
Have their been other issues?
There have been a few other minor mistakes with the display that have been fixed almost immediately.
- Twice, a graph’s X-axis has started from 1970 instead of 2020. This mistake is easy to understand if you realize that dates in databases are formatted in seconds since 1/1/1970.
- Once, the number of confirmed cases by private labs was displayed in the total tests field. It was an obvious mistake, as it was a smaller number than total cases. Someone clearly used the wrong data field when they did a report update.
While not incorrect, some people (including AJC writers), take issue with the fact that Georgia tracks cases by when symptoms began or when the test was taken (if symptom onset is not available). This is actually a common and valuable way to track viral outbreaks. More on that here.
So is it better now?
Now it’s in the right order, but there’s still an issue with the graph. It only displays case data for the last 15 days. As I mentioned above, Georgia tracks cases by when symptoms began or when the test was taken (if symptom onset is not available), so the last 14 days are always in a period of incomplete data as results come in from people who got sick a week or more ago. Elsewhere on the site, they make this clear. More on that here.
As a result, the data in this graph is of limited value. The further to the right you go, the more incomplete the case count is. The most recent date will always have next to no cases, because people who just noticed symptoms today or just got tested today won’t have a positive test reported to DPH yet. So it will always look like it’s declining, even now that it’s sorted correctly. Personally, I think they should do away with this chart completely, or they need to have a very clear disclaimer right above or below the graph that the current data is incomplete.