Finding the source of an epidemic: Faster with open data?

German public health officials are working around the clock to find the source of the e.coli (EHEC) epidemic. Today as many as 365 new cases were confirmed, in all there are more than 1000 cases. So far at least 14 deaths have been registered.

Germany is probably one of the countries in the world most able to deal with such a serious epidemic. However, looking at media coverage and the way public health agencies are informing citizens, I think a different approach could speed up the crucial process of finding the source.

A similar, though not so quickly developing epidemic occurred in Norway in 2006 (link in Norwegian). 18 persons, 16 of them children, were hospitalized with e.coli. One of the children died. Research by the public health agencies, including of course interviews with patients and their families, pointed initially towards ground meat as the culprit. But this turned out to be a wrong lead (as many of us following the news had suspected). Several weeks went by until the bacteria was found in “morrpølse” and traced back to a specific production facility.

The story and the “ground meat hypothesis” dominated the media, and this was a hot topic for discussion around breakfast and lunch tables all over the country. At the time, I wondered why the public health experts didn’t disclose more of their findings. If they had published all data and information from interviews and research (of course anonymized for privacy), then other experts, and smart people in general, could have contributed their own analysis. Perhaps they could have pointed to leads that officials had overlooked, patterns they hadn’t observed. Can we for example rule out that some IT experts have better tools at their disposal, or at least other tools than the officials in charge? Of course, data should be published in English so that foreigners also could weigh in.

One obvious counter-argument is that asking for ideas and analysis from the public would open the floodgates and confuse rather than help researchers. But this is again a question of having the right tools available for filtering and analysing contributions. After all, crowdsourcing research processes has been tried before.

In the ongoing German epidemic, online media could play a constructive part by starting such a process of asking the audience for advice and ideas. At least they should start by offering more in-depth interactive presentations of how the epidemic is spreading. Detailed maps would be interesting and helpful in itself. At least some government agencies are providing quite specific information about where cases originated (Schleswig-Holstein, pdf). Media could in general use this opportunity to file requests for data and demonstrate the potential of data journalism.

UPDATE June 5: The suggestion above more or less takes for granted that the responsible government agencies, hospitals etc. at least have an efficient way of collecting and disseminating information among themselves. But this is doubtful, as criticism in the German media the last week of several aspects of how the epidemic is handled shows. Hospitals complain about the late arrival of questionnaires to be used in interviews with patients. The Robert Koch Institut does not disclose much information on how they are working to find the source of the epidemic, one hospital director says. By tomorrow, June 6, a new Internet government platform for sharing of information between agencies will be launched — another indication that the information infrastructure part of dealing with the epidemic has had flaws so far.

UPDATE June 15: On Zeit Online’s Data Blog, some of the same questions are raised and debated, with comments from the Robert Koch Institut.