Digital Disease Detection: Using Social Media To Predict Flu Trends

It’s the season for coughing, sneezing, fever, sore throat and body aches. If you haven’t had the flu or another respiratory illness in the last five months, there’s a good chance you know someone who has. According to the Centers for Disease Control and Prevention (CDC) about 5 to 20 percent of people in the United States get the flu every year. That’s a lot!

Each flu season is usually different from the last. The number of people the virus infects, how sick the virus can make us, and when the flu season peaks usually varies from one year to another. The CDC believes that early prediction of these measures, being the severity of the flu and the time at which the flu season peaks, “could be very useful to public health officials for vaccination campaigns, communicating to the public, allocating resources, and implementing strategies to combat the spread of flu disease.” This makes a lot of sense. If we can predict what the flu season is going to look like, we can prepare for it so that its negative impact is lessened. So on November 25, 2013, the CDC launched an influenza prediction challenge encouraging researchers to use social media and other data sources to predict the flu.

Researchers have been evaluating the use of social media for monitoring flu outbreaks for several years. Monitoring means the collection, analysis and interpretation of data on the incidence and spread of disease. Monitoring, essentially, is looking at who is getting sick, where they are getting sick, when and how they are getting sick. When it comes to using social media for monitoring the flu, the aim is typically to estimate current flu trends before agencies like the CDC release their official reports. For instance, in order to estimate flu activity for week j, researchers need social media data for week j. Agencies like the CDC, however, monitor the flu by more traditional measures, such as data from outpatient healthcare providers, positive flu tests from laboratories around the nation, hospitalization data, etc. All of this data takes time to collect and send to the CDC, which can lead to delays in making the data publicly available. As suggested previously, disease monitoring based on traditional methods can be time-consuming, and necessarily so. Traditional public health surveillance methods ensure more accuracy in case identification, which social media cannot. However, data from social media can serve as a complement to data obtained by traditional methods. Data from social media is usually available in real-time and may provide useful information in a timely manner. The vast amount of data available through social media can be used in combination with traditional data sources for making better public health decisions.

Monitoring and reporting on current events or predicting near-future events is usually referred to as nowcasting. Examples include studies focused on monitoring and nowcasting flu activity during the 2009 H1N1 pandemic and the 2012-2013 influenza epidemic using Twitter. In these studies, researchers usually develop a process for collecting tweets containing flu-related terms (such as “flu”, “influenza”, and “Tamiflu”), the time at which each tweet is published and the geographical location from which the tweet is sent. Researchers then estimate daily and/or weekly flu-like illness from the data. These data sometimes correlate with flu-like illness data collected by the CDC.

Compared to studies on monitoring flu trends, there have been fewer studies focused specifically on using social media for long-term prediction (future projections of the incidence and spread of disease) of flu activity. In recent years, there were several news stories like Researchers Use Twitter to Predict Flu Outbreaks, Twitter can Predict Where Flu Outbreaks will occur and Flu forecasts use weather prediction model covering studies that used Google Flu Trends (a system that uses online search activity on flu-related terms to estimate flu trends) and Twitter sometimes in combination with other data sources for predicting flu activity. More recently, a group of researchers presented an approach for combining multiple data sources, including Twitter, Google Flu Trends, and online news reports to predict flu activity. A synthesis of methods used in influenza forecasting can be found here: A systematic review of studies on forecasting the dynamics of influenza outbreaks. The methods range from simple time series models (which are based on the assumption that we can forecast future flu activity based on past observations) to more complex methods such as agent-based models (also referred to as individual-based – seeks to represent individual/agents, and interaction between individuals and their environment). If interested, you can find a discussion of the advantages and limitations of using various forecasting approaches in the previously mentioned systematic review.

Researchers have also used more traditional disease modeling approaches such as Susceptible-Exposed-Infectious-Removed (SEIR) models in combination with Bayesian approaches such as particle filtering for flu forecasting. If you don’t have a background in epidemiology or statistics, this might be completely new to you. So I’ll try to explain. In SEIR models people are usually first susceptible to the disease. Then they become exposed to the virus when they come in contact with an infected person. Some exposed individuals become infected for a period of time and eventually they either recover or die. By combining particle filtering, which is a statistical method used for estimating the state of a dynamical system with traditional disease modeling methods, researchers can make forecasts of flu activity.

The CDC challenge is a great opportunity for researchers to use social media for prediction of measures such as the severity and peak of flu cases in real-time. These studies would add to the literature of real-time flu prediction studies performed during the 2009 pandemic and more recently using traditional flu data sources and techniques used in weather forecasting [1]. Judging of the CDC flu prediction challenge would take place between March 28 and May 30, 2014.





Related Posts