Our recent paper on ‘Social media image analysis for public health‘ will appear as a short paper in CHI 2016. The question we ask in this paper is whether images uploaded to social media can be used to predict public health variables and lifestyle diseases, such as obesity, diabetes, depression, etc.
Lifestyle diseases are of major concern in the developed world. NYTimes estimates that in addition to costing almost a trillion dollars, lifestyle diseases kill more people than contagious diseases. With the ubiquitous use of social-media platforms in the recent years, it has never been easier to collect and analyze lifestyle choices of large populations. For this reason, social-media data has indeed been used in the past to study or monitor public health.
A major trend in previous studies was the use of textual data (e.g., tweets). On the other hand, the recent increase in the usage of smartphone devices has lead to an exponential growth in the number of images posted in social media. It is estimated (2014 figures) that around 1.8 Billion images are posted to social networks every day (many times higher, compared to 500 million tweets a day).
With recent advances in deep learning for image understanding, it has become possible to use machines to understand image content. Various public APIs (e.g., Google, Imagga, metamind, etc) are now available for generating a set of tags that describe the contents of an image. In this paper, we use the Imagga auto tagging API to generate tags from images. An example of an image and tags generated by the Imagga API are shown below.
In our paper we make use of such machine-generated tags to predict US county-level health variables such as obesity, drinking, diabetes, etc. Our main finding is that even though machine generated tags generally do not help much, for certain special categories like ‘excessive drinking’ (percentage of adults who report excessive drinking in the county), machine-generated tags outperform other baselines. We believe that this finding is of great value, as machine-generated tags might prove valuable in the case of stigmatized behaviors, such as substance abuse, where explicit hints are unlikely to be added by the image owner. For example, the image below of a table full of beer glasses is not explicitly labeled ‘alcohol’ by the user, but with the use of machine-generated tags we can identify the presence of alcohol.