Our recent paper about second order “Co-Following” on Twitter was accepted in ACM Hypertext 2014 (short paper). This is work that I did mostly while I was at QCRI, with Ingmar Weber. The idea is that two Twitter users whose followers have similar friends are similar, even though they might not share any common followers. The intuition behind why this works is that the friends of a user typically represent the interests of that user, and so if two Twitter users have followers who have similar interests, they might be similar.
As an example of our approach, consider Figure 1 below. Consider the twitter accounts of two (unrelated, not so popular) football clubs, from Belgium (@Lierse) and Italy (@ACF_Fiorentina). Directed edges from users (foll1-foll5 in the middle) indicate the following behavior of these users. In this example, we see that the two accounts do not share a single common follower. However, many of their followers are “co-following” some common accounts, such as @FIFA or @FCBarcelona. This can be used to deduce the similarity or rather closeness of the two accounts, @Lierse and @ACF_Fiorentina. Most existing approaches try to measure similarities between two accounts using the number of common followers they have. This would fail to work in the above example, since the two accounts do not share many followers. However, the two accounts are related to football and are close in that sense. Our approach would complement existing approaches by extending to the 2nd order network and thus enabling us to measure similarities of pairs of users who (i) do not share a lot of common followers, (ii) do not have a lot of followers. At first sight, this idea is similar to using common links (co-citations) for clustering web pages. However, typical co-citation or co-linkage approaches would focus on the “1-hop backward” links only and then looking at overlaps. In our analysis, we make crucial use of the added “forward” links. In a sense, we are using 2nd order co-citation or co-following rather than ordinary 1st order co-citation.
This idea has a lot of cool applications in (i) language-agnostic user classification, (ii) user recommendation, (iii) cross-selling and marketing opportunities, etc.
The idea of co-following can be used for language-agnostic user-classification on Twitter. We tried to see if we can use co-following for predicting if a user will follow, from arguably interchangeable rival companies like @CocaCola vs. @Pepsi or @Puma vs. @Nike. We observed that, even after removing obvious co-following features, the prediction AUC is as high as 80%. Figure 2 below shows the results.
Does the fact that you follow @CocaCola tell something about your music preferences? Our preliminary results indicate a signal in that direction. We looked at the top features belonging to different categories of Twitter users and found interesting results. Figure 3 below shows the comparison of top features for @GOP and @TheDemocrats for categories Music, News and Sports. The lifestyle correlations for the political rivalry @GOP vs. @TheDemocrats can be inspected to make intuitive sense with, e.g., @nytimes being more popular among @TheDemocrats followers (The New York Times is generally perceived to have a liberal bias, see this).
We performed multidimensional scaling (MDS) using pair-wise similarity scores obtained from the co-following features and observed some interesting results. The figures below show some of the MDS plots obtained. Most of the observed structure corresponds to musical genres. For example, Lil Wayne (@liltunechi), Chris Brown (@chrisbrown) and Drake (@drake) are rappers and are co-mapped together in the map, marked in red. Similar is the case of Snoop Dogg (@snoopdogg) and Kanye West (@kanyewest), marked in green, both of which are hip hop artists. However, there are also surprising things that emerge such as the relative closeness of “Weird Al” Yankovic (@alyankovic), famous for musical parody, and Yoko Ono (@yokoono), both marked in orange. Though very different musical genres, both arguably appeal to an older, more educated audience. Similarly in the case of German political parties, we see groups of political parties with similar ideologies close to each other.
Note that MDS is a lossy embedding and that even though two points appear close in the 2-dimensional plane, they might be far apart in the original high dimensional space. Therefore, all conclusions and observations we derived from such mappings in the following have also been validated using the high dimensional similarity information.