Crowdsourcing social circle discovery on Twitter

In a short paper accepted to the 9th International Conference on Web and Social Media (ICWSM 2015), we explored how we can use the lists already created on Twitter to organize content for new users. A list is a way of organizing contacts and content on Twitter; for example, a user might create a list “Data Science” to include the accounts of prominent data-scientists like Andrew Ng and Hilary Mason.  The functionality is available to all users, who can create a list by coming up with a title and selecting list members from the pool of all twitter users. Once a list is created, it can be used to selectively view the tweets of its members. Rather than making every user start from scratch when organizing their friends, we wanted to find a way to use the lists already created by other users to recommend groupings automatically.

We collected a dataset of mini-networks centered around 24 seed users, with each consisting of the friends of the seed user, the members of the seed user’s lists, and the connections between them [1]. We used the seed user’s lists as “ground truth” lists to evaluate our automatic list predictions. We observed that users in the ground truth lists generally follow each other, and occur together in lists more often, when compared to all users in the seed user’s network. This can be seen in a comparison of the density (|E|/|V||V-1| where |E| is the number of connections between users and |V| is the number of users) of the ground truth lists compared to the seed user’s entire network.

list_densities
Follower density (left) and co-listed density (right) of the seed user’s network plotted against the density of the user’s ground truth lists. Density in the ground truth lists is generally higher.

Using this observation, we designed a simple density-based criterion to select the “best” lists in each user’s network. As candidate lists, we used the pre-existing lists created by other twitter users that contain members of the seed user’s network. In many cases, our method selects lists with title and size quite similar to the ground truth lists, as shown in the table below.

toplists
Top 25 list matches excluding lists with Asian-language titles. We show the list titles, the number of users in the list, and  the F1-score of the best match between each ground truth list and the selected list using our method.

For more details on our methods, here are the links to the paper and code.

1. This is akin to the egonet described in Leskovec and Mcauley’s recent work.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s