I was in Shanghai attending WSDM 2015. Here is a brief report on the conference, including summaries of a few papers that I found interesting and relevant to my own research.
There were 3 keynote speeches. The first one was by Michael Franklin from UC Berkeley, presenting the Berkeley Data Analytics Stack, well known for their contribution to Apache Spark. Michael argued that due to the use of intelligent caching, Spark is much faster than traditional MapReduce. He explained the need for a unified framework (like Spark), which contains a unified back-end, with verticals catering to different domains built on top. For example, the Berkeley Data Analytics stack has a unified back-end Spark, with verticals for SQL (Spark SQL), graphs (GraphX), streaming (Spark Streaming), on top.
The second keynote was by Lada Adamic from Facebook. She talked about different types of information propagation on Facebook. They studied various types of cascades like, rumors, memes, social movements, etc., and designed methods to predict these cascades. The third keynote was by Throsten Joachims from Cornell on learning from user-interaction data.
I was mostly interested in papers on social-network analysis and graph mining. Here is brief summary of some of the papers I found most relevant.
1. The power of random neighbors in social networks (Silvio Lattanzi and Yaron Singer)
The paper talks about the friendship paradox, a classic social phenomenon that states that your friends have more friends than you, on average. In this paper they try to give an explanation of this phenomenon. They show that given a random node, there is an asymptotic gap between its degree and the degree of one of its random neighbors. They show that this result can be proven with just a polylogarimithmic number of samples from the network. The friendship paradox is useful in many applications like marketing, for finding influential users, or immunization.
2. Negative link prediction in social media (Jilian Tang, et al.)
The study of signed networks has seen a decent rise in interest in recent times, and this paper follows suite. The paper begins with introducing the importance of predicting negative links in social media, which has applications in recommendation, for example. The main idea of the paper is based on using two datasets, Epinions and Slashdot (available in SNAP). These datasets contain negative links, indicating enemies/dislike. The authors use these networks to learn how negative links work in social media and apply these ideas to predict negative links. The main features that are used are based on user interactions (like/dislike), and two social theories — balance theory and status theory. Using these features, the authors try to build a classifier that classifies a link as missing or negative.
3. Sarcasm detection on twitter : a behavioral modeling approach (Ashwin Rajadesingan, et al.)
This paper presents an approach to detect sarcastic tweets. The main difference of their approach compared to existing work on sarcasm detection is their use of user behavior related features, in addition to lexical and linguistic cues. Using tweets containing #sarcasm and #not as ground truth, they evaluate their classifier and show a nice performance.
4. On integrating network and community discovery (Jialu Liu, et al.)
One main drawback of most existing community detection algorithms is that they assume that the entire network is available. But this assumption may not be feasible for many social networks like Facebook or Twitter (for example, due to API restrictions). This paper presents a method to overcome this problem by selectively expanding the network in such a way that good communities are found. The algorithm iterates between a “Network Discovery” phase and a “Community Detection” phase and iteratively tries to find the best communities with in a budget.
5. Finding subgraphs with maximum total density and limited overlap (Oana Denisa Balalau, et al.)
Most existing work on finding dense subgraphs is limited to find one big dense subgraph. This paper presents an intuitive, yet interesting addition to this, by proposing to find k dense subgraphs from a graph, which have a limited overlap. They define a “minimal densest subgraph”, which is the densest subgraph that doesn’t contain any other densest subgraph. They propose an efficient primal-dual algorithm to find the minimal densest subgraph. Their main algorithm to find k dense subgraphs is then very simple. First find a minimal subgraph S, then remove some of these nodes from the graph, only those nodes that are not well connected to the nodes outside S. Iterate until k subgraphs are found or until the graph is empty.
6. Delayed-dynamic-selective (DDS) prediction for reducing extreme tail latency in web search (Saehoon Kim, et al.) (Best Paper Runner-up)
An important problem in web search is to reduce the tail latency for long running queries. Typically, once a query is issued, a search engine tries to predict long running queries and parallelize its execution. This paper presents a framework that tries to improve extreme tail latency by delaying the prediction and doing it dynamically, based on the current execution status and data. They show significant improvement in the tail latency, with up to 950% improvement in the precision.
7. Inverting a steady-state (Ravi Kumar, et al.) (Best Paper)
In this paper, the authors show how to answer the following question: Given the steady state of a Markov chain, can we recover some properties of the underlying process?
I attended the tutorial on distributed graph algorithms, by Silvio Lattanzi and Vahab Mirrokni. They presented a couple of algorithms that they have developed, one about finding connected components on a large scale caught my attention, mainly because of its application to study the web structure.
They also presented a framework for distributed graph processing they created at Google, ASYMP, which uses asynchronous message passing, unlike the usual synchronous models like Pregel or Giraph. They show how it helps in scaling up a certain type of graph algorithms, like personalized PageRank.
I was also impressed about the scale of their experiments, with graphs ranging from few billions to trillions of edges.
It was a great learning experience for me and a fun trip, even though I did not have a paper. WSDM 2016 will be in San Francisco. I hope I will be there with a paper 🙂