I recently attended the CIKM conference in Melbourne to present our paper on facility location in map-reduce and Giraph. In this post, I will give a brief summary some of the talks I attended. As CIKM is a very large conference, with 166 papers accepted this year, this list is merely a random sample of the complete list of papers.
The first keynote was by Jaime Teevan (Microsoft Research), who talked about “slow search,” a novel concept where search engines may use additional time in order to provide higher quality search experience. Slow search can help with complex queries, which are hard to answer, or with making search experience immersive and interactive. Jamie presented their work on slow search as a way of involving human knowledge into the search process. Involving humans can be done in three ways: (i) crowd sourcing, (ii) friend sourcing, and (iii) self sourcing. The slides of the talk are available here.
The second keynote I attended was by Xiaofang Zhou (University of Queensland) on making sense of spatial trajectories. Xiaofang talked about the growing importance of spatial trajectory data (GPS traces, internet traffic, etc.), computational problems on trajectory mining, as well as the similarities and differences with existing methods (e.g., time-series analysis).
1. BiasWatch: A Lightweight System for Discovering and Tracking Topic-Sensitive Opinion Bias in Social Media
This paper presents methods to find hashtags related to a controversy in twitter, and to identify the bias scores of users with respect to a controversial topic. It is assumed that a small seed set of hashtags is provided as input. I was very interested in this paper as it is related to our upcoming work on Quantifying Controversy on Social media, which will appear in WSDM 2016.
2. (i) HDRF: Stream-Based Partitioning for Power-Law Graphs, (ii) Towards Scale-out Capability on Social Graphs
Both of these papers deal with the problem of graph partitioning for processing graphs in distributed frameworks, such as Giraph or GraphX. The objective of graph partitioning is to minimize communication between the graph nodes. This problem is relevant in managing social networks, where, if we just partition the nodes randomly, due to the skew of the degree distribution, a large amount of time will be spent in communication. In fact, we encountered this problem during our experiments on Giraph for large social graphs for our facility location algorithm.
3. (i) Identifying Top-k Structural Hole Spanners in Large-Scale Social Networks, (ii) Mining Brokers in Dynamic Social Networks
Structural hole spanners are defined to be the network nodes that act as bridges between communities. Both of the above papers address the problem of finding the network nodes whose removal changes the distances between other pairs of nodes. The main challenge here is to design algorithms that can compute these distances efficiently.
4. Learning Entity Types from Query Logs via Graph-Based Modeling
This paper present a new algorithm, based on label propagation on the query click graph, in order to associate entities with types. For instance, the algorithm is able to infer that ‘New York’ is a ‘place.’
5. Enterprise Social Link Recommendation
Enterprise social networks (ESN) is a new big thing these days. Over 85% of the Fortune 500 companies use ESNs. One peculiar aspect of ESNs is the availability of company information, like the organization chart (organization hierarchy). This paper proposes a method to fuse social information with the organization chart for improving social link recommendations.
6. Where you Instagram? Associating Your Instagram Photos with Points of Interest
This paper presents a classifier for annotating images with points of interest (POIs). The algorithm uses images that have already been annotated with points of interest, as well as image and user features.
7. Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis
A lot of methods exist for sparse matrix multiplication on CUDA, but this paper presents a version specific to social networks. The proposed method makes use of the skew of the degree distribution.
8. The Role of Citation Context in Predicting Long-Term Citation Profiles: An Experimental Study Based on A Massive Bibliographic Text Dataset
The goal of this work is to increase the quality of predicting the future success of a research paper. The main idea is to use citation context — the number of times a paper is cited within another paper and the number of words used to describe the paper — to extract additional features.
9. Weighted Similarity Estimation in Data Streams
This paper presents algorithms that make use of the Alon-Matias-Szegedy sketch to estimate weighted Jaccard, cosine, and Pearson correlation similarity in a streaming setting. They are the first to handle the weighted case.
10. Top-k Reliable Edge Colors in Uncertain Graphs (Short paper)
This paper addresses a problem in the setting of edge-colored uncertain graphs. Given a source and a destination node in the graph, the goal is to find an edge-color set of size k that maximizes the reliability from the source to the destination.
11. What Is a Network Community? A Novel Quality Function and Detection Algorithms
This paper presents a new measure for quantifying the quality of a community, called communitude. It is defined as “the Z-score of a subset of vertices S with respect to the fraction of the number of edges within the subgraph induced by S”. The authors propose a community-discovery algorithm that uses this measure.
12. DifRec: A Social-Diffusion-aware Recommender System
The paper presents a new recommender system that takes into account the diffusion patterns in a social network to avoid duplicate recommendations. For example, if an item is recommended to me and I am likely to share it with my friends, it does not make sense to recommend the same item to my friends too.
Perhaps an interesting observation, I noticed at least 4 papers that propose/make use of word embeddings (word2vec).
Best student paper: Struggling and Success in Web Search
Best paper: Assessing the Impact of Syntactic and Semantic Structures for Answer Passages Reranking