Scalable facility location for massive graphs on pregel-like systems

Our paper on designing a distributed algorithm for solving the facility-location problem was accepted at the CIKM 2015 conference, and will be presented in Melbourne next week.

What is the facility-location problem? Facility location is a classic problem, first studied in the field of operations research. In the problem setting, we are given a set of ‘facilities’ and a set of ‘locations’ and the goal is to find a mapping of the locations to the facilities such that a certain objective function is minimized. The objective function models the operating cost of serving the locations with a set of selected facilities, and it includes two terms: a cost term for opening a new facility, and a cost term for serving a location with an open facility. Continue reading

Who Let the DAGs Out?

Mining for Meaning

We did – in our paper titled “Beyond rankings: comparing directed acyclic graphs” (pdf) which I’ll be presenting at the ECML PKDD conference in Portugal next month. This was the first project of my PhD, but there’s also something else that makes it fundamentally different from the other research projects I’ve been involved with.

Typically, when I undertake a research project, I have a concrete question, like what is the next location a person will visit, to which I start looking for different solutions. In other words, I begin with a nail and start looking for a suitable hammer. However, this time we started by developing a cool new hammer with some neat theoretical properties before we had any idea if a suitable nail even exists.

View original post 1,088 more words

Apartment prices in Helsinki relate to accessibility by public transport

This post is shared with I.Ž. research blog.

Traditionally, apartment prices are considered to relate to the apartment characteristics and its location. We had a hypothesis that accessibility of a neighbourhood perhaps is even more important than its location. So we did a pilot study in Helsinki region to check that.

First we define static and dynamic points of interest in the city. Static points of interest are supposed to capture community centers. We find them by locating H&M stores in Helsinki region. Dynamic points of interest are supposed to capture where people go at different times of day. We find those centers by clustering FourSquare check-ins.


Continue reading

Crowdsourcing social circle discovery on Twitter

In a short paper accepted to the 9th International Conference on Web and Social Media (ICWSM 2015), we explored how we can use the lists already created on Twitter to organize content for new users. A list is a way of organizing contacts and content on Twitter; for example, a user might create a list “Data Science” to include the accounts of prominent data-scientists like Andrew Ng and Hilary Mason.  The functionality is available to all users, who can create a list by coming up with a title and selecting list members from the pool of all twitter users. Once a list is created, it can be used to selectively view the tweets of its members. Rather than making every user start from scratch when organizing their friends, we wanted to find a way to use the lists already created by other users to recommend groupings automatically. Continue reading

Frequently asked questions about malware

In the post-Snowden era, computer security and privacy are becoming a growing concern for the Internet users. At the same time, the Internet of Things (IoT) is emerging, in which more and more devices become interconnected. Still, most users have little knowledge of how they could protect themselves online.

Before returning to grad school, I had the privilege of working for a few years in the labs of F-Secure, one of the top 3 data-security companies in the field of malware (malicious software) fighting. Collaborating with some of the world’s top experts in the field has certainly been very exciting. In this post, I attempt to answer some very common and basic questions regarding computer malware. The following list of questions is by no means supposed to be exhaustive. It only aims to get across a few basic and necessary facts. Continue reading

How does Shakespeare compare against modern rap artists?

A couple of months ago Eric Malmi wrote about his work on Raplyzer, a method for analyzing Finnish rap lyrics. With the use of a speech synthesizer, Eric has now extended the method to English rap lyrics. Using the new version of the analyzer, he ranked 94 rap artists based on their rhyme factor, and even threw Shakespeare in the mix. He describes the results in a new blog post.


Additionally, if you are looking for more action, you may want to battle rap against BattleBot.