Our position paper called “Poisson Factorization Models for Spatiotemporal Retrieval”, joint work with Dirk Ahlers, got accepted at the 11th Workshop on Geographic Information Retrieval (GIR’17). In this work, we discuss some modelling ideas and possibilities for advancing spatiotemporal retrieval using Poisson factorization models, especially in scenarios where we have multiple sources of count or implicit spatiotemporal user data. Unfortunately, I will not be able to attend the workshop (but Dirk will be there), because I am now in Melbourne, Australia, and will stay here for 3 months, participating as visiting graduate student in a project with the IR group at RMIT. In particular, I will be working with Dr Yongli Ren and Prof Mark Sanderson, developing joint probabilistic models for spatiotemporal user data for indoor spaces recommendations (they have a very interesting dataset that I am curious to explore). Hopefully, in the next couple of months, I will continue working on nice probabilistic models for recommender system, but incorporating many new and interesting ideas related to location and time.
This semester I will be advising some master students on their final project. At this point, they don’t select a specific topic but should look into a given area to find specific research question and some of them will definitely work on Deep Learning and Recommender Systems. Especially because we (the NTNU-AILab group) had a very nice experience last year where one master student doing work on RNN for session-based recommendation managed to have a work accepted at DLRS 2017. So, I decided to make a small selection of the papers related to this topic, focusing on WSDM, WWW, KDD, CIKM, RecSys, ICLR, DLRS and some other specific conferences in the last three years (2015,2016 and 2017). The result is a list of 45 papers, with many distinct ideas, but also some common threads (Matrix Factorization to CNN or LSTM, Session-based methods using RNN, etc). We will not discuss the different ideas, but I will just post the link here because some people might be interested in that.
These last weeks has been full of work. I am in the critical final phase of my Master degree studies, trying to finish and hoping to submit my dissertation to the committee as soon as possible. Besides, I started in June a job as a software analyst at Brazilian Institute of Geography and Statistics (IBGE), as a public servant. Keeping up the good work in both jobs has been challenging; but there has been also really good times. It is the World Cup (\ironic\ yay)!
Yesterday, for example, I found out that our work on the parallelization of our LSH aproach to generic metric similarity search was accepted at the 7th International Conference on Similarity Search and Applications – SISAP 2014! What a great news. In this work, a colaboration with specialists in distributed and parallel system Dr. George Teodoro and Thiago Teixeira, we insisted in the direct “simplistic” approach (with rather good results) of VoronoiLSH (basically partitioning the space using a set points, random or not, as centroids of the partitions and attributing codes for the partitions) to design a parallel metric nearest neighbor search method using Dataflow programming (breaking the indexing and searching algorithm in five computing stages). It is nice that the approach exploits task, pipeline, replicated and intra-stage parallelism. We evaluated the proposed method in small metric datasets and in a big Euclidean dataset for ANN (1 Billion, 128 dimensional points). More details should be posted as soon. So, here is the abstract:
Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data
Eliezer Silva, Thiago Teixeira, George Teodoro and Eduardo Valle
Under the assumption of uniform access cost to the data, and for the handful of dissimilarities for which locality-sensitive families are available, Locality-Sensitive Hashing (LSH) is known to be one of the most competitive techniques available for similarity search. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.
Hot news folks! In my qualification exam I proposed two ideas for extending Locality-Sensitive Hashing to general metric spaces. Readily afterwards I wrote a short paper presenting one of the ideas and early results… and guess what? It has been accepted at the SBBD2013 (Brazilian Symposium on Databases).
K-medoids LSH: a new locality sensitive hashing in general metric space
Eliezer S. Silva, Eduardo Valle
Abstract. The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them – the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the ability to perform search by similarity efficiently. However, Multimedia Documents are often represented by high-dimensional feature vectors, or by other complex representations in metric spaces. Providing efficient similarity search for that kind of data is extremely challenging. In this article, we explore one of the most cited family of solutions for similarity search, the Locality-Sensitive Hashing (LSH), which is based upon the creation of hashing functions which assign, with higher probability, the same key for data that are similar. LSH is available only for a handful distance functions, but, where available, it has been found to be extremely efficient for architectures with uniform access cost to the data. Most of existing LSH functions are restricted to vector spaces. We propose a novel LSH method for generic metric space based on K-medoids clustering. We present comparison with well established LSH methods in vector spaces and with recent competing new methods for metric spaces. Our early results show promise, but also demonstrate how challenging is to work around those difficulties.