Our position paper called “Poisson Factorization Models for Spatiotemporal Retrieval”, joint work with Dirk Ahlers, got accepted at the 11th Workshop on Geographic Information Retrieval (GIR’17). In this work, we discuss some modelling ideas and possibilities for advancing spatiotemporal retrieval using Poisson factorization models, especially in scenarios where we have multiple sources of count or implicit spatiotemporal user data. Unfortunately, I will not be able to attend the workshop (but Dirk will be there), because I am now in Melbourne, Australia, and will stay here for 3 months, participating as visiting graduate student in a project with the IR group at RMIT. In particular, I will be working with Dr Yongli Ren and Prof Mark Sanderson, developing joint probabilistic models for spatiotemporal user data for indoor spaces recommendations (they have a very interesting dataset that I am curious to explore). Hopefully, in the next couple of months, I will continue working on nice probabilistic models for recommender system, but incorporating many new and interesting ideas related to location and time.
We have a paper accepted at ECML-PKDD 2017: “Content-Based Social Recommendation with Poisson Matrix Factorization” (Eliezer de Souza da Silva, Helge Langseth and Heri Ramampiaro). This is our first full paper resulting from our research on Poisson factorization and integration of multiple sources of information in a single recommendation model. If you have interest on the paper please email me and I will be happy to discuss.
Also, I am uploading the supplement of the paper here (you can find it also on my publications page)
These last weeks has been full of work. I am in the critical final phase of my Master degree studies, trying to finish and hoping to submit my dissertation to the committee as soon as possible. Besides, I started in June a job as a software analyst at Brazilian Institute of Geography and Statistics (IBGE), as a public servant. Keeping up the good work in both jobs has been challenging; but there has been also really good times. It is the World Cup (\ironic\ yay)!
Yesterday, for example, I found out that our work on the parallelization of our LSH aproach to generic metric similarity search was accepted at the 7th International Conference on Similarity Search and Applications – SISAP 2014! What a great news. In this work, a colaboration with specialists in distributed and parallel system Dr. George Teodoro and Thiago Teixeira, we insisted in the direct “simplistic” approach (with rather good results) of VoronoiLSH (basically partitioning the space using a set points, random or not, as centroids of the partitions and attributing codes for the partitions) to design a parallel metric nearest neighbor search method using Dataflow programming (breaking the indexing and searching algorithm in five computing stages). It is nice that the approach exploits task, pipeline, replicated and intra-stage parallelism. We evaluated the proposed method in small metric datasets and in a big Euclidean dataset for ANN (1 Billion, 128 dimensional points). More details should be posted as soon. So, here is the abstract:
Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data
Eliezer Silva, Thiago Teixeira, George Teodoro and Eduardo Valle
Under the assumption of uniform access cost to the data, and for the handful of dissimilarities for which locality-sensitive families are available, Locality-Sensitive Hashing (LSH) is known to be one of the most competitive techniques available for similarity search. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.