## GIR’17 and visiting RMIT

Our position paper called “Poisson Factorization Models for Spatiotemporal Retrieval”, joint work with Dirk Ahlers, got accepted at the 11th Workshop on Geographic Information Retrieval (GIR’17). In this work, we discuss some modelling ideas and possibilities for advancing spatiotemporal retrieval using Poisson factorization models, especially in scenarios where we have multiple sources of count or implicit spatiotemporal user data. Unfortunately, I will not be able to attend the workshop (but Dirk will be there), because I am now in Melbourne, Australia, and will stay here for 3 months, participating as visiting graduate student in a project with the IR group at RMIT. In particular, I will be working with Dr Yongli Ren and Prof Mark Sanderson, developing joint probabilistic models for spatiotemporal user data for indoor spaces recommendations (they have a very interesting dataset that I am curious to explore). Hopefully, in the next couple of months, I will continue working on nice probabilistic models for recommender system, but incorporating many new and interesting ideas related to location and time.

## Post-conference: ECML-PKDD 2017

ECML-PKDD 2017 was very pleasant and nice. Skopje was an unexpected surprise. I am happy with each new conference that I attend, always meeting new people doing very good research. The community there was very nice in general!

I presented my paper at Matrix and Tensor Factorization session, and I was particularly happy with that, because even though the application we are working is recommender systems, we are focusing on the methods and proposing new factorization methods and models. Later in the night, we had the poster (poster-ecml2017) session at the Macedonian Opera & Ballet and afterward headed to the wine festival, just outside.

For those interested, my presentation slides here:

## Recommender Systems and Deep Learning: paper links

This semester I will be advising some master students on their final project. At this point, they don’t select a specific topic but should look into a given area to find specific research question and some of them will definitely work on Deep Learning and Recommender Systems. Especially because we (the NTNU-AILab group) had a very nice experience last year where one master student doing work on RNN for session-based recommendation managed to have a work accepted at DLRS 2017. So, I decided to make a small selection of the papers related to this topic, focusing on WSDM, WWW, KDD, CIKM, RecSys, ICLR, DLRS and some other specific conferences in the last three years (2015,2016 and 2017). The result is a list of 45 papers, with many distinct ideas, but also some common threads (Matrix Factorization to CNN or LSTM, Session-based methods using RNN, etc). We will not discuss the different ideas, but I will just post the link here because some people might be interested in that.

https://github.com/zehsilva/recsys-deeplearning-info

## Paper accepted at European Conference on Machine Learning (ECML-PKDD) 2017

We have a paper accepted at ECML-PKDD 2017: “Content-Based Social Recommendation with Poisson Matrix Factorization” (Eliezer de Souza da Silva, Helge Langseth and Heri Ramampiaro). This is our first full paper resulting from our research on Poisson factorization and integration of multiple sources of information in a single recommendation model. If you have interest on the paper please email me and I will be happy to discuss.

Also, I am uploading the supplement of the paper here (you can find it also on my publications page)

Supplementary material for: “Content-Based Social
Recommendation with Poisson Matrix Factorization”

## Hidden Markov Models (part II): forward-backward algorithm for marginal conditional probability of the states

(in the same series HMM (part I): recurrence equations for filtering and prediction)

Consider a Hidden Markov Model (HMM) with hidden states $x_t$ (for $t \in {1, 2, \cdots, T}$), initial probability $p(x_1)$, observed states $y_t$, transition probability $p(x_t|x_{t-1})$ and observation model $p(y_t|x_t)$. This model can be factorized as
$p(x_{1:T},y_{1:T}) = p(y_1|x_1)p(x_1)\prod_{t=2}^{t=T}p(y_t|x_t)p(x_t|x_{t-1})$. We will use the notation $X=x_{1:T}$ to represent the set $X=\{x_1,x_2,\cdots,x_T\}$.
In this post we will present the details of the method to find the smoothing distribution $p(x_t|y_{1:T})$ of a HMM, given a set of observations $y_{1:T}$:
Our starting point is the marginal probability $p(x_t|y_{1:T})$ of $x_t$ given all the observations $y_{1:T}$.

\begin{aligned} p(x_t|y_{1:T}) &= \frac{p(x_t,y_{1:T})}{p(y_{1:T})} \\ &= \frac{p(x_t,y_{1:t},y_{(t+1):T})}{p(y_{1:T})}\\ &= \underbrace{p(y_{(t+1):T}|x_t)}_{\beta_t(x_t)}\underbrace{p(x_t,y_{1:t})}_{\alpha_t(x_t)}\frac{1}{p(y_{1:T})} \\ &= \frac{\alpha_t(x_t) \beta_t(x_t)}{p(y_{1:T})} \end{aligned}

## Sampling from Dirichlet Distribution using Gamma distributed samples

There is an algorithm to generate Dirichlet samples using a sampler for Gamma distribution for any $\alpha > 0$ and $\beta > 0$. We will generate Gamma distributed variables $z_k \sim \text{gamma}(\alpha_k,1)$, for $k \in {1,\cdots,d}$, and do the following variable transformation to get Dirichlet samples $x_k = \frac{z_k}{\sum_k z_k}$. First we should demonstrate that this transformation results in Dirichlet distributed samples.

Consider the following tranformation $(z_1,\cdots,z_d) \leftarrow (x_1,\cdots,x_d,v)$, where $x_k = \frac{z_k}{\sum_k z_k}$ and $v = {\sum_k z_k}$. We can rewrite this transformation as $(x_1,\cdots,x_d,v)=h(z_1,\cdots,z_d)$, where $x_k = \frac{z_k}{v}$ and $v = {\sum_k z_k}$. Also we can imediatly calculate the inverse transformation $(z_1,\cdots,z_d)=h^{-1}(x_1,\cdots,x_d,v)$, with $z_k=v x_k$. From the transformation definition we know that ${\sum_{k=1}^d x_k=1}$, implying that $x_d = 1-\sum_{k=1}^{d-1} x_k$ and $z_d=v(1-\sum_{k=1}^{d-1}x_k)$.

## Probabilistic models for Recommender systems (part I): Probabilistic Matrix Factorization

In Recommender Systems design we are faced with the following problem: given incomplete information about users preference, content information, user-items rating and  contextual information, learn the user preference and suggest new items for users based on features as:

• previous pattern of items preference of the user;
• preference of users with similar rating pattern;
• contextual information: location, time (of the day, week, month, year), social network.

This is usually formulated as a matrix completion problem using Matrix Factorization techniques to offer a optimal solution. Is this case latent features for users and item are inferred from the observed/rated items for each user, and from this latent features the missing entries are estimated. One major modelling tool for this problem is probabilistic modelling, and there are many proposals in the literature of different Probabillistic Matrix Factorization approaches. We will briefly discuss some of this models, starting with the seminal paper: Probabilistic Matrix Factorization (PMF) – [Salakhutdinov and Mnih, 2008, NIPS]. Continue reading “Probabilistic models for Recommender systems (part I): Probabilistic Matrix Factorization”

## Hidden Markov Models (part I): recurrence equations for filtering and prediction

This semester I will be attending the doctoral course MA8702 – Advanced Modern Statistical Methods  with the excellent Prof. Håvard Rue. It will be course about statistical models defined over sparse structures (chains and graphs). We will start with Hidden Markov Chains and after go to Gaussian Markov Random Fields, Latent Gaussian Models and approximate inference with Integrated Nested Laplace Approximation (INLA). All this models are interesting for my research objective of developing sound latent models for recommender systems and I am really happy of taking this course with this great teacher and researcher. So, I will try to cover some of the material of the course, starting from what we saw in the first lecture: exact recurrence for Hidden Markov Chains and dynamic programming. In other words, general equations for predictions, filtering, smoothing, sampling, mode and marginal likelihood calculation of state-space model with latent variables. We will start by introduction the general model and specifying how to obtain the prediction and filtering equation.

• Markovian property: $\pi(x_1,x_2,\cdots,x_T)=\pi(x_{1:T})=\prod_{t=1}^{t=T}\pi(x_t|x_{t-1})$, with $\pi(x_1|x_0)=\pi(x_1)$
• $y_t$ are observed and $x_t$ are latent, so $\pi(y_t|x_t)$ is always known.
• If we know $x_{t-1}$ than no other variable will add any information to the conditional distribution of $x_t$.

## Airplanes and Neural Nets

The November 15th is the Proclamation of the Republic Day, a national holiday, and I was really in need of a day of rest. As I couldn’t be resting the whole day I spent some time reading this great book by Dr. Vladimir Vapnik, The Nature of Statistical Learning Theory. I’m not going to write a full review of the book, but I’d like to make some observation regarding some of his opinion that really got me.

I really enjoyed the historical background in the research of the Learning Problem; as a newcomer and apprentice on the field I find it useful and wise to be well-informed on what has been going on. As Cosma Shalizi points out in this review, Vapnik’s view on the history of research in the field is very idiosyncratic, but nevertheless interesting. He expresses a great level of criticism towards the motto: “Complex theories do not work; simple algorithms do” and places himself in the side of the Theory guys.

At some point he starts describing the “second birth” of research on Neural Nets and there lies a footnote that helped me remember why I never was too excited about it. Let’s start with the quote, which is in the context of joint research program declared between Artificial Intelligence scientist and physiologists:

Of course it is very interesting to know how humans can learn. However, this is not necessarily the best for creating an artificial learning machine. It has been noted that the study of birds flying was not very useful for constructing the airplane.

I claim that this affirmation does not intend to undermine this whole research program on Bio-inspired Learning Algorithms, but indeed to elucidate the actual role of this approach in the general setting of Learning Problem. That is to say, airplanes are not the same as birds, the material is not the same, the dynamics is not the same, the way they use energy is not the same, probably we wouldn’t be able to create a flying machine just by copycatting flying living beings. Nevertheless could be that some good ideas that later was incorporated into the flying machine was inspired by birds, as long as we understood the nature of a flying machine essentially as different from birds.

The reason I was never too excited about neural nets is basically because of my second undergraduate programming course. Prof. Ayrton Cristo made a little negative comment on the fact that when neural nets function correctly, generalize the training set, and so on, usually we don’t understand why it works (or how). I don’t endorse his opinion, but I should say that subconsciously this always has made me scrupulous on Neural Nets and some other heuristics. (prof. Ayrton Cristo is what you could call AI guy with a strong foot in NLP, Logic and Functional Programming)

My personal philosophy is more tendentious to the “theoretical proven” side than to “as long as it works”. I know that a lot of good ideas can emerge from heuristics and only a posteriori be well explained by theorists, nevertheless, I’m much more comfortable when I have ground theories to back me up.