## Preprint to Temporal Hierarchical Recurrent Neural Network (THRNN) paper (WSDM 2019)

The ArXiv preprint to our paper introducing a joint Point process and Hierarchical RNN for item and time prediction is now available.

Time is of the Essence: a Joint Hierarchical RNN and Point Process Model for Time and Item Predictions

In recent years session-based recommendation has emerged as an increasingly applicable type of recommendation. As sessions consist of sequences of events, this type of recommendation is a natural fit for Recurrent Neural Networks (RNNs). Several additions have been proposed for extending such models in order to handle specific problems or data. Two such extensions are 1.) modeling of inter-session relations for catching long term dependencies over user sessions, and 2.) modeling temporal aspects of user-item interactions. The former allows the session-based recommendation to utilize extended session history and inter-session information when providing new recommendations. The latter has been used to both provide state-of-the-art predictions for when the user will return to the service and also for improving recommendations. In this work we combine these two extensions in a joint model for the tasks of recommendation and return-time prediction. The model consists of a Hierarchical RNN for the inter-session and intra-session items recommendation extended with a Point Process model for the time-gaps between the sessions. The experimental results indicate that the proposed model improves recommendations significantly on two datasets over a strong baseline, while simultaneously improving return-time predictions over a baseline return-time prediction model.

https://arxiv.org/pdf/1812.01276.pdf

## Paper accepted at WSDM 2019

Our paper «Time is of the essence: A joint Hierarchical RNN and Point Process model for time and item predictions» has been accepted at 12th ACM International Conference on Web Search and Data Mining (WSDM). Collaborative work with Bjørnar Vassøy, Massimiliano Ruocco and Erlend Aune. WSDM is one of the top conferences in the domain of data mining, information retrieval and machine learning on the Web. This year WSDM had 511 submissions with an acceptance rate of 16%. Soon we will provide a link to the preprint and source-code.

In this paper, we have proposed a joint model with a shared latent representation for a Point Process model (for time prediction) and a Hierarchical Recurrent Neural Network (HRNN). By doing so we are able to model a multi-session recommendation problem, together with returning time prediction.

This work was developed as part of the Norwegian Open AI Lab in cooperation with Telenor Research.

Looking forward to visiting Melbourne again in the summer!

## GIR’17 and visiting RMIT

Our position paper called “Poisson Factorization Models for Spatiotemporal Retrieval”, joint work with Dirk Ahlers, got accepted at the 11th Workshop on Geographic Information Retrieval (GIR’17). In this work, we discuss some modelling ideas and possibilities for advancing spatiotemporal retrieval using Poisson factorization models, especially in scenarios where we have multiple sources of count or implicit spatiotemporal user data. Unfortunately, I will not be able to attend the workshop (but Dirk will be there), because I am now in Melbourne, Australia, and will stay here for 3 months, participating as visiting graduate student in a project with the IR group at RMIT. In particular, I will be working with Dr Yongli Ren and Prof Mark Sanderson, developing joint probabilistic models for spatiotemporal user data for indoor spaces recommendations (they have a very interesting dataset that I am curious to explore). Hopefully, in the next couple of months, I will continue working on nice probabilistic models for recommender system, but incorporating many new and interesting ideas related to location and time.

## Post-conference: ECML-PKDD 2017

ECML-PKDD 2017 was very pleasant and nice. Skopje was an unexpected surprise. I am happy with each new conference that I attend, always meeting new people doing very good research. The community there was very nice in general!

I presented my paper at Matrix and Tensor Factorization session, and I was particularly happy with that, because even though the application we are working is recommender systems, we are focusing on the methods and proposing new factorization methods and models. Later in the night, we had the poster (poster-ecml2017) session at the Macedonian Opera & Ballet and afterward headed to the wine festival, just outside.

For those interested, my presentation slides here:

## Recommender Systems and Deep Learning: paper links

This semester I will be advising some master students on their final project. At this point, they don’t select a specific topic but should look into a given area to find specific research question and some of them will definitely work on Deep Learning and Recommender Systems. Especially because we (the NTNU-AILab group) had a very nice experience last year where one master student doing work on RNN for session-based recommendation managed to have a work accepted at DLRS 2017. So, I decided to make a small selection of the papers related to this topic, focusing on WSDM, WWW, KDD, CIKM, RecSys, ICLR, DLRS and some other specific conferences in the last three years (2015,2016 and 2017). The result is a list of 45 papers, with many distinct ideas, but also some common threads (Matrix Factorization to CNN or LSTM, Session-based methods using RNN, etc). We will not discuss the different ideas, but I will just post the link here because some people might be interested in that.

https://github.com/zehsilva/recsys-deeplearning-info

## Lisbon Machine Learning Summer School (LxMLS) 2017

Last year I had the opportunity to attend this great summer school in the beautiful and lovely city of Lisbon. It was a great week together with a lot of interesting and intelligent people, all of them interested in the amazing and exciting area of machine learning and NLP. I liked it so much last year that I decided to come back this year to volunteer as an assistant in the summer school. Today was the -1 day, where we organized some of the registration stuff, welcomed some student and had some beers. Looks like it will be, again, a great time here in Lisbon

http://lxmls.it.pt/2017/

## WSDM Doctoral Consortium 2017

On February I visited Cambridge to attend WSDM Doctoral Consortium. It happened during the first day of the conference, in parallel to some tutorials. It was a great time, we had excellent discussions about our projects with senior researchers and fellow Ph.D. candidates. Here is a photo for the posterity.

And the program: http://www.wsdm-conference.org/2017/doctoral-consortium/

## A lower bound for expected value of log-sum

Lately, I have been working with Poisson Matrix Factorization models and
at some point a needed to work a lower bound for $\text{E}_q[\log \sum_k X_k]$. After seeing some people using this lower bound without a good explanation, I decided to write this blog post. Also, this is included as an appendix to my ECML-PKDD 2017 paper about poisson factorizatiom model for recommendation.
The function $\log(.)$ is a concave function, which means that: $\log(p_1 x_1+p_2 x_2) \geq p_1\log x_1+p_2 \log x_2, \forall p_1,p_2:p_1+p_2=1, p_1,p_2 \geq 0$
By induction this property can be generalized to any convex combination of $x_k$ ($\sum_k p_k x_k$ with $\sum_k p_k=1$ and $p_k \geq 0$ ):

$\log \sum_k p_k x_k \geq \sum_k p_k\log x_k$

Now with the a random variable we can create a similar convex combination by multiplying and dividing each random variable $X_k$ by $p_k$ and apply the sum of of expectation property:
$\text{E}_q[\log \sum_k X_k] = \text{E}_q[\sum_k\log \frac{p_k X_k}{p_k}]$
$\log \sum_k p_k\frac{X_k}{p_k} \geq \sum_k p_k\log \frac{X_k}{p_k}$
$\Rightarrow\text{E}_q [\log \sum_k p_k\frac{X_k}{p_k}] \geq \sum_k p_k \text{E}_q[\log \frac{X_k}{p_k}]$
$\Rightarrow \text{E}_q [\log \sum_k X_k ] \geq \sum_k p_k \text{E}_q[\log X_k]- p_k\log p_k$

If we want a tight lower bound we should use Lagrange multipliers to choose the set of $p_k$ that maximize the lower-bound given that they should sum to 1.

$L(p_1,\ldots,p_K) = \left(\sum_k p_k \text{E}_q[\log X_k]- p_k\log p_k\right)+\lambda \left(1-\sum_k p_k\right)$
$\frac{\partial L}{\partial p_k} =\text{E}_q[\log X_k]-\log p_k-1-\lambda = 0$
$\frac{\partial L}{\partial \lambda} =1-\sum_k p_k = 0$
$\Rightarrow \sum_k p_k = 1$
$\Rightarrow\text{E}_q[\log X_k]=\log p_k+1+\lambda$
$\Rightarrow\text{E}_q[\log X_k]=\log p_k+1+\lambda$
$\Rightarrow \exp\text{E}_q[\log X_k]=p_k \exp(1+\lambda)$
$\Rightarrow \sum_k \exp\text{E}_q[\log X_k]=\exp(1+\lambda)\underbrace{\sum_k p_k}_{=1}$
$\Rightarrow p_k=\frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_k \exp \{\text{E}_q[\log X_k]\}}$

The final formula for $p_k$ is exactly the same that we can find for the parameters of the the Multinomial distribution of the auxiliary variables in a Poisson model with rate parameter as sum of Gamma distributed latent variables. Also using this optimal $p_k$ we can show a tight bound without the auxiliary variables.

$\text{E}_q [\log \sum_k X_k ] \geq \sum_k \frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_j \exp \{\text{E}_q[\log X_j]\}}\text{E}_q[\log X_k]- \frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_j \exp \{\text{E}_q[\log X_j]\}}\log \frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_j \exp \{\text{E}_q[\log X_j]\}}$
$= \sum_k \frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_j \exp \{\text{E}_q[\log X_j]\}} \log \sum_j \exp \{\text{E}_q[\log X_j]\}$
$= \log \sum_j \exp \{\text{E}_q[\log X_j]\} \underbrace{ \sum_k \frac{\exp \{\text{E}_q[\log X_k]\}}{\sum_j \exp \{\text{E}_q[\log X_j]\}} }_{=1}$
This results in:
$\text{E}_q [\log \sum_k X_k ] \geq \log \sum_k \exp \{\text{E}_q[\log X_k]\}$

## Paper accepted at European Conference on Machine Learning (ECML-PKDD) 2017

We have a paper accepted at ECML-PKDD 2017: “Content-Based Social Recommendation with Poisson Matrix Factorization” (Eliezer de Souza da Silva, Helge Langseth and Heri Ramampiaro). This is our first full paper resulting from our research on Poisson factorization and integration of multiple sources of information in a single recommendation model. If you have interest on the paper please email me and I will be happy to discuss.

Also, I am uploading the supplement of the paper here (you can find it also on my publications page)

Supplementary material for: “Content-Based Social
Recommendation with Poisson Matrix Factorization”

## Hidden Markov Models (part II): forward-backward algorithm for marginal conditional probability of the states

(in the same series HMM (part I): recurrence equations for filtering and prediction)

Consider a Hidden Markov Model (HMM) with hidden states $x_t$ (for $t \in {1, 2, \cdots, T}$), initial probability $p(x_1)$, observed states $y_t$, transition probability $p(x_t|x_{t-1})$ and observation model $p(y_t|x_t)$. This model can be factorized as
$p(x_{1:T},y_{1:T}) = p(y_1|x_1)p(x_1)\prod_{t=2}^{t=T}p(y_t|x_t)p(x_t|x_{t-1})$. We will use the notation $X=x_{1:T}$ to represent the set $X=\{x_1,x_2,\cdots,x_T\}$.
In this post we will present the details of the method to find the smoothing distribution $p(x_t|y_{1:T})$ of a HMM, given a set of observations $y_{1:T}$:
Our starting point is the marginal probability $p(x_t|y_{1:T})$ of $x_t$ given all the observations $y_{1:T}$.

\begin{aligned} p(x_t|y_{1:T}) &= \frac{p(x_t,y_{1:T})}{p(y_{1:T})} \\ &= \frac{p(x_t,y_{1:t},y_{(t+1):T})}{p(y_{1:T})}\\ &= \underbrace{p(y_{(t+1):T}|x_t)}_{\beta_t(x_t)}\underbrace{p(x_t,y_{1:t})}_{\alpha_t(x_t)}\frac{1}{p(y_{1:T})} \\ &= \frac{\alpha_t(x_t) \beta_t(x_t)}{p(y_{1:T})} \end{aligned}