In this work, we develop a theoretical and algorithmic methodology based on the prior predictive distribution that can be helpful in the specification and choice of prior parameters **before** training the model or doing posterior inference, focused on the matrix factorization case. This prior predictive analysis of matrix factorization allows us to obtain closed-form equations for the dimension of the latent space, as well as relations with the mean and variance of the prior. It works for different choices of prior and likelihoods. Having a closed-form equation for the number of dimensions in such a case has been an open problem, and now we have a solution.
Furthermore, we developed a gradient-based optimization procedure that seeks to find prior parameters to match “virtual statistics” (moments induced by sampling and marginalizing the latent variable from the generative model before fitting any data) with fixed values.
This is the last chapter of my thesis, developed during the research visit to the University of Helsinki, hosted by Arto Klami’s Multi-Source Probabilistic Inference group. I am very thankful to my supervisors Helge Langseth and Heri Ramampiaro for their support in developing my research, to António Góis for the helpful discussions, and to Alina Karakanta for the detailed feedback on the paper.
Multiple interactions loop when considering ML systems, individuals, and societies.
[This post accompanies a talk I gave to a general audience at NTNU in 05.Feb.2019. I added some extra references and pointers. It is mostly pointing to some general trends in the field without going into many details.]
We are living in an era of growing interest in Artificial Intelligence and Machine Learning. This surge has been mostly driven by emerging computational power, clever designed and scalable algorithms, data availability, and new models capable of delivering systems with human-level capabilities* in a set of complex tasks in natural language processing, computer vision, automatic control, and automatic decision making. As researchers, students, and industrials we are attracted to these ideas because of the big questions around intelligence (including our own, as humans) and the potential of unlocking new capabilities and positive societal, economical, and individual opportunities. As we navigate the multiplex of ideas and trends around the question of intelligence we hope to unfold and connect some components that can be crucial for human flourishment and scientific progress.
As our body of knowledge moved from conceptual and speculative arenas towards empirical and testable theories, systems and models, we see the emergence of a new engineering discipline — a human-centric engineering discipline**. As an emerging discipline, the state of the affairs today is a mix of connected components without a central all-encompassing theory, although feeding from many well-established disciplines including but not limited to control theory, information theory, statistics, optimization, computer science, neuroscience, economics, mathematics, logic, and philosophy. In this talk we intend to look very briefly at a single thread on this multiplex of ideas: machine understanding and prediction of human behavior (group and individually) and the interactions between humans, machines, and the larger environment.
Traditional recommender systems seek to estimate a function of rating for each pair of users and item, based on previous items already rated by the user (or other similar user ratings, the content of the item, or contextual information) — it is generally stated as a problem of matrix factorization and completion. The classical approach of matrix factorization for collaborative filtering consists in assuming factors for rows and columns of the matrix (user and items), using past entries as training dataset, and inferring what factors are the best to predict the entries of the matrix — a prediction problem. The rating function learned from the data will predict users’ interest in any given unseen item.
Nevertheless, by deploying predictive recommender systems we are more and more influencing also how the user will behave, dynamically influencing the behavior that initially we wanted to just predict — a growing concern of this type of situation is being addressed by the emergent field of algorithmic bias, strategic behavior, and performative prediction***. The feedback loop is introduced by deploying predictions in the world, and the world changes with those predictions, which turns predictions into productions. The predictive model is now influencing how the data shift will happen, producing new datasets that deviate from its initial training datum. Questions of stability, convergence, and sensitivity are fundamental. Even the concept of uncertainty will have a certain twist: predictions that are supposed to model (and in some sense reduce) uncertainty, when considering the performativity of such systems, we could conclude that there is indeed an increase in uncertainty (if it is unstable). This poses the problem of finding predictors that anticipate such effects, leading to algorithms for learning and inference that take them into account and compensate accordingly.
The observation of the double direction of influence is significant and points to a similar observation done by Wiener and colleagues while being involved in the transdisciplinary research program that culminated in the new field of Cybernetics — the study of control and communication in the animal and the machine. At the same time that there is a process of pushing further the point of human agency in the human-machine systems, it has become more clear that is not a pure matter of substitution of human cognition, we should be aiming at augmenting human capabilities, and cooperative cognition between different types modalities of cognition. New human-in-the-loop systems have been developed recently where the cooperation of humans and machine cognition is taken as the basic feedback loop, as well as machine capabilities of learning with a few examples, semi-supervised and utilizing simulations.
As an open-ended and more speculative turn, the concept of purpose in the decision-making process is yet obscured and hidden by the designer choices of the loss function, regularization, dataset, and the existing meta-information for each data point. There are many theoretical and computational challenges that when we are looking at task-specific system is less obvious, nevertheless, it is necessary to address them properly when thinking in broader terms on the interplay of decision-making and inference and as we deploy more general-purpose system for the public.
Footnotes
* The idea of human-level capabilities is conditioned on what metrics we use, nevertheless independent of particular metrics, we have observed that current ML systems have a tendency to achieve above human-level performance to any metric that is proposed after a certain time.
** Michael I. Jordan in the article “Artificial Intelligence—The Revolution Hasn’t Happened Yet” discusses various aspects of modern ML/AI as part of an emerging field of human-centric engineering discipline, combining data, inference, and decision-making in large-scale societal systems.
*** Algorithmic confounding [1] in recommender systems happens when the system models users’ preferences without taking into account how the recommendations will lead to drift/change in the preferences; performative prediction as defined in Perdomo et al. (2020) [2] is defined in the context of supervised learning, with the acknowledgment that the deployment of a predictive system will change the outcome it is trying to predict.
I will slowly restart blogging a bit about my past year experience that included visiting Prof. Arto Klami group at the University of Helsinki and a research internship at Curious AI, working under the guidance of Mathias Berglund and Harri Valpola. The first part of my stay resulted in an interesting research direction exploring how to use the prior-predictive distribution to obtain direct relationships between moments of the data (if generated by the model being specified) and hyperparameters of the model. I will discuss this further, but for now, I leave the abstract and link to the preprint.
Abstract: Hyperparameter optimization for machine learning models is typically carried out by some sort of cross-validation procedure or global optimization, both of which require running the learning algorithm numerous times. We show that for Bayesian hierarchical models there is an appealing alternative that allows selecting good hyperparameters without learning the model parameters during the process at all, facilitated by the prior predictive distribution that marginalizes out the model parameters. We propose an approach that matches suitable statistics of the prior predictive distribution with ones provided by an expert and apply the general concept for matrix factorization models. For some Poisson matrix factorization models we can analytically obtain exact hyperparameters, including the number of factors, and for more complex models we propose a model-independent optimization procedure.
You are welcome to apply for the Nordic Probabilistic AI School (ProbAI) 2019 finding place on June 3-7 in Trondheim (Norway).
About the ProbAI 2019
The Nordic Probabilistic AI School (ProbAI) is a new annual event serving a state-of-the-art expertise in machine learning and artificial intelligence to the public, students, academia and industry.
Our objective is to bring an intermediate to advanced level summer school with a particular focus on probabilistic models and deep generative models, covering the topics of latent variable models, inference with sampling and variational approximations, and probabilistic programming and tools.
The intentionally small team of invited lecturers will cover a carefully designed curriculum. Through a tight cooperation between our lecturers, and through theoretical lectures and hands-on tutorials, we hope to provide a high quality continuous and consistent knowledge transfer.
In the next couple of months, I will be visiting Arto Klami‘s Multi-Source Probabilistic Inference group. Also, I am happy to I am working again close to my ex-colleague at NTNU and friend Tomasz. This next couple of months look promising and I am looking forward to finalizing all the open-threads that would lead my PhD thesis!
The ArXiv preprint to our paper introducing a joint Point process and Hierarchical RNN for item and time prediction is now available.
Time is of the Essence: a Joint Hierarchical RNN and Point Process Model for Time and Item Predictions
In recent years session-based recommendation has emerged as an increasingly applicable type of recommendation. As sessions consist of sequences of events, this type of recommendation is a natural fit for Recurrent Neural Networks (RNNs). Several additions have been proposed for extending such models in order to handle specific problems or data. Two such extensions are 1.) modeling of inter-session relations for catching long term dependencies over user sessions, and 2.) modeling temporal aspects of user-item interactions. The former allows the session-based recommendation to utilize extended session history and inter-session information when providing new recommendations. The latter has been used to both provide state-of-the-art predictions for when the user will return to the service and also for improving recommendations. In this work we combine these two extensions in a joint model for the tasks of recommendation and return-time prediction. The model consists of a Hierarchical RNN for the inter-session and intra-session items recommendation extended with a Point Process model for the time-gaps between the sessions. The experimental results indicate that the proposed model improves recommendations significantly on two datasets over a strong baseline, while simultaneously improving return-time predictions over a baseline return-time prediction model.
Our paper «Time is of the essence: A joint Hierarchical RNN and Point Process model for time and item predictions» has been accepted at 12th ACM International Conference on Web Search and Data Mining (WSDM). Collaborative work with Bjørnar Vassøy, Massimiliano Ruocco and Erlend Aune. WSDM is one of the top conferences in the domain of data mining, information retrieval and machine learning on the Web. This year WSDM had 511 submissions with an acceptance rate of 16%. Soon we will provide a link to the preprint and source-code.
In this paper, we have proposed a joint model with a shared latent representation for a Point Process model (for time prediction) and a Hierarchical Recurrent Neural Network (HRNN). By doing so we are able to model a multi-session recommendation problem, together with returning time prediction.
Our position paper called “Poisson Factorization Models for Spatiotemporal Retrieval”, joint work with Dirk Ahlers, got accepted at the 11th Workshop on Geographic Information Retrieval (GIR’17). In this work, we discuss some modelling ideas and possibilities for advancing spatiotemporal retrieval using Poisson factorization models, especially in scenarios where we have multiple sources of count or implicit spatiotemporal user data. Unfortunately, I will not be able to attend the workshop (but Dirk will be there), because I am now in Melbourne, Australia, and will stay here for 3 months, participating as visiting graduate student in a project with the IR group at RMIT. In particular, I will be working with Dr Yongli Ren and Prof Mark Sanderson, developing joint probabilistic models for spatiotemporal user data for indoor spaces recommendations (they have a very interesting dataset that I am curious to explore). Hopefully, in the next couple of months, I will continue working on nice probabilistic models for recommender system, but incorporating many new and interesting ideas related to location and time.
ECML-PKDD 2017 was very pleasant and nice. Skopje was an unexpected surprise. I am happy with each new conference that I attend, always meeting new people doing very good research. The community there was very nice in general!
I presented my paper at Matrix and Tensor Factorization session, and I was particularly happy with that, because even though the application we are working is recommender systems, we are focusing on the methods and proposing new factorization methods and models. Later in the night, we had the poster (poster-ecml2017) session at the Macedonian Opera & Ballet and afterward headed to the wine festival, just outside.
For those interested, my presentation slides here:
This semester I will be advising some master students on their final project. At this point, they don’t select a specific topic but should look into a given area to find specific research question and some of them will definitely work on Deep Learning and Recommender Systems. Especially because we (the NTNU-AILab group) had a very nice experience last year where one master student doing work on RNN for session-based recommendation managed to have a work accepted at DLRS 2017. So, I decided to make a small selection of the papers related to this topic, focusing on WSDM, WWW, KDD, CIKM, RecSys, ICLR, DLRS and some other specific conferences in the last three years (2015,2016 and 2017). The result is a list of 45 papers, with many distinct ideas, but also some common threads (Matrix Factorization to CNN or LSTM, Session-based methods using RNN, etc). We will not discuss the different ideas, but I will just post the link here because some people might be interested in that.
Our paper «Time is of the essence: A joint Hierarchical RNN and Point Process model for time and item predictions» has been accepted at 12th ACM International Conference on Web Search and Data Mining (WSDM). Collaborative work with Bjørnar Vassøy, Massimiliano Ruocco and Erlend Aune. WSDM is one of the top conferences in the domain of data mining, information retrieval and machine learning on the Web. This year WSDM had 511 submissions with an acceptance rate of 16%. Soon we will provide a link to the preprint and source-code.
In this paper, we have proposed a joint model with a shared latent representation for a Point Process model (for time prediction) and a Hierarchical Recurrent Neural Network (HRNN). By doing so we are able to model a multi-session recommendation problem, together with returning time prediction.
This work was developed as part of the Norwegian Open AI Lab in cooperation with Telenor Research.
Looking forward to visiting Melbourne again in the summer!