I will slowly restart blogging a bit about my past year experience that included visiting Prof. Arto Klami group at the University of Helsinki and a research internship at Curious AI, working under the guidance of Mathias Berglund and Harri Valpola. The first part of my stay resulted in an interesting research direction exploring how to use the prior-predictive distribution to obtain direct relationships between moments of the data (if generated by the model being specified) and hyperparameters of the model. I will discuss this further, but for now, I leave the abstract and link to the preprint.
Abstract: Hyperparameter optimization for machine learning models is typically carried out by some sort of cross-validation procedure or global optimization, both of which require running the learning algorithm numerous times. We show that for Bayesian hierarchical models there is an appealing alternative that allows selecting good hyperparameters without learning the model parameters during the process at all, facilitated by the prior predictive distribution that marginalizes out the model parameters. We propose an approach that matches suitable statistics of the prior predictive distribution with ones provided by an expert and apply the general concept for matrix factorization models. For some Poisson matrix factorization models we can analytically obtain exact hyperparameters, including the number of factors, and for more complex models we propose a model-independent optimization procedure.
Our paper «Time is of the essence: A joint Hierarchical RNN and Point Process model for time and item predictions» has been accepted at 12th ACM International Conference on Web Search and Data Mining (WSDM). Collaborative work with Bjørnar Vassøy, Massimiliano Ruocco and Erlend Aune. WSDM is one of the top conferences in the domain of data mining, information retrieval and machine learning on the Web. This year WSDM had 511 submissions with an acceptance rate of 16%. Soon we will provide a link to the preprint and source-code.
In this paper, we have proposed a joint model with a shared latent representation for a Point Process model (for time prediction) and a Hierarchical Recurrent Neural Network (HRNN). By doing so we are able to model a multi-session recommendation problem, together with returning time prediction.
This work was developed as part of the Norwegian Open AI Lab in cooperation with Telenor Research.
Looking forward to visiting Melbourne again in the summer!