I will slowly restart blogging a bit about my past year experience that included visiting Prof. Arto Klami group at the University of Helsinki and a research internship at Curious AI, working under the guidance of Mathias Berglund and Harri Valpola. The first part of my stay resulted in an interesting research direction exploring how to use the prior-predictive distribution to obtain direct relationships between moments of the data (if generated by the model being specified) and hyperparameters of the model. I will discuss this further, but for now, I leave the abstract and link to the preprint.
Abstract: Hyperparameter optimization for machine learning models is typically carried out by some sort of cross-validation procedure or global optimization, both of which require running the learning algorithm numerous times. We show that for Bayesian hierarchical models there is an appealing alternative that allows selecting good hyperparameters without learning the model parameters during the process at all, facilitated by the prior predictive distribution that marginalizes out the model parameters. We propose an approach that matches suitable statistics of the prior predictive distribution with ones provided by an expert and apply the general concept for matrix factorization models. For some Poisson matrix factorization models we can analytically obtain exact hyperparameters, including the number of factors, and for more complex models we propose a model-independent optimization procedure.