ABS16 - 2016 Applied Bayesian Statistics School

BAYES, BIG DATA, AND THE INTERNET

Villa del Grumello, Como, Italy

29th August - 2nd September, 2016





Grumello







ABS SCHOOLS

The Applied Bayesian Statistics summer school has been running since 2004. From 2012 it is organized by

  • IMATI CNR
    Istituto di Matematica Applicata e Tecnologie Informatiche, Consiglio Nazionale delle Ricerche
  • Dipartimento di Scienze Statistiche
    Università Cattolica, Milano
  • Since 2014 the school is organized in cooperation with  Centro di Cultura Scientifica "Alessandro Volta"

    The school aims to present state-of-the-art Bayesian applications, inviting leading experts in their field. Each year a different topic is chosen. Past editions were devoted to Gene Expression Genomics, Decision Modelling in Health Care, Spatial Data in Environmental and Health Sciences, Bayesian Methods and Econometrics, Bayesian Decision Problems in Biostatistics and Clinical Trials, Bayesian Methodology for Clustering, Classification and Categorical Data Analysis, Bayesian Machine Learning with Biomedical Applications, Hierarchical Modeling for Environmental Processes, Stochastic Modelling for Systems Biology, Bayesian Methods for Variable Selection with Applications to High-Dimensional Data and Applied Bayesian Nonparametrics, Modern Bayesian Methods and Computing for the Social Sciences.


    TOPIC AND LECTURER

    The topic chosen for the 2016 school is  Bayes, Big Data, and the Internet.  The lecturer is:

    Dr Steve Scott,
    Director of Statistics Research Google, USA  


    He will be assisted by Ilaria Bianchini (Politecnico di Milano, Italy).


    COURSE OUTLINE


    Day 1: One of the main applications of statistics at internet companies is to A/B testing (e.g. to determine which version of a web site is most effective).  Standard statistical methods for A/B testing using static experimental design is typically dominated (in terms of cost) by sequential methods based on Thompson sampling.  The first day of the course uses Thompson sampling to motivate standard low dimensional statistical modeling such as beta-binomial, Poisson-gamma and normal-normal models.  We will review Monte Carlo method for computing with Bayesian models, including Gibbs sampling, Metropolis-Hastings, and slice sampling.

    Day 2: Very quickly after implementing an experiment, one discovers that more than one factor needs to be tested (e.g. you need to test whether the button should be red or blue, AND whether it should be located on the left or the right of the page).  You also need to determine whether the optimal design varies according to various contextual factors (e.g. whether the page is shown on the weekend or on a week day).  These factors can be handled within the Thompson sampling framework by extending the reward distribution to various generalized linear models.  The second day of the course will discuss Bayesian regression in linear and generalized linear models using data augmentation.  Probit, logit, Poisson, and student T models will be covered.

    Day 3: If the number of factors to be tested (or number of contexts to be controlled for) is very large, then sparse models must be considered.  Sparsity can be introduced into a Bayesian model using a "spike-and-slab" prior that places some prior mass at zero for each coefficient in a linear model.  The data can then move posterior mass between the "spike" (at zero) and "slab" (nonzero) portions of the posterior distribution.

    Day 4: One valuable aspect of internet data is that it occurs in nearly real time, while many official statistics are released as
    monthly or quarterly time series.  One way to model these data is with dynamic linear models. These models can combine time series model
    (capturing the target series' past behavior) with a sparse regression component (capturing the impact of contemporaneous internet data).  We will work on examples using economic data from FRED (the St. Louis Federal Reserve Economic Database) and data from Google trends.

    Day 5: we will discuss methods that can be applied when facing "big" data that must be distributed across several machines.



        INTENDED PARTICIPANTS AND PREREQUISITES

    This course is intended for students with little or no background in Bayesian statistics, who would like to use applied Bayesian methods. Students should have a basic familiarity with R, and some elementary knowledge of probability (so we can talk about "binomial" models and "gamma" priors).  Some background in linear models will be helpful but is not strictly necessary.
    reading
    The course will not follow a specific text.  Several good textbooks on Bayesian inference include.
    * Bayesian Data Analysis (Gelman, Carlin, Stern, Dunson, Vehtari, Rubin)
    * Bayesian Methods and Marketing (Rossi, Allenby, McCulloch)
    * Bayesian Computation with R (Albert)
    Useful Papers:
    * George and McCulloch (1997, Statistica Sinica) "Approaches for Bayesian Variable Selection"
    * Scott and Varian (2013) Predicting the present with Bayesian Structural Time Series
    * Scott (2010) A Modern Bayesian Look at the Multi-Armed Bandit
    * Bayes and Big Data: The consensus Monte Carlo algorithm


    SOFTWARE

    We will use the following R packages (available from CRAN)
    * BoomSpikeSlab
    * bsts


    LOCATION AND SCHEDULE

    The 2016 school will be held at Villa del Grumello, a magnificent villa located in the city of Como, along the Lake Como shoreline.

    Please note that the number of available places is limited.

    The school will start on Monday,  August , 29th,  and it will end on Friday, September, 2th
    .




    INFORMATION