BISP6
Sixth Workshop on
BAYESIAN INFERENCE IN STOCHASTIC PROCESSES
Accademia Cusano, Bressanone/Brixen (BZ), Italy
June, 18-20, 2009
Joint parameter estimation in bottlenecked populations using Approximate
Bayesian Computation
I will present an approach to jointly estimating demographic and population
genetic parameters in a structured population with well known history. The
approach is based on neutral genetic variation, Approximate Bayesian
Computation (ABC) and a new simulation program called SimBoP (Simulate
Bottlenecked Populations). I focus on the case where several demes were
established in multiple founder events from one original gene pool and
post-bottleneck population sizes are known. The reintroduction of Alpine
ibex (Capra ibex ibex) into the Swiss Alps will serve as an example for the
application. I will present estimates of migration rates, theta (4 * effective
size of the founder pool * mutation rates), and the proportion of males
getting access to females during reproduction. Inference is made difficult by
a potentially large number of summary statistics. This raises issues related
to redundancy, sufficiency and a 'good' choice of summary statistics. My
current work focusses on these aspects and I hope to include results in my
presentation. Specifically, I will report on the application of boosting
for choosing and weighting summary statistics.
Tevfik Aktekin and Refik Soyer
|
Bayesian Queuing Models for Call Centers
Queuing models have been extensively used in call center analysis for obtaining
performance measures and for developing staffing policies. However, almost all
of this work have been from a pure probabilistic point of view and have not
addressed issues of statistical inference. In this paper, we develop Bayesian
analysis of call center queuing models. We consider models with both patient
and impatient customers and discuss their further extensions. We discuss details
of Bayesian inference for queues with abandonment such as M/M/s+M models (also
referred to as the Erlang-A) and present relevant operating characteristics.
We illustrate implementation of the Bayesian models using actual arrival,
service, and abandonment data from call centers.
A Multivariate Prediction of Spatial Process with Non-Stationary Covariance
and its Application in Mapping Nen-Methane Carbohydrate Exposure
Multivariate prediction of spatial process with covariance is a fundamental
process in Environmetrics. The study of levels of air pollutants is important
for understanding and improving air quality in major urban areas. This research
aims to handle the prediction in a Bayesian framework for NCH4 (Non-Methane)
pollutant for the State of Kuwait where records of six monitor stations located
in different site are observed successive time points. We will implement a
hierarchical Bayesian approach assuming Gaussian random field technique that
allows us to pool the data from different sites in predicting the exposure of
the Non-Methane hydrocarbon in different regions of Kuwait.
Lucie Buresova, Ondrej Majek, Jan Danes, Helena Bartonkova, Miroslava
Skovajsova and Ladislav Dusek
|
Bayesian Estimation of Mean Sojourn Time and Sensitivity in the Czech
Organized Mammography Screening Programme
Organized breast cancer screening programme in the Czech Republic was
initiated in September 2002. Free biennial preventive mammography examinations
are offered to women aged 45-69. By year 2007 1,067,836 women were screened
(more than 50% of the target population) in 1,611,582 examinations. A total of
7,835 cases of breast cancer were detected.
Important parameters in assessing the quality of the programme and natural
history of breast cancer are: mean duration of the preclinical
screen-detectable phase (the carcinoma is without clinical signs but it could
be found by the screening test - mammography), which is called mean sojourn
time, and sensitivity of mammography (capability of the test to detect cancer).
These parameters are not directly observable; however, they can be estimated
using mathematical models.
Both parameters were estimated for different age groups of women in target
population. Simple tree-state Markov model with states: disease free -
preclinical screen-detectable - clinical disease was developed and utilized for
estimation. Analysis was performed using WinBUGS programme.
A Variational Bayesian Approach to Linear Gaussian State-Space Models
We present a Bayesian treatment of the Mixtures of Linear Gaussian
State-Space Space Models (LGSSMs) and the switching LGSSM for time-series
clustering and segmentation. In our approach, model structure selection (i.e.
the choice of the number of mixture components) can be achieved within the
model by defining prior distributions that enforce a sparse parametrization.
This avoids the need to train and compare several separate models, as required
when criteria such the BIC or AIC are used.
To deal with model intractability we introduce a variational approximation,
where the difficult issue of performing inference on the hidden variables is
addressed by reformulating the problem such that any method developed for the
(non-Bayesian) LGSSM and switching LGSSM can be used.
We present an application of the Mixtures of LGSSMs to robot imitation,
where we investigate the identification and dynamics learning of the different
strategies underlying a set of human executions of the ball-in-a-cup game of
dexterity. We show that our approach yields a generative model of each strategy
which works well in the execution of this complex task on a simulated
anthropomorphic SARCOS arm.
Nicolas Chopin, Judith Rousseau and Brunero Liseo
|
Bayesian nonparametric estimation of a long-memory Gaussian process:
computational aspects
In Rousseau et al. (2008), we proposed a novel method for the Bayesian
nonparametric estimation of the spectral density of a Gaussian long-memory
process, based on the true likelihood rather than on Whittle's
approximation, which is not valid in long memory settings. We also
established the convergence properties (consistency, rates of convergence)
of the corresponding estimates.
In this paper, we discuss the computational implementation of this
procedure.
The two main computational challenges are (a) the likelihood function
involves the inverse of a possibly big Toeplitz matrix, and (b) the
posterior distribution is trans-dimensional. With respect to (a), we
consider a simple approximation of the likelihood, which may be used either
directly or as a tool for building an importance sampling proposal. With
respect to (b), we compare different methods, focussing on
population/sequential monte carlo methods. In particular, we explain how the
likelihood function may be computed recursively, which makes the use of
sequential Monte Carlo particularly interesting, especially if the dataset
is large.
Amelie Crepet and Jessica Tressou
|
Nonparametric Bayesian model to cluster co-exposure to
pesticides found in the French diet
This work introduces a specific application of the Bayesian nonparametric
methodology in the food risk analysis framework. Namely, the joint
distribution of the exposures to a large number of pesticides is assessed from
the available consumption data and contamination analyses. We propose to
model the exposures by a mixture of Dirichlet processes so as to determine
clusters of pesticides jointly present in the diet at high doses. The goal of
this analysis is to give directions for future toxicological experiments for
studying possible combined effects of multiple pesticide residues simultaneously
present in the diet. Two approaches are compared: the exposures to each
pesticide are either linked together in a hierarchical Dirichlet process mixture
based on a univariate Gaussian kernel, or they are assumed to arise from a
multivariate Gaussian kernel in a classical Dirichlet process mixture. In both
cases, posterior distributions are computed through a Gibbs sampler based on
stick-breaking priors. Finally, the clustering among individuals also obtained
as an auxiliary output of these analyses is discussed in a risk management
perspective.
Andrea Duggento, Dmittri G. Luchinsky, Vadim N. Smelyanskiy
and Peter V.E. McClintock
|
Bayesian framework for fast dynamical inference of
multidimensional nonlinear nonstationary time series data
We consider the long standing problem of how to reconstruct
models and their parameters from the signals emanating from
multi-dimensional dynamical systems. Usually, one wishes to
minimise the number of parameters to reduce computational
costs. On the other hand, realistic model reconstruction tends
to require increased numbers of adjustable parameters.
To tackle this problem, we introduce a Bayesian technique that
allows real-time evaluation of parameters; where there is a
large number of parameters, we use fast algebraical methods to
compute the posterior density; for those few parameters that
cannot be included in a suitable factorisation we obtain the
corresponding marginal distribution by application of Markov
Chain Monte Carlo (MCMC) methods.
Our Bayesian algorithm enables us to infer parameters under very
general conditions and in the presence of an arbitrary, highly
nonlinear, velocity field that drives the dynamics. We find
that, in same cases, the hypothesis of a stationary signal can
be relaxed, and time-varying parameters can then be inferred.
As an example, we consider a multi-dimensional system
consisting of N coupled FitzHugh-Nagumo oscillators. The
dynamics is globally mixed by an unknown "measurement matrix".
In our example, the noise matrix, the evolution of the
parameters, hidden dynamics, and the measurement matrix can all
be inferred. We show that our procedure is fast, despite the
high dimensionality of the problem.
References:
Phys. Rev. E, vol. 77, 061105 (2008)
Phys. Rev. E, vol. 77, 061106 (2008)
Colin S. Gillespie and Andrew Golightly
|
Bayesian inference for generalized stochastic population growth
models with application to aphids
A field study was conducted on the population numbers of cotton aphids
(Aphis gossypii Glover). The study consisted of three irrigation levels
(Low, Medium and High), three nitrogen fertility treatments (blanket
nitrogen, variable nitrogen and no nitrogen) and three field blocks. At
five weekly intervals the numbers of aphids were counted at each
treatment combination. This gives a total of twenty-seven data sets.
This paper explores parameter inference for a stochastic population
growth model of aphids. It is believed that the death rate of the aphid
population depends on the unobserved cumulative population size, whilst
the birth depends on the current population size. The aim of this study
is too investigate how the treatment effects manifest themselves within
the birth and death rates. Once interactions effects are considered,
this involves fitting thirty-six parameters and estimating the
unobserved cumulative aphid population. Markov chain Monte Carlo
methods, coupled with a moment closure approximation are used to
integrate over uncertainty associated with the unobserved component and
estimate parameters. We highlight that blocking effects and interaction
terms play a crucial role in understanding aphid dynamics.
Flavio B. Goncalves and Gareth O. Roberts
|
Bayesian Inference for Jump-Diffusion Processes
This work proposes a Bayesian method for inference in discretely observed
jump-diffusion processes. The method is based on an MCMC algorithm where
a Markov chain that has the full posterior distribution of the parameters
of the process as its equilibrium distribution is constructed. The most
challenging step of the MCMC algorithm is to sample from the jump-diffusion
conditional on the observations. Such step is performed using the Conditional
Jump Exact Algorithm which is also proposed in this work. The algorithm
draws exact samples from the conditional jump-diffusion via retrospective
rejection sampling.
Brett Houlding, Arnab Bhattacharya and Simon P. Wilson.
|
Bayesian Spatial-Temporal Modelling of Indoor Pico-Cell Radio
Environments with Implications for Wireless Network Management
Recent advances in cognitive radio technology allow radio devices to access
spectrum in a dynamic manner, permitting such devices to respond to changes
in the radio environment and switch to under-used frequencies. This motivates
research into the statistical spatial-temporal-frequency modelling of radio
wave propagation so as to establish and develop fair and effective frequency
access etiquettes that ensure a pre-specified performance guarantee. Such
statistical inference on the prevailing radio environment must also be
efficiently performed in real or near-real time in order to provide a high
level approximation of current frequency usage and ambient noise level.
This work presents the initial steps in developing a full and fast Bayesian
spatial-temporal-frequency model of radio wave propagation. Indoor Pico-Cell
data collected by the Center for Telecommunications Value Chain Research is
used to develop a Bayesian Model of the radio environment and this in turn is
used for policy development of the network. The ultimate aim is to use such
a statistical model and frequency access decision rule to provide a proof of
concept for the liberalization of regulations concerning radio frequency
usage.
Maria Kalli and Stephen G. Walker
|
A Bayesian semiparametric model for density estimation in Financial
Econometrics
The aim of this paper is density estimation of asset returns series;
specifically daily stock returns and daily equity index returns. A flexible
mixture model is developed to capture the empirical features of financial
asset returns: heavy tails, slight skewness and volatility clustering. These
modest features have proven difficult to capture accurately using parametric
models.
A Bayesian nonparametric method is used to generate random distributions
that are unimodal and asymmetric; volatility is parametrically embedded in
this set up. This allows the density of asset returns to be estimated with time
varying volatility. Posterior inference is necessarily completed via the
application
of Markov chain Monte Carlo methods and use of a slice sampling algorithm.
Our proposed model is applied to the daily returns of the S&P500 index
and the Cyprus stock exchange main equity index. Our results are compared
to stochastic volatility models and other Bayesian nonparametric models.
Theodore Kypraios, Philip D. O'Neill and Ben Cooper
|
Bayesian Inference and Model Choice for Nonlinear Stochastic Processes
Applied to Hospital Infections
High-profile hospital "superbugs" such as methicillin-resistant Staphylococcus
aureus (MRSA) etc. have a major impact on healthcare within the UK and
elsewhere. Despite enormous research attention, many basic questions concerning
the spread of such pathogens remain unanswered. For instance what value do
specific control measures such as isolation have? How the spread in the ward
is related to ``colonisation pressure``? What role do the antibiotics play?
How useful it is to have new molecular rapid tests instead of conventional
culture-based swab tests?
A wide range of biologically-meaningful stochastic transmission models that
overcome unrealistic assumptions of methods which have been previously used
in the literature are constructed, in order to address specific scientific
hypotheses of interest using detailed data from hospital studies. Efficient
Markov Chain Monte Carlo (MCMC) algorithms are developed to draw Bayesian
inference for the parameters which govern transmission. The extent to which
the data support specific scientific hypotheses is investigated by considering
and comparing different models under a Bayesian framework by employing a
trans-dimensional MCMC algorithm while a method of matching the within-model
prior distributions is discussed how to avoid miscalculation of the Bayes
Factors. Finally, the methodology is illustrated by analysing real data which
were obtained from a hospital in Boston.
Krzysztof Latuszynski, Witold Bednorz and Wojciech Niemiro
|
Nonasymptotic Confidence Intervals for Regenerative MCMC Algorithms
Abstract available
here
Calibration of stochastic models of bio-chemical reactions with Bayesian
inference methods: advantages and open questions
The estimation of parameter values is a bottleneck of the computational
analysis of biological systems. Modeling approaches are central in systems
biology, as they provide a rational framework to guide systematic strategies
for key issues in medicine as well as the pharmaceutical and biotechnological
industries. Inter- and intra-cellular processes require dynamic models
decorated the rate constants of the biochemical reactions. These kinetic
parameters are often not accessible directly through experiments. Therefore
methods that estimate rate constants with the maximum precision and accuracy
are needed. In particular new methods are needed for parameter estimation
in stochastic biochemical reaction. Chemical reaction rate are determined
based on the probability of collision and effective reaction of individual
molecules. In many cases when the importance of noise in the chemical
dynamics cannot be overlooked, and when small numbers of molecules react, the
time evolution of the number of molecules of the reactant species exhibit
a strong stochastic behavior. In these case, the dynamics can be suitably
described by a Langevin rate equation where the noise term is a Wiener process.
We present two new inference method we recently developed to estimate the
parameters of a stochastic models of chemical reaction in the form of
Stochastic Langevin Differential Equation: one based on maximum likelihood
approach and the other one based on Bayesian methods specifically tailored for
the estimation of chemical kinetic parameters. We compare and discuss the
two methods to highlight the benefits and the drawbacks of one with respect
to the other in computational biochemistry.
Chiara Mazzetta, Byron Morgan and Tim Coulson
|
Bayesian estimation of demographic trends and spatial dispersion of
closely monitored populations
We consider populations of wild animals that are closely monitored over time,
by being recaptured or simply resighted on multiple occasions within a year.
Recapture data are integrated with recovery data to estimate the age and time
structure of key demographic parameters. A seasonal time scale is used to
account for environmental effects on survival and for seasonal patterns in the
recaptures. Resighting data of a different group of individuals, within
the same population, are used to estimate the geographical dispersal over
a discrete set of locations. These two independent data sets are integrated
into a non-linear and non-normal state-space model so as to estimate
simultaneously spatial and temporal dynamics of the species of interest.
Parameters are estimated with MCMC methods within a fully Bayesian approach.
As an example we consider the Soay sheep population on the uninhabited
island of Hirta, Scotland, that has been closely monitored over the last 20
years. This species is particularly interesting as it represents the most
prehistoric form of domestic sheep and it has remained virtually unchanged for
thousands of years.
Christoph Pamminger, Sylvia Fruhwirth-Schnatter, Rudolf Winter-Ebmer
and Andrea Weber
|
Model-Based Clustering of Categorical Time Series Using Finite
Mixtures of Markov Chain Models Extended with a Logit Model for
Inclusion of Covariates
Two approaches for model-based clustering of categorical time series based
on time-homogeneous first-order Markov chains are presented. Furthermore we
discuss an extension based on a multinomial logit model for the group
membership to include also explanatory variables. Within the mixture model the
multinomial logit model is included as a logit-type prior model for group
membership. In the first clustering approach called Markov chain clustering the
individual transition probabilities are fixed to a group-specific transition
matrix. In the more general approach called Dirichlet multinomial clustering the
individual transition matrices deviate from the group means and follow
Dirichlet distributions with unknown group-specific hyperparameters.
Estimation is carried out by Markov chain Monte Carlo including an auxiliary
mixture sampler for the parameters of the multinomial logit model. An
application to a panel of Austrian wage mobility data leads to an interesting
segmentation of the Austrian labour market.
Annalisa Pascarella, Alberto Sorrentino, Cristina Campi and Michele
Piana
|
Random Finite Sets in particle filtering for the reconstruction of neural cur
rents in magnetoencephalography
Magnetoencephalography (MEG) [1] is a non-invasive brain imaging technique
measuring the weak magnetic field due to neural activity with excellent
temporal resolution (1 ms). The analysis of the temporal evolution of the
magnetic field, however, does not provide accurate spatial information about
the neural activations in the cerebral cortex. Such information can be
restored only by solving the inverse problem of reconstructing neural sources
from the dynamical measurements of the magnetic field.
In the dipolar approximation, the neural current is assumed to be the
superposition of a small number of point-wise currents ("dipoles") to be
detected and tracked from the external measurements. The forward model is
strongly non-linear in such a framework; furthermore, the number of sources
is unknown and may vary across time; finally, measurements are affected by
non-white noise coming from several noise sources.
We successfully applied a particle filter based on Random Finite Sets [2] for
tracking the time-varying number of sources from MEG data [3]. First the
number of sources is dynamically determined using the maximum a posteriori of
the marginal probability distribution of the cardinality; then the peaks of
the Probability Hypothesis Density in the brain volume are used as point
estimates of the neural source locations, and the source strengths are
determined through linear least-squares; finally, a clustering procedure is
applied to keep track of the source identities in time.
[1] Hari R, Hämäläinen M, Ilmoniemi R J, Knuutila J, Lounasmaa O V :
Magnetoencephalography - theory, instrumentation and applications to
noninvasive studies of the working human brain, Reviews of Modern Physics,
Vol.65, No.2, 1993
[2] Vihola, M., 2004. Random Sets for Multitarget Tracking and Data Fusion.
Licentiate Thesis, Tampere University of Technology.
[3] Sorrentino, A., Parkkonen, L., Pascarella, A., Campi, C., Piana, M.
Dynamical MEG source modeling with multi-target Bayesian filtering. Human
Brain Mapping (in press).
Alicia Quiros Carretero, Raquel Montes Diez and Simon
Wilson
|
Brain Activity Detection - Bayesian spatiotemporal model of
fMRI using Transfer Functions
We analyse functional Magnetic Resonance Imaging (fMRI) data to find
areas of brain activity. fMRI is a non-invasive technique for obtaining a
series of images over time under a certain stimulation paradigm and
regions of brain activity are detected by observing diferences in blood
magnetism due to hemodynamic response to this stimulus.
In this work we propose a Bayesian spatiotemporal model to analyse fMRI
studies. In the temporal dimension, we parameterise the hemodynamic
response function's shape with a transfer function model. In the spatial
dimension, we use Gaussian Markov random fields priors that embody our
prior knowledge that evoked responses are spatially contiguous and
locally homogeneous. These powerful tools provide a framework for
detecting active regions much as a neurologist might as they allow us to
consider the level of the voxel magnitudes along with the size of the
activated area.
Due to the model complexity, we use MCMC methods to make inference
over the unknown parameters. Simulations from the model are performed
in order to ascertain the performance of the sampling scheme and the
ability of the posterior to estimate model parameters. Results are shown
on synthetic data and on real data from a block-design fMRI experiment.
Alexandra Schmidt, Nancy L. Garcia and Ronaldo Dias
|
Bayesian inference for aggregated functional data
In this work we address the problem of estimating mean curves when
the available sample consists on aggregated functional data.
Consider a population divided into sub-populations for which one
wants to estimate the mean (typology) and covariance curves for each
sub-population c=1, ..., C. However, it is not possible (or too
expensive) to obtain sample curves for single individuals. The
available data are collective curves, that is sum of curves of
different subsets of individuals belonging to the sub-populations.
More specifically, replicates of these curves are available across
days of a week and observations are made for times t=1,...,T within
a day. Our model specifies that the observed data is decomposed as
the sum, over the C sub-populations, of latent structures which are
independent across sub-populations but temporally correlated. And
these latent structures are assumed to follow a Gaussian process
whose mean varies with the sub-population and evolve with time t as a
dynamic linear model. Inference procedure is performed following the
Bayesian paradigm. We apply our model to a real dataset composed of the
electric load of transformers which distributes energy to different
types of consumers.
Exact inference for discretely observed diffusions
The aim of this work is to make Bayesian inference for discretely observed
diffusions, the challenge being that the transition density of the process
is typically unavailable. Competing methods rely on augmenting the data with
the missing paths since there exists an analytical expression for the
complete-data likelihood. Such implementations require a rather fine
discretization of the imputed path leading to convergence issues and
computationally expensive algorithms.
Our method is based on exact simulation of diffusions (Beskos et al 2006)
and has the advantage that there is no discretization error. We present a
Gibbs sampler for sampling from the posterior distribution of the parameters
and discuss how to increase its efficiency using reparametrizations of the
augmentation scheme (Papaspiliopoulos et al 2007 ).
Anna Simoni and Jean-Pierre Florens
|
On the Regularization Power of the Prior Distribution in
Linear ill-Posed Inverse Problems
We consider a functional equation of type Y = Kx+U in an Hilbert space, where
both U and Y are gaussian processes. We wish to recover the functional parameter
of interest x after observing Y. This problem is ill-posed because the operator
K is assumed to be compact. We consider a class of models where the prior
distribution on x is able to correct the ill-posedness even for an infinite
dimensional problem. The prior distribution is a gaussian process. It must be
of the g-prior type and it depends on the regularization parameter and on the
degree of penalization. We prove that, under some conditions, the posterior
distribution is consistent in the sampling sense. In particular, the
prior-to-posterior transformation can be interpreted as a Tikhonov
regularization in the Hilbert scale induced by the prior covariance operator.
Finally, the regularization parameter may be treated as an hyperparameter
and may be estimated using its posterior distribution or integrated out.
David Suda and Paul Fearnhead
|
Importance Sampling On Discretely-Observed Diffusions
This study focuses on importance sampling methods for sampling from
conditioned diffusions. The fact that most diffusion processes have a
transition density which is unknown or intractable makes it desirable to
find an adequate alternative density to simulate from, hence making the
importance sampling approach worth considering. In this study, the first
aim will be that of using Bayes formula to derive the general stochastic
differential equation for a diffusion bridge. This can be used to design
efficient importance sampling proposals. Secondly, the performance of both
existent and newly-derived importance samplers shall be assessed on various
types of diffusions by means of Monte Carlo simulation, and a theoretical
explanation of the output shall be sought for each diffusion/sampler
combination.
Enrique ter Horst and Abel Rodriguez
|
Measuring expectations in options markets: an application to the SP500
Extracting market expectations has always been an important issue when making
national policies and investment decisions in financial markets. In option
markets, the most popular way has been to extract implied volatilities to
assess the future variability of the underlying with the use of the Black &
Scholes formula. In this manuscript, we propose a novel way to extract the
whole time varying distribution of the market implied asset prices from option
prices. We use a Bayesian nonparametric method that makes use of the
Sethuraman representation for Dirichlet processes to take into account the
evolution of distributions in time. As an illustration, we present the
analysis of options on the S&P500 index.
Svetlana Tishkovskaya, Paul Blackwell and Keith J. Harris
|
Bayesian Inference for Animal Movement Data in Heterogeneous Environment
The radio-tracking of animals and similar technologies are dominant and
increasingly popular tools used in the developing sciences of wildlife
management and ecology. They have become an important source of data on
movement, behaviour and habitat use and require developing statistical
methods to extract meaningful inferences from such data.
We describe stochastic models which aim to capture key features of
realistic patterns of animal movements. To represent the movement
process, a flexible class of models is used in which a diffusion process
can be combined with a discrete behavioural state. Given a discrete state
we consider the multidimensional Ornstein--Uhlenbeck diffusion
process taking into account non-independence of radio-tracking
measurements. Complex patterns of behaviour can be driven not only by
biological mechanisms but also by the heterogeneous environment of the animal.
Modelling spatial heterogeneity, we consider animal habitat as a partition
containinga finite set of heterogeneous regions and assume that boundaries
between the regions are unobserved and need to be estimated from the data
on movements.The approach to inference is Bayesian using MCMC techniques,
allowing us to estimate parameters of the movement process and reconstruct
the heterogeneous partition of the animal's environment.
Approximate Bayesian computation (ABC) gives exact results under
the assumption of model error
Approximate Bayesian computation (ABC) or likelihood-free inference algorithms
are used to find approximations to posterior distributions without making
explicit use of the likelihood function, depending instead on simulation of
sample data sets from the model. In this talk I will show that under the
assumption of the existence of a uniform additive model error term, ABC
algorithms give exact results when sufficient summaries are used. This
interpretation allows the approximation made in many previous application
papers to be understood, and should guide the choice of metric, tolerance
and summary statistics in future work. ABC algorithms can be generalized by
replacing the 0-1 cut-off with an acceptance probability that varies
with the distance of the simulated data from the observed data. The acceptance
density gives the distribution of the error term, enabling the uniform error
usually used to be replaced by a general distribution. This generalization can
also be applied to approximate Markov chain Monte Carlo algorithms. In light
of this work, ABC algorithms can be seen as calibration techniques for
implicit stochastic models, inferring parameter values in light of the
computer model, data, prior beliefs about the parameter values, and any
measurement or model errors.
Ole Winther and Ricardo Henao
|
Learning Graphical Model Structure with
Sparse Bayesian Factor Models and Process Priors
Abstract available
here