Conference on BAYESIAN NONPARAMETRICS

BELGIRATE (ITALY) June, 17-19th, 1997

ABSTRACTS




Elja Arjas and Juha Heikkinen

Nonparametric Bayesian intensity estimation from point pattern data

Until recently, nonparametric Bayesian estimation of intensities, or of distributions, was in practice confined to models based on the Dirichlet process or its extensions. The tractability of such inferential methods depended crucially on the analytic special properties of these models. With the advent of Markov chain Monte Carlo techniques in Bayesian estimation, and particularly after the introduction of variable dimensional models, it has become attractive and numerically feasible to employ nonparametric model approximations based on classes of simple functions, such as piecewise constant, piecewise linear, or splines. In particular, averaging with respect to the posterior produces a flexible class of smooth nonparametric function estimates. Extensions to a higher dimension, for example, by employing a Voronoi tessellation to support the approximating simple functions, can be considered with relative ease. In the talk, these ideas are discusses, and some concrete examples are considered.



Jose M. Bernardo

Bayesian Kernel Density Estimation

Given a set of exchangeable observations from an unknown distribution q(x), we define a family of kernel density functions , with , as

where is a parametric density function with location parameter . We then derive the corresponding reference posterior distribution , and use this to obtain a Bayesian kernel estimate of the unknown distribution as the corresponding reference predictive density

We discuss some simulated examples and gauge the corresponding results using the logarithmic divergence

as an appropriate loss function.




Cinzia Carota

Further Results with Dirichlet process priors

We consider a three-stages hierarchical model where the data, conditional on a random distribution function F, are independent and identically distributed according to F; F is a Dirichlet process with mean G; and G itself, in the more general case, is unknown and yielded by a Dirichlet process. Under these hypotheses we derive the posterior distribution of G and the predictive distribution of a future observation. The case when G is a known (up to real parameters) discrete distribution function is also considered in order to stress some inferential implications of the discreteness of the mean of a Dirichlet process. We provide two applications, to the evaluation of Bayes factors for model selection and to the estimation of a mean. Some examples based on real data are presented.



Persi Diaconis

Markov moment problems and de Finetti's theorem

We discuss Markov's elegant solution of the moment problem for bounded densities through work of Cifarelli and Regazzini, and the problem of giving finite conditions for the mixing measure in de Finetti's theorem to have a density.



Michael Escobar

Applications with Dirichlet Processes: Revisiting the Beckett-Diaconis Tack Data

In the last Bayesian Valencia meeting, Prof. Diaconis challenged the nonparametric bayesian community to fit a model to a data set containing the results of 320 tosses of a tack. Previous investigations of this data have appeared unsatisfactory. In addition to explaining the unusual results and descibing how to get the ``usual'' results, I hope to answer the question, ``When is nonparametric Bayesian analysis (with Dirichlet processes) not nonparametric?''



Sandra Fortini, Lucia Ladelli and Eugenio Regazzini

A Characterization of Exchangeable Laws

De Finetti characterizes sequences of exchangeable, -valued random variables in the following terms:
1.
is symmetric in ;
2.
=
.

In this comunication we extend the previous characterization to sequences of exchangeable, arbitrary random elements.

Moreover we use this extension to revisit well known characterizations of the laws of a Dirichlet process and of a Pólya tree process.




Alessandra Guglielmi

Numerical analysis for distribution functions of means of a Dirichlet process

In this paper, an approximation of the distribution function of the mean of a Dirichlet process on with parameter is derived, with respect to the sup metric. The approximating function, very simple to be numerically computed, is the distribution function of the mean of a Dirichlet process with parameter , where is a ``discretized'' version of .
Furthermore, a direct and simple procedure to derive the exact distribution function (found by Cifarelli and Regazzini in 1990) will be illustrated, starting from a characterization of a Dirichlet process given by Feigin and Tweedie in 1989.



Keisuke Hirano

Semiparametric Bayesian Models for Longitudinal Earnings Data

Longitudinal data on earnings are useful to study economic behavior in an intertemporal setting, since earnings risk plays an important role in theories of consumption, savings, and asset allocation. This paper studies data from the Michigan Panel Study of Income Dynamics, using simple models in which some components are allowed to be nonparametric. These components are modelled as having a mixture of normals representation, where the mixture distribution is given a Dirichlet process prior. Predictive distributions based on these models are compared to predictive distributions based on commonly used parametric models, such as autoregressive models with normally distributed random effects.



Nils Lid Hjort

Local Bayesian Regression

A wide class of Bayesian non- and semiparametric methods is developed for estimating regression curves and surfaces. The main idea is to model the regression as locally linear and then place suitable local priors on the local parameters. The method requires the posterior distribution of the local parameters given local data, and this is found via a suitably constructed local likelihood function. When the width of the local data window is large the methods reduce to familiar fully parametric Bayesian methods, and when the width is small the estimators are essentially nonparametric. When noninformative reference priors are used the resulting estimators coincide with recently developed well-performing local weighted least squares methods for nonparametric regression.
Each local prior distribution needs in general a centre parameter and a variance parameter. Of particular interest are versions of the scheme that are more or less automatic and objective in the sense that they do not require subjective specifications of prior parameters. We therefore develop empirical Bayes methods to obtain the variance parameter and a hierarchical Bayes method to account for uncertainty in the choice of centre parameter. There are several possible versions of the general programme, and some of its specialisations are discussed. It is hoped such special versions will be capable of outperforming standard nonparametric regression methods, particularly in situations with several covariates.



Pier Luigi Conti and Giovanna Jona Lasinio

Bayesian and Linear Bayesian Inference on Regression Functions using Orthonormal Expansions

In the usual setting of fixed-disegn non parametric regression, we find a Bayesian estimator of the regression function using its orthonormal expansion in a separable Banach space. We start considering Gaussian errors with known variance and choosing a Gaussian prior distribution for the coefficients in the series expansion, and then we find the prior and posterior process on the space F where the regression function (f) lives.
We study the trajectories of the posterior process and we give conditions on the basis functions to ensure the continuity of such trajectories. As an estimator of f we take the posterior process mean and we give an approximation procedure based on the Bayes factor.
Then we consider a second order process as prior measure on the functions space F and errors not necessarily normally distributed. In this setting we find the ``best quadratic approximation'' to the posterior process mean function, i.e. we find a linear Bayesian estimator of f.



Prakash Laud

Bayesian Analysis of Models for Survival Data

Implementation of full Bayesian inference for Cox's proportional hazards regression model is considered using (i) Hjort's beta process prior on the baseline cumulative hazard and (ii) the extended gamma process and its variants on the baseline hazard rate. Model selection and related issues arising in the analysis of survival data are discussed.



Michael Lavine

Polya trees: a review

Polya trees form a class of distributions for a random probability measure intermediate between Dirichlet processes (Ferguson 1973) and tailfree processes (Freedman 1963 and Fabius 1964). Their advantage over Dirichlet processes is that they can be constructed to give probability one to the set of continuous or absolutely continuous probability measures, while their advantage over more general tailfree processes is their much greater tractability. We will review the definition and construction of Polya trees and, if time permits, some of their recent applications.



Donald Malec and Peter Müller

Small Area Estimation of U.S. Health Outcomes Using Mixtures of Dirichlet Processes

For public health management, there is a need to produce subnational estimates of health outcomes. Often, however, funds are not available to collect samples large enough to produce traditional survey sample estimates for each subnational area. Although parametric hierarchical models have been successfully used to make estimates from small samples, there is a concern that the geographic diversity of the U.S. population may be oversimplified in these models. Here, a Dirichlet Process is used to describe the geographic variability component of the model. Results are compared to a parametric model which uses the base measure of the Dirichlet process as a fully parametric prior. Binary health outcomes are modeled. Differences and improvements are noted.



Maura Mezzetti and Paolo Giudici

Nonparametric estimation of survival function by means of partial exchangeability structure

In the analysis of survival data most of the statistical models relate the response to a set of explanatory variables. However, selection and proper design of the latter may become a difficult task, particularly in the preliminary stage, when the information is limited. We propose an alternative methodology which evaluates the relative importance of each potential prognostic factor.

To achieve this aim, each explanatory variable is associated to the partition of the observations induced by its levels. We then consider, conditionally on each partition, a hierarchical Bayesian model on the hazard rates. We then derive as a measure of the importance of each prognostic factor, the posterior probability of each partition. Such probabilities are finally employed to estimate the hazard functions by averaging the estimated conditional hazard over the set of all partitions.

More specifically, for a collection of individuals, , let be a random variable representing the failure time of subject i, possibly censored by , and let . In order to better illustrate and interpretate our proposed methodology, we shall consider a collection of discrete failure times: , as in Hjort (1990) and let the individual hazard at time be the following

Given a collection of observed discrete times, the likelihood of the cumulative hazard functions is:

 

where , and .

Let now g indicate a partition of the index set with subsets , for . In other words, g defines a partition of the subjects, assuming that the hazards of the individuals in the same subset, , are equal.

The consideration of a partition of the individuals leads to a new likelihood for , which is conditional upon g:

 

where is the number of failures in the interval for group j, is the number of subjects at risk in the interval for group j.

In our exploratory approach, we shall entertain several partition structures. Each partition corresponds to a specific stratification of a potential prognostic factor. This amounts to consider a collection of alternative partial exchangeability structures for the survival times, as suggested in the context of binomial studies by Consonni and Veronese (1995).

Our first aim is to evaluate the importance of each prognostic factor. This can be achieved calculating, given the observed evidence D, the posterior probability of each partition, p(g|D). Our second aim is to estimate the hazard function, in order to make predictions on survival times. This task can be performed in two steps: first we work conditionally on a partition g, and determine a Bayesian estimate of each individual hazard, by calculating the posterior means . The second step of the estimation procedure involves using p(g|D) to calculate the marginal posterior expectation of each individual hazard via the law of total probabilities:

 

In order to perform the above computation two alternative prior specifications on will be considered. First we take, conditionally on a partition g, each as a stochastic process, with independent summands, each of which is distributed as follows:

 

for and for and with and positive constants.

We then consider a Markovian dependency structure among the hazards in different time intervals, similarly to what suggested in Arjas (1994). This prior specification has the advantage of requiring a fewer number of hyperparameters and to allow borrowing strength between time intervals, thus making the cumulative hazards in marginally dependent. The proposed class of prior distribution corresponds to consider the hazard increments as exchangeable.

The two resulting methodologies are applied and compared, using appropriate MCMC method on both simulated and observed survival data.

Essential References

Arjas, E. and Gasbarra, D. (1994). Nonparametric Bayesian inference from right censored survival data, using the Gibbs Sampler. Statistica Sinica 4, 505-524.

Consonni, G. and Veronese, P. (1995). A Bayesian method for combining results from several binomial experiments. Journal of the American Statistical Association 90, 935-944.

Hjort, N. L. (1990). Non parametric Bayes estimators based on beta processes in models for life history data. Annals of Statistics 18, 1259-1294.




Pietro Muliere and Stephen Walker

Prior processes in the presence of censoring

This paper presents a Bayesian nonparametric approach to survival analysis based on arbitrarly right censored data. The first aim will be to show how the presence of censoring suggests the form of the prediction and implies that the neutral to the right process is the natural prior to use in this context. Secondarily the properties of a particular neutral to the right process called ``beta-Stacy process'' are examined. Finally the connections between some Bayesian bootstraps and the beta-Stacy process are investigated.



Peter Müller and Gary Rosner

A longitudinal data model with non-parametric random effects prior

Repeated measurements on individuals often change as a nonlinear function of time, making analysis difficult. We propose a class of nonlinear population models with nonparametric second-stage priors for the distribution of the subject-specific profile parameters. We describe a full posterior analysis in a Bayesian framework, including prediction of profiles (and derived functionals) for future subjects, estimation of the mean response function for observed individuals, and flexible nonlinear regression on covariates. The second-stage prior takes the form of a mixture of normals; the random mixture measure is, in turn, generated by a Dirichlet process. We illustrate the proposed model with pharmacodynamic data concerning cancer patients treated with high-dose chemotherapy.



Andrea Ongaro

A general class of nonparametric prior distributions which includes the Dirichlet process

A new class C of nonparametric prior distributions, which generalizes the Dirichlet process, is proposed. A number of properties of this class as well as a relatively tractable expression of its posterior distribution are derived. Because of the method used in its construction, the class C retains to a considerable extent the tractability and interpretability of the Dirichlet process. On the other hand, the class C, unlike the Dirichlet process, allows for the possibility of giving special weights to repeated observations.


Giovanni Petris

Bayesian Analysis of Long Memory Time Series

In recent years there has been a growing interest for the statistical analysis of long memory processes, i.e., stationary processes with a spectral density presenting a pole at the origin. I will illustrate a Bayesian nonparametric approach to the problem. I propose a prior distribution on a class of spectral densities which properly includes a set of densities having a pole at the zero frequency. This allows to test for the presence of a long memory behavior and to compare the prior and posterior probability of the data coming from a long memory process, in order to see how strongly the data support this hypothesis. The basic model is then extended to include a linear regression term for the mean of the process.
Using Markov Chain Monte Carlo techniques, samples of the spectral density and the regression parameters from the posterior distribution can be obtained. With some modifications, the simulation scheme can be used to obtain also a sample from the predictive distribution of the future observations.
The methodology described is applied to the analysis of the average temperature of the Southern emisphere over the last century.



Sonia Petrone

A prior for Bayesian nonparametric inference based on Bernstein polynomials

I define random Bernstein polynomials and show how they can be used for obtaining a prior on the class D of distribution functions on [0, 1]. The prior has full weak support and selects an absolutely continuous distribution function, with continuous and smooth derivative. As a particular case, I discuss a smoothing of the Dirichlet process. The continuous nature of the Bernstein prior provides a continuous and smooth predictive density, which is a desirable property in problems such as density estimation. Some applications will be presented. For computation, I use a Markov chain Monte Carlo algorithm, which has some aspects of novelty since the problem in exam has a ``changing dimension'' parameter space.


Wolfgang Polasek and H. Kozumi

A Bayesian semiparametric analysis of ARCH models

This paper provides a Bayesian analysis of a semiparametric autoregressive conditional heteroskedasticity (ARCH) models. We propose a semiparametric ARCH model based on Dirichlet process mixing (DPM), called the DPM-ARCH model. We apply a Markov Chain Monte Carlo (MCMC) method for the posterior inference. The method is demonstrated for the monthly time series of exchange rates for the Japanese Yen to the US Dollar.


Eugenio Regazzini and Viatcheslav V. Sazonov

On approximation of random probability measures by mixtures of Dirichlet processes with application to Bayesian inference

In the general setting of non-parametric Bayesian inference when observations take values in a Polish space X, priors are approximated (in the Prokhorov metric) with any precision by explicitely constructed mixtures of the distributions of Dirichlet processes. It is shown that if these mixtures converge weakly to a given prior , the p osteriors of (when observations are exchangeable) converge weakly to the posterior of . In the case when X is finite, approximations of the prior distribution function in the Kolmogorov metric are also given. Finally, the previous results are applied to the analysis of , where is a random probability measure having distribution .


Emma Sarno

Dependence Structures of Polya Tree Autoregressive Models

Polya trees form a class of distributions for a random probability measure. They are defined through a set of independent Beta distributions over a diadic partition of a separable measurable space Omega. Following the recent developments of models for dependent data and autoregressive processes based on Polya trees (by M. West, 1997, reported at this conference), the diadic expansion of an observation x_t, defined on Omega, is used here to investigate the dependence structure of an autoregressive process of order one under non-normal marginal distributions of x_t. As in West (1997), we assume that the diadic representation of two following observations is identical up to level k (i.e. the last known matching point), with k distributed as a Geometric random variable with parameter P, so that P models the autoregressive structure of the time series as a large k gives low dependence. It is shown that small values of P produce an almost linear relationship between x_t and x_{t+1}, whereas P equal to one leads to independence, but, in general, values of P between zero and one allow for non-linear dependence. Moreover, the conditional distributions p(x_t|x_{t+1}) are derived for a few marginal distributions of x_t and some numerical results are discussed.


Luca Tardella

Some Robustness Issues in Modelling Bayesian Nonparametrics

This paper deals with the problem of evaluating the range of conditional inference for an exchangeable sequence of observations when only some partial predictive information about observables is available. Here, we restrict the attention to a particular class of processes arising from directing measures that are Tail-Free (Freedman, 1963; Fabius, 1964), namely the class of Exchangeable Trees (Monticino, 1996). This class contains other classes of processes widely used in Bayesian Nonparametrics such as Polya Trees and Dirichlet Processes.


Angelika van der Linde

Smoothing in Generalized Regression Models

In generalized regression models one may combine the distributional assumption of an exponential family for the observations with the modelling assumption of ``smoothness'' for a nonparametric predictor. Typically then the asymptotics of maximum likelihood do not hold any more, and if one requires more than an exporative analysis one can proceed only referring to the Bayesian interpretation of smoothing.
Within this interpretation the smoothing parameter occurs as an hyperparameter. Most often the analysis is (effectively) done as an approximate empirical Bayesian analysis, using a Taylor expansion of the likelihood (``working observations'') and a cross-validatory technique to determine the smoothing parameter. An ``exact'' fully Bayesian approach computationally requires Gibbs sampling.
In this talk the alternatives will be discussed, some experiences be reported and illustrated by examples.



Stephen Walker

A Note on the Scale Parameter of the Dirichlet Process

This paper gives an interpretation for the scale parameter of a Dirichlet process when the aim is to estimate a linear functional of an unknown probability distribution. We provide exact first and second posterior moments for such functionals under both informative and noninformative prior specifications. The noninformative case provides a normal approximation to the Bayesian bootstrap.


Mike West and Fabrizio Ruggeri

Time series on Polya trees: Nonparametric Bayesian autoregressions

This talk presents theoretical and modelling developments in nonparametric Bayesian analysis of dependent data, with one specific focus on the development of Polya tree models for autoregressive processes. Novel constructions of stationary AR processes with arbitrary marginal distributions lead into a framework for inference on dependencies and marginal structure using Polya tree priors. Characteristics of the models, theoretical and simulation-based, are discussed, as are issues of model fitting and posterior and predictive inference.