ABS11 - 2011 Applied Bayesian Statistics School

HIERARCHICAL MODELING FOR ENVIRONMENTAL PROCESSES

EURAC, Bolzano/Bozen, Italy

June, 20-24, 2011

PARTICIPANTS' TALKS

 


Marek Brabec

Uncertainty in PM10 Spatial Modelling

We will talk about a collaborative project between the Institute of Computer Science in Prague and the Czech Hydrometeorological Institute in which PM10 annual concentrations are modelled spatially, based on the measurements from the Czech air quality network. While most of the project activities are based on classical geostatistics, we will concentrate on uncertainty assessments for various model-derived quantities. One of them will be the boundary of the "lower air quality" areas on which a lot of practical interest is focusing currently in the Czech Republic and elsewhere and which are not easy to delineate accurately.


Tuomas Kukko,  Harri Högmander  and Jyrki Pusenius

Estimating the Size and Structure of Local Moose Populations

RESEARCH BACKGROUND AND AIMS

Moose, Alces alces, is the most significant game animal in Finland. Measured by weight, moose gives approximately 80 percent of annually harvested game bag. On the other hand, every year moose population causes a considerable number of traffic accidents and damage worth millions of Euros to young forests. Hence, strict controlling of local moose densities by hunting is necessary.

In order to advance the collection of data for monitoring moose populations, Finnish Game and Fisheries Research Institute launched a specific moose observation card for hunting groups in early 1970s. Since then, monitoring methods have been based on observation cards’ data, various ground and aerial counts and different population indices. However, the methods applied to estimate the totals have been rather ad hoc in nature, and there has been a lot of dispute over the results between different groups of interest. Therefore, a more scientifically based and transparent estimation procedure would be highly welcomed.

Our main goal is to develop a statistical computation method or model, which optimally uses information of several – direct or indirect – sources of data of the status and changes in moose populations. It is also possible to obtain other significant improvements, like yielding accuracy estimates for population size point estimates and learning more not only about the size but also the age and sex structure of local moose populations. In the future, it is possible to develop the model to gain even more comprehensive estimates related to forecasting forest damages and traffic accidents depending on different decision making policies of annual and areal hunting allocation.

 

METHODS

Since the development of a moose population can be considered as a dynamic process, an obvious solution for an assessment tool is a population dynamics model. Population dynamics model is divided in smaller sub-models for, e.g., breeding, sex assignment, hunting and survival processes. All the sub-processes are modeled in a Bayesian framework for four sub-populations (states): adult males, adult females, male calves and female calves.

The key data sources are linked to the Bayesian population model through separate likelihood functions. The most influential likelihood function connects moose observation cards’ daily numbers of adult moose observations to the corresponding standing stock. These numbers are assumed to be binomially distributed with unknown population size as parameter N and also unknown observation probability as parameter p. Other likelihoods are set for hunters’ estimates of the population sizes, sex ratio estimates based on hunters’ observations, annual catch (or observations) per effort indices and harvest structure indices, which are known to predict changes in the population size. The dynamic model with all its side dishes is implemented in R environment, and is run with WinBUGS program using R2WinBUGS software package.

 

RESULTS AND CONCLUSIONS

For now, our population dynamics model works fine and produces plausible results at least for the southern half of Finland, where hunting and data collection are well organized. However, for some areas additional information is required to achieve satisfactory results. Particularly this is the case in northern Finland, where, compared to the southern populations, the moose density is regarded as geographically more heterogeneous and the tendency for migration higher. The possibilities for improvements are to strengthen the likelihood part of the Bayes model and to enrich the population dynamics model.

The results from winter track counts and aerial counts are examples of yet unexploited sources of data. Modeling these procedures would yield in new, hopefully informative likelihoods. When it comes to the population model, the movement of the moose could be taken into account in a more realistic manner, and also adding a third age class of “yearlings” into the model might lead us closer to the truth.


Claire McDonald, Rognvald Smith, Jan Dick and Christopher Andrews

Plant Species Variability in the Allt a’Mharcaidh Catchment, Scotland

Identifying spatial variability in plant species diversity is important for assessing changes in species distribution and identifying the drivers of these changes.  The survey technique used to measure species abundance may not be appropriate to detect this variation or the sampling effort may not be cost effective if the vegetation diversity does not vary.  Plant species in the Allt a’Mharcaidh catchment in the Cairngorms National Park, Scotland are monitored as part of the UK’s long term environmental monitoring programme, The Environmental Change Network (ECN).   This provides an opportunity to examine how species associations change across the catchment and if such spatial variation can be explained by external factors such as soil type and altitude.   The vegetation is surveyed at two scales, one covering the whole catchment and the other is a subsample of the total area.  Marked point process models and spatial hierarchical models are used to map plant species intensity across the catchment and compare the spatial variation each survey method captured.  Species associations are compared with previous national vegetation classification at the site.  The results of the models are used to aid future monitoring of plant species diversity across the catchment and other long term monitoring sites.      


Consuelo Rubina Nava

Multinomial Mixed Logit Model: Bayesian Methods and Algorithms

This paper investigates the advantages of the Multinomial Mixed Logit Model (MMLM) with respect to other discrete choice models applied to panel data econometric studies. Implementation of MMLM allows a greater flexibility and a better representation of individual heterogeneity w.r.t. Logit, Probit and Nested Logit models. This follows from a random utility maximization assumption, subject to the identification of an unknown mixing distribution for random parameters. However, considerable estimation complexity arises from the use of MMLM under classical estimation.

Try to menage this increment in computational complexity opens an interesting analysis over the implementation of Bayesian methods. Bayesian methods present a good alternative in dealing with computational complexity given by the MMLM implemented within the classical approach. Several studies and empirical applications of Bayesian algorithms for MMLM have been proposed. The first one, elaborated by Train, is a parametric Bayesian approach. Assuming Normal distribution for random parameters, this algorithm estimates unknown mean and variance-covariance matrices from Normal and Inverse Whishart distributions, respectively, using Gibbs sampling procedure. The second one is a Bayesian nonparametric algorithm (de Blasi, James, Lau 2010) for panel data. This algorithm for MMLM uses a nonparametric prior on the unknown mixing distribution to estimate choice probabilities. A blocked Gibbs sampling procedure is used. Implementation of this algorithm requires an approximation of the random probability measure. For that approximation finite stick-breaking processes are used.

Extensions of the algorithm proposed by De Blasi, James, Lau (2010), allow for different distributional assumptions but generate implementation troubles. In this respect, the use of retrospective sampler (cfr. Roberts, Papaspilopoulos (2006)) or slice sampler (cfr. Walker (2007)) should overcome algorithm's limits.


Claudia Notarnicola

Bayesian Methodology for Estimation of Bio-geophysical Parameters from Remotely Sensed Data

This work focuses on the use of Bayesian methodologies for inversion purposes: the estimation of bio-geophysical parameters from remotely sensed data. The retrieval of bio-geophysical parameters from remotely sensed data falls within the category of inverse problems where, from a vector of measured values, m, one wishes to infer the set of ground parameters, x, that gave rise to them. The inverse problem is a typically ill-posed problem. It presents many difficulties due to the non-linearity between remote sensing measurements and ground parameters, and generally because more than one value of x could produce the same measured vector m. Multi-sources information, such as different polarization, frequencies and sensors are fundamental to the development of operationally useful inversion systems in which the interference among different parameters in the sensor response can be disentangled (Satalino et al. 1999). In this context, Bayesian methodologies offer a convenient tool of combining two or more disparate sources of information, models and data (Dubois et al., 1985). The work describes the development of a general model starting from a theoretical model with the inclusion of sensor noise and model errors through a Bayesian approach (Notarnicola et al., 2007). Some case studies will be presented:

-         Different sensor combination for soil moisture estimation from SAR images;

-         Different polarization combination for estimation of plant water content from SAR images;

-         Comparison with other inversion approaches such as Neural Networks.

References

1.     Dubois P. C., J. van Zyl, and T. Engman, “Measuring soil moisture with imaging radars,” IEEE Trans. Geosci. Remote Sensing, vol. 33, July 1995.

2.     C. Notarnicola, M. Angiulli, F. Posa, “Use of Radar and Optical Remotely Sensed Data for Soil Moisture Retrieval on Vegetated Areas”, IEEE Transactions on Geoscience and Remote Sensing, vol.44, no.4, April 2006, p.925-935 .

3.     Satalino et al. (1999) The potential of multi-angle C-band SAR data for soil moisture retrieval. In Proceedings of the International Geoscience and Remote Sensing Symposium, IGARSS 1999 (GE-18) (pp.288-295).


Lucia Paci

A Comparison between Hierarchical Spatio-Temporal Models in Presence of Spatial Homogeneous Groups: The Case of Ozone in Emilia-Romagna Region

Hierarchical spatio-temporal models permit to consider and estimate many sources of variability. In many environmental problems, different features characterizing spatial locations can be found. For example in a Region, monitoring sites can be classified as urban and rural. Differences in these classifications can show differences either in mean levels or in the spatio-temporal dependence structure. When these differences are not included in the model structure, model performances and the spatial predictions may lead to poor results.

 

The aim of this work consists in the comparison of several hierarchical spatio-temporal models, namely: a set of group-specific models, a model that does not include groups, a model that includes differences in the mean levels between groups and finally a model that includes different spatial correlation structures between groups. Comparisons allow to detect and capture the actual differences between groups if they exist.

 

The application presented concerns Ozone data in the Emilia-Romagna Region in which 31 monitoring sites can be classified according to their relative position with respect to traffic emissions.


Paula Pereira

Characterizing Spatial-Temporal Forest Fires Patterns in Portugal

Forest fires can be regarded as spatio-temporal point patterns and thus space-time statistical tools can be of help in analysing the behaviour of fires. The aim of this work is to model the location and the sizes of forest fires in Portugal by an adequate marked spatio-temporal point processes.


Jacopo Soriano, Alessandra Guglielmi, Francesca Ieva, Anna Maria Paganoni and Fabrizio Ruggeri

Semiparametric Bayesian Approaches to Mixed-Effects Models for Outcome Measures in the Treatment of Acute Myocardial Infarction

Studies of variations in health care utilization and outcome involve the analysis of multilevel data, considering in particular prediction of a specific response, and estimate of covariates effect and components of variance. Those studies quantify the role of contributing factors including patients and providers characteristics and may assess the relationship between health-care process and outcomes.

 

We consider Bayesian generalized linear mixed models to analyze data on patients admitted with ST-elevation myocardial infarction (STEMI) diagnosis in Regione Lombardia hospitals. Clinical registries and administrative databanks were used to predict both in-hospital survival and ST resolution probability. We fit logit models for the in-hospital survival and ST-resolution probability with grouping effect (the hospital), under a semiparametric prior. In particular, random effects with dependent Dirichlet process prior are assumed, allowing to include specific hospital-covariates and then enriching the dependence structure among the related random measures.


Rasoul Yousefpour

A Bayesian Modelling and Simulation Concept for Knowledge Update in Adaptive Management of Forest Resources under Climate Change

A fully adaptive decision-making takes into account that more information about climate and forest state is forthcoming and may change the optimal decision. Therefore, we develop a modelling concept that applies Bayesian updating of beliefs about future climate change, but otherwise build on a set of forest management outputs under different climate change scenarios. We apply the concept in a hypothetical decision-making problem considering four species to be selected for forest plantation in European temperate zone. We thus consider three models of slow, moderate and high change in climate state such that the realized climate state at any time is defined to be the predicted true climate model plus a shock normally distributed with mean zero and a model specific variance, . The results may illustrate the value of knowledge update and the need to switch the decisions for an optimal adaptive forest management. Moreover, we show that, using Bayes’ theorem, revealing of the true climate is just a matter of time and the more divergence are the climate models, the faster is recognition of the true underlying one. However, the outcome of the entire concept is highly sensitive to the initial beliefs on different climate models where these beliefs may change after new observation of climate states and via Bayesian belief updating. The economic value of such an adaptive and updating approach would be positive and higher if a reasonable change in climate state occurs asking consequently for a change in optimal decision.


Paolo Zanini

Models and Data for the Analysis of Traffic Flows

 

The idea behind this presentation is to analyze vehicular traffic flow in the area of Milano through techniques that concern, directly or indirectly, the traffic analysis. First of all we focus on a traffic model generated by the idea that diffusion of vehicles in a urban network is like the flow of a fluid through a porous medium (Della Rossa, D’Angelo and Quarteroni, 2010) where traffic flow spans two-dimensional regions whose size (macroscale) is greater than the characteristic size of the network arcs (microscale). Starting from a stochastic lattice gas model with simple constitutive laws, a distributed two-dimensional model of traffic flow is derived, in the form of a non-linear diffusion-advection equation for the particle density. The equation is formally equivalent to a non-linear Darcy’s filtration law. In particular, it contains two parameters describing the morphology of urban area that can be seen as the porosity and the permeability tensor of the network, while a different parameter identifies the principal direction of traffic flow. Once the model parameters have been identified, then the model permit to determine the density of traffic in the considered network.

 

The innovative scenario we want to develop goes in the opposite direction. From a given traffic density in a real situation we want to estimate the model parameters that reproduce this density. In this way we could be able, for example, to investigate how the analysis of real traffic would change altering these parameters.  We do not have direct access to vehicular traffic data, but we do have information on telephone traffic; indeed (Secchi, Vantini and Vitelli, 2010) the Telecom Italia dataset was analyzed. This dataset describes the intensity of telephone traffic (as a function of time) in a large number of spatially distributed points in the area of Milan and in the hinterland. The analysis of these functional data by means of spatial statistics methods shows that we can recover information connected with vehicular traffic. Firstly the problem was reduced from a spatial point of view, dividing the area of interest in greater parts than the singular cell and identifying a representative function for each zone. Then a treelet basis (Lee, Nadler and Wasserman, 2008) of the functional data available was evaluated. Employment of such basis is justified by the fact that treelet is a data-driven wavelet basis and so we can detect relevant functional features in the curves. For example, there are some relevant elements of basis that evaluate telephonic patterns associated to rush hours (morning and evening commuters). The analysis of the scores associated to these relevant treelets shows clearly the correlation between telephone and vehicular traffic.

 

We aim to use the info provided by this analysis of telephone data to validate the model of traffic flow in complex network described above and to identify these parameters. At that point we will in the position to simulate different real scenarios to investigate how traffic flow in the area of Milan may change, for instance, as a consequence of the introduction of new car-sharing services.


Monika Zovko

Integrated Environmental Risks Assessment in Croatian Coastal River Basins