ABS12 - 2012 Applied Bayesian Statistics School

STOCHASTIC MODELLING FOR SYSTEMS BIOLOGY

Pavia, Italy

September, 3-7, 2012

PARTICIPANTS' TALKS

 


Simona Arcuti

Spatio-temporal models for zero-inflated data: an application to the abundance data of two crustaceans’ species in the Ionian Sea

In the ecological field, abundance data are often characterized by the zero inflation of population distributions. In this work we consider two commercial species belonging to the faunistic category of crustaceans, abundant in the North-Western Ionian Sea, namely the Parapenaeus longirostris (Lucas, 1846) and the Aristaeomorpha foliacea (Risso, 1816). Biological data concerning the two species of shrimp are collected during trawl surveys carried out from 1995 to 2006 as part of the international program MEDITS (International bottom trawl survey in the Mediterranean) and of the national program GRU.N.D (GRUppo Nazionale Demersali). Two modelling approaches are used in order to investigate the changes of the two population densities and biomasses, over spatial and temporal scales and in response to anthropogenic and environmental factors (fishing effort, sea surface temperature, precipitations, NAO and MO baric climatic indices). Generalized additive models (GAM’s) are considered, assuming that the distribution of the response variable belongs to the Tweedie family. Such distributions are characterized by an index parameter and are continuous on the positive reals, plus an added mass exactly at zero, when the index is between 1 and 2. Another general approach to zero-inflated data modelling consists in assuming the response distribution as a probabilistic mixture of a zero and a non-zero generating process. Constrained zero-inflated GAM’s (COZIGAM’s) are obtained specifying the probability of non-zero inflation and the mean of the non-zero-inflated population abundance as GAM’s with linearly related parameters.


Maja Czokow

Bayesian networks in detecting key structure features of a spring system in exploring proteins conformations

 

In our work we developed a mathematical model, which is applied to explore conformational movements of proteins. The model can be used in order to identify key residues (amino acids, groups of atoms), which have the greatest impact on transition from one conformation to another [1] or to detect intermediate conformations, a reaction path between two input conformations. We are going to employ Bayesian network to infer, which real-valued parameters of our mathematical model have the greatest impact on its efficiency. Therefore, the Bayesian network presents influence of attributes on the quality of solutions returned by the method. In order to find conditional probabilities for the network, we are going to take advantages of numerical results obtained by software implementation of our model. Numerous tests for different combinations of the values of the model attributes, will guarantee good estimation of the Bayesian network parameters. Our mathematical model of the protein conformations is implemented by means of spring systems, which are represented by a graph G: = (V; E) embedded in a Euclidean space R^3. The principal task for the spring systems is to assume a required mechanical behaviour (physical locations of network nodes, regarded as output) in response to suitable physical stimuli (displacements of control nodes, regarded as input). In [2], we show how such systems can be implemented. Each atom or group of atoms of given protein is represented by a node of V and virtual bonds between them constitutes the set of springs E. In order to accomplish the objective, the expected mechanical behaviour of the spring system is defined by conformations of a given protein.

 

References

 

[1] Chen, W. Z., Li,C. H., Su, J. G., Wang, C. X., Xu, X. J.

Identification of key residues for protein conformational transition using elastic network model

[2] Czoków, M., Schreiber, T.: Adaptive Spring Systems for Shape Programming. Proc. of the 10th International Conference on Artificial Intelligence and SoftComputing, Zakopane, Poland, Lecture Notes in Artificial Intelligence 6114, pages 420–427, Springer, (2010).


Martina Feilke

Bayesian spatial analysis of FRAP-images

Fluorescence Recovery after Photobleaching (FRAP, see for example, Sprague and McNally, 2005) is a method in biology to investigate in vivo the binding behaviour of molecules in a cell nucleus. The molecules are therefore tagged fluorescently, a part of the cell nucleus of the cell of interest is bleached, and the recovery of the bleached part of the nucleus is observed by taking pictures of the nucleus in predefined time intervals. The aim is to get information about the speed of the movement of the unbleached molecules to observe their binding behaviour. More specifically, one wants to get information about the existence of one or more binding sites for the molecule as well as the duration of residence at a specific binding site. To date, analysis of FRAP data has been performed either for only the bleached part of the cell nucleus (for example, in Sprague et al., 2004), or for both the bleached part and the unbleached part of the cell nucleus separately (for example, in Phair et al., 2004), or for a finer subdivision of the cell nucleus into a small number of disjoint parts (for example, in Beaudouin et al., 2006).

Our goal is to perform a spatial analysis of FRAP data at the pixel level. We plan to incorporate the concentration of unbleached molecules of interest in neighbouring pixels into the fit of the concentration curve per pixel, which is obtained by the imaging of the cell nucleus, to account for diffusion. Moreover, we plan to model one binding reaction per pixel. By solving the differential equations based on the compartment model that describes the change of the concentration of unbleached molecules in each pixel in the cell nucleus we aim to get a nonlinear regression equation per pixel by which we can model this concentration at any time during the recovery.

For each pixel, we intend to obtain estimates of the on- and off-rates of the binding reaction providing information about the binding behaviour of the molecules, as well as estimates of the volume of the compartments of interest by applying a MCMC-algorithm with Gibbs- and/or Metropolis-Hastings-update-steps.

 

References

 

Joel Beaudouin et al. Dissecting the Contribution of Diffusion and Interactions to the Mobility of Nuclear Proteins. Biophysical Journal, 90:1878-1894, 2006.

 

Robert D. Phair et al. Global Nature of Dynamic Protein-Chromatin Interactions In Vivo: Three-Dimensional Genome Scanning and Dynamic Interaction Networks of Chromatin Proteins. MOLECULAR AND CELLULAR BIOLOGY, 24(14):6393-6402, 2004.

 

Volker J. Schmid et al. A Bayesian Hierarchical Model for the Analysis of a Longitudinal Dynamic Contrast-Enhanced MRI Oncology Study. Magnetic Resonance in Medicine, 61:163-174, 2009.

 

Brian L. Sprague and James G. McNally. FRAP analysis of binding: proper and fitting. TRENDS in Cell Biology, 15(2):84-91, 2005.

 

Brian L. Sprague, Robert L. Pego, Diana A. Stavreva, and James G. McNally. Analysis of Binding Reactions by Fluorescence Recovery after Photobleaching. Biophysical Journal, 86:3473-3495, 2004.


Luca Ferreri and Mario Giacobini

A discrete stochastic model of the transmission cycle of the tick borne encephalitis virus

Tick borne encephalitis (TBE) is an emergent zoonosis transmitted by ticks in several woodland areas of the Eurasia. The ethiological agent is a Flavivirus causing in men severe consequences - such as meningoencephalitis - in some cases leading to death.

TBE is naturally maintained by a cycle involving hard ticks belonging to the Ixodes spp. as vectors and mice as hosts animals. In fact, hard ticks need only one complete blood meal to moult. Furthermore, immature ticks - larvae and nymphs - usually feed on small vertebrates while adults ticks prefers large mammals. However, the main route of transmission of the TBE viruses arises from infected nymphs to larvae cofeeding on the same mice.

In this work we try to formulate a discrete stochastic model that describes the aforementioned transmission cycle. In particular, we consider a stochastic network contact structure in order to describe the potential numbers of transmissions from nymphs to larvae over different months in years. From this mathematical model we have achieved some interesting analytical results that in the future we intend to validate by stochastic simulations.


Thijs Janzen

Diversification in a dynamic landscape

Allopatric speciation is often viewed as a slow and gradual process. Over time reproductive isolation is achieved due to a lack of gene-flow as a result of the formation of a geographical barrier. Because geographical changes are relatively slow, allopatric speciation is usually not associated with the generation of, or dynamic changes in, biodiversity. In contrast, environmental factors such as water level or temperature might rapidly change the distribution of viable habitat. Here we study the effect of such changing environmental factors on diversification in a model containing allopatric speciation due to dynamical environmental factors and sympatric speciation. We use the cichlids in Lake Tanganyika as a model system. Over time, the populations of cichlids in Lake Tanganyika have been subject to dramatic water level changes that subdivided the lake into multiple satellite lakes. These changes in water level are thought to have acted as a “species pump”. We fit our simple model to a phylogeny of the Lamprologini in order to estimate the relative importance of these processes in the generation of diversity in this species-rich tribe.


Eugenia Koblents

A novel population Monte Carlo method for Bayesian inference and its application to stochastic kinetic models

Many problems of current interest in science and engineering rely on the ability to perform inference in high-dimensional spaces. A very common strategy, which has been successfully applied in a broad variety of complex problems, is the Monte Carlo methodology. In particular, we have considered a recently proposed technique known as population Monte Carlo (PMC), which is based on an iterative importance sampling approach. The aim of this method is the approximation of static probability distributions by way of random measures consisting of samples and associated weights.

 

An important drawback of the importance sampling approach, and particularly of PMC, is that its performance heavily depends on the choice of the proposal distribution (or importance function) that is used to generate the samples and compute the weights. When the variable of interest is high-dimensional or the proposal is very wide with respect to the target, the importance weights degenerate leading to an extremely low number of representative samples.

 

We propose a novel PMC scheme which is based on a simple proposal update scheme and introduces a technique which avoids degeneracy of the importance weights and increases the efficiency of the PMC scheme when drawing from a poorly informative proposal.

 

As a practical application of interest we have applied the proposed algorithm to the challenging problem of estimation of the rate parameters in stochastic kinetic models (SKM). Such models describe the time evolution of the population of a set of species which evolve according to a set of chemical reactions and present an autoregulatory behaviour. We propose a particularization of the proposed algorithm to SKMs and present numerical results based on a simple SKM, known as predator-prey model.


Laura Martin Fernandez, Ettore Lanzarone, Joaquin Miguez, Sara Pasquali and Fabrizio Ruggeri

Particle filter estimation in a stochastic predator-prey model

 

Parameter estimation and population tracking in predator-prey systems are critical problems in ecology. In this paper we consider a stochastic predator-prey system with a Lotka-Volterra functional response and propose a particle filtering method for jointly estimating the behavioural parameter representing the carrying capacity and the population biomasses using field data. In particular, the proposed technique combines a sequential Monte Carlo sampling scheme for tracking the time-varying biomasses with the analytical integration of an unknown behavioural parameter. In order to assess the performance of the method, we show results for both synthetic and field data. The latter correspond to an acarine predator-prey system, namely the pest mite “Tetranychus urticae” and the predatory mite “Phytoseiulus persimilis”. Finally, we compare the proposed particle filtering method with a Monte Carlo Markov Chain method previously used to solve this parameter estimation problem and a significant improvement is obtained.

 


Michal Matuszak

Application of the Bayesian influence diagram framework to the ramified optimal transport problem

 

A tree leaf transports resources like water and minerals from its root to its tissues. The leaf tends to maximize internal efficiency by developing an optimal transporting system. That observation can be applied to the, well-known NP-hard, ramified optimal transport problem, where the goal is to find an optimal transport path between two given probability measures. One measure can be identified with a root (source) while the other one with tissues (target). We will present an algorithm for solving a ramified optimal transport problem within the framework of Bayesian networks. It is based on the decision strategy optimisation technique that utilises self-annealing ideas of Chen-style stochastic optimisation, and uses Xia's formulation for the cost functional. Resulting transport paths are represented in the form of tree-shaped structures.


Preetam Nandy and Michael Unger

Optimal perturbations for the identification of stochastic reaction dynamics

 

Identification of stochastic reaction dynamics inside the cell is hampered by the low-dimensional readouts available with today’s measurement technologies. Moreover, such processes are poorly excited by standard experimental protocols, making identification even more ill-posed. Recent technological advances provide means to design and apply complex extra-cellular stimuli. Based on an information-theoretic setting we present novel Monte Carlo sampling techniques to determine optimal temporal excitation profiles for such stochastic processes. We give a new result for the controlled birth-death process and provide a proof of principle by considering a simple model of regulated gene expression.


Robert Ness      

Causal network modeling for drug target discovery

 

Development of algorithmic approaches to interpretation of large-scale genetic, transcriptomic, proteomic, and metabolic datasets is a key focus of computational biology.  In pharmaceutical research and development, these methods are used to gain a mechanistic understanding of the biological question of study.

 

One such method is causal network modelling, a systematic computational analysis that identifies upstream changes in gene regulation that can serve as explanations for observed changes in experimental data.  These upstream gene regulation events are identified using a directed interaction network.  Different hypotheses for upstream causal events are compared by using the network model to make predictions for the observed data, then evaluating the accuracy of the predictions.  The common method for making such predictions is the shortest-paths algorithm, which predicts the regulatory effect based on the net effect along the edges of the shortest path in the network between the upstream regulation event and the observed regulation event in the data.

 

While the causal network modelling approach is promising, the use of shortest paths based predictions is flawed.  It ignores the topological complexity that is characteristic of biological networks, such as feedback loops, which have an essential impact on net effect.  It also only considers upstream hypotheses concerning individual genes, despite the fact we now know disease are too complex to target individual targets in isolation.

 

To address this, we are developing a statistical approach to causal network modelling.  This incorporates probabilistic modelling of the network can be used to capture topological complexity and quantify uncertainty, in contrast to shortest paths algorithm.  Further, we can evaluate upstream regulation events that involve multiple genes, instead of evaluating single-gene hypotheses separately.


Hossein Farid Ghassem Nia

Bayesian decision making in computer vision with an approach to industrial automation

 

Computer vision is becoming the main stream in automation and quality control industry. In some applications, it is critical to make correct decision based on the uncertain data from vision systems and draw a conclusion based on analysis of data and predefined models. In this presentation, we introduce the application of Bayesian theory in a novel computer vision system in automation industry. In our research, we used Bayesian theory in image processing and signal analysis to find region of interest. We also show that how we developed our theory to analyse mass spectrometry data of melanoma patients. In addition, we are aiming to demonstrate some on-going challenges in this project regarding minimizing error in decision making.


Gian Marco Palamara

Statistical inference for temperature dependent logistic time series

 

Methods of parameter estimation are fundamental tools to assess the predictive power of theoretical models of population dynamics. The use of simple models like the logistic growth and the ability to infer parameters from time series data is emerging as a key problem in population ecology. We simulate stochastic logistic time series from different birth and death processes using the classical Gillespie algorithm. Logistic growth is the building block of more complex population models and can be used to test different methods of parameter estimation. We are able to simulate temperature dependence of the parameters of the logistic growth (namely growth rate and carrying capacity) for different birth and death processes. We apply to those simulated time series data different observation processes based on discrete time sampling of a typical experiment and on the spatial homogeneity of a population. We then construct different likelihood functions in order to fit simulated data to different models.

 

We find that there is a constant bias in fitting deterministic models to stochastic data. This bias is based upon the choice of the correct parameterisation of the variance of the observed data and does not depend on the observation process we use. Taking into account the observation process we are able to disentangle the intrinsic stochasticity of the biological process from the noise induced by the observation itself. Bayesian approaches are particularly convenient when dealing with incomplete data


Jaroslaw Piersa

Statistical description of functional neural networks

 

The aim of the presentation is to briefly discuss mainly statistical tools for description of large-scale activity-flow graph in artificial neural networks.

 

Suppose we are given a recurrent artificial neural network i.e. a set of neurons connected by synapses, with its stochastic, energy-driven dynamics. During the dynamics action potentials (or spikes) transmit signals between pairs of neurons by travelling along the synapses.

These spike travels yield spike- or activity-flow graphs, consisting of the synapses, which took part in transmitting the information.

 

Due to the scale of the graphs one must resort to mixed statistical and random-graph-theoretical approach in order to describe properties of the activation-flow network. Among the discussed properties we mention average connectivity, empirical degree distribution, characteristic and maximum (the diameter) path length, clustering coefficient. Additional features include spectral density, 'small-world-ness indicator’, resiliency to random damage, graph degeneracy, degree assortativity etc.

 

The aim of such description is two-fold. First, the properties of the model seem to be interesting by themselves. Second, the statistical description can be compared to those obtained from values, reported in medical data of the fMRI brain analyses and, by extension, shed some light onto the at least some principles of brain work, at least at macroscopic level.


Ihor Smal

Sequential Monte Carlo methods for multiple object tracking in molecular bioimaging

 

Time-lapse fluorescence microscopy imaging has rapidly evolved in the past decade and has opened new avenues for studying intracellular processes in vivo. Such studies generate vast amounts of noisy image data that cannot be analyzed efficiently and reliably by means of manual processing. Many popular tracking techniques exist but often fail to yield satisfactory results in the case of high object densities, high noise levels, and complex motion patterns. Probabilistic tracking algorithms, based on Bayesian estimation, have recently been shown to offer several improvements over classical approaches, by better integration of spatial and temporal information, and the possibility to more effectively incorporate prior knowledge about object dynamics and image formation. We propose an improved, fully automated particle filtering algorithm for the tracking of many subresolution objects in fluorescence microscopy image sequences. It involves a new track management procedure and allows the use of multiple dynamics models. The accuracy and reliability of the algorithm are further improved by applying marginalization concepts. Experiments on synthetic as well as real image data from three different biological applications clearly demonstrate the superiority of the algorithm compared to previous particle filtering solutions.


Sofia Tsepletidou

Computational Bayesian tools for modeling the aging process

 

Whereas the aging process is obvious in macroscopic organisms, it is not in single celled ones. However, when monitoring the growth of rod-shaped bacterial colonies, for instance using the model organism E. coli, it is made possible to recognize an aging mechanism. This is due to the division process, which splits the cell transversally producing a new end per progeny cell; this new end is called new pole, whereas the other pre-existing end, old pole. Thus, the replicative age is defined as the number of generations elapsed since the old pole arose. The older this pole, the slower is its growth; thus, more damages are expected to have accumulated – increased physiological age. However, the replicative age accounts for a significant, yet limited fraction of the variability observed in the physiological characteristics. Understanding the impact of the replicative age on the physiological measurements, as well as, the mechanism with which the cells are rejuvenated, symmetrical or not, is possible by reconstructing a hidden quantity that would govern the physiology of the cell while fulfilling basic conservation laws. Estimation is made in form of exploration of the approximate posterior distribution for the parameters of the constructed mathematical model. Approximate

Bayesian Computation methods (ABC rejection sampler and ABC MCMC sampler) are considered in order to avoid the combinatorial cost, as well as, the difficulty of computing the distribution of the statistics which this study is relied on. Results show that the method recognizes well the presence and the absence of asymmetry, but not at a low level.


Giorgio Vacchiano

Modeling dynamics of forest ecosystems

 

Forests provide man with relevant goods and services. These include not only timber and firewood, but also non-wood forest products, protection from hydrogeological hazards, carbon sequestration, recreation and tourism, habitat for animal and plant biodiversity. However, forest ecosystems face pressures that may exceed their resistance or adaptation potential (resilience). On one hand, human-induced climate change is forecasted to impact photosynthesis, wood production, tree mortality, regeneration, tree species distribution, soil properties, and frequency and severity of disturbances such as fire or pest and pathogen outbreaks. On the other hand, changes in land use brought about by socio-economic processes are affecting forest distribution, composition and structure even faster than climatic forcing, e.g. by deforestation (in developing countries) or formation of secondary woodlands and/or spreading of invasive alien species (in developed countries for the most   part). In order to ensure the continuity of forest services and the sustainability of forest resources in face of ongoing changes, scenarios of response to external drivers and alternative pathways of resource exploitation are needed. Forest ecosystem models can produce such scenarios at variable scales, i.e., from the tree to the stand and the landscape. This talk will outline my research as it relates to the calibration and programming of forest ecosystem simulators, to be applied at multiple scales, in order to compare sustainable forest management alternatives in terms of their short- and long-term outcome on forest resources. I will show how I integrate field sampling, databases, and geographical information systems, and the statistical tools that can be used to attain the goal of producing reliable simulations of tree growth, ecosystem processes and forest response to external forcing, and provide examples from my current study areas and research work.


Felix Weidemann

A Bayesian approach to infectious disease modelling using ordinary differential equations: rotavirus in Germany

 

Understanding infectious disease dynamics using epidemic models requires the quantification of several parameters describing the transmission process. In the context of predictive transmission modelling this quantification is commonly based on disconnected epidemiological studies to fix as many parameters as possible in advance. This approach often leads to biased inference for parameters left to estimate due to dependency structures inherent in any given model, without sufficiently assessing the uncertainty regarding those detached assumptions. We developed a Bayesian inference framework that lessens the reliance on external parameter quantifications due to a data driven estimation approach. We extended this idea with model averaging techniques with a focus on the residual autocorrelation, to weaken those estimates dependence on the underlying model structure. We applied our methods to the modelling of age stratified weekly rotavirus incidence data in Germany from 2001-2008 using a complex susceptible-infected-recovered type model taking maternal antibodies, waning immunity seasonality and underreporting into account. Our results not only give valuable insight into the transmission processes, but also show the severe consequences of fixing parameters beforehand regarding the model predicted dynamics.