The list below is certainly not exhaustive, and may not even be representative.
Our intended goal here is to give the readers a sense of the current use
of Bayesian analysis in Genetics.
J.S. SINSHEIMER, J.A. LAKE, AND R.J.A. LITTLE (1996).
Bayesian Hypothesis Testing of Four-Taxon Topologies Using Molecular
Sequence Data.
Biometrics 52, pp193-210.
Contact: Roderick J. A. Little, University of
Michigan, <rlittle@umich.edu>.
Bayesian analysis is used to test three hypotheses concerning the correct
topology from the available DNA sequences. Classical hypothesis testing is
reported to be difficult due to test the multiple alternative hypotheses
involved, while Bayesian analysis being conceptually straightforward.
Multinomial and Multivariate normal sampling models are used with uniform
priors on the parameters to obtain the posterior probabilities of the
hypotheses. Using a large simulation study to assess the frequentist
properties of the Bayesian tests, the authors conclude that Bayesian tests
are well calibrated and have reasonable discriminating power
for a wide range of realistic conditions.
K. SJ¨OLANDER, K. KARPLUS, M. BROWN, R. HUGHEY, A. KROGH, I. S. MIAN
AND D. HAUSSLER (1996).
Dirichlet mixtures: a method for improved detection of weak but
significant protein sequence homology.
Computational Applications in Bio. Science 12, pp 327-345
Contact: Kimmen Sjölander, University of California, Santa
Cruz, <kimmen@cse.ucsc.edu>.
Bayesian methods are used in finding remote homologs with lower primary
sequence identity . The authors use a Dirichlet mixture prior for amino acids
distributions. This prior is estimated using data obtained from multiple
alignment databases on observed counts of amino acids in clusters, and the
maximum likelihood method. The observed amino acid frequencies are then
used to obtain posterior estimates of amino acid probabilities at each
position in a profile.
I. HOESCHELE, P. UIMARI, F. E. GRIGNOLA, Q. ZHANG, K.M. GAGE (1997).
Advances in Statistical Methods to Map Quantitative Trait Loci in Outbred
Populations.
Genetics 147, pp1445-1457.
Contact : Ina Hoeschele, Virginia Polytechnic Institute
and State University, <ina@vt.edu>.
Six different statistical methods, including maximum likelihood, exact and
approximate Bayesian methods, used for gene mapping are reviewed. Authors
comment that Bayesian analysis takes full account of the uncertainty
associated with all unknowns in the problem, and allows fitting of different
models quantitative trait loci variation. References to many other related work
using Bayesian methods are also given.
P. UIMARI AND I. HOESCHELE (1997).
Mapping-Linked Quantitative Trait Loci Using
Bayesian Analysis and
Markov Chain Monte Carlo Algorithms.
Genetics 146, pp735-743.
Contact: Ina Hoeschele, Virginia Polytechnic Institute
and State University, <ina@vt.edu>.
A Bayesian analysis for mapping linked quantitative trait loci(QTL) using
multiple linked genetic markers is given. This approach was motivated by the
evidence of detecting a single ``ghost QTL'' with least square analysis, when
in fact two linked QTL were segregating. Here, the authors extend existing
Bayesian linkage analysis to fit models that allow multiple linked markers;
specifically zero, one and two QTL linked to the markers. Model selection from
among these linkage models is done using data simulated under four different
designs with map positions and effects. Three different MCMC algorithms are
used to fit a mixed effect model for each data. These MCMC algorithms use
different methods of fitting, such as use of indicator variable in the model,
variable selection approach, and reversible jump MCMC. All three MCMC methods
are found to do well.
Detailed comparisons of these
methods are given. The authors conclude that it is
feasible to fit linked QTL simultaneously using
Bayesian analysis, and that it provides estimates of all genetic parameters
and can fit alternative QTL models.
R. L. DUNBRACK, JR. AND F. E. COHEN (1997).
Bayesian statistical analysis of protein side-chain rotamer preferences.
Protein Science, 6, pp1661-1681.
Contact: Roland L. Dunbrack, Jr., Institute for Cancer Research,
Philadelphia. <rl_dunbrack@fccc.edu>.
A Bayesian analysis is used to account for varying amount of information in the
Protein Data Bank for
backbone dependent rotamer distributions, and to
obtain more complete estimates of these distributions. In addition, Bayesian
analysis is used to provide better estimates of the probability of occurrences
of other rare rotamers. Multinomial models and Dirichlet priors are used.
Parameters of the prior distribution are derived from previous data or from
pooling some of the present data. Model checking is done using a Bayesian
version of p-value calculated by simulating both parameter and data.
G. PARMIGIANI, D. A. BERRY AND O. AGUILAR (1998).
Determining Carrier Probabilities for Breast Cancer Susceptibility
Genes BRCA1 and BRCA2.
American Journal of Human Genetics, 62, pp145-158.
Contact: Giovanni Parmigiani, Duke University,
< gp@stat.duke.edu.edu>.
Breast cancer susceptibility genes BRCA1 and BRCA2 have
recently been identified on the human genome. Women who carry a mutation of
one of these genes have a greatly increased chance of developing breast and
ovarian cancer, and they usually develop the disease at a much younger age,
compared with normal individuals. Women can be tested to see whether they
are carriers. A woman who undergoes genetic counseling before testing can be
told the probabilities that she is a carrier, given her family history. In
this paper we develop a model for evaluating the probabilities that a woman
is a carrier of a mutation of BRCA1 and BRCA2, on the basis of her family
history of breast and ovarian cancer in first- and second-degree relatives.
Of special importance are the relationships of the family members with
cancer, the ages at onset of the diseases, and the ages of family members
who do not have the diseases. This information can be elicited during
genetic counseling and prior to genetic testing. The carrier probabilities
are obtained from Bayes's rule, by use of family history as the evidence and
by use of the mutation prevalences as the prior distribution. In addressing
an individual's carrier probabilities, we incorporate uncertainty about some
of the key inputs of the model, such as the age-specific incidence of
diseases and the overall prevalence of mutations. There is some evidence
that other, undiscovered genes may be important in explaining familial
breast cancer. Users of the current version of the model should be aware of
this limitation. The methodology that we describe can be extended to more
than two genes, should data become available about other genes.
G. M. PETERSEN, G. PARMIGIANI AND D. THOMAS (1998).
Missense Mutations in Disease Genes: A Bayesian Approach to
Evaluate Causality.
American Journal of Human Genetics, 62, pp1516-1524.
Contact: Gloria M. Petersen, Johns Hopkins University,
< gpeterse@jhsph.edu>.
The problem of interpreting missense mutations of disease-causing genes
is an increasingly important one. Because these point mutations result in
alteration of only a single amino acid of the protein product, it is often
unclear whether this change alone is sufficient to cause disease. We propose
a Bayesian approach that utilizes genetic information on affected relatives
in families ascertained through known missense-mutation carriers. This
method is useful in evaluating known disease genes for common disease
phenotypes, such as breast cancer or colorectal cancer. The posterior
probability that a missense mutation is disease causing is conditioned on
the relationship of the relatives to the proband, the population frequency
of the mutation, and the phenocopy rate of the disease. The approach is
demonstrated in two cancer
data sets: BRCA1 R841W and APC I1307K. In both
examples, this method helps establish that these mutations are likely to be
disease causing, with Bayes factors in favor of causality of 5.09 and 66.97,
respectively, and posterior probabilities of .836 and .985. We also develop
a simple approximation for rare alleles and consider the case of unknown
penetrance and allele frequency.
J. S. LIU AND C. E. LAWRENCE (1999).
Bayesian Inference on biopolymer models.
Bioinformatics, 15, pp38-52.
Contact: Jun S. Liu, Stanford University,
<jliu@stat.stanford.edu>.
This article introduces the Bayesian methods and its use to researchers in
bioinformatics. The article gives a tutorial introduction to Bayesian methods
using an example involving data from tossing two different coins. This example
is then further extended to illustrate application in bioinformatics using
two specific examples: sequence segmentation and global sequence alignment.
The authors state that the need for setting parameter values has been the
subject of much discussion, and that a distinct advantage of the
Bayesian method is the added modeling flexibility in the specification of
parameters. The authors comment that the rich history of computation in
bioinformatics such as dynamic programming recursions can be modified to
complete the high dimensional computation required by the Bayesian methods,
and that through the use of these recursions, the full power of the Bayesian
methodology can be brought to bear on a wide range of problems previously
addressed by dynamic programming.
Return to the main page