Average customer rating:
|
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
R. Durbin Manufacturer: Cambridge University Press ProductGroup: Book Binding: Paperback Similar Items:
ASIN: 0521629713 |
Book Description
Probablistic models are becoming increasingly important in analyzing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analyzing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it is accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time presents the state of the art in this new and important field.Customer Reviews:
Great reference.......2007-09-06
One of the best available.......2007-08-17
Biological Sequence Analysis.......2006-03-07
Truly an Excellent Book.......2006-02-18
Excellent book ... a little boring to read ..........2005-09-30
Average customer rating:
|
Mathematics of Genome Analysis
Jerome K. Percus Manufacturer: Cambridge University Press ProductGroup: Book Binding: Paperback ASIN: 0521585260 |
Book Description
The massive research effort known as the Human Genome Project is an attempt to record the sequence of the three trillion nucleotides that make up the human genome and to identify individual genes within this sequence. The description and classification of sequences is heavily dependent on mathematical and statistical models. This short textbook presents a brief description of several ways in which mathematics and statistics are being used in genome analysis and sequencing.Download Description
The massive research effort known as the Human Genome Project is an attempt to record the sequence of the three trillion nucleotides that make up the human genome and to identify individual genes within this sequence. The description and classification of sequences is heavily dependent on mathematical and statistical models. This short textbook presents a brief description of several ways in which mathematics and statistics are being used in genome analysis and sequencing.Customer Reviews:
Narrow and shallow.......2003-10-10
This brief book does not deliver on the title's promise. It provides a cursory introduction to the assembly problem. That intro is so brief, however, that I don't think a reader will come away understanding what genome assembly is really about.
It continues with a disappointing analysis of nucleotide frequencies. The probability analysis is competent enough, within its limits, but I don't see any mention of why the analysis is interesting, or how to extend it the same techniques proteins. The author proposes spectral analysis as a tool, and argues for Walsh vectors as basis functions. Spectral analysis is offbeat, to say the least, but the author does not explain what (if any) biological insight the technique generates. More mainstream tools, including Markov Models, get little or no mention.
The chapter on sequence comparison is so short and skips so much critical material, that I'm tempted to call it negligent.
Perhaps you have specific reason for wanting the narrow and idiosyncratic view that Percus brings. ...
Short but helpful.......2002-01-02
The first section is a brief overview of the structure of DNA, m-RNA, and t-RNA. Recognizing that DNA is two large for direct analysis, restriction fragments are discussed in the second section, with emphasis on the restriction-enzyme fingerprint. The author's goal is to find the probability of occurences of a 6-letter word in a strand and the mean distance between occurrences of this word (assuming no overlap between the words or the occurences and equal probabilities for the bases). The effect of successive pair correlation (Markov chain effect) is considered briefly. This is followed by a calculation of the probability that a base pair is contained in a given clone. The author omits any discussion of algorithms for optical mapping, but does give a brief discussion of restriction maps.
The mathematics becomes more rigorous in chapter two, wherein the author analyzes a chain that exists as a set of cloned subchains with unknown overlap. This is the 'fingerprint assembly' problem the object of which is to produce a physical map of the full sequence. The fingerprint of the clone is a collection of lengths of a particular restriction fragments. This algorithm involves a sequence of contiguous clones called 'islands'; and 'contigs', which are two or more clones. The average number and size of islands are calculated assuming that the clones have equal length and identical overlap threshold. The method of anchoring is also discussed as a second method for obtaining the physical map of the genome. The author then considers the problem of covering the whole sequence by first placing n markers on a genome and covering by intervals centered at these markers. This is the restriction-fragment-length polymorphism analysis, the combinatorics of which the author solves by using Laplace and Fourier transforms. He also considers adaptive and non-adaptive pooling, in order to find a particular set of proteins on a large fragment.
The third chapter addresses sequence statistics, with the author addressing the nonhomogeneity of sequences and the correlation dependence in the bases. The chi-square test is discussed is some detail and the author discusses the accuracy of the Markov chain assumption. Noting that very long chains would be needed to determine the parameters for the expressions for the conditional correlations, he uses the maximum likelihood method to find the intrinsic correlation length, and then estimates the parameters by modeling the parameter set.
The author then studies the isochore regions and discusses their detection via the Jensen-Shannon entropy. Asking whether there are correlations between these long regions and within them motivates him to consider the long-range properties of DNA. This leads to the examination of a long fragment of a single strand of DNA, and with the assumption that strand-symmetry holds, the correlation coefficients are studied, with the decay properties of the auto- and cross-correlation discussed. Then, distinguishing only dual pairs, the author considers the probability that a pair is separated by an integer after an integral number of steps, a calculation that reduces to finding the largest eigenvalue of a 'transfer matrix', a procedure well-known in statistical physics.
Next, a consideration of simple sequence repeats leads to a difference equation that is solved by the method of moments. Windows of bases are then discussed, in order to improve on the statistics. Correlations within and between windows are calculated. Interestingly, the consideration of long-range correlations gives a power-law dependence for the correlations, which is related to the Hurst index for self-similar patterns. Readers get their first taste of hidden Markov models in this chapter, which are currently very popular in sequence analysis. Even more interesting is the discussion of walking Markov models, wherein a first-order base-to-base Markov chain is chosen to depend on a hidden parameter, and the time evolution is shown to satisfy a Fokker-Planck (diffusion) equation. Spectral analysis and information theoretic criteria are also discussed.
In the next chapter of the book, the author considers the most important part of sequence analysis, namely the comparison between sequences according to their linear ordering. The problem is to find the probability of a common subsequence of two linear chains with a given length. The first calculation assumes that the matches are mutually exclusive, and the result is an upper bound on the probability. The author then considers the matches to be independent events, and again bounds are given for the probability, the so-called Chen-Stein estimate). He also gives an estimate of the probability in terms of an asymptotic series. Extreme value methods are then used to calculate the expectation value and the variance of the length of the longest match. An interesting exercise is assigned for the reader; namely of finding the effect on the Fourier and Walsh power spectrum with the assumption that the base correlations are fractal in form. The alignment problem is then generalized to include replication errors, mutations, etc. The chapter ends, appropriately, with a discussion of multisequence comparison. The author poses the problem as one of finding the best match of a word to an n-tuple of words, which he tackles first using 'information content'. The category analysis of separating subsequence configurations into clusters is briefly discussed via simulated annealing, discriminant analysis, Bayesian analysis, and neural networks.
The last chapter is a short introduction to the biophysics of DNA. The Hamiltonian for the dynamics of DNA is given, thermal equilibrium is assumed, and the partition function is calculated. This is followed by a discussion of the dynamics at low temperature when the energy is given by RNA polymerase instead of the heat bath, and the dynamics is solved via the Lagrangian using Bessel functions.
Average customer rating:
|
Mathematical and Statistical Methods for Genetic Analysis
Kenneth Lange Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
Accessories:
ASIN: 0387953892 |
Book Description
During the past decade, geneticists have cloned scores of Mendelian disease genes and constructed a rough draft of the entire human genome. The unprecedented insights into human disease and evolution offered by mapping, cloning, and sequencing will transform medicine and agriculture. This revolution depends vitally on the contributions of applied mathematicians, statisticians, and computer scientists. Mathematical and Statistical Methods for Genetic Analysis is written to equip students in the mathematical sciences to understand and model the epidemiological and experimental data encountered in genetics research. Mathematical, statistical, and computational principles relevant to this task are developed hand in hand with applications to population genetics, gene mapping, risk prediction, testing of epidemiological hypotheses, molecular evolution, and DNA sequence analysis. Many specialized topics are covered that are currently accessible only in journal articles. This second edition expands the original edition by over 100 pages and includes new material on DNA sequence analysis, diffusion processes, binding domain identification, Bayesian estimation of haplotype frequencies, case-control association studies, the gamete competition model, QTL mapping and factor analysis, the Lander-Green-Kruglyak algorithm of pedigree analysis, and codon and rate variation models in molecular phylogeny. Sprinkled throughout the chapters are many new problems. Kenneth Lange is Professor of Biomathematics and Human Genetics at the UCLA School of Medicine. At various times during his career, he has held appointments at the University of New Hampshire, MIT, Harvard, and the University of Michigan. While at the University of Michigan, he was the Pharmacia & Upjohn Foundation Professor of Biostatistics. His research interests include human genetics, population modeling, biomedical imaging, computational statistics, and applied stochastic processes. Springer-Verlag published his book Numerical Analysis for Statisticians in 1999.Customer Reviews:
Read this book and you might learn something.......2004-05-21
OK, but not for me.......2003-09-24
It seems to be a pretty good presentation of population genetics, the kind of genetics taught in high schools in the 70s. I can't comment on this book's merits, but I can warn the biochem types to spend their money elsewhere.
nice introduction to math. and stoch. models in genetics.......2002-07-09
It is a little disappointing that it does not go into the microarray technology that has become so important for experimentation in the last few years. Other recent books that cover statistical aspects of genetic research are Weir (1996) "Genetic Data Analysis II" Sinauer Associates (publisher) and Yang (2000) "Introduction to Statistical Methods in Modern Genetics" Gordon and Breach Science Publishers.
Mathematical Details of Genetics.......2001-03-01
Mathematical details about genetics.......2001-02-27
Average customer rating:
|
Probability Models for DNA Sequence Evolution
Rick Durrett Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
ASIN: 038795435X |
Book Description
Our basic question is: Given a collection of DNA sequences, what underlying forces are responsible for the observed patterns of variability? To approach this question we introduce and analyze a number of probability models: the Wright-Fisher model, the coalescent, the infinite alleles model, and the infinite sites model. We study the complications that come from nonconstant population size, recombination, population subdivision, and three forms of natural selection: directional selection, balancing selection, and background selection. These theoretical results set the stage for the investigation of various statistical tests to detect departures from "neutral evolution." The final chapter studies the evolution of whole genomes by chromosomal inversions, reciprocal translocations, and genome duplication. Throughout the book, the theory is developed in close connection with data from more than 60 experimental studies from the biology literature that illustrate the use of these results. This book is written for mathematicians and for biologists alike. We assume no previous knowledge of concepts from biology and only a basic knowledge of probability: a one semester undergraduate course and some familiarity with Markov chains and Poisson processes. Rick Durrett received his Ph.D. in operations research from Stanford University in 1976. He taught in the UCLA mathematics department before coming to Cornell in 1985. He is the author of six books and 125 research papers, and is the academic father of more than 30 Ph.D. students. His current interests are the use of probability models in genetics and ecology, and decreasing the mean and variance of his golf score.Customer Reviews:
complicated genetics explained through probability.......2002-08-30
Rick has been an academic his entire career and has made major contributions in applied probability. He also explains things in very intuitive ways and has thus been able to publish a number of successful books on various aspects of probability theory and stochastic processes. In this book Rick presents a variety of probability models to explain the evolution of DNA sequences that are found in humans. This represents an interesting and important area of research that has tremendous impact on medical treatments, pharmaceuticals and genetic engineering. Most of all it has the rigorous touch that Rick always gives to his work.
Average customer rating: |
Probability Models and Statistical Methods in Genetics (Wiley Series in Probability & Mathematical Statistics)
Regina C.Elandt- Johnson Manufacturer: John Wiley & Sons Inc ProductGroup: Book Binding: Hardcover ASIN: 0471234907 |
Average customer rating: |
Modern Developments in Theoretical Population Genetics
Montgomery, Ed. Slatkin Manufacturer: Oxford University Press ProductGroup: Book Binding: Paperback ASIN: 0198599633 |
Book Description
Slatkin and Veuille invite leading population geneticists to summarise many of the recent developments in population genetics theory and its application to genetic data. The book has been assembled in honour of the late Gustave Malecot, one of the pioneers of theoretical population genetics. Whilst early chapters summarise Malecot's life and scientific contributions, the rest of the book is devoted to topics that trace their origin in Malecot's work. Several of the contributions describe recent developments in the coalescent theory, which can be viewed as a generalisation of Malecot's method for analysing identity by descent. Other chapters discuss recent developments in the study of geographic variation, genetic linkage, and allele age. The diversity of topics and the effectiveness with which various theoretical methods can be applied to DNA sequence data illustrates both the increasing relevance of theoretical population genetics and the depth of Malecot's insight into fundamental genetic processes. This exciting work will be of interest to population and statistical geneticists as well as a wider audience of evolutionary biologists.
Average customer rating:
|
Computational and Statistical Approaches to Genomics
Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
ASIN: 1402070233 |
Book Description
At the beginning of the post-sequencing era, biology must now work with the enormous amounts of quantitative data being amassed and must render complex problems in mathematical terms, with all of the computational effort that entails. This phenomenon is perhaps best exemplified by the interdisciplinary scientific activity caused by the advent of high-throughput cDNA microarray technology, which facilitates large-scale surveys of gene expression. Biologists must now work together with engineers, statisticians, computer scientists, and other specialists, in order to attain a holistic understanding of the complex relationship between genes within the genome and uncover genetic function and regulation.
Computational and Statistical Genomics aims to help researchers deal with current genomic challenges. Topics covered include:
This book is for any researcher, in academia and industry, in biology, computer science, statistics, or engineering, involved in genomic problems. It could also be used as an advanced level textbook in a course focusing on genomic signals, information processing, or genome biology.
Customer Reviews:
Review from the journal of Cancer Biology and Therapy.......2002-07-12
Reviewed by- Michael N. Liebman
Average customer rating: |
Experimental Design, Statistical Models, and Genetic Statistics (Statistics, a Series of Textbooks and Monographs)
Hinkelmann Manufacturer: CRC ProductGroup: Book Binding: Hardcover ASIN: 0824771516 |
Average customer rating:
|
Hidden Markov Models for Bioinformatics (Computational Biology)
T. Koski Manufacturer: Springer ProductGroup: Book Binding: Hardcover Similar Items:
ASIN: 1402001355 |
Book Description
The purpose of this book is to give a thorough and systematic introduction to probabilistic modeling in bioinformatics. The book contains a mathematically strict and extensive presentation of the kind of probabilistic models that have turned out to be useful in genome analysis. Questions of parametric inference, selection between model families, and various architectures are treated. Several examples are given of known architectures (e.g., profile HMM) used in genome analysis.Customer Reviews:
Written by a mathematician for mathematicians.......2004-03-11
I wanted a book with a mathematical sophistication simliar to Durbin's book, but this book is way more than that. On the other hand, I showed this book to a mathematics graduate student and she said this book is perfect for her. So I guess this book is written by a mathematician only for mathematicians.
Good material, but you really have to want it........2003-10-10
This additional depth of coverage may go beyond many readers' needs. It is very helpful, though, for people who need more than the usual algorithms. By giving the background in such detail, a persistent reader can follow to a certain point, then create modifications with a clear idea of where the new algorithm actually comes from.
Regarding the current practice of HMM usage, I found it a bit thin. Widely-known tools based on HMMs are mentioned only occasionally and in passing, and HMM-based alignment is discussed only briefly. Well, this book isn't for the tool user. Perhaps more important, I found scant mention of scoring with respect to some background probability model ("null" model, as it's called here).
My one real complaint, and this is truly minor, is the quality of illustration. The line-drawings look like Word pictures - not necessarily a bad thing, if done well. These aren't particularly professional-looking, though, and oddly stretched or squashed in many cases. Still, they're readable enough and make all the needed points.
A lesser point, and not the author's fault, is the editorial implication that this book introduces probabilitic models in general. It does not. This is strictly about HMMs, not Bayesian nets, bootstrap techniques, or any of the dozens of other probabilistic models used in bioinformatics. That is not a flaw of the book, just a flaw in how it's represented.
If you are dedicated to becoming an expert in HMM construction and application, you must have this book. It's a bit much, though, for people who just want the results that HMMs give.
Primarily for bio-mathematicians.......2003-07-01
Some of the highlights of the book include: 1. An overview of the probability theory to be used in the book. The material is fairly standard, including a review of continuous and discrete random variables, from the measure-theoretic point of view, i.e the author introduces them via a probability space which is set with its sigma field, and a probability measure on this field. The weight matrix or "profile" as it is sometimes called, is defined, this having many applications in bioinformatics. Bayesian learning is also discussed, and the author introduces what he calls the "missing information principle", and is fundamental to the probabilistic modeling of biological sequences. Applications of probability theory to DNA analysis are discussed, including shotgun assembly and the distribution of fragment lengths from restriction digests. A collection of interesting exercises is included at the end of the chapter, particularly the one on the null model for pairwise alignments. 2. An introduction to information theory and the relative entropy or "Kullback distance", the latter of which is used to learn sequence models from data. The author defines the mutual information between two probability distributions and the entropy, and calculates the latter for random DNA. He also proves some of the Shannon source coding theorems, one being the convergence to the entropy for independent, identically distributed random variables. The Kullback distance is then defined, as a distance between probability distributions, with the caution that it is not a metric because of lack of symmetry. 3. The overview of probabilistic learning theory, where 'learning from data' is defined as the process of inferring a general principle from observations of instances. 4. The very detailed treatment of the EM algorithm, including the discussion of a model for fragments with motifs. 5. The discussion of alignment and scoring, especially that of global similarity. Local alignment is treated in the exercises. 6. The discussion of the learning of Markov chains via Bayesian modeling applied to a training sequence via a family of Markov models. Frame dependent Markov chains are discussed in the context of Markovian models for DNA sequences. 7. The discussion of influence diagrams and nonstandard hidden Markov models, in particular the excellent diagrams drawn to illustrate the main properties, and excellent discussion is given of an "HMM with duration" in the context of the functional units of a eukaryotic gene. This is important in the GeneMark:hmm software available. 8. The treatment of motif-based HMM, in particular the discussion of the approximate common substring problem. 9. The discussion of the "quasi-stationary" property of some chains and the connection with the "Yaglom limit". 10. The treatment of Derin's formula for the smoothing posterior probability of a standard HMM. The author shows in detail that the probability of a finite length emitted sequence conditioned on a state sequence of the HMM depends only on a subsequence of the state sequence. 11. The treatment of the lumping of Markov chains, i.e. the question as to whether a function of a Markov chain is another Markov chain. 12. The very detailed treatment of the Forward-Backward algorithm and the Viterbi algorithm. 13. The discussion of the learning problem via the quasi-log likelihood function for HMM. 14. The discussion of the limit points for the Baum-Welch algorithm. Since the Baum-Welch algorithm deals with iterations of a map, its convergence can be proved by finding the fixed points of this map. These fixed points are in fact the stationary points of the likelihood function and can be related to the convergence of the algorithm via the Zangwill theory of algorithms. Unfortunately the author does not give the details of the Zangwill theory, but instead delegates it to the references (via an exercise). The Zangwill theory can be discussed in the context of nonlinear programming, with generalizations of it occurring in the field of nonlinear functional analysis. It might be interesting to investigate whether the properties of hidden Markov models, especially their rigorous statistical properties, can all be discussed in the context of nonlinear functional analysis.
Books:
Recommended Books