Usenet.com

www.Usenet.com

Group Index

Sci Thread Archive from Usenet.com

<-- __Chronological__ --> <-- __Thread__ -->

Re: About the Bootstrap...





Gilles Escarguel wrote:

iii. A parametric bootstrap generates its pseudoreplicate data sets
using an explicit model of evolution abstracted from the data, rather
than resampling the actual data as in a non-parametric bootstrap. If,
for example, I wanted to get bootstrapped data sets for a set of DNA
sequences, I might first estimate a tree using some phylogenetic
analysis program, then estimate evolutionary parameters over that tree
(transition/transversion ratio, proportion of invariable sites, base
composition, etc.), and then plug those parameters and that tree into an
evolution simulation program to get a new data set. Repeated many times,
I would have a set of pseudoreplicate data sets. Those pseudoreplicates
can then be used for all sorts of purposes. See this for example:

Huelsenbeck, J. P., D. M. Hillis, and R. Jones. 1996. Parametric
bootstrapping in molecular phylogenetics: Applications and performance.
Pages 19-45 in  Molecular zoology: Advances, strategies, and protocols
(J. D. Ferraris and S. R. Palumbi, eds.). Wiley.


Many thanks for this. The procedure you describe is what I put in the solution # i -- sorry: I didn't explain it very well.

That's OK for me, but consequently, how to name -- if a specific name
exists... -- the parametric (as based on parametric estimates of descriptors
uncertainties) Monte-Carlo resampling procedure described in (ii)?


I have no suggestion. I only have a vague notion of what you mean by this. But is it anything like this?: Marshall, C. R. 1991. Statistical tests and bootstrapping: assessing the reliability of phylogenies based on distance data. Mol. Biol. Evol. 8:386-391. Or this: Krajewski, C., and A. W. Dickerman. 1990. Bootstrap analysis of phylogenetic trees derived from DNA hybridization distances. Syst. Zool. 39:383-390. One of them (I forget which) fills pseudoreplicate cells by choosing from a normal distribution with mean at the real value and standard deviation determined by experimental measurements. (The other picks at random from among experimental measurements for that cell.)

Actually,
in the case of biometrical and/or morphometrical data, the procedure you
describe seems to me hard to apply, even if an explicit model of biometrical
and/or morphometrical evolution can be formulated from the data (e.g., taking
into account the variances and covariances between descriptors by using the
Mahalanobis generalized distance).


If you want to use between-character covariances you will have to write your own program. Otherwise there are a number of programs that can evolve characters on a tree using brownian motion models. But all of them assume independence among traits.

The procedure proposed in (ii) is much
more easy and straightforward, but I currently don't know any paper dealing
with it on a theoretical basis. For instance, when I apply this procedure on
real data sets (I have done the programs for this), and I compare estimated
CL with CL estimates obtained by classical nonparametric bootstrap, it (the
ii procedure) appears to return significantly higher CL estimates, and thus
to be significantly less conservative than the nonparametric bootstrap
technique. But how to interpret such differences?


The standard method is to use simulations to assess the behavior of data sets when the true tree is known. That's a lot of work, but I don't see another way. So it looks once more as if you need a model of evolution and a program to implement it over a tree. How much that model matches the behavior of real data is always the problem.




<-- __Chronological__ --> <-- __Thread__ -->


Usenet.com



Please check out one of the premium Usenet Newsgroup Service Providers below for access to Usenet.