I'm finding that with large taxon samples that the node probabilities
continue to change long after the tree likelihoods stabilize (i.e. after
the burnin period). Dave Swofford reported this same phenomenon at the
Evolution meeting in Chico. It doesn't, however, appear that the change
is monotonic. In my experience at least, the probabilities just wander
around. I suspect that this is due to autocorrelation of the chain over
short periods; unfortunately, "short" with a big enough taxon sample can
be a million generations or more. That is, the chain isn't exploring a
wide enough region of treespace over this period to get a valid sample.
How much is enough would seem to depend on the number of taxa, since as
I understand it Mr. Bayes tweaks only one or two parameters per
generation, and a branch swap is one of those tweaks. This may be
exacerbated by the fact that the version I have used, only now in the
process of being replaced, used NNI interchanges only. (I understand
that the newest version uses SPR.)
So for any number of taxa, there should be a number of generations that
is enough to get a representative sample, swamping any autocorrelation
problems, and that number should increase as the number of taxa
increases. This would perhaps be more serious if the likelihood peak is
broad, or, worse, if there were multiple peaks.
Thoughts? Experiences?
(I've been accumulating this impression with runs of 10-20 million
generations and data sets around 150 taxa.)