So in the world of molecular evolution one of the cu-de-gras-de-analysis programs would have to be BEAST. A power-packed bayesian analysis software that makes phylogenetic trees, calculates the time to the most recent common ancestor (tMRCA) and substitution rates, geographic partitioning, can handle copious amounts of data and pretty much squeezes blood from a turnip…walks on water…heals your mother, in short, it’s cool.
Sound awesome? It is. For a more technical in depth discussion and introduction to BEAST software I suggest reading the Wiki and attacking the tutorials with full force as well as reading some awesome books on phylogenetic inference such as The Phylogenetic Handbook. If you are super impatient and channeling your inner terrible twos about phylogenetic analysis then read Phylogenetic Trees Made Easy. It has a nice introduction and literal button for button how tos on different software packages including Bayesian ones. When you’ve finished your tantrum, enter the adult world and read Felsenstein or the phylogenetic handbook mentioned above. Now that you’ve been introduced to phylogenetic inference and genetic analysis with forays into evolution over time…jump into BEAST. Although the BEAST wiki and manual are still navigate-able without that background but you’ll be scratching your head a bit and heading to google for answers.
Now the caveats to BEAST are that if you have a lot of sequence data and a cheap low running computer…perhaps you’ll get output from a run of 100 million generations/iterations in oh say…when Christ comes again. The default setting in BEAST is 10 million generations which on my lovely military issued computer from the dark ages takes a good 2-4 hours to run depending on the dataset…2 hours for 50 sequences of length 1701 nucleotides with a strict clock and little variation. Unfortunately the ESS values were such an angry red, it was as if they were cursing me from the bayesian matrix beyond. Ok…up to 20 million iterations…10 hrs later still many angry red ESS values and 1 or two mellow yellow warnings…
I checked parameters with two different wonderful individuals with BEAST experience, parameters were tweaked, prior constraints applied or were fine: further suggestions? Run longer…ok! 40 million iterations, 30 hrs later–still correlated values more yellow ESS values and a couple red. I gave up for the time being and attended a conference, where it was suggested that 100 million should be the standard for a BEAST run…I noted it and was ready to rock 100 million iterations.
So like any good scientist I crave data to make my analysis run better and more robustly…enter GenBank Influenza Resource Database! Yes, currently I am working on flu. With some choosing, hemming and hawing, screening and random picking over the course of a year and such I was able to boost my sequences to an informative 472 representing isolates from the globe covering just about every month of the 2009 influenza pandemic; which will put my sequences in better perspective. Now traditionally, I can make a tree of up to 500 sequences with not too much problem using Maximum likelihood approaches/software, so I figured BEAST would be fine. What I didn’t bet on was my slow ass PC computer.
So I figured I’d be smart about this and run the analysis on my Linux box which may have the latest Ubuntu on it (11.04) the computer itself is from B.C., I could’ve sworn the Babylonian map of the world was etched into it…
Estimated time to completion? 4.63 hours/million states x 100 million states = 463 hours = 19 days!!! DOH! Computing power FAIL!!!
Old hardware aside, BEAST analysis takes a long time to run, and until I obtain favor from the grant gods these computers are what I have to work with until I find a free cluster to run the analysis on…see #22 below.
So, lets run down a list of how to keep oneself occupied when not working while BEAST runs:
- Blog about BEAST.
- Maximize your terminal screen and watch BEAST run…iteration 4238000….iteration 4239000, 4240000, 4241000…it’s like a matrix lullaby lulling you to sleep…but not really.
- Go blind watching BEAST run
- Go get coffee…at 7:30a, 9am, 9:30am, 10:30 am, 11am
- Go get lunch…at 11am
- Actually work…
- Go get more coffee
- Hunt the BEAST user group for why BEAST is so damn slow on your computer
- Realize its not BEAST its your damn computer
- Tap your IV of coffee, make sure it’s still flowing.
- Pull ones hair out
- Then put it back together
- Finish work, go
- Go for a run
- Remember you live in hot humid Thailand…don’t go for a run
- Do pushups everytime your BEAST logs to screen, something anything…
- See how many pushups you can do between logs to screen
- Drink a glass of wine…watch BEAST run
- Switch to vodka…watch BEAST run
- Switch to whisky and take a shot everytime BEAST logs to screen
- Pass out with status lines running through your head
- Get to work decide to hunt down free clusters to run BEAST on remotely…
I’m currently on #22
Awesome software…but you need the power to run it…currently searching for ‘more power’! And yes, for those of you who’ve ever watched Home Improvement as kids that was a Tim the tool man Taylor reference.
And for those of you gravely concerned I have no life and never work given the above list I assure you this blog is tongue in cheek. I have an amazing life quite productive in lab work, manuscript writing, data analysis, Asia exploration, wine drinking, amusing my fiance with my antics…and BEAST analysis.