Development and application of open-source research tools for computational biology

Efficient Analysis of Microbiome Data

 I recently described a more efficient approach to the statistical analysis of microbiome count data using methods adapted from RNA-Seq analysis. In simulations these methods far outperformed the common approaches based on simple proportions or a subsampling technique called rarefying.  published in PLoS Computational Biology:  McMurdie and Holmes. Waste Not, Want Not: Why Rarefying Microbiome Data is Statistically Inadmissible(2014) PLoS Computational Biology in press  Pre-print version:  McMurdie and Holmes. Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible(2013) q-bio arXiv PDF version 2 (12 Dec 2013), PDF version 1 (1 Oct 2013)



phyloseq: Reproducible Analysis for Microbiome Census Data

I am interested in data management, multiple-testing, exploratory analysis, and other statistical interpretations of microbiome census data in studies such as the human gut microbiome, microbial mats, and biofuel feedstock degradation communities. To help facilitate a need for easier, more reproducible statistical analysis of this highly multivariate, multicomponent data, I have created a new open-source R package, phyloseq, that  provides a set of tools for importing, organizing, filtering, analyzing, and graphically-summarizing phylogenetic sequencing data. The phyloseq package leverages many of the tools available in R for ecological/phylogenetic analysis, graphics, statistics, and parallel/cloud computing, with emphasis on flexible publication-quality graphics built with a powerful implementation of the Grammar of Graphics called ggplot2

Latest Peer-Reviewed Article about phyloseq:

phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data


The phyloseq package is a completely open-source software tool, licensed under AGPL-3.  The release version of phyloseq is available through the Bioconductor Repository.  The development version of phyloseq is available for use and collaborative development through GitHub, where we also have hosted tutorial and demo materials, as well as the phyloseq feature request and issue tracker.   

Previous Research

My doctoral research was focused on a group of bacteria, called Dehalococcoides, that can destroy chlorinated organic pollutants, including chloroethenes - the most commonly detected pollutants in contaminated groundwater. The prevalence of chloroethenes in our nation's aquifers is primarily due to the use of tetrachloroethene (PCE) in dry-cleaning, as well as extensive use of trichloroethene (TCE) in industrial degreasing. In particular, I am interested in the evolution of the enzymes responsible for this ability of Dehalococcoides to respire (think: "breathe") chlorinated compounds, and the surprising extent to which the Dehalococcoides genome and ecological niche is specialized toward respiration of organochlorine compounds. 

Caltrain Updates

I'm an avid commute cyclist and Caltrain user.   


!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);;js.src=p+"://";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");