The Exelixis Lab


Enabling Research in Evolutionary Biology

PTP - a Poisson Tree Processes (PTP) model to infer putative species boundaries on a given phylogenetic input tree.


Introduction

PTP is a model for delimiting species on a rooted phylogenetic tree. In PTP, we model speciations or branching events in terms of number of substitutions. So it only requires a phylogenetic input tree, for example the output of RAxML. To be more clear, the branch lengths should represent number of substitutions.

In general, if you have single locus molecular data (such as 16S, 18S, ITS) and want to delimit species based on these sequences, you can try to run PTP on your data.

Relationship to GMYC

A close relative of PTP is the GMYC model. The GMYC model require an ultrametric tree as input, in other words, you must time calibrate your phylogenetic tree before using GMYC. However, this is known to be a difficult task. The most commonly used programs for getting an ultrametric tree are BEAST, DPPDIV and r8s. Note that after the calibration, the branch lengths should represent time. PTP has completely avoided this erroness procedure, only a simple phylogenetic tree is required. Our numerous tests show PTP outperferms GMYC on simulation data, and PTP results are comparable to GMYC on real data sets.

You can find a python implementation of the single threshold GMYC model in my GitHub repository. The original R implementation can be downloaded here. Be aware the results from Python and R implementation might differ slightly, this is due to different ways of parameter optimizations. Also note the input tree should be strictly ultrametric and bifurcating (with no zero branch lengths).

Relationship to OTU-picking

PTP can delimit species based on the Phylogenetic Species Concept. So the entities output by PTP are in theory species. OTU-pikcing by its definition should delimit Operational Taxonomic Unit. In essence, OTUs are sequence clusters, OTU-picking methods are clustering algorithm applied to sequences. In some cases species and OTUs are the same, this is because the population size is small and birth rate is low. In such cases, species are well seperated and nature sequence clusters corresponding to species. When sequence clusters do not exist, OTU-picking methods will inevitabley fail. But we show that PTP can still give resonable results when OTU-pikcing methods fail.

I also implemented an experimental pipeline that can delimit speices on NGS data (e.g. 454 sequencing of 16S). It is similar to the so called open reference OTU-picking. I first run EPA to place the query sequences onto the tree, then each placement is evaluated independently to count the number of species. Please read our paper for details.

Python code

Please find the up-to-date code and user manual at my GitHub repository.

Web server

A simple web server for PTP is here: http://species.h-its.org/ptp/
The server will accept a phylogenetic tree as input and output the species delimitation results.

User support and contact

For general questions, please post on the PTP google group.
If you find some bugs or want to discuss with me, here is my e-mail: bestzhangjiajie[at]gmail[dot]com. I am here to help!

About me

My name is Jiajie Zhang, and currently a PhD student of Prof. Alexandros Stamatakis.