effective large scale coalescent simulations
we are involved in the development of the open source software msprime maintained by Jerome Kelleher. msprime uses succinct tree sequences to speed up the simulation of ancestral relationships.
We have implemented the gene conversion mechanism into msprime and will continue to incorporate further features that are particularly relevant for microbial evolution.
Have a look at the msprime documentation here.
supervised feature selection for ancestry informative markers
With Peter Pfaffelhuber, Franziska Grundner-Culemann and Veronika Lipphardt we created AIMsetfinder.
AIMsetfinder is a supervised feature selection approach to identify sets of Ancestry Informative Markers (AIMs), that minimize the logloss error of a naive Bayes classifier.
Have a look at what we have done at github
standardizing population genetic simulations
we are a member of the PopSIm Consortium, aiming to standardize population genetic simulations for frequently studied model organisms and species, including humans.
As a result, the software stdpopsim allows to a) easily re-simulate published population models for many species and b) compare the results of new inference methods against a standardized benchmark.
Have a look at the stdpopsim documentation here.
Pan-genome Analysis and Exploration
Richard Neher, Wei Ding, and I created a pipeline to automatically analyze pan-genomes. The most outstanding part of this project is the visualization of the pan-genome in a browser. It is now very easy to explore the pangenome and search for certain genes or features within a pan-genome.
Have a look at what we have done at this demo webpage.
(formerly known as IMaGe)
Panicmage is a shortcut for "pangenome analyzer for infinitely - which means considerably - many genes".
The name changed from IMaGe to panicmage, since "image" is possibly one of the most stupid words to search for on google. In addition, the new name emphasizes that in our model, while there are infinitely many possibly existing genes, at any time a finite number of genes exists in the population.
- a genealogy
- the gene frequency spectrum
- the number of generations to the most recent common ancestor (optional),
- the parameters of a neutral Infinitely Many Genes Model (gene gain and gene loss rates)
- the number of core genes of the whole population
- the expected number of new genes found in the next sequenced strain
- the size of the persistent pangenome
- the size of the total pangenome.
In addition, panicmage computes the p-value of gene frequency spectra for a given genealogy under neutral evolution and can simulate distributed genomes. So far p-values for neutral evolution and for existing sampling bias can be computed.
To install panicmage please visit the panicmage GitHub repository.