Software

ERA: Efficient Serial and Parallel Suffix Tree Construction for Very Long Strings

I implemented the complete efficient system for “ERA”, constructing suffix trees for very long strings. ERA indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer. I implemented these efficient variants of the system: a) Serial: for single-core processor. b) Parallel shared-memory: for multicore processor. c) Parallel shared-nothing: for linux cluster.

Karect: Accurate Correction of Substitution, Insertion and Deletion Errors for Next-generation Sequencing Data

I implemented the complete efficient system for "Karect". Karect is a novel error correction technique for next-generation sequencing, based on multiple alignment, supporting substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome.