I implemented the complete efficient system for “ERA”, constructing suffix trees for very long strings. ERA indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer. I implemented these efficient variants of the system: a) Serial: for single-core processor. b) Parallel shared-memory: for multicore processor. c) Parallel shared-nothing: for linux cluster.
I implemented the complete efficient system for "Karect". Karect is a novel error correction technique for next-generation sequencing, based on multiple alignment, supporting substitution, insertion and deletion errors. It can handle non-uniform coverage as well as moderately covered areas of the sequenced genome.