Регистрация / Вход
Прислать материал

SPAdes genome assembler

Name
Andrey
Surname
Prjibelski
Scientific organization
St. Petersburg State University
Academic degree
None, PhD student
Position
Researcher
Scientific discipline
Life Sciences & Medicine
Topic
SPAdes genome assembler
Abstract
Originally, we have developed a genome assembler tool (SPAdes) for the purpose of overcoming the complications associated with single-cell microbial data. Later, SPAdes was recognized by the scientific community as one of the best assemblers for bacterial data sets. This fact inspired us to extend the capabilities of SPAdes to include additional sequencing platforms (e.g. PacBio and Oxford Nanopore) and to develop a set of novel software tools for various purposes: assembly of highly polymorphic genomes, metagenome assembly, plasmid assembly, transcriptome assembly etc.
Keywords
sequencing, genomics, genome assembly, single-cell sequencing, metagenomics, transcriptomics
Summary

Despite all the efforts, de novo genome assembly is a complex task that so far remains unsolved. The assembler tool SPAdes [1, 2] was originally developed by the researchers from Center of Algorithmic Biotechnology (St. Petersburg State University) for the purpose of overcoming the complications associated with single-cell microbial data obtained via Multiple Displacement Amplification [3]. In contrast to conventional genome sequencing, this type of data is characterized by the uneven read coverage, increased level of errors and chimerical reads. SPAdes was able to successfully resolve these issues for Illumina reads and was recognized by the scientific community as one of the best assemblers working with both isolates and single-cell data [4]. Even though the assembler was specifically designed to work solely with microbial genomes, scientists have tested the tool on a large number of different types of other data (e.g. metagenomic data, larger genomes etc). Their efforts and feedback have inspired us to extend the capabilities of SPAdes to include additional platforms (Ion Torrent, Pacific Biosciences, Oxford Nanopore, Illumina TruSeq), combinations of platforms, and to develop a set of novel software tools for various purposes: assembly of highly polymorphic genomes [5], metagenome assembly [6], plasmid assembly [7] and de novo transcriptome assembly from RNA-Seq data.

In this work we discuss the origin of single-cell bacterial sequencing and the main challenges in assembling such kind of data. We briefly describe the developed SPAdes pipeline and core algorithmic ideas that allowed us to successfully address the problem of de novo assembly using single-cell sequencing [2] and later to develop a set novel SPAdes-based tool for assembling various types of sequencing data [5, 6, 7, 8, 9, 10, 11]. In conclusion we discuss the current progress and future plans for assembly-related projects in our lab.

 

References

  1. Bankevich, A., et al. "SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing." Journal of Computational Biology 19.5 (2012): 455-477.

  2. Nurk, S., et al. "Assembling single-cell genomes and mini-metagenomes from chimeric MDA products." Journal of Computational Biology 20.10 (2013): 714-737.

  3. Lasken, R.S. "Single-cell genomic sequencing using multiple displacement amplification." Current opinion in microbiology 10.5 (2007): 510-516.

  4. Magoc, T., et al. "GAGE-B: an evaluation of genome assemblers for bacterial organisms." Bioinformatics 29.14 (2013): 1718-1725.

  5. Safonova, Y., et al. "dipSPAdes: assembler for highly polymorphic diploid genomes." Journal of Computational Biology 22.6 (2015): 528-545.

  6. Nurk, S., et al. "metaSPAdes: a new versatile de novo metagenomics assembler." arXiv preprint arXiv:1604.03071 (2016).

  7. Antipov, D., et al. "plasmidSPAdes: Assembling Plasmids from Whole Genome Sequencing Data." bioRxiv (2016): 048942.

  8. Prjibelski, A.D., et al. "ExSPAnder: a universal repeat resolver for DNA fragment assembly." Bioinformatics 30.12 (2014): i293-i301.

  9. Vasilinetc, I., et al. "Assembling short reads from jumping libraries with large insert sizes." Bioinformatics 31.20 (2015): 3262-3268.

  10. Antipov, D., et al. "hybridSPAdes: an algorithm for hybrid assembly of short and long reads." Bioinformatics (2015): btv688.

  11. Bankevich, A. and Pevzner, P.A. "TruSPAdes: barcode assembly of TruSeq synthetic long reads." Nature methods (2016).