Pinus sibirica and Larix sibirica whole genome de novo sequencing
The Siberian larch (Larix sibirica Ledeb.) and Siberian pine (Pinus sibirica Du Tour.) nuclear and organelle genomes are being de novo sequenced in the Laboratory of Forest Genomics at the Genome Research and Education Center of the Siberian Federal University using Illumina HiSeq 2000 and MiSeq, and their first draft genome assemblies were generated (http://genome.sfu-kras.ru/en/main). Estimated genome size was 12.03 Gbp for Siberian larch and 28.90 Gbp for Siberian pine. DNAs isolated from needles, single megagametophytes and a haploid tissue culture of a reference larch tree and from needles and single megagametophytes of a reference pine tree were used to generate multiple PE libraries with 250, 400 and 500 bp long inserts and MPE libraries representing 3 and 5 Kbp long fragments. We tested CLC Assembly Cell, ABySS and MaSuRCA assemblers that were used in the similar conifer genome sequencing projects. The assembling was done using the IBM x3950 x6 server with 96 cores and 3 TB RAM. ABySS was the most stable, but the best assemblies were generated by CLC Assembly Cell. The best Siberian larch genome assembly was ~5.5 Gbp long (that is 46% of the expected complete genome length) with N50 for contigs equaled 1947 bp. Almost all Siberian pine short reads were successfully mapped to the draft genome assembly v1.0 of closely related sugar pine (Pinus lambertiana Dougl.) generated in the PineRefSeq project (http://pinegenome.org/pinerefseq) covering more than 80% of the assembly (~21.26 Gbp). Thus, the reference-based together with de novo assembly approaches resulted in a draft genome assembly of Siberian pine with a total length of ~22.9 Gbp (79% of the expected complete genome length) with N50 for contigs equaled 2352 bp. About 80% of Siberian larch and pine nuclear genomes consisted of highly repetitive DNA. For the first time the chloroplast genome of Siberian larch has been assembled and annotated. For Siberian pine we completed the partial chloroplast genome assembly available in Genbank (FJ899558.1) by closing all gaps. The draft assemblies of mitochondrial genomes for these species have been also generated. The larch transcriptome assembly consisted of 43717 unigenes with a total length of ~26 Mbp. The longest unigene was 8512 bp; N50 = 1330 bp, and the number of unigenes longer than 1 Kbp was 6919. The obtained transcriptome assembly was similar to other published conifer transcriptomes. This study was supported by Research Grant No. 14.Y26.31.0004 from the Government of the Russian Federation.