Регистрация / Вход
Прислать материал

Using next generation sequencing data for improvement of eukaryotic gene prediction

Name
Evgeny
Surname
Gerasimov
Scientific organization
Lomonosov Moscow State University
Academic degree
PhD
Position
Scientific Researcher
Scientific discipline
Life Sciences & Medicine
Topic
Using next generation sequencing data for improvement of eukaryotic gene prediction
Abstract
Genome annotation is vital for most genomic analyses. Still the quality of many annotations of eukaryotic genomes remains poor. Two major approaches to annotation rely on HMM ab-initio prediction or on transferring some existing annotation based on homology. Both have drawbacks. Approaches which combine ab-initio prediction with hints from experimental data (mostly, next generation sequencing data) can greatly improve annotation. I will discuss this method and demonstrate the results obtained with our annotation pipeline on genomes of several plant species.
Keywords
next generation sequencing, RNA-seq, genome annotation, gene prediction, gene model
Summary

Being a basic step in most NGS projects annotation is still very inaccurate. For most de-novo annotations ab-initio prediction is used. The method is based on HMM or machine learning algorithms and attempts to output the most probable gene annotation with respect of the gene model given. Still gene models are often far from being close to biological reality and rarely rely on such things like signaling sequences. The reason is that nature of such signals is often poorly understood (tss, tts) and some of the signals are very  smooth (like Kozak sequence or an enhancer). Modeling such signals is not possible. But some hints can be made from experiments like RNA-seq. Mapping the reads from RNA-seq can precisely locale the intron boundaries, detect transcribed regions and determine the proper DNA strand for gene model. Using such hints it is possible to improve the annotation quality significantly. Here we discuss our pipeline which is able to combine hinted ab-initio approach with homology-based scoring system for concurrent gene models. The results of this pipeline usually are of better quality if compared to most widely used methods. We also will discuss the importance of annotation in whole-genome studies and connection between genome assembly and annotation quality.