Анализ данных Oxford Nanopore


Помните эти истории про секвенирование эболы на лету? А про безумцев, которые собрали центромеру Y-хромосомы целиком? Про сборку арабидопсиса от теломеры до теломеры за 4 дня и 1000$?

Все это было сделано с помощью секвенатора от Oxford Nanopore. Купить его несложно даже в России — например, у нашего партнера, компании SkyGen.

Мы анализируем данные, полученных с секвенаторов Promethion, Gridion, Minion и Smidgion и принимаем длинные чтения в форматах fast5, FASTQ или FASTA.
  • Благодарим Александра Предеуса за помощь с описанием программ и примеров применения нанопоровых чтений

Basecalling

Nanopore run generates raw files called "squiggle" (fast5 files).
Application
– Сonverting fast5 files to common used fastq format
– Always keep your fast5 files, since basecalling algorithms are constantly improved

Tools
– Guppy is the leader in terms of quality, it has very good read accuracy and runs reasonably quickly on a GPU cluster

Metagenomic profiling

Metagenomic profiling is broadly used to characterize microbial communities in animals, soil, water, etc. Gene of choice, 16S rRNA, has 9 variable regions (V1-V9), separated by conserved spaces. Resulting sequences are characterized for present species and diversity.
16S gene is 1500-1800 bp long. Comparing to Illumina, Nanopore gets the whole gene, greatly improving the resolution.
Application
Characterize microbial communities in animals, soil, water, etc.

Tools
Kraken is hash-based classifier. It's fast, but has high memory demand
Centrifuge is FM-index-based classifier. It's slower, but much better on memory
Pavian is modern tool for metagenomic visualization with per-sample and per-group statistics

Genome assembly

Genome assembly was revolutionized by long reads, allowing dramatic increase in assembly continuity. Bacterial genomes usually can be assembled completely (circular chromosome and plasmids), allowing the generation of gold-standard references with as little as 30x long read coverage.

Hybrid assemblers are a very good option when lots of Illumina (100x+) is available alongside with modest long read coverage (10-30x). Following assembly is much easier because the reads are long and accurate.
Long-reads assembly tools
miniasm2 + minimap2 for dirty assembly in several minutes
canu for thorough and careful assembly with good read correction and best overall accuracy
Flye is a de novo assembler for single molecule sequencing reads produced by Oxford Nanopore
– Other options: FALCON, ABruijn

Hybrid assembly tools
Masurca uses short Illumina reads to correct long reads into super-reads

Polishing

Application
Fixing assembly errors and compute an improved consensus sequence for a draft long-read assembly

Tools
Nanopolish uses raw squiggle (fast5) information and methylation model in bacteria. It can also call variants and identify methylated sites
Racon is useful after miniasm/minimap assembly for crude consensus polishing
Pilon uses Illumina data to polish out problematic regions
Наши контакты
Телефон: +7 916 088 13 07
E-mail: hello@ksivalue.com
ООО «Ксивелью»
ИНН 7702424959, ОГРН 5177746030831
Почтовый и фактический адрес: 119049, Москва, Ленинский проспект, 30А. Схема проезда