Oxford Nanopore long reads analysis

Nanopore provides cheap access to long reads that easily solve problems that looked impossible 5 years ago. Nanopore is still considered subpar to Pacbio as a service, but has many important advantages (direct RNA, extra-long reads, etc).

Tools for Nanopore reads are changing every month, so that analysis requires lots of bioinformatic knowledge.

We analyze reads that performed by Promethion, Gridion, Minion and Smidgion. Data is provided in fast5, FASTQ or FASTA format.

Special thanks to Alex Predeus for help with cases and tools description

Basecalling

Nanopore run generates raw files called "squiggle" (fast5 files).

Application
– Сonverting fast5 files to common used fastq format
– Always keep your fast5 files, since basecalling algorithms are constantly improved

Tools
– Albacore is considered as the best free software
– Basecalling in the cloud (EPI2ME) is now poorly supported and expensive

Metagenomic profiling

Metagenomic profiling is broadly used to characterize microbial communities in animals, soil, water, etc. Gene of choice, 16S rRNA, has 9 variable regions (V1-V9), separated by conserved spaces. Resulting sequences are characterized for present species and diversity.
16S gene is 1500-1800 bp long. Comparing to Illumina, Nanopore gets the whole gene, greatly improving the resolution.

Application
Characterize microbial communities in animals, soil, water, etc.

Tools
– Kraken is hash-based classifier. It's fast, but has high memory demand
– Centrifuge is FM-index-based classifier. It's slower, but much better on memory
– Pavian is modern tool for metagenomic visualization with per-sample and per-group statistics

Genome assembly

Genome assembly was revolutionized by long reads, allowing dramatic increase in assembly continuity. Bacterial genomes usually can be assembled completely (circular chromosome and plasmids), allowing the generation of gold-standard references with as little as 30x long read coverage.

Hybrid assemblers are a very good option when lots of Illumina (100x+) is available alongside with modest long read coverage (10-30x). Following assembly is much easier because the reads are long and accurate.

Long-reads assembly tools
– miniasm2 + minimap2 for dirty assembly in several minutes
– canu for thorough and careful assembly with good read correction and best overall accuracy,
– Other options: FALCON, ABruijn

Hybrid assembly tools
– Masurca uses short Illumina reads to correct long reads into super-reads
– Flye

Polishing

Application
Fixing assembly errors and compute an improved consensus sequence for a draft long-read assembly

Tools
– Nanopolish uses raw squiggle (fast5) information and methylation model in bacteria. It can also call variants and identify methylated sites
– Racon is useful after miniasm/minimap assembly for crude consensus polishing
– Pilon uses Illumina data to polish out problematic regions

Our contacts

Skype: dasha_dail
E-mail: daria.iakovishina@ksivalue.com
US Representative: NGS Pipeline Inc., 900 Wilshire Dr., STE 202-45, Troy, Michigan, 48084. Registration number: 07034F

Head Office: Ksivalue LLC, Russia, Moscow, Tsvetnoy Bulvar 30, str. 1, pom. 7, room 16А, office 2И, 127051. Registration number: 7702424959