Oxford Nanopore long reads analysis

Nanopore provides cheap access to long reads that easily solve problems that looked impossible 5 years ago. Nanopore is still considered subpar to Pacbio as a service, but has many important advantages (direct RNA, extra-long reads, etc).

Tools for Nanopore reads are changing every month, so that analysis requires lots of bioinformatic knowledge.

We analyze reads that performed by Promethion, Gridion, Minion and Smidgion. Data is provided in fast5, FASTQ or FASTA format.

Special thanks to Alex Predeus for help with cases and tools description


Nanopore run generates raw files called "squiggle" (fast5 files).
– Сonverting fast5 files to common used fastq format
– Always keep your fast5 files, since basecalling algorithms are constantly improved

Albacore is considered as the best free software
– Basecalling in the cloud (EPI2ME) is now poorly supported and expensive

Metagenomic profiling

Metagenomic profiling is broadly used to characterize microbial communities in animals, soil, water, etc. Gene of choice, 16S rRNA, has 9 variable regions (V1-V9), separated by conserved spaces. Resulting sequences are characterized for present species and diversity.
16S gene is 1500-1800 bp long. Comparing to Illumina, Nanopore gets the whole gene, greatly improving the resolution.
Characterize microbial communities in animals, soil, water, etc.

Kraken is hash-based classifier. It's fast, but has high memory demand
Centrifuge is FM-index-based classifier. It's slower, but much better on memory
Pavian is modern tool for metagenomic visualization with per-sample and per-group statistics

Genome assembly

Genome assembly was revolutionized by long reads, allowing dramatic increase in assembly continuity. Bacterial genomes usually can be assembled completely (circular chromosome and plasmids), allowing the generation of gold-standard references with as little as 30x long read coverage.

Hybrid assemblers are a very good option when lots of Illumina (100x+) is available alongside with modest long read coverage (10-30x). Following assembly is much easier because the reads are long and accurate.
Long-reads assembly tools
miniasm2 + minimap2 for dirty assembly in several minutes
canu for thorough and careful assembly with good read correction and best overall accuracy,
– Other options: FALCON, ABruijn

Hybrid assembly tools
Masurca uses short Illumina reads to correct long reads into super-reads


Fixing assembly errors and compute an improved consensus sequence for a draft long-read assembly

Nanopolish uses raw squiggle (fast5) information and methylation model in bacteria. It can also call variants and identify methylated sites
Racon is useful after miniasm/minimap assembly for crude consensus polishing
Pilon uses Illumina data to polish out problematic regions
