June 1, 2017

Example gallery of NanoPlot

Wouter De Coster

348 Words · 1 Minute, 34 Seconds

2017-06-01 21:38

I am developing NanoPlot, a python package for plotting various aspects of Nanopore sequencing data (fastq) and alignments (bam). It’s a python script, heavily using the seaborn package for creating plots. The package is available on GitHub and I welcome all feedback and suggestions!

In this post, I will show some examples of plots. The data is from the Cliveome flowcell FAB48453, after repeating the basecalling with albacore v1.1.0 followed by filtering and trimming using the NanoPlot companion script NanoFilt.

scaled_OutliersRemoved_Downsampled_HistogramReadlength This plot shows a simple histogram with read N50 metric.

scaled_Log_Downsampled_HistogramReadlength Similar to the plot above, but here with log10 transformed read lengths.

scaled_OutliersRemoved_Downsampled_LengthvsQualityScatterPlot_kde This bivariate plot shows with a kernel density estimate the read length compared to the average read basecall Phred quality.

scaled_Log_Downsampled_LengthvsQualityScatterPlot_kde This plot contains the same as above, but again with a log10 transformation on the read lengths.

scaled_Log_Downsampled_LengthvsQualityScatterPlot_hex This plot is the same as the previous, but instead of a kernel density estimate here hexagonal bins are used to show the distribution of the data.

scaled_OutliersRemoved_Downsampled_MappingQualityvsReadLength_kde Here is a comparison of the read length with the mapping quality of those reads after alignment using bwa mem -x ont2d. Clearly there is a subgroup of small reads showing very low mapping quality.

scaled_Log_Downsampled_MappingQualityvsReadLength_kde This is the same plot as above but with a log scale on the read length.

scaled_MappingQualityvsAverageBaseQuality_kde This plot compares the average basecall quality of reads with their mapping quality, clearly showing that there is a subgroup of low quality reads which are essentially useless. Keep in mind that the worst quality reads were removed from this dataset prior to alignment.

scaled_PercentIdentityvsAverageBaseQuality_kde This plot compares the percent identity (the edit distance to the reference genome scaled by the read length) with the read quality. The majority of the reads have a percent identity of about 85-90%, but with a long tail to identities of ~60%.

scaled_PercentIdentityvsAlignedReadLength_kde This plot compares the read length (log10 transformed) with the percent identity.

scaled_AlignedReadlengthvsSequencedReadLength_scatter In this graph the read length is compared with the aligned read length, showing an expected line on the bisection but also showing reads which are not fully aligned due to softclipping.

See Also