How to Map Small RNA Sequencing Data Using miRDeep

Written by

in

miRDeep2 is a widely used bioinformatics software package designed to identify known and novel microRNAs (miRNAs) from deep sequencing data. It handles data preprocessing, genome mapping, quantification, and novel miRNA prediction based on a probabilistic model of miRNA biogenesis. Core Prerequisites

Before starting, download and install the software and its essential dependencies:

Software Options: You can install via the Bioconda Package Recipe or access it using a graphical user interface through Galaxy Europe. If installing manually via the miRDeep2 GitHub Repository, remember to append the path to your ~/.bashrc.

Required Packages: Ensure you have bowtie for alignment, ViennaRNA (specifically RNAfold) for structure predictions, and randfold.

Input Data: You will need raw microRNA-seq reads (typically single-end FASTQ files), the reference genome of your target organism, and known mature and hairpin miRNA sequences extracted from databases like miRBase. Step-by-Step miRDeep2 Workflow

The pipeline processes raw data and yields results through two core scripts:

[Raw FASTQ Reads] + [Genome] │ ▼ mapper.pl ◄── Trims adapters, collapses identical reads, maps to genome │ ▼ miRDeep2.pl ◄── Scores hairpins, calculates signal-to-noise, outputs final HTML 1. Reference Preparation

Before mapping, your reference FASTA sequences must be properly formatted to prevent syntax errors.

Convert Alphabet: Change Uracils (U) to Thymines (T) in your miRBase RNA sequences since they are mapped against a DNA genome index.

Clean Identifiers: Remove any whitespaces from the sequence headers using the built-in remove_whitespace_in_id.pl utility.

Build Index: Generate a Bowtie index for your reference genome using the bowtie-build command. 2. Preprocessing & Mapping (mapper.pl)

The mapper.pl script processes your sequencing data and maps them directly to the genome. A standard command follows this structure according to the Max Delbrück Center miRDeep2 documentation and the miRDeep2 Tutorial on GitHub:

mapper.pl reads.fastq -e -h -i -j -k TGGAATTCTCGG -l 18 -m -p genome_index -s reads_collapsed.fa -t reads_vs_genome.arf -v Use code with caution. -j: Filters out reads containing non-canonical nucleotides. -k: Clips the specified 3’ adapter sequence. -l 18: Discards reads shorter than 18 nucleotides.

-m: Collapses identical reads into a single entry while tracking their abundance counts.

-p: Aligns the processed reads against the genome index to output an ARF alignment file. 3. miRNA Prediction & Quantification (miRDeep2.pl)

Once you have your collapsed reads and alignment file, run the core prediction module:

miRDeep2.pl reads_collapsed.fa genome.fa reads_vs_genome.arf mature_ref.fa mature_related.fa hairpin_ref.fa -t Organism 2> report.log Use code with caution.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *