TT-seq analysis - Metagene profiles

In this post, I will use ngs.plot to visualize the results of TT-seq experiment around functional genomic regions, i.e. transcription start site (TSS) and transcription end site (TES).

NGS plot download instructions are available at: https://github.com/shenlab-sinai/ngsplot

I would like to be able to look at sense and antisense profiles. As data I am working with is paired, I will first use SAMtools to get MATE 1 reads. The script is called as follows: sbatch scripts.sh bam samplename

where bam is the input bam file, and samplename is what will be use to name new files.

#!/usr/bin/bash

##modules
module load samtools/1.3.1
module load python/3.5.1

#### Set working, temporary and results directories.
export WORKDIR=/pathtoworkdir/
TMPDIR="${WORKDIR}tmp/"
PROFDIR="${WORKDIR}metaprofiles/"
mkdir -p $TMPDIR
mkdir -p $PROFDIR

#### Sample information: sample name and BAM file location.
SAMPLE=$2
BAM=${WORKDIR}${1}
MATE1="${WORKDIR}${SAMPLE}.mate1.bam"

#### Restict to first mate reads and index 
# If the BAM file contains paired reads, create a new version containing only the first mate reads.<br>
# This step may be skipped if your reads are not paired.<br>
samtools view --threads $THREADS -h -b -f 64 $BAM -o $MATE1
samtools index $MATE1

#### BAM header compliance.
# It might be necessary to re-header the BAM file so that the chromosome names match those in the ngs.plot database, e.g. standard chromosomes preceeded with "chr" for mm10
MATE1REHEADER="${WORKDIR}${SAMPLE}.mate1.reheader.bam"
samtools view --threads $THREADS -H ${MATE1} | sed -e 's/SN:\([0-9XY]*\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | samtools reheader - ${MATE1} > ${MATE1REHEADER}
samtools index ${MATE1REHEADER}

Mate 1 can now be used to create sense and anti-sense profiles.

#!/bin/bash
#call script as: script.sh samplename

export WORKDIR=/pathtoworkdir/
PROFDIR="${WORKDIR}metaprofiles/"
mkdir -p $PROFDIR

THREADS=1
SAMPLE=$1
BAM="${WORKDIR}${SAMPLE}.bam"

REGION="tss"
for STRAND in both same opposite
do
    OUTPUT="${PROFDIR}${SAMPLE}.${REGION}.${STRAND}"
    ngs.plot.r -G mm10 -R $REGION -C ${BAM} -O $OUTPUT -P $THREADS -SS $STRAND -SE 1 -L 5000 -F chipseq -D ensembl
done

REGION="genebody"
for STRAND in both same opposite
do
    OUTPUT="${PROFDIR}${SAMPLE}.${REGION}.${STRAND}"
    ngs.plot.r -G mm10 -R $REGION -C ${BAM} -O $OUTPUT -P $THREADS -SS $STRAND -SE 1 -L 5000 -F chipseq -D ensembl
done

REGION="tes"
for STRAND in both same opposite
do
    OUTPUT="${PROFDIR}${SAMPLE}.${REGION}.${STRAND}"
    ngs.plot.r -G mm10 -R $REGION -C ${BAM} -O $OUTPUT -P $THREADS -SS $STRAND -SE 1 -L 5000 -F chipseq -D ensembl
done

As a result, you will get pdf files with profiles for sense, antisense, and both strands on seperate graphs, e.g.:

as well as zipped files with the txt files with data used for drawing the profiles. These files can be used to re-draw the profiles, e.g. drawing sense and antisense coverage on the same plot (see the upcoming blog post).