Processing rMATS output

rMATS outputs several files, including txt files for all considered splicing events (i.e. SE (skipped exon), MXE (mutually exclusive exons), A3SS (alternative 3’ splice site), A5SS (alternative 5’ splice site), RI (retained intron)). There are two versions of these files, with exentsions JC or JCEC, which include only reads that span junctions or both reads that span junctions and reads that do not cross an exon boundary; respectively.

For downstream analysis, I would like to combine these files into one file. I will create two files: one filtered on FDR, one not. I will also select columns that I find useful. I will use python script to do so:

import glob
import pandas as pd
import numpy as np
import os
from pathlib import Path

AS_events = 'path/to/txtfiles/with/JCevents'

#create lists for all events of FDR filtered events
events_list_all = []
events_list_sig = []

all_AS_events=sorted(glob.glob(AS_events + '/*.txt'))

for a in all_AS_events:
    df = pd.read_table(a, sep='\t', engine='python')
    #get event name from the filename, by extracting substring before last period (.txt) and splitting it further, at it as a new column to a dataframe  
    #subset only required columns
    df=df[['Group','GeneID','chr','strand','FDR','IJC_SAMPLE_1', 'SJC_SAMPLE_1', 'IJC_SAMPLE_2', 'SJC_SAMPLE_2','IncLevel1','IncLevel2','IncLevelDifference']]
    #subset only significant events
    df_sigw=df[df['FDR']<= 0.05]
    #append to appropriate lists

#combine the lists
WT_all = pd.concat(events_list_all)
WT_sig = pd.concat(events_list_sig)

#replace NA with 0
WT_all = WT_all.fillna(0)
WT_sig = WT_sig.fillna(0)

#specify file names for these new files
sig = 'events_sig.csv'
all = 'events_all.csv'
output_sig = os.path.join(AS_events, sig)
output_all = os.path.join(AS_events, all)

WT_sig.to_csv(output_sig, sep=',', header=True)
WT_all.to_csv(output_all, sep=',', header=True)