Most, if not all (published) NGS data is deposited to GEO repository and available to download and process. Today the objective is to get the data of interest from GEO repository onto our HPC cluster accounts.
- Log-in to HPC as usual using
sshcommand in the Terminal. - Data should be downloaded in the interactive mode using:
srun --pty /bin/bashOnce in the interactive mode, create an appropriate directory for your data. - Locate the data of interest in GEO repository. In this example, I want to download the following data:
GSE168378. To find the link to transfer the data, we should access the FTP site - you can locate it under “Tools” header on GEO homepage. In theseries/folder of FTP site find and enter the appropriate directory (hereGSE168nnn/) and thenGSE168378/The data is available in thesuppl/directory. Right click/double tap (if you are a Mac user) to get the link for this directory. - Use the
wgetcommand with the following options:--recursive(as we are copying a directory) and--no-parentandnd(to avoid copying any parent folders)
wget --recursive —-no-parent -nd ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE168nnn/GSE168378/suppl/
DONE!