SRA: Sequence Read Archive
The Sequence Read Archive (SRA) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, […]
However, it is rather difficult to even find the download links… and even then…
This web site “SRA-explorer” makes things easier and provides pre-edited commands.
EVEN THOUGH, missing SRA folders are still sometimes included…
Here’s how I ended-up there….
- It all started with trying a “Dockerised” RNA-Seq pipeline: An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study. (Wang Z, Ma’ayan A. PubMed: 27583132)
- Docker Hub entry maayanlab/zika
- They propose a link to a “Project”: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP070/SRP070895/
- The script just would need this Project ID to download all 8 SRA files.
- BUT this is no longer on the SRA ftp site…
- After lots of searches I found the SRA-explorer web site that provided exact links for each experiment: For example:
- OK… BUT the folder
SRR3191542no longer exists on the ftp site… neither are
- Thanks to SRA-explorer it is possible to download these BUT from the UK archive, not the US archives thanks to alternate links provided, for example:
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/005/SRR3191545/SRR3191545_1.fastq.gz -o SRR3191545_GSM2073124_ZIKV2-1_Homo_sapiens_RNA-Seq_1.fastq.gz
There is no explanation anywhere that I could find that explains the disappearance of folders from the FTP site…
P.S. I found the SRA-explorer thanks to an entry on this discussion link: https://www.researchgate.net/post/What_is_fastest_way_to_download_read_data_from_NCBI_SRA