Hunting for SRA sequence archives

By | December 13, 2019

SRA: Sequence Read Archive

The Sequence Read Archive (SRA) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, […]

However, it is rather difficult to even find the download links… and even then…

This web site “SRA-explorer” makes things easier and provides pre-edited commands.

EVEN THOUGH, missing SRA folders are still sometimes included…

SRA-explorer search page

Here’s how I ended-up there….

  • It all started with trying a “Dockerised” RNA-Seq pipeline: An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study. (Wang Z, Ma’ayan A. PubMed: 27583132)
  • Docker Hub entry maayanlab/zika
  • They propose a link to a “Project”: ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP070/SRP070895/
  • The script just would need this Project ID to download all 8 SRA files.
  • BUT this is no longer on the SRA ftp site…
  • After lots of searches I found the SRA-explorer web site that provided exact links for each experiment: For example:
    ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR319/SRR3191542/SRR3191542.sra
  • OK… BUT the folder SRR3191542 no longer exists on the ftp site… neither are SRR3191543SRR3191544, or SRR3191545
  • Thanks to SRA-explorer  it is possible to download these BUT from the UK archive, not the US archives thanks to alternate links provided, for example:
    curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR319/005/SRR3191545/SRR3191545_1.fastq.gz -o SRR3191545_GSM2073124_ZIKV2-1_Homo_sapiens_RNA-Seq_1.fastq.gz
    

There is no explanation anywhere that I could find that explains the disappearance of folders from the FTP site…

P.S. I found the SRA-explorer thanks to an entry on this discussion link: https://www.researchgate.net/post/What_is_fastest_way_to_download_read_data_from_NCBI_SRA

specifically:

7th Jul, 2019
Eric JC Gálvez
Helmholtz Centre for Infection Research
It’s a very old question, but this can save a lot of time and also it works pretty well

Appendix:

Share this:

2 thoughts on “Hunting for SRA sequence archives

  1. Pingback: Dockerization – details needed – Biochemistry Computational Research Facility (BCRF) – UW–Madison

Leave a Reply