STAR index for human genome – overcoming the hardware barriers

By | December 30, 2019

Recently I was testing a Docker image to run a container for Next Gen sequencing, a way to test an existing “pipeline” on the first published study of the effect of the Zika virus. (https://hub.docker.com/r/maayanlab/zika/)

Running a docker container may provide some ease in reproducibility, but sometimes there are also hardware barrier that need to be overcome. In my case, even though I provided half or more of the hardware resources of my Mac to Docker (2/3 of the CPU i.e. 4 cores for Docker, and 1/2 of RAM i.e. 8Gb for Docker) that was not enough to run the STAR program to make an index from the human genome.

Therefore I ran STAR on my Mac itself (Macmini8,1; 6 cores Intel i5 3Ghz; 16Gb RAM; SSD drive) after I installed STAR. I downloaded the version that was within the Docker image (version STAR_2.4.1c for Mac.) However, my Mac is still below the 32Gb limit.

Checking online I found at least 4 useful links (see bottom of page.) Perhaps the most useful was the one mentioning --genomeSAsparseD 2 that allowed me to create a working command that worked. So here it is:

./STAR --runThreadN 3 --runMode genomeGenerate --genomeSAsparseD 12 --genomeSAindexNbases 12 -- genomeChrBinNbits 14 --genomeDir ./STAR_2.4.1c --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --sjdbOverhang 75

After that I could go back within the Docker container and continue with the analysis…. that in itself is yet another long story!

Links

Here are the links as I wrote them in my notes:

# SEE https://www.biostars.org/p/251736/#251742

# SEE https://www.biostars.org/p/221781/   : Question: Pre made STAR Index?

# SEE https://github.com/alexdobin/STAR/issues/292  : created the index with –genomeSAsparseD 2 to overcome the limit of my 32 GB RAM MacBook Pro.

# SEE https://github.com/alexdobin/STAR/issues/569  : genomeParameters.txt file not generated during genome indices generation #569

Share this:

Leave a Reply