Recently I was testing a Docker image to run a container for Next Gen sequencing, a way to test an existing “pipeline” on the first published study of the effect of the Zika virus. (https://hub.docker.com/r/maayanlab/zika/)
Running a docker container may provide some ease in reproducibility, but sometimes there are also hardware barrier that need to be overcome. In my case, even though I provided half or more of the hardware resources of my Mac to Docker (2/3 of the CPU i.e. 4 cores for Docker, and 1/2 of RAM i.e. 8Gb for Docker) that was not enough to run the STAR program to make an index from the human genome.
Therefore I ran STAR on my Mac itself (Macmini8,1; 6 cores Intel i5 3Ghz; 16Gb RAM; SSD drive) after I installed STAR. I downloaded the version that was within the Docker image (version STAR_2.4.1c for Mac.) However, my Mac is still below the 32Gb limit.
Checking online I found at least 4 useful links (see bottom of page.) Perhaps the most useful was the one mentioning --genomeSAsparseD 2
that allowed me to create a working command that worked. So here it is:
./STAR --runThreadN 3 --runMode genomeGenerate --genomeSAsparseD 12 --genomeSAindexNbases 12 -- genomeChrBinNbits 14 --genomeDir ./STAR_2.4.1c --genomeFastaFiles genome.fa --sjdbGTFfile genes.gtf --sjdbOverhang 75
After that I could go back within the Docker container and continue with the analysis…. that in itself is yet another long story!
Links
Here are the links as I wrote them in my notes:
# SEE https://www.biostars.org/p/251736/#251742
# SEE https://www.biostars.org/p/221781/ : Question: Pre made STAR Index?
# SEE https://github.com/alexdobin/STAR/issues/292 : created the index with –genomeSAsparseD 2 to overcome the limit of my 32 GB RAM MacBook Pro.
# SEE https://github.com/alexdobin/STAR/issues/569 : genomeParameters.txt file not generated during genome indices generation #569