STAR index for human genome – overcoming the hardware barriers

Recently I was testing a Docker image to run a container for Next Gen sequencing, a way to test an existing “pipeline” on the first published study of the effect of the Zika virus. (https://hub.docker.com/r/maayanlab/zika/) Running a docker container may provide some ease in reproducibility, but sometimes there are also hardware barrier that need to… Read More »

Down-sampling FASTQ.gz paired ends

Downsampling I have performed a search for creating a set of down-sampled data from an actual  large dataset, and while there are many creative information on BioStar and other forums, I find that the most versatile and easy to use tool would be one recommended on the forums: seqtk which is available on Github: github.com/lh3/seqtk  Quoting… Read More »

Hunting for SRA sequence archives

SRA: Sequence Read Archive The Sequence Read Archive (SRA) makes biological sequence data available to the research community to enhance reproducibility and allow for new discoveries by comparing data sets. The SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms, […] However, it is rather difficult to even find the download links… and even… Read More »

Docker tutorials for Biologists

I have started a series of tutorials that I am writing from the perspective of a biologist wanting to use a Docker container for a specific application. An easy example could be using EMBOSS, the molecular biology open suite for analysis. The tutorials are online at the Biochemistry department here: Docker tutorials (general page) Docker… Read More »

Enterotypes-2018

The recent update on enterotypes1 was an important read. Yes, we desperately need to reduce the dimensionality of the gut microbiome data and discover the stable “archetypes” of the microbiome functional states. The concept of genera-based enterotypes is a step in this direction. However, one may feel a “sense of fragility” while reading the 2017… Read More »

Review on Deep Learning in Biology and Medicine

Deep neural networks are everywhere. They are revolutionizing our day-to-day lives, and this phenomenon no longer needs any introduction or description. Deep learning is especially suitable to find structures in overwhelming amounts of data. Recently, biological data became exactly that – overwhelming, and application of deep learning toolsets to it indeed looks very natural. You… Read More »

asciinema: record commands in terminal

RE: asciinema.org  (Linux/MacOS) It may be nice to share/show commands being typed on a Text Terminal and embed this simple “movie” within blog or HTML page. It seems that the recording gets uploaded to their web site… Since it’s all text-based the file should be rather small and the clarity of replay very good compared… Read More »

Omics Pipe: An Automated Framework for Next Generation Sequencing Analysis

Re:  pythonhosted.org/omics_pipe Next Gen. data analysis requires many steps, that can be learned one by one, for example running an aligner such as bowtie, tophat or STAR, then handle the SAM/BAM file for subsequent analysis. Once the various steps are understood and if there are new analyzes do be performed routinely, it would be “nice”… Read More »

alternativeto.net: finding alternate software

Re:alternativeto.net What software one uses may be the result of colleague recommendations, using “what others use in the lab,” or haphazardly found options online. Or perhaps it is time to “upgrade” a commercial package and the steep price makes you think twice… The crowdsources software recommendation web site alternativeto.net does just that… just enter the name… Read More »

omictools.com: Search engine for biological data analysis

Re: omictools.com Finding software that is relevant to any biological analysis can be inspired by reading a paper or perhaps searching within Google. The web site https://omictools.com/ contains 16,971 software (omic) tools that are organized in categories shown on the home page. Checking a bit further one can find interesting entries such as: High-throughput sequencing data analysis…… Read More »