Default preference reversal in R 4.0.x

By | November 4, 2020

I just wrote an entry on my Biochem blog which I think would fit on this site:

Default preferences

I enjoy using R and RStudio, but I am always weary of upgrading R because that usually leads to some issue(s). The most recent was a bit long for me to diagnose, even though in retrospect it is a simple change in default preference, namely changing the default setting default.stringsAsFactors() from  TRUE in R versions 3.x to FALSE in the newest R versions 4.0.x. This setting is part of functions used to read data into R from tabular text, for example read.table() for tab-delimited text or read.csv() for comma-delimited text files.

One would hope that a script with such basic commands should continue to work when upgrading to a new version, as reading data should not change the nature of things without warning! However, this little change caused a simple command to not work anymore in one of the R tutorialDemo yeast mutant analysis.

The commands were simple: read tab-delimited text file (yeast_example.txt) and then make a plot (which should automatically be made as a boxplot.)

yeast_eg = read.table('yeast_example.txt'), header = T)

with(yeast_eg, plot(genotype, OD_change))

However, now with R 4.0.x default installation this command now fails.

Solution 1

When things did not work I was puzzled and since I had not used R for a while I simply went to search online and found a suitable command that requires loading a library. The purpose of the command was to convert the character/words/string entries within a column of tabular data into a factor.

library(dplyr)
yeast_eg <- mutate_if(yeast_eg, is.character, as.factor)

This library is part of the great new ways of using R thanks to the Tidyverse but it adds a level of complexity that was not warranted before the upgrade of R to version 4.x.

Solution 2

I happened to notice default.stringsAsFactors()within the help file for read.table() and that led me to discover the change from R 3.x (default of TRUE) to R 4.0.x (default of FALSE.) The statement within the function reads stringsAsFactors = default.stringsAsFactors()

One way to change the default is to explicitly make this TRUE while using the read.table() command, for example:

yeast_eg = read.table('yeast_example.txt', header = T, stringsAsFactors = T)

However, stringsAsFactors = T would have to be repeated each time. But there is a way to change the behavior by changing the default with the command options(stringsAsFactors=FALSE) as discussed (pro and cons) on this stack overflow article “Change stringsAsFactors settings for data.frame“.

Prior announcement

Now that I know this I found this article: stringsAsFactors (2020/02/16)

The article provides a historical retrospective of the reasons why this change was not present, then added, and now again removed from R with good reasons, stating that “Automatic string to factor conversion introduces non-reproducibility.[…] Hence, the results of subsequent statistical analyses can differ with automatic string-to-factor conversion in place.

Defaults

Computer software have the great flexibility to providing (convenient, or sometimes annoying) defaults, which users can arrange as they wish. However, the new settings may make one’s own version very different than that of others, and therefore create a loss of consistency. This experienced software engineer explains it better in his post “The pros and cons of “defaults”

Variable types

What are factors in R?

The Berkeley statistical department provides this answer as part of a longer article: “Conceptually, factors are variables in R which take on a limited number of different values; such variables are often referred to as categorical variables. ”

The difference and definitions between numerical  as well as categorical variables are well summarized on this short page about variable types on the S.O.S ( Statistics Online Support, University of Texas.)

Share this:

Leave a Reply