The following examines the difference between mean(asinh(x/5)) and asinh(mean(x/5)). For the scope of this markdown, when I say “mean” I mean “arithmetic mean.”

First, we make some toy data. I take 10,000 instances 100 normally distributed data points (think 10,000 clusters). To make the data CyTOF-like (more log-normal-like), I do the inverse of the standard asinh(x/5) transformation for CyTOF data.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.1
## ✔ readr   2.1.2     ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
rand_list <- lapply(1:10000, function(i) rnorm(100, mean = 5, sd = 2))
rand_list <- lapply(rand_list, function(i) sinh(i)*5)
rand_list[1:3]
## [[1]]
##   [1]  1746.0757536   420.2970402   281.1666214   524.9709878    56.5013780
##   [6]   403.5724889    15.9439362 58095.4671170 26432.2422277    72.4213880
##  [11]   959.8826441   357.0029326   525.9660021   528.3548157    34.4592987
##  [16]  3152.5255679   252.3495527   553.5552746  1730.5754956   324.1488566
##  [21]   440.0371066     5.5486396   165.6549788    64.8685427   215.5503969
##  [26]   113.9300958  1978.7456439  6031.5067267    54.7363505    38.9923806
##  [31]  5874.7021037  1838.2045899   467.4419724  1557.5076455  1119.8460847
##  [36]   662.5377975   608.8868651  1658.1743909  8608.0498159  2896.1920858
##  [41]  2496.8366735   434.2369688 11988.7139791    69.0725255   891.7047822
##  [46]   243.6850210    66.4701989 89906.7764213  3912.7853885    92.9448296
##  [51]    34.6368336    24.3939101    71.4034468    22.1461634   608.6268410
##  [56]  1792.0376446 10495.7521424     3.5288467  4323.1397068   142.2619409
##  [61]     5.1177573   369.9936541    17.2541253   299.6243381   884.3247160
##  [66]    69.0902181     6.9665513   141.8996084     7.0812145   272.3181236
##  [71]  1026.4574609    -0.2955320 11756.1558043  1829.9141896   841.1556645
##  [76]  2665.1922702   537.1055199   700.6955314   126.5913209  4012.8312896
##  [81]  3151.6674662  2540.8256633  1009.3098028   210.6880843  5967.7375487
##  [86]  1085.0677963  1168.5115907   196.9926250  1246.7262322   659.7143056
##  [91]    -0.6613423   688.8745440   285.1679390    20.3297343  4539.8770921
##  [96]    52.5400801   228.6629801  1118.0288285   260.2348837     7.8060900
## 
## [[2]]
##   [1]  4.034610e+03  1.026494e+03  1.267857e+03  2.815649e+02  4.685218e+02
##   [6]  9.661845e+01  1.409353e+05  1.223276e+02  4.132786e+01  1.824882e+01
##  [11]  7.443437e+02  5.528934e+01  1.208500e+03 -1.080373e+00  7.446284e+03
##  [16]  1.925238e+02  1.009399e+04  2.212393e+02  3.956889e+04  1.682333e+03
##  [21]  1.342092e+04  1.965190e+01  2.763727e+02  1.647481e+03  7.979056e+03
##  [26]  1.378196e+02  5.540994e+02  5.403612e+02  1.738488e+03  3.142428e+02
##  [31]  9.816026e+02  1.030701e+03  1.503352e+04  6.664515e-01  2.163169e+04
##  [36]  2.428208e+01  2.385866e+02  7.884476e+02  2.363507e+02  6.922780e+02
##  [41]  4.410700e+03  7.260006e+02  1.523496e+02  2.245870e+01  1.514329e+02
##  [46]  7.839947e+02  9.747094e+02  2.014186e+03  1.479641e+04  4.222060e+03
##  [51]  1.196505e+03  1.166676e+01  1.327936e+03  3.125151e+02  3.347027e+00
##  [56]  3.978845e+03  3.257598e+02  1.178646e+03  3.502583e+02  3.768050e+02
##  [61]  2.731557e+03  1.100080e+04  6.915140e+02  6.432045e+00  4.614079e+01
##  [66]  1.477315e+04  9.864926e+01  4.957144e+01  2.060374e+02  1.040545e+03
##  [71]  2.794849e+02  1.356731e+03  4.557137e+02  6.632460e+01  8.262151e+02
##  [76]  7.935307e+02  1.327505e+02  4.377748e+02  1.675814e+02  1.168413e+04
##  [81]  3.630991e+03  1.728686e+02  2.614279e+02  4.852752e+03  2.066405e+02
##  [86]  4.104074e+02  1.872247e+02  3.762643e+00  6.830458e+01  1.427025e+03
##  [91]  1.281973e+04  7.094996e+02  1.588157e+02  8.791537e+01  9.768902e+02
##  [96]  5.802986e+01  1.284005e+02  1.406308e+02  4.150838e+01  1.079575e+03
## 
## [[3]]
##   [1]  6698.184248  1682.786190    46.651683  3424.912227  1253.914548
##   [6]  1234.313410    22.495870  3729.535549   541.428455   138.097208
##  [11]  3477.768851    18.431418    47.858522   407.849859  2005.460822
##  [16]   948.941279   965.034427    11.233440     8.342860   669.297166
##  [21]   487.946981  2497.213690   462.505537   106.331585   228.718348
##  [26]    25.381312   269.986701   655.667984   359.847718   540.279603
##  [31]    15.872637   769.943474   390.807837    85.168689  1259.111581
##  [36]   862.100613  5109.344980     1.280160  2989.122642    18.930802
##  [41]   164.279317    38.611415  1826.346633  1110.679437   418.879626
##  [46]  1401.438280   725.106501   246.381576     3.108472  3597.138827
##  [51]   168.287476    58.795861  2481.573731    17.649912  1712.431904
##  [56]   199.268567 13020.332453   973.659362   191.519682   253.199132
##  [61]    80.963353  1197.101318    53.943140    72.194229   162.216039
##  [66]   182.608605  1918.659805  3785.434006 14767.769142    34.887089
##  [71]  2783.749274   121.351197  2233.075484     1.710268    14.208843
##  [76]   244.671776   265.615435    92.862496   133.398605   766.161737
##  [81]    43.536302   948.388762   102.816503  1719.262350   895.646418
##  [86]  7052.741434   713.759463   815.036367    31.815229    30.129238
##  [91]   154.929203    34.265568    33.465824    56.357590    20.677419
##  [96]   216.916177   765.120463   123.054501     3.968968    17.384131

We then do the asinh(mean(x/5)) and the mean(asinh(x/5)), storing them as separate vectors.

x <- lapply(rand_list, function(i) asinh(mean(i)/5)) %>% unlist()
x[1:10]
##  [1] 7.118008 7.348267 6.137893 6.836156 6.772216 6.562271 6.530395 6.623748
##  [9] 7.323337 6.884612
y <- lapply(rand_list, function(i) mean(asinh(i/5))) %>% unlist()
y[1:10]
##  [1] 5.054472 5.230885 4.667892 5.251546 5.073105 4.910224 5.020283 4.914269
##  [9] 5.131838 5.092842

We next check the correlation between the data transformed by the two operations, and do a simple biaxial plot.

library(ggplot2)
scor <- cor(x, y, method = "spearman")
qplot(x, y) + ggtitle(paste("spearman cor = ", scor))

Finally, we view each vector separately. We note than the mean of the asinh transformed data form a bell curve, whereas the asinh of the mean of the raw data form a distribution that has a tail to the right.

hist(x, breaks = 100) # asinh of the mean

summary(x)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.599   6.578   6.830   6.879   7.130   9.608
hist(y, breaks = 100) # mean of the asinh (this is what omiq exports by default)

summary(y)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.245   4.862   4.997   4.999   5.138   5.741

Finally, what is the verbal explanation as to why you should take the mean of the asinh transformed data, and not the asinh of the mean of the raw data? Because the raw data are skewed, and the asinh transformation makes the data more normal distribution-like. Taking the mean gives you better intuition around a normal distribution than a skewed distribution. In the latter case, you have to deal with outliers on the right affecting the value of the mean. For skewed distributions and for outliers in general, there are other operations, like the geometric mean, that may be more appropriate here. But that’s outside of the scope of this markdown.