Evaluating a Bayes factor for the comparison of dependent variances

This post explores the properties of a Bayes factors for the comparison of two dependent variances via simulation. The major benefit of simulation studies is that we can compare the output of a method to a known hypothetical state of nature. To determine the usefulnes of the Bayes factor, I am can investigate whether it agrees with known “ground truths”.

Because I investigate a Bayes factor comparing two dependent variances, I first need a function that creates correlated data (i.e., bivariate normal data):

# Generate bivariate normal data with specified correlation
#
# param n: how many data points
# param m1: the mean of the first variable
# param m2: the mean of the second variable
# param sd1: the standard deviation of the first variable
# param sd2: the standard deviation of the second variable
# param r: the »true« correlation between the two measures
# 
# return: the data set (two columns with random normal data generated 
#     via MASS::mvrnom)
paired_data <- function(n, m1, m2, sd1, sd2, r) {
  cor_matrix <- matrix(c(1, r, r, 1), ncol = 2)
  sds <- c(sd1, sd2)
  vars <- sds %*% t(sds)
  cov_matrix <- vars * cor_matrix
  MASS::mvrnorm(n, mu = c(m1, m2), Sigma = cov_matrix)
}

This function creates two columns of data and I can specify the true means, standard deviations and the true correlation between the measures. Note that this does not imply that each data set that is generated via using this function has these exact parameters: due to sampling error, the descriptive statistics will vary across simulation runs.

Next, I specify a function that generates data using paired_data() and then calls computes the Bayes factor comparing the variances of the two measures.

compare_correlated_variances <- function(X, n, m1, m2, sd1, sd2, r) {
  pairs <- paired_data(n, m1, m2, sd1, sd2, r)
  varBF::depvarBF(pairs[, 1], pairs[, 2])
}

The first function parameter X is somewhat a nuissance, but stay with me for a moment. I defined the function to repeat the process of generating data and computing a Bayes factor as often as I want; the parameter X is used for the purpose of repetition, as shown below. The simulation principle following now is actually rather simple: Using the function compare_correlated_variances(), I systematically vary relevant input parameters between simulation runs, such as

the “effect size” (i.e., how different are the variances)
the sample size
the correlation between the measures

For each combination of parameters, I repeat the simulation process often, e.g., 1,000 times. In general, simulations are computationally demanding because there is a large combinatorial space that results from crossing all possible combinations of input parameters. Usually, we have to restrict ourselves. To repeat the process for just one possible combination of parameters, we can use the lapply() function to repeat the computations 100 times as follows:

# Generate 100 bivariate data sets and compute a Bayes factor each time:
BFs <- lapply(
  X = 1:100, 
  FUN = compare_correlated_variances,
  n = 100,
  m1 = 100,
  m2 = 100,
  sd1 = 15,
  sd2 = 10,
  r = .3
)

To print the output of the lapply() function in an comprehensible way, I define the following function returning the quantiles of the Bayes factor across simulation runs:

# Input `BFs` is a `list` of Bayes factors
BF_quantiles <- function(BFs) {
  BFs <- simplify2array(BFs, higher = FALSE)
  quants <- c(quantile(BFs, probs = seq(0.1, 0.9, 0.1)))
  quants <- format(
    quants, 
    scientific = FALSE, 
    digits = 2, 
    nsmall = 2,
    big.mark = ","
  )
  ret <- data.frame(quantile = names(quants), BayesFactor = quants)
  rownames(ret) <- NULL
  ret
}

Let’s check out the quantiles:

BF_quantiles(BFs)

quantile	BayesFactor
10%	12.78
20%	32.36
30%	103.82
40%	375.68
50%	1,195.41
60%	2,321.91
70%	7,289.37
80%	48,085.05
90%	155,457.70

What does this tell us? For example, only 20% of all observed Bayes factors that were smaller than 32.36. Remember, the Bayes factor quantifies the degree to which the alternative hypothesis is favored over the null hypothesis. Therefore, more than 80% of all simulation runs (each representing a hypothetical study) indicate very strong evidence in favor of the hypothesis that the variances are different. This is good because this is consistent with the “ground truth” that the “true” variances differ.

Therefore, given

a standard deviation of 10 in one group and 15 in another group
a sample size of 100
a correlation between samples of \(r = .3\)

the Bayes factor will generally yield evidence that is consistent with the true state of the nature. This is good, but we wish to find out whether this property holds for all sorts of different conditions.

The simulation

Across all simulation runs, the true mean of both measures was always set to 100; the standard deviation of one measure was always set to 15. I varied the effect size by adjusting the other standard deviation (between 11, 13 and 15). One condition was realized where the null hypothesis was true, i.e., when both standard deviations was 15. In total, the following conditions varied across simulation runs:

The correlation between the measures varied between \(0\), \(0.1\), \(0.3\), \(0.5\) and \(0.7\)
The standard deviation of one measure varied between \(11\), \(13\) and \(15\)
\(N\) varied between \(20\), \(40\), …, \(180\) and \(200\)

This resulted in 150 parameter combinations, for each of which I simulated 1,000 data sets. In total, I ended up with 150,000 Bayes factors. The computations did not even take as long as one might expect, only ~5min on my personal computer. The complete simulation code is available for reproducibility from here.

The results of the simulation are displayed in the following plot:

The three facet panels realize the three effect sizes under investigation. In the right panel, we have a null effect where because both standard deviations / variances had the same value. In the left panel, we have the largest effect size. Data points illustrate Bayes factor quantiles. The \(y\) axis represents the Bayes factor and is given on a log scale because the Bayes factor quickly becomes huge for larger effects.

The dots connected by the lowest line indicate the 20% quantile. This means that 20% of all observed Bayes factors were smaller than the Bayes factor that is illustrated, i.e., 80% were larger. Note that these Bayes factors were averaged over the different correlation coefficients that are ignored in this plot. The black line illustrates the threshold of inconclusive evidence: A Bayes factor of 1 neither favors the null nor the alternative.

Overall, the distribution of Bayes factors looks as it should. For our largest effect, the effect size quickly increase as the sample size increases. Here, a sample size of 100 was enough to ensure that more than 80% of all observed Bayes factors are larger than 5, which is considerable evidence favoring the alternative hypothesis. When the null effect is true, the evidence favoring the null hypothesis increases with increasing \(N\), which is good. However, finding evidence that favors the null often requires large \(N\). In the current simulation, for \(N = 100\), we barely miss the mark that 80% of all Bayes factors \(BF01\) favor the null hypothesis by a factor of at least 3 (in this cases the Bayes factor \(BF10\) favoring the alternative hypothesis is 0.33). This is no reason to worry as it is a general property of Bayes factors.

Out of interest, we may also plot the distribution of Bayes factors by the correlation between the measures:

The information contained in this plot is very interesting. The size of the Bayes factors increases with increasing correlation (however, apparently this is not the case when the null hypothesis is true!). This is why using within-subject designs usually pays off: the correlated nature of the data leads to more informative inferences, at least when there is an effect.

Limitations

In this post, I restricted the investigation of the Bayes factor to only a subset of all conceivable conditions. \(N\) might be much larger than 200. Maybe adjusting the means also has an effect on the detectability of differences in variance. Moreover, the investigation was limited to normal input data. Real data in psychological research is never truly normal, and in many cases not even approximately normal. There is some evidence that the Morgan-Pitman test that is the basis for the Bayes factor comparing dependent variances is particularly susceptible to deviations from normality, so exploring non-normal data should be done in the future. However, simulating correlated non-normal data is hard. Still, this is plans for the future.

Conclusions

Based on the current simulations, I would conclude that the Bayes factor for the comparison of dependent variances works as one should expect. Of course this simulation is not an exhaustive evaluation, but it gives some reason for optimism.

Last updated: 2019-11-25

Evaluating a Bayes factor for the comparison of dependent variances

Martin Papenberg

November 22, 2019

The simulation

Limitations

Conclusions

Back to the front page