With the advent of so-called “default” Bayesian hypothesis tests, scientists in applied fields have gained access to a powerful and principled method for testing hypotheses. However, such default tests usually come with a compromise, requiring the analyst to accept a one-size-fits-all approach to hypothesis testing. Further, such tests may not have the flexibility to test problems the scientist really cares about. In this tutorial, I demonstrate a flexible approach to generalizing one specific default test (the JZS
The
Consider the following scenario that is often used when assessing the effect of some treatment. Let
and then defining two competing hypotheses: a null hypothesis ℋ_{0} :
where
where
This idea is well known to practicing social and behavioral scientists. However, there are some properties of this procedure that are suboptimal for robust inference. For one, the procedure is asymmetric (Rouder
Another criticism of the traditional hypothesis testing procedure is that researchers often misinterpret the results of such tests. Hoekstra
In light of these criticisms, the social and behavioral sciences have seen an increase in recommendations to find alternatives to the orthodox use of NHST (Wagenmakers
One way to think of
While this computation is fundamentally quite basic, one immediate consequence is how it can be used for comparing two hypotheses. Suppose as above that we have two competing hypotheses: a null hypothesis ℋ_{0} and an alternative hypothesis ℋ_{1}. We can directly compare our posterior beliefs in these two hypotheses by computing their ratio
As with Bayes’ theorem,
There are two immediate advantages to using the Bayes factor for inference. First, the Bayes factor is a ratio, and thus, is subject to a natural interpretation. Simply put, larger is better - the bigger the Bayes factor, the bigger the weight of evidence provided by the observed data. Second, since there was no specific assumption about the order in which we addressed ℋ_{0} and ℋ_{1}, we could have just as easily measured the weight of evidence in favor of ℋ_{1} over ℋ_{0}. In fact, once we have a Bayes factor in favor of one hypothesis, a simple reciprocal will give us the Bayes factor in favor of the the other hypothesis. In our example above, the Bayes factor for ℋ_{1} over ℋ_{0} would have been 1/5, implying that the data would actually decrease our relative belief in ℋ_{1} over ℋ_{0}. Because we can compute Bayes factors from either direction, we must be careful to define our notation carefully. In this paper, I will adopt the common convention to define
Though the previous discussion certainly speaks positively about the benefits of using the Bayes factor as a tool for inference, there are some important considerations that the researcher must address before implementing it as a tool for inference. First, as we’ll see in the discussion below, the Bayes factor requires the analyst to specify prior distributions on all parameters in the underlying model. Thus, a given Bayes factor reflects a specific choice of prior. Second, the computation of these Bayes factors is usually nontrivial. For example, computing the either the numerator or denominator of
where
In this paper, I will provide a tutorial with particular focus on extending one type of Bayesian model comparison known as the Jeffreys-Zellner-Siow
The JZS
and
These computations require placing priors on
where
Though computationally convenient, this JZS Bayes factor formula has a few disadvantages. First, it reflects a very specific choice of prior specification. Though using a Cauchy prior on effect size may be a reasonable choice for many researchers, especially in the behavioral sciences (Rouder
Another disadvantage of the JZS Bayes factor is that it forces the analyst into a very specific hypothesis test; that is, ℋ_{0} :
In this section, I describe a method for extending the default JZS
The core method relies on a result known as the Savage-Dickey density ratio (Dickey and Lientz, 1970), which states that the Bayes factor for a pair of models in which one of the models is a one-point restriction of the other (i.e., ℋ_{0} :
The Savage-Dickey density ratio can stated rigorously as the following proposition:
By definition, the Bayes factor is the ratio of marginal likelihoods over ℋ_{0} and ℋ_{1}, respectively. That is,
The key idea in the proof is that we can use a “change of variables” technique to express
By Bayes’ Theorem, we can rewrite this last line as
Thus we have
The beauty of Proposition 1 is that it allows one to calculate the Bayes factor for a point null hypothesis (i.e, ℋ_{0} :
The Savage-Dickey density ratio is an elegant solution to the problem of computing Bayes factors in situations involving a point null hypothesis ℋ_{0}. All that is required is that one can compute samples from the posterior of an effect size parameter under a specified alternative model ℋ_{1}. I think casting the problem in the context is preferable, not only for its flexibility, but especially given the broad class of computer methods now available for sampling posteriors in Bayesian models, including BUGS (Lunn
Greta is an R package designed for sampling from Bayesian models. It provides a reasonably simple language for modeling that is implemented directly within R, eliminating the need for writing models in another language (e.g., JAGS, Stan) and then having to call these external files from within R. Further, Greta uses the computational power of Google TensorFlow (Abadi
To illustrate how Greta works, we will first look at a model inspired by that which was initially described by Wetzels
In this model, our data
As ℋ_{0} is a sharp hypothesis nested within ℋ_{1}, we can apply Proposition 1 (the Savage-Dickey density ratio) and compute
To do this, we’ll need to draw samples from the posterior distribution of
To begin, let us consider a simple example, the type of which can be found in most elementary statistics textbooks. This example comes from Hoel (1984). Suppose that 10 patients take part in an experiment on a new drug that is supposed to increase sleep in the patients. Table 1 shows the hours of sleep gained by each patient (negative values indicate lost sleep). Assuming that the sample data are normally distributed with mean
Once the sampling is complete, there are two ways to inspect the samples before proceeding to our inference. The first is to type
We will now look at how to compute Bayes factors necessary to compare the models ℋ_{0} and ℋ_{1}. We will do this by computing the Savage-Dickey density ratio. Recall from Proposition 1 that in order to compute
Listing 2 shows the R code necessary to both (1) plot the ordinates from the prior and posterior densities for
From here, there are two paths worth exploring. First, I typically will produce a plot showing the components of the Savage-Dickey density ratio. Such a plot will consist of (1) a plot of the prior density for
Next, we can compute the Savage-Dickey density ratio using the code on lines 24–30. Lines 24–25 compute the specific ordinates required.
Now, suppose the analyst had a different a prior belief about the effect sizes he or she would expect in this context. For illustration, let us suppose that
The methods described above will readily scale up to problems involving two independent samples. All that is required is that the underlying model is adjusted accordingly. I will illustrate this with another example from Hoel (1984).
Consider a sample of 20 rats, each of which receives their main source of protein from either raw peanuts or roasted peanuts. To compare weight gains as a function of protein source, a researcher randomly assigns 10 rats to receive only raw peanuts and 10 rats to receive only roasted peanuts. The resulting weight gains (in grams) are displayed in Table 2.
First, we must consider the underlying model. As with the single-sample example, we can represent this model as a directed acyclic graph, which is shown in Figure 3. In this model (inspired by Wetzels
For concreteness, let us denote the sample of weight gains from the raw peanut diet as
After completing the code in Listing 5, the Savage-Dickey density ratio can be plotted and computed as we did earlier in Listing 2. Figure 4 shows this ratio graphically; indeed, note that the posterior density of
In the previous section, I described an extension of the JZS
Consider again the sleep example above. What if instead the researcher wanted to whether the new drug
and
Once these two Bayes factors are computed, one can use transitivity of Bayes factors to compute
The mechanics of the encompassing approach can be summarized in the following proposition:
Consider first that for any model ℋ
Thus, we can write the marginal likelihood for
Taking the ratio of the marginal likelihoods for ℋ_{1} and the encompassing model ℋ
Now, both the constrained model ℋ_{1} and the encompassing model ℋ
Because ℋ_{1} is nested within ℋ
where
By similar reasoning, we can write the posterior as
This gives us
Note that by definition, 1/
It might seem a bit odd to represent the fraction
To illustrate the computation of Bayes factors with the encompassing approach, let us consider the problem mentioned immediately above – suppose we wanted to test whether the drug that we administered to sleep patients actually
The R code necessary to perform these computations is in Listing 4. As the encompassing model is defined identical to that from Figure 1, the code assumes that we have already drawn samples from that model, as we did in Listing 1. Note that just like our previous computations with the Savage-Dickey density ratio, using the encompassing approach requires that we sample from the posterior of
The key steps in Listing 4 are as follows. First, we will compare ℋ_{0} to the encompassing model ℋ
Next, we do a similar computation with ℋ_{1} versus the encompassing model ℋ
This approach can be extended to test a wide variety of hypotheses involving inequality constraints. One particular advantage of the encompassing approach is that it gives us the ability to test interval null hypotheses – that is, hypotheses of the form ℋ_{0} : −
An example of such computation is displayed in Listing 5. Like in the example above, we define three hypotheses: two competing hypotheses ℋ_{0} : |
As in the example before, we use the posterior samples for
In this tutorial, I have demonstrated a flexible approach to extending the default JZS
I am grateful to the handling editor and two anonymous referees for their comments on an earlier version of this manuscript.
Example data for a single-sample
Patient | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Hours gained | 0.7 | −1.1 | −0.2 | 1.2 | 0.1 | 3.4 | 3.7 | 0.8 | 1.8 | 2.0 |
Example data for a two-sample
Raw | 62 | 60 | 56 | 63 | 56 | 63 | 59 | 56 | 44 | 61 |
---|---|---|---|---|---|---|---|---|---|---|
Roasted | 57 | 56 | 49 | 61 | 55 | 61 | 57 | 54 | 62 | 58 |
Five-number summary of log(
BF type | Min | Median | Max | Consistency | ||||
---|---|---|---|---|---|---|---|---|
0 | 20 | JZS | −1.77 | −1.71 | −1.51 | −1.12 | 1.76 | |
sampling | −1.95 | −1.70 | −1.51 | −1.08 | 1.88 | 0.985 | ||
50 | JZS | −2.20 | −2.16 | −2.01 | −1.56 | 1.51 | ||
sampling | −2.47 | −2.16 | −1.99 | −1.53 | 2.49 | 0.985 | ||
80 | JZS | −2.43 | −2.38 | −2.20 | −1.69 | 4.21 | ||
sampling | −2.55 | −2.36 | −2.18 | −1.61 | 2.90 | 1.000 | ||
0.05 | 20 | JZS | −4.43 | 0.23 | 1.11 | 1.61 | 1.77 | |
sampling | −4.95 | 0.27 | 1.12 | 1.62 | 2.04 | 0.995 | ||
50 | JZS | −10.86 | 0.08 | 1.48 | 1.97 | 2.20 | ||
sampling | −4.95 | 0.27 | 1.12 | 1.62 | 2.04 | 0.975 | ||
80 | JZS | −8.09 | −0.79 | 1.14 | 2.06 | 2.43 | ||
sampling | −6.96 | −0.79 | 1.16 | 2.08 | 2.51 | 0.945 | ||
0.2 | 20 | JZS | −10.98 | −1.76 | 0.39 | 1.44 | 1.77 | |
sampling | −19.90 | −1.81 | 0.25 | 1.42 | 1.86 | 0.975 | ||
50 | JZS | −23.85 | −4.25 | 0.05 | 1.51 | 2.20 | ||
sampling | −12.39 | −3.19 | 0.01 | 1.49 | 2.36 | 0.965 | ||
80 | JZS | −37.79 | −6.78 | −1.13 | 1.67 | 2.43 | ||
sampling | −25.60 | −4.77 | −1.11 | 1.85 | 2.65 | 1.000 |