In the least surprising development imaginable, I’m currently working on a manuscript that involves Bayesian model selection via Bayes factors. More details in the months to come, but one aspect of the project involves the extent to which Bayes factors are sensitive to prior specification. Within the context of Bayesian model selection, the researcher must specify two layers of prior belief. First, the researcher specifies priors on the space of candidate models. Second, priors are chosen for the parameters of those models. In a world simpler than ours, the prior on the model space would fully encapsulate known information and preferences among the candidate models. For example, setting the model prior to some flavor of a uniform distribution (on the models directly or perhaps a uniform distribution on classes of models of each size) would reflect an uninformed prior preference among the models.
As it turns out, the world is not so simple. Bayes factors are also sensitive (sometimes wildly) to the priors on the parameters. Generally, more diffuse priors on parameters lead to a preference for simpler models. Further, the tail behavior of priors as related to likelihoods can have a surprising impact on resulting Bayes factors. For the purpose of fitting Bayesian models, usually an increase in the sample size lets the data (through the behavior of the likelihood function) increasingly dominate inference, so the choice of prior becomes less important. This is not so for Bayesian model selection, where increasing the sample size will often exacerbate problems due to the unfortunate choice of prior. Thus, two researchers who analyze the same data, choose the same likelihood function, select the same prior on the model space but use different priors on the parameters of the candidate models might get wildly different Bayes factors and hence conclusions, even when the differences in those priors might seem innocuous (e.g. do you prefer gamma(.1,.1)? gamma(.01,.01)? gamma(.0001,.0001)?). This instability is present in cases where Bayes factors can be calculated analytically. As a further difficulty, Bayes factors approximated via Monte Carlo Markov chain can be highly unstable, adding yet another layer of uncertainty into the metric which we are ostensibly using to gain certainty about which models are best supported by observed data.
I’ll confess that my initial interest in Bayes factors has been partially due to the fact that many arguments against p-values have merit. Bayes factors seem like a natural alternative to p-values for hypothesis testing and model selection. As a younger statistician, I even fantasized a bit about the possibility that Bayes factors might offer a panacea against certain drawbacks of p-values. Now I am not so sure. To those who rightly worry that hypothesis testing based on comparing p-values to specific thresholds is a process that is often “hacked,” it would seem that Bayes factors are much easier to hack.
So why continue my interest in Bayes factors? To me, there is something (beyond the obvious philosophical appeal) that is fascinating about Bayesian model selection based on Bayes factors. To use Bayes factors responsibly, you must explore their sensitivity to prior beliefs. This sensitivity analysis provides a formal way to understand where someone with a different worldview for their parameters might draw conclusions opposite to yours even when the observed data are agreed upon. I want science to move in a direction where researchers are rewarded less for increasingly sensational claims paired with a falsely overstated assertions of confidence. The idea that individual studies based on finite samples (often drawn for convenience) provide some inferential lens to the true state of the universe borders on hubris, come to think of it. So, for now, I hope Bayes factors will gain more traction because their analysis reveals such sensitivities. More to come.