Article Text

PDF

Modelling with multiple explanatory variables
  1. Pamela Warner
  1. Centre for Population Health Sciences, University of Edinburgh, Edinburgh, UK
  1. Correspondence to Dr Pamela Warner, Centre for Population Health Sciences, University of Edinburgh Medical School, Teviot Place, Edinburgh EH8 9AG, UK; p.warner{at}ed.ac.uk

Statistics from Altmetric.com

This statistical technique has been used in two papers in this issue of the Journal, namely those by Kotb et al.1 and Schembri et al.2 These notes are intended to provide a supplementary explanation of this method. [See Box 1 for a glossary of terms used in this article.]

Box 1 Glossary of statistical terms used in this article

What is it?

Multivariable modelling is the use of statistical modelling techniques, applied to a dataset for a group of individuals – that dataset comprising some known outcome or group membership (usually binary), plus a set of variables that potentially ‘explain’ that outcome/group membership.3

When/why is it useful?

It is often the case that research seeks to understand better the nature of the association between some condition/group of interest and a set of potential explanatory variables – which might be demographic, behavioural, exposures, and so on.4,,7 This understanding can be difficult to achieve because of the fact that the outcome is associated with numerous potential explanatory variables, and because, often, these potential explanatory variables are associated among themselves. For example, lack of knowledge about contraception, and difficulty in affording it, might both be associated with low socioeconomic status. Multi-variable modelling can reveal the association of a potential explanatory variable with the outcome, adjusted for (or ‘independent’ of) all the other explanatory variables in the model.

In health care it is sometimes the case that understanding of the precise associations/causes is not of prime importance, whereas what would be useful would be to be able categorise patients into groups, on the basis of known or easily ascertained information about them. In such circumstances, it is often a further requirement for the multi-variable model developed that it is ‘parsimonious’, including only as many explanatory variables as are needed to give good prediction. Subject to further validation in independent data, and ideally a randomised trial to evaluate the benefit of the implementation of such a prognostic algorithm, a prognostic model/algorithm is derived, for the population from which the data have been drawn.8

What precautions are needed?

The number of explanatory variables, and the extent to which they are interrelated, can create difficulties for the mathematics of the modelling. Therefore it is a common precaution to subject the potential explanatory variables to prior screening by means of separate (univariate) analyses of association of outcome with potential explanatory variable, and then include in, or offer to, the multi-variable model, only those variables with some degree of association with the outcome variable. This avoids overloading the initial model with too many variables for the study size, a particular concern when the explanatory variables are highly related one with another.

Example

In the associated research paper by Kotb et al., univariate analyses were reported for 27 potential explanatory variables, and for the multi-variable model this number was reduced to the 18 that satisfied their criterion (p<0.10) for consideration in the multi-variable model.1 (Age was also included in the MV model.) The authors report the odds ratio (ORs) of association from the multivariable model for the five variables most strongly associated with unmet contraceptive need. Table 1 presents these ORs, and the ORs corresponding to the univariate associations reported in Kotb et al.'s paper. It can be seen that after adjustment for age and other explanatory variables in the multi-variable model, there was some change in the size of the ORs – one OR has increased, whereas the other ORs are smaller after adjustment. Some of the decreases are likely to be because of shared information across the variables (about unmet contraceptive need). For example, if variable A and B are both associated with ‘unmet need’, but some of that association is common to them both, then in a multivariable model with both A and B included, and all else being equal, neither will show as strong an adjusted association (or, equivalently, as extreme an OR), as was found in univariate analyses. Alternatively, the univariate association of variable C with unmet need might be confounded by other variables. If the confounding variable happens to be one of the other explanatory variables that is included in the multi-variable model, D say, then in the multi-variable model, adjusted association of C with unmet need will be free of confounding by D, which could result in a change in the OR, in either direction. Such might be the case for ‘previous side effects’ where the univariate OR is 5.0, but the multi-variable OR is 5.7.

Table 1

Comparison of odds ratios summarising association, for univariate and multi-variable analyses*

Overview

Given the different approaches to multi-variable modelling that can be taken, the strategy used should reflect the research aims. Care is needed with interpretation of the results of analyses of association, in particular the ORs in a multivariable model, where there is adjustment for all variables in that model.

References

View Abstract

Footnotes

  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.