Yesterday, I enjoyed the company of Robin Evans, Oxford Statistician and causal inference expert. A few friends from the Bay Area Probabilistic Programming meetup were in attendence.

We discussed how the Bay Area machine learning community thinks about causal inference and causal models.

The goal of causal machine learning as evaluating and improving counterfactual predictions. A counterfactual prediction means predicting the data we would observe if we were to intervene in (i.e. change) the conditions that generated the training data. In other words, it predicts “what would happen if we were to do something different?”, like serving a new kind of ad or changing a gene. Using regular non-causal models like deep learning to predict counterfactuals is just extrapolation, which is often unreliable. Causal models model the mechanism that generates the data, and thus can represent changes interventions make to that mechanism. A key task in causal machine learning is addressing the nuances of randomization strategy.

Some references

We discussed the role of causal models in addressing algorithmic bias and discriminiation.

From the latter article: As statistical and machine learning models become an increasingly ubiquitous part of our lives, policymakers, regulators, and advocates have expressed fears about the harmful impact of deployment of such models that encode harmful and discriminatory biases of their creators.

Dat Nguyen asked Robin for his general recommendation on how to handle a causal problem where there’s potentially hidden latent variables, is high dimensional, and experimentation is possible but costly and/or time consuming (i.e. can’t just do ad hoc A/B testing).

Robin suggests using the PC algorithm or graphical lasso to try to get a sparse representation of the system (or at least to find the strongest apparent effects). He also suggested applying methods that try to obtain the direction of causal effects by making assumptions about additivity, sugh as Jonas Peters’ work on causal discovery, or the LINGAM method.

Having found some good candidates for important effects, the next step would be to try to run an experiment with interventions that would distinguish between models of interest. Finding the whole causal model would be very hard, but answering a narrower question is plausible. This experiment should make it much clearer what directions some effects are in, and how strong they are. Now you can update and repeat.

Theoretically, a small number of experiments is sufficient to identify the whole causal model, though this tends to assume that interventions are very clean in a way that often isn’t very realistic (see this (pdf)).

We discussed the use of the vanishing tetrad test in constructing latent variable models. “Tetrad” refers to the difference between the product of a pair of covariances and the product of another pair among four random variables.

  • Bollen, Kenneth A., and Kwok-fai Ting. “A tetrad test for causal indicators.” Psychological methods 5.1 (2000): 3. pdf

We also talked about Robin’s own work on in causal graphical models with latent variables.