Causal Inference for Aging
Understanding Calico's recent paper
Causal Inference has many researchers excited about the prospect of automated causal reasoning. This kind of reasoning is the backbone of scientific research and legal processes.
Calico, the aging research arm of Alphabet, recently released a new paper, Time-resolved genome-scale profiling reveals a causal expression network. in it, they describe a framework, CANDID (Causal Attribution Networks Driven by Induction Dynamics), for predicting transcriptional regulators that can be validated experimentally
The primary model for the inference machine is as follows:
For learning the causal relationships the authors describe the following model:
with being the index for all given genes. Let’s unpack this.
is the expression relative to the control strain and relative to time zero of a gene in a timecourse at a time .
In other words, for a given treatment and control ,
Here, represents the linear effect of one transcript on another and represents the effect proportional to the target transcript. In this model any transcript is allowed to affect any other transcript, and thus we sum over all genes (with index ).
Since most genes will not be regulatory, we use L1 regularization (i.e., LASSO) to shrink uninformative predictive coefficients to zero. We also enforce a predicted rate of change of zero at time zero, reflecting the pre-induction steady-state assumption. To arrive at this formula, we considered a suite of alternative data cleaning and modeling approaches (see Supplement for details) and decided upon this formalism and hyperparameters based on an ability to predict held-out induction datasets (in total, 50 million regressions performed).
This paper seems to be pretty in-line with Calico’s approach of throwing massive amounts of data and processing power at problems.