Causal Inference for Aging

Understanding Calico's recent paper

Causal Inference has many researchers excited about the prospect of automated causal reasoning. This kind of reasoning is the backbone of scientific research and legal processes.

Calico, the aging research arm of Alphabet, recently released a new paper, Time-resolved genome-scale profiling reveals a causal expression network. in it, they describe a framework, CANDID (Causal Attribution Networks Driven by Induction Dynamics), for predicting transcriptional regulators that can be validated experimentally

The primary model for the inference machine is as follows:

For learning the causal relationships the authors describe the following model:

Δln(yijt)Δt=k(αik(ykjt1)+βik(yijtykjt1))yijt \frac{\Delta \mathrm{ln}(y_{ijt})}{\Delta t} = \sum_{k} \frac{(\alpha_{ik}(y_{kjt}-1)+\beta_{ik}(y_{ijt}y_{kjt}-1))}{y_{ijt}}

with kk being the index for all given genes. Let’s unpack this.

yijty_{ijt} is the expression relative to the control strain and relative to time zero of a gene ii in a timecourse jj at a time tt.

In other words, for a given treatment rr and control gg,

yijt=(rijtgijt)(rij0gij0)yij0=1{I,J})y_{ijt} = \frac{(\frac{r_{ijt}}{g_{ijt}})}{(\frac{r_{ij0}}{g_{ij0}})} \\ \therefore y_{ij0} = 1 ∀\{I, J\})

Here, α\alpha represents the linear effect of one transcript on another and β\beta represents the effect proportional to the target transcript. In this model any transcript is allowed to affect any other transcript, and thus we sum over all genes (with index kk).

Since most genes will not be regulatory, we use L1 regularization (i.e., LASSO) to shrink uninformative predictive coefficients to zero. We also enforce a predicted rate of change of zero at time zero, reflecting the pre-induction steady-state assumption. To arrive at this formula, we considered a suite of alternative data cleaning and modeling approaches (see Supplement for details) and decided upon this formalism and hyperparameters based on an ability to predict held-out induction datasets (in total, 50 million regressions performed).

Figures

Figure 1 from the paper
Figure 2 from the paper
Figure 3 from the paper
Figure 4 from the paper
Figure 5 from the paper
Figure 6 from the paper

Further reflections

This paper seems to be pretty in-line with Calico’s approach of throwing massive amounts of data and processing power at problems.

Subscribe to know whenever I post new content. I don't spam!


At least this isn't a full screen popup

That would be more annoying. Anyways, if you like what you're reading, consider subscribing to my newsletter! I'll notify you when I publish new posts - no spam.