Causal Inference for Aging

Understanding Calico's recent paper

Causal Inference has many researchers excited about the prospect of automated causal reasoning. This kind of reasoning is the backbone of scientific research and legal processes.

Calico, the aging research arm of Alphabet, recently released a new paper, Time-resolved genome-scale profiling reveals a causal expression network. in it, they describe a framework, CANDID (Causal Attribution Networks Driven by Induction Dynamics), for predicting transcriptional regulators that can be validated experimentally

The primary model for the inference machine is as follows:

For learning the causal relationships the authors describe the following model:

Ξ”ln(yijt)Ξ”t=βˆ‘k(Ξ±ik(ykjtβˆ’1)+Ξ²ik(yijtykjtβˆ’1))yijt \frac{\Delta \mathrm{ln}(y_{ijt})}{\Delta t} = \sum_{k} \frac{(\alpha_{ik}(y_{kjt}-1)+\beta_{ik}(y_{ijt}y_{kjt}-1))}{y_{ijt}}

with kk being the index for all given genes. Let’s unpack this.

yijty_{ijt} is the expression relative to the control strain and relative to time zero of a gene ii in a timecourse jj at a time tt.

In other words, for a given treatment rr and control gg,

yijt=(rijtgijt)(rij0gij0)∴yij0=1βˆ€{I,J})y_{ijt} = \frac{(\frac{r_{ijt}}{g_{ijt}})}{(\frac{r_{ij0}}{g_{ij0}})} \\ \therefore y_{ij0} = 1 βˆ€\{I, J\})

Here, Ξ±\alpha represents the linear effect of one transcript on another and Ξ²\beta represents the effect proportional to the target transcript. In this model any transcript is allowed to affect any other transcript, and thus we sum over all genes (with index kk).

Since most genes will not be regulatory, we use L1 regularization (i.e., LASSO) to shrink uninformative predictive coefficients to zero. We also enforce a predicted rate of change of zero at time zero, reflecting the pre-induction steady-state assumption. To arrive at this formula, we considered a suite of alternative data cleaning and modeling approaches (see Supplement for details) and decided upon this formalism and hyperparameters based on an ability to predict held-out induction datasets (in total, 50 million regressions performed).

Figures

Figure 1 from the paper
Figure 2 from the paper
Figure 3 from the paper
Figure 4 from the paper
Figure 5 from the paper
Figure 6 from the paper

Further reflections

This paper seems to be pretty in-line with Calico’s approach of throwing massive amounts of data and processing power at problems.


Cited as:

@article{mcateer2019causalaging,
  title   = "Causal Inference for Aging",
  author  = "McAteer, Matthew",
  journal = "matthewmcateer.me",
  year    = "2019",
  url     = "https://matthewmcateer.me/blog/causal-inference-for-aging/"
}

If you notice mistakes and errors in this post, don’t hesitate to contact me at [contact at matthewmcateer dot me] and I would be very happy to correct them right away!

See you in the next post πŸ˜„

I write about AI, Biotech, and a bunch of other topics. Subscribe to get new posts by email!


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

At least this isn't a full-screen popup

That'd be more annoying. Anyways, subscribe to my newsletter to get new posts by email! I write about AI, Biotech, and a bunch of other topics.


This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.