Protein sectors: evolutionary units of three-dimensional structure. Halabi N, Rivoire O, Leibler S, Ranganathan R. Cell. 2009 Aug 21.
This paper proposes a new method called ‘sectors’ for extracting functional units of proteins from the covariation of residues in a family of homologous sequences. It is necessary to consult the Supplement to discover that the authors are doing standard principal component analysis of the residue frequency correlation matrix with some ad-hoc residue weighings that are not justified. The main text confers an air of mystery on the whole procedure by citing econometrics from the early 2000’s (hence the name) and random matrix theory (this all did not mean much to me, but sounds sophisticated, and maybe biologists will understand). How do they decide how many ‘sectors’ to keep? They take the top modes ordered by eigenvalues! The first (all positive) mode is assumed to account for all phylogenetic effects p776 and dropped, with no evidence (or rather we are given an economic analogy). One may worry since they generate residue frequencies by just counting all sequences in their set, pS6, so if one of the sequences is replicated the correlation matrix changes. Are the eigenmodes beyond the first invariant? We are not told.
Proteins are important. If the method is black magic but works, would most people care? Figure 6 shows a 3x3 matrix with the sectors on one axis and three functional properties on the other: substrate specificity, phylogeny (yes, ‘vertebrate vs non’), and enzymatic activity. We see the matrix is roughly diagonal, and realize what we saw from the crystal structure that the respective sectors are grouped around the substrate recognition and catalytic residues, and those that are neither reflect phylogeny. Experiments are also reported, at least some are not surprising, e.g., mutations in the phylogenic sector do not affect catalysis. There is a large expert community working on protein structure-function than can better evaluate this work and previous papers from Ranganathan than I can. Parts that I can follow are not credible.
Several papers that deal better with the problems of deducing function from residue correlation, in my view are Thattai 2007 , Burger 2008 , and Weigt 2009 (plus other papers from T. Hwa), and are not cited.