Photobleach Recovery

Purpose:

    This program was written to simulate diffusion in a two dimensional isotropic but inhomogeneous media. It is has been productively used to     follow photobleach recovery in whole cells for both soluble and membrane confined fluorescent markers. Soluble markers can be inhomogeneous because they are excluded from the nucleus or the cell is variable thickness and is being viewed in projection. For endoplasmic reticulum markers the inhomogeneity is obvious in the image. The code assumes a continuous media, so is only appropriate if bleaching on scale much larger than the spacing of the ER tubules

Coding:

    Done in C, rather unprofessionally, with a clutzy interface to read the tiff images from the microscope

References:

    method

Availability:

    Rockefeller does a free academic license (material transfer agreement in legal parlance), do email EDS. Source code provided.

Dictionaries for Genomes

Purpose:

    The program 'MobyDick' was developed to find find over represented words (strings with a few degenerate base symbols) in samples from 10's of kb to several mb in length. Its a maximum likelihood method, and assumes that the data is generated by picking words at random from a dictionary with their frequencies which are the fit parameters. Words are added with increasing length when the current model under predicts their frequency in the data. Shorter words may disappear as a result. Empirically, the final solution is unique or nearly so. Convergence is improved when long exact repeats are removed with the REPuter program. A nice illustration of how words are built up from syllables is provided by the recovery of an English dictionary from the novel Mobydick reduced to a long string of [a-z].

The authors have experimented with converging entire weight matrices with this algorithm and then multiple minima are a fatal problem. The algorithm is completely hopeless for modeling sequence where the information is the absence of a few words in otherwise random sequence. A plausible extension would be to segment

Coding:

    Intelligently done in generic C with various Perl scripts to tie things together. The script for generating RegEx words is greedy and a bit klutzy.

References:

    method , application .

Availability:

    Email EDS and get a tar ball with source. Or try Hao Li's web site here .

Ahab and Stubb

Purpose:

    These are two descendants of MobyDick, (we skipped Starbuck on the crew list to not confuse the coffee drinkers). Both programs fit a set of predefined weight matrices to data. We do not use Ahab any more because of some coding infelicities and expanded functionality in Stubb, which handles multiple species with an evolution model that we have used in our PhyloGibbs code, and also fits and scores positional correlations between motifs.

Coding:

    Decent C and perl, requires prealignment of species, we use mlagan

References:

    Basic algorithm for Ahab and, Stubb , and applications to fly, one or two species

Availability:

    Email EDS for a tar ball with Ahab, for Stubb source code, contact S. Sinha . There is also a stubb web site , (which does not implement the motif correlation feature however). There is also useful graphical depiction of binding sites within a module called 'windowfit' at this web site. For genome wide runs, you have to run the program locally.

Clustering motifs

Purpose:

    Assume you have found a few thousand 'mini-weight matrices" by running some motif finder on orthologous sequences from a collection of genomes. How many different motifs are represented and how does one find the composite weight matrices? Clustering is accomplished by sampling the distribution of all possible ways of generating the mini-weight matrices by sampling an unknown number of unknown weight matrices. There is no need for ad hoc similarity scores we implement a Bayesian model for hypothesis that the data derives by sampling independent weight matrices. In the authors humble opinion there are no competitors for this task. A weakness is that motifs of substantially different widths are not treated.

References:

    The original application to E.coli .

Availability:

    Contact van Nimwegen for the PROSCE code. It was professionally coded in C, C++.

Motif finding in related species

Purpose:

    We have implemented a Gibbs sampler that uses the Stubb model for motif evolution. It will correctly weight data when several species are close and others more removed, from the common ancestor. It will also realistically assess the significance of the motifs thus found. It has been described by some early users as a 'tank' it crushes most applications, but with some startup costs, and its not fast, (but a lot quicker than an experiment!). The same evolution model, but searching with maximum likelihood is available in the PhyME code. The code allows the use of prior information about the motif, such as would be obtained from a structural model.

References:

    With application to the yeast ChIP-chip data .

Availability:

    Contact the primary authors, Erik van Nimwegen or Rahul Siddharthan . There should we a web interface available.

Protein structure and DNA binding specificity

Purpose:

    A web site to automate the task of finding the closest known co-crystal structure for an arbitrary transcription factor and determine its binding specificity according to our contact model. This prior on the weight matrix can then be used to bias a Gibbs sampler search for the motif in a cluster of sequences that may contain it.  This site should eventually move to Alex Morozov’s group at Rutgers.

Reference:

    The original paper and a second with applications to yeast ChIP-chip data.

Items are displayed in rough chronological order, with a succinct indication of purpose, coding, the original reference, and availability.

Kinetics of RNA folding

Purpose:

    RNA folding is an attractive problem for a ‘multi-scale’ simulation in which the molecule is described by a list of stems whose energy is computed from the elaborate stacking/pairing energy tables fit from thermodynamics.  Its not a trivial matter to find the lowest saddle point between two configurations (which can involve partially made stems).  We also solved the problem of computing energies of the simpler  pseudo-knotted configurations, using polymer approximations on the loops and the same thermodynamic energies for the stems.  So there is an application just to ground states, since the Zuker dynamic programming code does not do pseudo-knots.  RNA biochemists and structure people say this is naive since various stably positioned Mg’s are ignored etc.  Consult the original paper and later ones from Isambert to see how well naivety did back in 2000.

References:

    method

Availability:

    Consult Herve Isambert’s web site at Inst. Curie.

Image processing yeast time lapse

Purpose:

   In collaboration with Fred Cross’ lab at Rockefeller, I wrote a segmentation code in MATLAB for a phase contrast image of a 2D yeast colony, plus various subprograms to integrate various fluorescent markers.  A programmer created a GUI to step through the movie, hand annotate when cells divide based on markers (this noone has  automated to my knowledge since the colonies are dense), and with option to correct the segmentation.  Its a bit clutzy and other labs have similar tools; one from Duke and another from Skotheim at Stanford

References:

    None really thorough, see various papers with Cross lab beginning with Bean etal.

Availability:

    Contact EDS for a tar ball.