Running CG with other prediction programs
- by Mirela Andronescu, last modified Mar 6, 2008.

CG (Constraint Generation) is a computational approach that estimates free energy parameters used by RNA secondary structure prediction software. CG was described in detail in:
M. Andronescu, A. Condon, H.H. Hoos, D.H. Mathews, and K.P. Murphy, ``Efficient parameter estimation for RNA secondary structure prediction'', Bioinformatics 2007 23(13): i19-i28,

CG can essentially work with any RNA secondary structure prediction software, as long as the energy function is linear or quadratic in the parameter vector. You just need a prediction function and a few other functions for your model (see details below). One option is to send me the necessary files, and I'll run CG with your files for you. Here's what you need:
  1. Configuration file
  2. Initial parameter file
  3. Training data set
  4. Testing data set
  5. Code to create the structural constraints
  6. Code to predict and analyse results of new parameters
  7. [optional] Thermodynamic file
  8. [optional] File that specifies which parameters are fixed and which are variable
  9. [optional] File with additional constraints
When you have all these, please send me a directory with all these files. Here's a sample directory, where I used Simfold as the prediction program: Simfold.tar.gz



1. The configuration file is a file where you specify the names (and path) of all the other files on this web page. Read the rest of the document first.
This file also contains some input options for CG. I'll make sure I'll test several such options, for best performance. If you have strong opinion about what these options should be, please make sure you make it clear in the configuration file.
Here's a configuration file example: config_sample.txt



2. Initial parameter file. This is a text file, with the values of the initial parameters, one per line. Here's an example: turner_parameters_fm363_constrdangles.txt



3. Training data set. This is one text file, to be used as "structural training set" (see paper). There are two options:
The training set should be comprehensive enough for good training. The better it is, the better the quality of the estimated parameters.



4. Testing data set. Exactly the same format as the training data set, you can use one of two options above. The molecules in this set should be different from the ones in the training data set.



5. Code to create the structural constraints. You need to create an executable that takes as input a data set, and writes two output files (see details below). The minimum you need for this is:


6. Code to predict and analyse results of new parameters. You need to create an executable that takes as input a set of parameters compatible to your model and a data set file. The program predicts structures with the new parameters and computes the accuracy obtained. The functions you need are:


7. [optional] The thermodynamic file is one text file, to be used as constraints corresponding to the thermodynamic set (see paper). I have a file with all optical melting experiments that I could find, in XML format. I need a piece of code that creates the linear constraints corresponding to this file, for your model. The code will be similar to the code to create the structural constraints.
TODO. I will provide a model soon, talk to me if you need this.



8. [optional] File that specifies which parameters are fixed and which are variable.
Sometimes you might want to keep some parameters fixed to some values. If so, start from a file like the initial parameters file, and replace every value that you do NOT wish to keep fixed by the word "variable". Here's an example in which parameters with the index 205 and 259 have fixed values, and all the others are variable: params_fix_205_259.txt



9. [optional] File with additional constraints. Sometimes you need to specify some constraints for some variables. For example, in the following example we want all dangling end parameters to be negative or zero, and we want the 3' dangling ends to be less than or equal to the 5' dangling ends: constraints_dangling_ends_fm363.txt