Compilation and setup
Running CG with
Compilation and setup
- The package MultiRNAFold
has to be downloaded and compiled. You may copy it under the same
directory as CG, or you may modify the Makefile.
- CG currently uses LAM-MPI for the prediction step at each
iteration. You need to have a file called hostfile in the CG directory. This
file lists the machines you want to use. You have to run lamboot -v hostfile to start the
- Type make to
compile the C++ programs that are called by the main file.
- CGlearn.pl is the
main file. It calls various other files, outlined below.
- options_*.txt (e.g.
or options_best_ISMB.txt) are
files that are given as input
to the main file.
is a Perl script that creates files with the upper and lower bounds on
the initial parameters.
- Directory data
contains training and test data sets, parameter files, and other files
contains the CPLEX constraints coming from the thermodynamic set T-Full.
a script which picks the best parameter set according to the f-measure
on the training set, and then tests this set on the provided test set.
It is called by CGlearn.pl, but it can also be called separately.
- The training files are called TRA_*.txt, and the test files
are called TES_*.txt.
The format is:
- documentation line;
- RNA sequence, all on one line;
- RNA secondary structure in dot-parentheses format, all on
- optional, a ''restricted string", which was born from
processing the original structures;
- the secondary structure predicted with the initial
parameter set (usual Turner99).
- an empty line.
- You can type "perl
CGlearn.pl" at the command prompt, and you get a usage message.
The easiest way is to modify the options file options_151Rfam.txt provided and
run "perl CGlearn.pl options_151Rfam.txt". Or you can
give a bunch of options at the command prompt, see the usage message.
- If, for some reason, CGlearn.pl stops during some
can run the same command, and it will continue from the last completed
iteration. So you don't have to run it all over again.
- CG creates a directory with a complicated name, depending
input options provided. This directory contains several files of
contains the accuracy of the prediction on the training set, at each CG
iteration. After all iterations are done, CG finds the best parameter
set according to the f-measure on the training set, and tests it on the
testing set specified as an input option. This information is written
in the results_final.txt file, at the end.
are the parameters estimated at each iteration.
contains the input options and a log of the run.
is just a copy of the input options file.