|
Compilation and setup
Source files
Data files
Running CG
Output files
Running CG with
other
prediction
programs
Compilation and setup
- The package MultiRNAFold
has to be downloaded and compiled. You may copy it under the same
directory as CG, or you may modify the Makefile.
- CG currently uses LAM-MPI for the prediction step at each
iteration. You need to have a file called hostfile in the CG directory. This
file lists the machines you want to use. You have to run lamboot -v hostfile to start the
lamd deamons.
- Type make to
compile the C++ programs that are called by the main file.
Source files
- CGlearn.pl is the
main file. It calls various other files, outlined below.
- options_*.txt (e.g.
options_151Rfam.txt
or options_best_ISMB.txt) are
files that are given as input
to the main file.
- get_percentage_lb_ub.pl
is a Perl script that creates files with the upper and lower bounds on
the initial parameters.
- Directory data
contains training and test data sets, parameter files, and other files
- data/all_thermodynamic_constraints_fm363_l2norm.lp
contains the CPLEX constraints coming from the thermodynamic set T-Full.
- pick_best_training.pl
is
a script which picks the best parameter set according to the f-measure
on the training set, and then tests this set on the provided test set.
It is called by CGlearn.pl, but it can also be called separately.
Data files
- The training files are called TRA_*.txt, and the test files
are called TES_*.txt.
The format is:
- documentation line;
- RNA sequence, all on one line;
- RNA secondary structure in dot-parentheses format, all on
one line;
- optional, a ''restricted string", which was born from
processing the original structures;
- the secondary structure predicted with the initial
parameter set (usual Turner99).
- an empty line.
Running CG
- You can type "perl
CGlearn.pl" at the command prompt, and you get a usage message.
The easiest way is to modify the options file options_151Rfam.txt provided and
run "perl CGlearn.pl options_151Rfam.txt". Or you can
give a bunch of options at the command prompt, see the usage message.
- If, for some reason, CGlearn.pl stops during some
iteration, you
can run the same command, and it will continue from the last completed
iteration. So you don't have to run it all over again.
Output files
- CG creates a directory with a complicated name, depending
on the
input options provided. This directory contains several files of
interest:
- results_final.txt
contains the accuracy of the prediction on the training set, at each CG
iteration. After all iterations are done, CG finds the best parameter
set according to the f-measure on the training set, and tests it on the
testing set specified as an input option. This information is written
in the results_final.txt file, at the end.
- params_sub-*.txt
are the parameters estimated at each iteration.
- output_verbose.txt
contains the input options and a log of the run.
- input_options.txt
is just a copy of the input options file.
|