Detailed description of the RNA Secondary Structure Analyser Output

 

The RNA Secondary Structure Analyser provides extensive information about a given RNA molecule and its secondary structure (represented in bpseq format). Following is a list of the information that is produced by the analyser.

 

General information:

--minimum  number of base pairs whose removal leaves the structure pseudoknot free

--positions of bases in a minimum-size set of base pairs whose removal leaves the structure pseudoknot free

--statement if there is more than one minimum-size set of base pairs whose removal leaves the structure pseudoknot free

--total number of bands (roughly, a band is a stem, or stems interspersed with bulge, internal, or multiloops, that overlaps with another stem, thereby forming a partial or whole pseudoknot) comprised of a minimum-size set of base pairs whose removal leaves the structure pseudoknot free

--positions of bases that close above bands

--number of pseudoknots

--their coordinates (if there are no pseudoknots, then no coordinates are reported)

--total number of bands

--total number of base pairs in bands

--total number of stems 

--total number base pairs not in bands

--total number of hairpin loops

--total number of bulge loops

--total number of internal loops

--total number of multiloops

--total number of external loops (this is always 1, and done mostly for error checking)

--total number of stacked base pairs found in non-pseudoknotted stems

--total number of domains (i.e. number of closing base pairs in external loop)

Note: the following are output only if they are non-zero:

--max/min/avg of hairpin lengths

--max/min/avg of internal loop lengths

--max/min/avg of bulge lengths

--max/min/avg of multiloop lengths

--max/min/avg of stem lengths

--max/min/avg of band lengths (where the length of a band is the number of its basepairs)

--total number (#) of bases and base pairs, and percentage (%) of all bases and base pairs  in:

   -- all of the stems

   -- all of the stems in bands

--total number (#) of free bases, and percentage (%) of all free bases in:

   -- all of the hairpins

   -- all of the bulges

   -- all of the internal loops

   -- all of the multiloops

   -- the external loop

   -- all of the bands

--total number (#) of closing base pairs, and percentage (%) of all closing base pairs in:

   -- all of the hairpins

   -- all of the bulges

   -- all of the internal loops

   -- all of the multiloops

   -- the external loop

   -- all of the bands

For each structural feature (individually), the following is listed:

 

External Loop Information:

-- name and unique number

--positions of bases in closing base pairs

--lengths (in unpaired bases), of external unpaired regions

--number of domains

--asymmetry measures:

       if there are two unpaired regions, the absolute difference between their lengths and the ratio of their lengths

       if there are more than two unpaired regions, the max/min/avg of the absolute difference between the lengths of any two regions and of the ratio of thelengths of any two regions

-- number of loops connected with this external loop

--total number of free bases and percentages of all free bases found in the external loop

--total number of base pairs and percentages of all closing base pairs found in the external loop

 

Hairpin loop, multiloop, internal loop and bulge information:

The summaries are very similar to that of the external loop. In the case of internal and bulge loops, one difference is that the number of branches is not specified, since it is always 2. In the case of a hairpin loop, additional items in the summary are:

--the sequence of unpaired bases (from 5' to 3')

--motifs that recognize this hairpin (or a statement that no such motif has been found)

--a statement if this hairpin is a tri or tetra hairpin (ie its length is 3 or 4).

 

Stem information:

We report information both on stems and on closing base pairs of loops, as follows:

--positions (closing base pairs, always four numbers)

--length (in paired bases)

--connections (to other loops)

--free stem energy

--all the non-canonical (NC) base pairs and their positions (this is not always reported, as sometimes there are no NCs in a stem). We consider as canonical base pairs: AU (or UA), CG (or GC) and UG (or GU), the other seven base pairs are considered to be NCs (ie "AA","UU","CC","GG","AC","AG","UC"). Note: all the base pairs involving “other” bases are not classified either as canonical or as NCs.

--total #'s & %'s of bases involved (ie those that make up the stem)

--total #'s & %'s of basepairs involved in the stem formation.

 

Pseudoknot information:

--name and unique number

--position of start and end

--total number of children

--number of un-band children

--number of in-band children

--number of unpaired bases (those inside the children are not counted towards this number)

--number of bands that make up this pseudoknot

--number of bands that need to be removed in order for us to make this region pseudoknot free

--coordinates of the bands to be removed (i.e. their closing basepairs)

--number of arcs to be removed in this pseudoknot

--list of bands and their statistics. (Each band is separated by “___________” lines from its neighbours). For each band we list its stems, and their statistics. (Each stem is separated by a blank line from its neighbours).

--the type of the pseudoknot

 

Pseudoknots are classified into types as follows:

- If pseudo region has two bands and its parent is an external loop, then we look at the maximum number of un-band children under an arc. If the maximum is 0 then we are dealing with E-H type, if 1- then E-I, if the maximum is more than or equal to 2, we have E-M on our hands. In addition, if both bands have the same length in terms of paired bases, then we also report it as ambiguous.

- If pseudo region has three bands, where band1 and band3 do not interact (ie do not overlap), and band2 connects band1 with band3 (ie one of its ends is inside band1, the other is inside band2). We look only at the un-band children of band1 and band3, if they both have zero children, then we are dealing with H-H type, if 0-1 then H-I, if 0- >= 2 then H-M, if 1-1 then I-I, if 1->=2 then I-M, if both have 2 or more children then M-M. In addition, if it so happens that band2 has length more than or equal to the sume of the lengths of band1 and band3, then we also report this pseudo region as special. All other pseudo regions are reported as unclassified.

 

For each band, the following information is reported:

--name and unique number

--coordinates (in terms of closing basepairs)

--list of stems

--length in paired bases

--number of stems in a band (they would be called band stems from now on)

--number of free bases inside the band

--total #'s &%'s of free bases inside the band (if there is at least one, otherwise optional)

Note: all the headers for the bands are bolded, for the ease of reading.

 

For each stem, the following information is reported:

--name and unique number

--positions

--length in paired bases

--free stem energy

--NCs and their neighbourhoods (optional, as sometimes there are no NCs)

--total #'s & %'s of bases involved in the band stem

--total #'s & %'s of basepairs involved in this band stem