Free Energy and Kinetic Traps

A meta-analysis of one-cross-bulge results

II: free energy and kinetic barriers

by alan.robot, last updated on 3/14 (fixed dead links with PDFs of results- thanks Chaendryn!)

In this tutorial, I will show you how to use the barriers and subopt analysis of Vienna RNAfold to detect the presence of kinetically trapped intermediates. This will require that I briefly explain the concepts of equilibrium free-energy, sub-optimal folds, and kinetic traps. These are all advanced topics, but the end-goal is that you will learn why Christmas-tree designs must fail, even if they could be synthesized properly.

*disclaimer* I’m not affiliated with eteRNA, and although I am a computational biophysicist, I’m not a specialist in RNA bioinformatics, so any inaccuracies are my fault alone and not due to eteRNA or its staff.

First things first: If you have not read part 1, it contains links to the vienna RNAfold servers and source, I won’t repeat them here.

However, I strongly recommend you download the source if you want to play with subopt, as it takes ~ 10 minutes per calculation, and the server can only do one computation at a time. So just look at my precomputed results for now (until the links die)

Let motivate the problem a little. Here are the RNAfold predictions for a christmas-tree entry that failed to synthesize in round 2. If you want to try it yourself (or if my link expired) here are the states:

Sequence: GGAACGGGCCCGGCGGGCCCGGGAAACCGGGCCCGGGCGGCCCGAAAGGGCGGCCCGGGCGGGCGAAAGCCCGCCCGCCGGGCCCAAAGAAACAACAACAACAAC

RNAfold OUTPUT:

Results for minimum free energy prediction

The optimal secondary structure in dot-bracket notation with a minimum free energy of -87.80 kcal/mol is given below.

Results for thermodynamic ensemble prediction

The free energy of the thermodynamic ensemble is -95.82 kcal/mol.

The frequency of the MFE structure in the ensemble is 97.29 %.

The ensemble diversity is 0.57 .

From everything we’ve discussed in tutorial 1 this should be an awesome design, right? Low positional entropy, high % of MFE in the ensemble, less than 1 bp broken on average. So why isn’t lower free energy better if that’s what we get rewarded for in the tutorials?

First, let’s define free energy.

Free energy is a measure of how likely something is going to happen at thermodynamic equilibrium. What’s equilibrium? Equilibrium is when when we’ve waited so long that things settle down and stop changing on their own. How long is that? Who knows! It could be a nanosecond. For an RNA (either in a cell or in a test tube) it’s probably a few seconds to minutes tops. But in principle, we might have to wait until the cows come home to reach equilibrium - if we are unlucky, it could take an infinite amount of time.

So lets try that again. Free energy is the probability that something will occur at infinite time. Convention says that a process with a negative free energy will happen “for sure” on it’s own (spontaneous), whereas something with a positive free energy requires work done before it can happen.

There are no little folding-robots in the ETERNA labs that will perform work on the RNAs for you. They are synthesized in a test tube and left to their own devices, so only designs with negative free energy will spontaneously form.

The minimum free energy that ETERNA reports is defined mathematically as the logarithm of the ratio of folded to unfolded RNA’s at equilibrium.

(R is the gas constant, T is temperature in Kelvins, you can ignore R for now because they are just constants you can look up - note 37 Celcius is used for RNA computations).

So if 50% of the ensemble folds and 50% does not, the free energy is ~ log(1) = 0
If 90% folds and 10% does not, the free energy is -RT log(0.9/0.1) = -1.36 kcal/mol
If 99.9% folds and 0.1% does not, the free energy is ~log(0.999/0.001) = -4.28 kcal/mol

So -87 kcal/mol means that we expect the ratio of folded to unfolded RNA’s to be 1061, yes, that’s a 1 followed by 61 zeroes. Talk about overkill. You don’t need 99.999(58 more zeroes)% of the molecules in the test tube properly folded to win a lab design. You DO, however, need more properly folded than improperly folded molecules.

So what gives? Well, the remember the ratio is folded to unfolded, and ETERNA defines UNFOLDED to mean NO BASE PAIRS AT ALL, i.e., the straight line you get in natural mode before you plunk any nucleotides down. In the actual test tube, the RNA is going to form one base pair at a time, and there will be many sub-optimal folds that will form along the way, with favorable free-energies. THE MFE DOES NOT TAKE SUB-OPTIMAL FOLDS into account.

You should imagine a pyramid of possible structures, with one lone MFE at the top, a few next-best folds beneath, and kabillions of sub-optimal folds the farther you get to the bottom (i.e. higher free energy).

AND IT TURNS OUT THERE’S AN APP FOR THAT: http://rna.tbi.univie.ac.at/cgi-bin/barriers.cgi

Under advanced options:

unpaired bases can participate in at most one dangling end (MFE folding only)
RNA parameters (Turner model)
You’ll have to trim 6 or so nucleotides from the (unstructured) end of the sequence to fit it in the 100 nucleotide limit.

For bigger sequences, you’ll have to download the source to try this program

Here are the results for the winning entry (left) and a failed christmas tree (right)

The vertical axis is free energy, the numbers the number at the bottom is the rank # of each fold “cluster”. So the MFE is 1, the next best is 2, so on and so forth. Note the server will only calculate 10,000,000 sub-optimal folds so the graph doesn’t go all the way to 0 kcal/mol vertically, but you get the idea.

So how feasible is it for a misfold to arrive at the MFE fold? On the design at left, if an RNA goes to the next-best fold (#2), it’s a mostly downhill hop to get to fold #1. At right, fold #2 has to backtrack uphill before it can get on the right track towards fold #1. Uphill means positive free energy means highly unlikely.

If you scroll down to folding kinetics, and choose 100% to be initially in structure 2 and hit OK, you get a population vs time plot for the conversion of structure 2 -> 1.

At left, it only takes 1/10th of a second for the misfold #2 to correct itself to become 100% of the MFE fold in the winning design. At right, after 1000 seconds fold #2 has still not completely gone to christmas tree MFE, some of it is still stuck in fold #2 and some has gotten trapped in fold #3 instead . And this was a misfold that was already 99% of the way there - you can only imagine how slow it would be to get there from the completely unfolded state (0 kcal/mol), it could be days or weeks. Therefore, MOST OF THE MOLECULES WILL BE MISFOLDED when the structure tests are done by the eterna staff. It might get there eventually, but only if you wait an infinite amount of time. This is called a rugged free energy landscape.

AND THAT IS WHY YOU DON’T MAKE CHRISTMAS TREES.

Note that this is just a more technical version of ccccc’s paper and glue explanation of why we wouldn’t want all GC’s - “Getting stuck” is just another way of saying “kinetic trap”.

The idea of an “energy landscape” is that you have to start from a completely unfolded structure and “roll downhill” through progressively more favorable sub-optimal folds to get to your final fold. You can see on the plots above that such a path would be equivalent to always taking the rightmost branch of every decision tree - if you take a left branch, you get stuck for awhile and have to wait for a rare chance to get back on the big downhill slide. If you get stuck in a “local minimum” that is too deep, however, you may have to wait an unreasonable amount of time to get unstuck, and that’s to be avoided in a well-designed structure.

If you’ve read this far, you get a cookie. View a predicted folding path animation from barriers which is really, really cool.