RNA 3D Motif Atlas release notes
Note: Representative sets are updated each week, and representative sets are updated each four weeks. Accordingly, the representative set version increments by 4 with each Motif Atlas release version.
Note: Annotations with common names have been added for some individual loops. Often, some loops in a motif group have a different conformation and so are not annotated. Also, as new releases occur, additional loops enter into motif groups, resulting in a situation in which some loops could be annotated but have not yet been annotated. For each motif group, the number of times each annotation appears is tallied and shown, which gives a good idea what the group consists of. Some groups are entirely homogeneous, while others have some instances which are different from the most common geometry, and yet are geometrically similar enough that they are included in the group by the automated clustering procedure.
4.0
- Release 4.0 is the first release to have loops from RNA 3D structures solved by electron microscopy. Requirements for EM structures are stringent: resolution 2.0A or better, composite quality score (CQS2) value 9 or better. The main EM structures that meet those requirements are 8b0x (E. coli ribosome), 8GLP (human ribosome), 9AXU (Schizosaccharomyces pombe ribosome), 9E6Q (Pyrobaculum calidifontis ribosome). In addition, EM loops are excluded if all nucleotides in the loop have a Q-score under 0.4; see the individual structure pages under "problematic loops".
- For structures solved by x-ray diffraction, the resolution cutoff is still 3.5A, but we imposed a requirement that the CQS2 score be 15 or lower. That drops out some structures with questionable structure quality indicators.
- Release 4.0 also benefits from having modified nucleotides included in loops systematically, 3-way and higher junctions being extracted, loops with embedded isolated Watson-Crick basepairs being extracted, and the most recent basepair annotation cutoffs.
- Motif group identifiers have changed in many cases; the five-digits handles are new in many cases.
- Loop-level manual annotations will be added over the next few weeks. Most motif groups still have enough instances annotated that it is clear what they are. Note that annotations are attached to specific loop instances, not motif groups. Loop annotations are often mapped to newer structures by homology, indicated by (H) after the annotation, for RNA chains that map to an Rfam family.
3.94
- We filled in Motif Atlas releases for four-way and higher junctions, which are designated J4, J5, ..., J9.
- These junctions are covered in releases 3.2 onward. Not enough data was available to re-create the releases before that.
- Clustering follows the same protocol as for HL, IL, J3, except that a mutual discrepancy up to 3.0 is allowed for two instances to be in the same group (but not yet observed to happen since most groups are made of homologous instances that are very similar).
- With J6 and higher, multiple different queries are attempted to try to find a good match in a short time; the 3.0 discrepancy is only used for instances that have the six strands presented in the same order when they are extracted from the 3D structure files.
3.93
- A small change in the hierarchical clustering methodology avoids joining two groups when the minimum discrepancy between them is greater than 0.6, when the maximum discrepancy within the two groups is 0.3, and when the two groups have at least 3 members. That avoids some groups that are made of two very distinct subgroups.
3.90
- Releases from this point forward seem to avoid the problems from releases 3.77-3.84, 3.88, 3.89.
3.88
- Note: releases 3.77-3.84, 3.88, 3.89 appear to be missing some loops
- This release was due to come out on September 11, 2024.
- It was finally released on December 20, 2024.
- New Python code was written to extract HL, IL, and J3.
- Some J3 had been missed before, so this is a more complete collection than in any previous release.
- Watson-Crick basepairs made by non-standard nucleotides are now annotated, so loops can be flanked by basepairs like that. Previous versions of the Motif Atlas did not allow for modified nucleotides.
- Some loops with embedded cWW basepairs used to be split into two smaller loops. Now the loops with embedded cWW basepairs are extracted and included in the motif atlas, in addition to the smaller loops.
- This release will be examined and maybe the methodology will be refined before release 3.89 is produced.
3.81
- Note: releases 3.77-3.84, 3.88, 3.89 appear to be missing some loops
- In April 2024, we released J3 motif groups, filling in all the way back to 2018-02-09.
- Unfortunately, approximately half of the J3 were missed by the old Matlab code that extracted them.
- See release 3.90 and later for all J3.
3.80
- Note: releases 3.77-3.84, 3.88, 3.89 appear to be missing some loops
- Some consensus basepairs are not being shown; they should be added in release 3.81.
- Note that the basepair diagrams now have the position number indicated.
- The "Detailed Motif Atlas Release History" page is updated and showing more of the recent data, but it still could use some improvement.
3.79
- Note: releases 3.77-3.84, 3.88, 3.89 appear to be missing some loops
- This release duplicates 3.78. That has the advantage of making the motif group release number equal to 1/4 of the representative set release number. It also helped us fix the .png files in releases 3.77 and 3.78.
3.77
- Note: releases 3.77-3.84, 3.88, 3.89 appear to be missing some loops
- We implemented a new clustering algorithm and all calculations are carried out with Python code, removing the Matlab code that had been used.
- Motif groups should be more coherent now, with smaller values for the maximum discrepancy between instances.
- But we have not systematically checked everything, so some changes to the methodology may occur over the next several releases.
- When calculating the discrepancy between instances, we do not penalize when one base is flipped 180 degrees around the glycosidic bond. Many motif groups have at least one instance that has one base that is flipped relative to the consensus. That usually results in the base being in syn, whereas the consensus is anti. Otherwise, these instances have the same geometry as the others, so they are best located in the motif group with the other instances. To prevent individual instances with flipped bases being relegated to their own singleton motif group, we do not penalize for these flips. But this has the downside that the heat map does not allow one to spot the instances with a flipped base, and it is possible that the centroid instance will be an instance with a base flipped relative to the consensus.
3.76
- This is the last release generated by old Matlab code.
3.62
- Loops whose units are all obtained from the same non-trivial symmetry operation will be excluded, as they simply duplicate the loop with the default 1_555 symmetry operation. Examples of loops that will be excluded are IL_1KOG_009 and IL_5G4T_001.
- 61 IL are excluded on this basis
3.28
- Based on representative set release 3.112
- Internal loops with single base bulges between flanking cWW basepairs are now separated from other small motifs and are separated according to the bulged base. Thus, there is a motif group whose instances are all 5-nucleotide internal loop motifs with one bulged A, another with one bulged C, one for G, one for U. This way, when a new RNA has a single bulge in the middle of a helix, you can see the types of geometries that that base makes. Note that there are also 5-nucleotide internal loops where the non-flanking base stacks on one of the flanking bases or makes a base triple with a flanking base; these are also possible geometries for a single bulge in the secondary structure of a helix.
3.12
- Based on representative set release 3.48
- Computed on July 31, 2021
- We lowered the resolution cutoff from 4.0 to 3.5 Angstroms for this and later releases. That leaves out:
- 4V5O|1|BA, Tetrahymena thermophila SSU from Equivalence Class 80911
- 4R0D|1|A, Pylaiella littoralis Group IIB intron from EC 18903
- 3Q1Q|1|B, RNA subunit of RNase P from EC 39625
- 1Y0Q|1|A, Staphylococcus phage Twort Group I ribozyme from EC 88088
- 3DHS|1|A, RNAse P from EC 82016
- 4GMA|1|Z, Adenosylcobalamin riboswitch from EC 26802
- 4P8Z|1|A, Didymium iridis partial IGS, 18S rRNA from EC 86266
- 3P49|1|A, Fusobacterium nucleatum, Glycine riboswitch from EC 36867
- 2XXA|1|F, Deinococcus radiodurans 4.5S RNA from EC 57933
- 1E8S|1|C, Homo sapiens 7SL RNA, 88-MER from EC 19932
- and other structures smaller than 86 nucleotides from http://rna.bgsu.edu/rna3dhub/nrlist/release/3.48/4.0A at 4.0 Angstroms with filter “x-ray”
- Reasoning: Better to use higher resolution to build a database of known 3D structures
- Note that as of Representative Set Release 3.185, none of these molecules has a new structure at resolution 3.5 Angstroms or better, so they have not re-entered the set used to construct the RNA 3D Motif Atlas.
- The number of internal loops dropped from 1984 in release 3.11 to 1708 in release 3.12.
- Loops from 4V5O accounted for 6 singleton motif groups:
- IL_09193.1
- IL_10140.1
- IL_12874.1
- IL_64775.1
- IL_75001.1
- IL_88135.1, C-loop
3.3
- Based on representative set release 3.12
- Computed in July, 2021
- Tries to split apart 5-nucleotide internal loops having different bulged bases, with limited success
- Within groups, similarity order is computed using the Matlab function optimalleaforder
3.2
- Based on representative set release 3.8
- Computed on September 26, 2018
- Includes x-ray structures up to 4.0 Angstrom resolution
- Excludes loops from x-ray structures where all nucleotides have RSRZ above 1, or where one basepair has both nucleotides with RSRZ above 1.
- Excludes cryo-em structures
- IL have some annotations with common names; more will be added in later releases
- Note that annotations are attached to loop instances, so new annotations may appear on old Motif Atlas releases
3.0 and 3.1
- Not all data was written to the database after the clustering step, so the web pages for these releases are incomplete. Use a later release.