Improving Large Molecular Language Models via Relation-aware Multimodal Collaboration
Jinyoung Park, Minseong Bae, Jeehye Na, Hyunwoo J. Kim
Korea Advanced Institute of Science and Technology
1
MLV Lab
2
Introduction
> Large Language Models
MLV Lab
3
Introduction
Molecule description generation
Could you give me a brief overview of this molecule?
IUPAC name prediction�What is the IUPAC name of the molecule?
Chemical reaction prediction
Please suggest a potential product based on the given reactants and reagents.
Molecule description generation
The molecule is the hydrogenmaleate salt of O-(cyclohexanecarbonyl)lysergol…
Property prediction�Please provide the energy separation between the highest occupied and lowest unoccupied molecular orbitals (HOMO-LUMO gap) of this molecule.
IUPAC name prediction�The molecule's IUPAC name is 2-amino-1-phenylethanol.
Property prediction�0.1913
Chemical reaction prediction
O=[N+1]([O-1])C1=CC(CO)=C(F)C=C1F
LMLM
> Assistant for molecular reasoning
MLV Lab
4
Introduction
Prior models typically process molecular data (1D strings, 2D graphs, 3D conformations) in isolation or fuse them shallowly, which prevents them from leveraging the complementary properties of different modalities.
Existing evaluation frameworks on molecule-language models are generally based on generic text metrics like BLEU and ROUGE, which are "molecule-agnostic”, which fails to assess “chemical correctness”.
MLV Lab
5
Method
CoLLaMo integrates 1D sequences, 2D graphs, and 3D conformations into a shared token space using relation-aware attention to capture structural and spatial details of molecules.
MLV Lab
6
Method
CHARM/RCHARM quantifies the proportion of mentioned molecular entities that are factually incorrect or not grounded in the input molecule.
GPT-based evaluation verifies the generated output based on two criteria: factual informativeness and alignment with the ground truth.
MLV Lab
7
Experiment
Quantitative Results
CoLLaMo achieves the best performance compared to other baseline models including GPT-based models such as GPT-4, GPT-4o and o1-mini.
MLV Lab
8
Experiment
Quantitative Results
Furthermore, CoLLaMo shows its effectiveness on multiple property QA benchmark datasets with its capability to capture both 1D, 2D, 3D molecular modalities.
MLV Lab
9
Experiment
Quantitative Results
CoLLaMo achieves the highest score, demonstrating its superior molecule understanding and generation quality from both automatic and LLM-based evaluations.
MLV Lab
10
Experiment
Quantitative Results
Table demonstrates that integrating 1D, 2D, and 3D modalities through the modality-collaborative projector consistently improves the performance across all tasks except motif counting.
MLV Lab
11
Experiment
Quantitative Results
The ablation study shows that all components contribute to the performance improvement of our CoLLaMo.
MLV Lab
12
Experiment
Quantitative Results
The results reveal that even when only a single modality is available during inference, CoLLaMo maintains strong performance, demonstrating its robustness, while the model without Co-Projector is significantly degraded.
MLV Lab
Qualitative Results
CoLLaMo accurately identifies the molecule as 3-hydroxy fatty acyl-CoA(4−) by integrating complementary cues from 1D, 2D, and 3D modalities, whereas single-modality models produce incorrect descriptions.
MLV Lab
14
Summary
MLV Lab