Leveraging NLP and LLMs for AI-Based Phytosanitary Early Warning Systems
AKIE 2025, China
November 2025
Claire Nédellec – MaIAGE, INRAE
Conference on Agricultural Knowledge Engineering
Huazhong Agricultural University, Wuhan
Icons of https://www.flaticon.com/fr/
p. 1
AI for Plant Health AKIE2025
Claire Nédellec
A world leader in agriculture, food and the environment
13 000 members
2
p. 2
AI for Plant Health AKIE2025
Claire Nédellec
With a dozen International Associated Laboratories launched since 2013 in partnership with major research institutes,
China is now INRAE’s largest partner in Asia
8 October 2025: opening of an INRAE and CIRAD office at the French Embassy in Beijing.
INRAE delegation this year signed seven scientific agreements with China’s leading scientific institutions on carbon neutrality, agroecology, biodiversity, plant genetics, nutrition and sustainable livestock farming
International cooperation infrastructures
p. 3
AI for Plant Health AKIE2025
Claire Nédellec
Gene
Cell
Individu
Ecosystem, Population
Network, Environment
Continuum of disciplines ranging from experimental data processing to predictive modeling.
Interdisciplinary research across multiple scales
4 research groups – 1 bioinformatics platform
Genomics, microbiology�(food, synthetic biology, pathogens)
Physiology, holobionts
(animal, plant, nutrition)
Epidemiology, Agroecology
(animal, plant)
statistics
bioinformatics
statistical modeling
automatic, system biology
natural language processing
STATINFOMICS
DYNENVIE
BIOSYS
BIBLIOME
MIGALE
Laboratory
Jouy-en-Josas
Paris area
p. 4
AI for Plant Health AKIE2025
Claire Nédellec
Bibliome research group at MaIAGE, INRAE
Language models for Information extraction.
Zero, few-shot learning. Knowledge injection.
Autoencoder & autoregressive models.
Formalization of the extracted information and cross-source combination
Normalization and entity-linking. Knowledge model alignment
Evaluation. Metrics for complex information quality and consistency measure. Datasets and challenges (BioNLP-ST, CLEF).
Focused and high quality knowledge. Complex structure representation for fine-grained information that fits application requirements
Natural Language Processing (NLP) for domain-specific goals
p. 5
AI for Plant Health AKIE2025
Claire Nédellec
Building epidemiological surveillance and prophylaxis with observations near and distant
6
The BEYOND Project
https://beyond.paca.hub.inrae.fr/
National project lead by INRAE
p. 6
AI for Plant Health AKIE2025
Claire Nédellec
Healthy plants for safe and sustainable food – BEYOND
Risk assessment for plant pests in the BEYOND project
Living with pests using very limited curative treatments,
Requires enhanced epidemiological surveillance to adapt prophylaxis, control methods and anticipate risks
Better anticipation for transition to agroecological systems
There is a range of approaches to be combined: mathematical, biological, economic, and organizational
Focus on emerging and regulated pests
Extension to distant observations in time and space
08/07/2025
p. 7
AI for Plant Health AKIE2025
Claire Nédellec
Levers for pesticide-free agriculture
BEYOND Project - Building epidemiological surveillance and prophylaxis with observations near and distant
Today
Tomorrow
Growing
Protecting Differently
p. 8
AI for Plant Health AKIE2025
Claire Nédellec
�Elaborate new indicators for surveillance that provide sufficient time/space advance for early prophylaxis�
Pest bio-sensor data
Abiotic conditions
Adjacent wild and cultivated plants
Plant Health Bulletins
Interconnectedness of regions via water, wind, transportation
Data from abiotic sensor networks
Regional land use & landscape data
Remote-sensing data
Historical meteorological trends and future predictions
Informations extracted by Text mining
Informations extracted by Text mining
p. 9
AI for Plant Health AKIE2025
Claire Nédellec
The international health monitoring system of the �French plant health epidemiological surveillance platform ESV�
Keeping up to date with the situation, knowledge and centralising it
Horizon scanning for plant health through monitoring of
the media and scientific articles
indicating emerging threats from new and regulated pests.
p. 10
AI for Plant Health AKIE2025
Claire Nédellec
From data to knowledge for monitoring real-time events:�documents processing is a key process
docs empilés
100->1000 Regulated pest species, disease
URLs
Collect
Report
Monthly and weekly bulletins
Focused expertise
>1000 docs processed per week
Today: Manual reading, knowledge acquisition, summary and edit with limited tracability
10% relevant docs
1% referenced
keyword list
Goal: Scale-up enabled by Artificial Intelligence–based processing
p. 11
AI for Plant Health AKIE2025
Claire Nédellec
Text content processing to support comprehensive plant health surveillance
Increasing complexity and diversity of document sources
© M de Sainte Marie 2020
Sainte Marie ©
p. 12
AI for Plant Health AKIE2025
Claire Nédellec
Official / Regulatory Documents
authoritative and often legally binding or policy-oriented.
→ Delimiting surveys or specific surveillance programs based on risk
Scientific & Technical Literature
detailed, validated knowledge.
-> current knowledge about pest biology and drivers of dynamics
-> identify gaps and research priorities for risk modeling
Monitoring & Field Reports
Operational documents from field surveillance
→ presence, abundance, or pressure
→ feed forecasting models
Knowledge-Based & Technical Reference Documents and Data
→ pest biology
→ pest distribution map
Climatic, landscape, ecological, phenological data
→ to predict pest dynamics
→ risk windows and optimal intervention periods
Historical Records
→ Useful for longitudinal studies or modeling
Pest Spread and Impact models for Risk-Based Decision Making
Official incidents and early warning Reporting documents
include contextual data: pest species, hosts affected, location, and actions taken
→ pest biology
→ pest distribution map
Data sources for risk anticipation in long, mid and short-term
🏛️
⚠️
🌱
📘
🌍
🔁
📚
p. 13
AI for Plant Health AKIE2025
Claire Nédellec
Official / Regulatory Documents
authoritative and often legally binding or policy-oriented.
→ Delimiting surveys or specific surveillance programs based on risk
Scientific & Technical Literature
detailed, validated knowledge.
-> current knowledge about pest biology and drivers of dynamics
-> identify gaps and research priorities for risk modeling
Monitoring & Field Reports
Operational documents from field surveillance
→ presence, abundance, or pressure
→ feed forecasting models
Knowledge-Based & Technical Reference Documents and Data
→ pest biology
→ pest distribution map
Climatic, landscape, ecological, phenological data
→ to predict pest dynamics
→ risk windows and optimal intervention periods
Historical Records
→ Useful for longitudinal studies or modeling
Pest Spread and Impact models for Risk-Based Decision Making
Official incidents and early warning Reporting documents
include contextual data: pest species, hosts affected, location, and actions taken
→ pest biology
→ pest distribution map
Sources for risk anticipation in long, mid and short-term
🏛️
⚠️
🌱
📘
🌍
🔁
📚
p. 14
AI for Plant Health AKIE2025
Claire Nédellec
Extracting key pathological ecosystem information from text to anticipate plant health threats
Contribute to pest occurrence histories
Documents
Enable rapid prediction and response
p. 15
AI for Plant Health AKIE2025
Claire Nédellec
Natural Language Processing (NLP), key technology for information extraction from textual documents
To extract observation descriptions and scientific knowledge relevant to plant health monitoring
Extraction of observations/occurrences
Observation description of a pathogen on a host plant, causing a disease, in a given place and time
Named entity recognition and relationship extraction structure raw text,
transforming the textual description into machine-readable biological event records.
Knowledge extraction
Identifies knowledge in the text, including including rare, emerging, or contested findings, e.g. report of a pest in a new region.
By processing a comprehensive corpus of documents, including reports and multilingual literature,
it formalizes data into an actionable knowledge repository representing scientific consensus.
Towards an integrated information system
The interoperability between document-extracted information and other data sources
facilitates data enrichment, and
enables advanced analysis and
support automatic hypothesis inference
p. 16
AI for Plant Health AKIE2025
Claire Nédellec
From news to structured data
https://www.cdfa.ca.gov/exec/Public_Affairs/Press_Releases/Archive/pr.html?id=15-031
https://californiacitrusthreat.org/pest-disease/huanglongbing-quarantine/
https://acwm.lacounty.gov/asian-citrus-psyllid-and-hlb/
Definition of Epidemiomonitoring of plants (EPOP) Ontology
Methods for the annotation of text information
Information agregation into a knowledge graph
Interoperability ensured by the alignment of the extracted information with references
p. 17
AI for Plant Health AKIE2025
Claire Nédellec
docs empilés
Collect – Extract – Translate web scraped documents
Filter – Summarize - Classify the documents
Transform raw text into structured data =
Extract and Standardise the information with respect to a reference
Integration of literature extracted data in existing knowledge base
Targeted information
Interoperability ensured by the alignment of the extracted information with references
Corpus-based evaluation of the methods, quality and robustness
Using language models
p. 18
AI for Plant Health AKIE2025
Claire Nédellec
EPOP, epidemiomonitoring of plant
Text-bound annotation.
To support the development of monitoring systems that can highlight mentions in context
Normalization of species and geographical locations.
For integration with knowledge graphs and predictive models
Modality detection
for assessing evidential reliability
Negation and hypothesis
N-ary relations (events) extraction
To represent epidemiological events, observations, or complex trophic interactions
Formalization as a new NLP Shared Task: Structured Extraction from Phytosanitary Reports
⚠️
p. 19
AI for Plant Health AKIE2025
Claire Nédellec
The new annotated corpus EPOP, for the epidemiomonitoring of plant
To train and evaluate NLP methods to extract semantically grounded and verifiable information.
540 news annotated by 30 French experts in plant health and NLP
in a double-bind way
submitted to LREC 2026
Token | Entity | Binary relation | N-ary relation | Coreference |
115,000 | 7,537 | 4,717 | 2,929 | 373 |
Entity type | Training | Dev |
Date | 419 | 217 |
Disease | 234 | 148 |
Dissemination_pathway | 138 | 48 |
Location | 1042 | 485 |
Pest | 908 | 338 |
Plant | 663 | 347 |
Vector | 78 | 32 |
Total | 2925 | 1350 |
Relation type | Training | Dev |
Causes | 66 | 35 |
Detected_on | 287 | 134 |
Dispersed_by | 36 | 18 |
Affects | 141 | 74 |
Found_On | 441 | 181 |
Located_In | 1210 | 567 |
Transmits | 36 | 13 |
Total | 1894 | 870 |
The test set remains hidden to avoid data leakage and LLLM evaluation bias
p. 20
AI for Plant Health AKIE2025
Claire Nédellec
EPOP corpus specificities
Equivalent argument entities in relations or events.
Identity coreferences link mentions of the same entity in the same role
Discontinuous and overlapping
entity annotations
Annotation of overlapping long-distance n-ary relations (events)
Ambiguous semantic roles
North-Western provinces| and |Southern provinces|of Tuscany
2 overlapping and discontinuous entities to denote distinct geographical locations
Are pome, or stone fruits, …
crop, or fruits ?
p. 21
AI for Plant Health AKIE2025
Claire Nédellec
Nomenclatures and ontologies for standardisation
Relations: EPOP ontology
Normalisation maps the text entity mentions to the relevant class
The identifiers in the graph designate the entity classes
Locations Geonames
Plant, vector, pest species NCBI taxonomy
Relations EPOP
Locations: Geonames nomenclature
Species: NCBI taxonomy
Information graph with the entity classes
Reference per entity type
p. 22
AI for Plant Health AKIE2025
Claire Nédellec
Performances measured on the EPOP corpus in the Named Entity Recognition task
Entity type | F₁ | Recall | Precision |
Any | 0.81±0.01 (0.07) | 0.84±0.01 (0.08) | 0.78±0.01 (0.07) |
Date | 0.79±0.01 (0.10) | 0.82±0.01 (0.10) | 0.75±0.01 (0.09) |
Disease | 0.87±0.02 (0.05) | 0.90±0.02 (0.05) | 0.84±0.02 (0.05) |
Dissemination_ pathway | 0.49±0.04 (0.17) | 0.52±0.04 (0.18) | 0.49±0.04 (0.16) |
Location | 0.84±0.01 (0.07) | 0.85±0.01 (0.08) | 0.83±0.02 (0.07) |
Pest | 0.85±0.01 (0.04) | 0.91±0.01 (0.05) | 0.79±0.01 (0.04) |
Plant | 0.79±0.01 (0.09) | 0.83±0.01 (0.09) | 0.76±0.02 (0.09) |
Vector | 0.36±0.05 (0.13) | 0.32±0.04 (0.11) | 0.45±0.07 (0.15) |
BioBERT plus a softmax layer to classify tokens and an entity span reconstruction step
BioBERT trained on the EPOP corpus
Performances measured by
The difference between strict and relaxed matching is in parentheses
A single model predicts all entity types
p. 23
AI for Plant Health AKIE2025
Claire Nédellec
Performances of ReBERT measured on the EPOP corpus in the Relation Extraction task
Relation | Recall | Precision | F₁ |
Causes | 0,71 | 0,81 | 0,76 |
Detected_on | 0,30 | 0,59 | 0,40 |
Dispersed_by | 0,65 | 0,32 | 0,43 |
Affects | 0,83 | 0,69 | 0,76 |
Found_on | 0,72 | 0,68 | 0,70 |
Located_in | 0,70 | 0,61 | 0,65 |
Transmits | 0,75 | 0,43 | 0,55 |
ALL (micro) | 0,70 | 0,59 | 0,64 |
ALL (macro) | 0,66 | 0,59 | 0,61 |
Relation extraction scores obtained with gold-standard entities.
Detected on is the most difficult to predict
ReBERT
p. 24
AI for Plant Health AKIE2025
Claire Nédellec
LLM hard-prompting
p. 25
AI for Plant Health AKIE2025
Claire Nédellec
Performances of LLM methods measured on the EPOP corpus in the Information Extraction task (joint task)
| GPT-4o-mini | Kimi | DeepSeek-V3 | Qwen3.0 | ||||||||
Precision | Recall | F₁ | Precision | Recall | F₁ | Precision | Recall | F₁ | Precision | Recall | F₁ | |
Affects | 0.75±0.41 | 0.50±0.37 | 0.57±0.36 | 0.70±0.39 | 0.50±0.37 | 0.56±0.36 | 0.82±0.34 | 0.61±0.35 | 0.68±0.33 | 0.84±0.27 | 0.64±0.29 | 0.70±0.26 |
Causes | 0.71±0.43 | 0.63±0.42 | 0.65±0.42 | 0.78±0.37 | 0.68±0.38 | 0.71±0.37 | 0.80±0.39 | 0.69±0.40 | 0.72±0.39 | 0.76±0.36 | 0.68±0.37 | 0.70±0.36 |
Found on | 0.74±0.32 | 0.75±0.39 | 0.60±0.39 | 0.64±0.38 | 0.75±0.37 | 0.59±0.38 | 0.84±0.31 | 0.69±0.34 | 0.74±0.32 | 0.83±0.29 | 0.70±0.34 | 0.74±0.32 |
Located in | 0.81±0.36 | 0.44±0.31 | 0.53±0.31 | 0.88±0.30 | 0.51±0.32 | 0.61±0.31 | 0.93±0.22 | 0.53±0.29 | 0.64±0.26 | 0.91±0.21 | 0.55±0.28 | 0.64±0.25 |
Transmits | 0.86±0.32 | 0.70±0.35 | 0.74±0.33 | 0.82±0.27 | 0.66±0.35 | 0.70±0.31 | 0.98±0.09 | 0.79±0.31 | 0.84±0.23 | 0.87±0.29 | 0.70±0.36 | 0.75±0.32 |
All (Micro) | 0.55 ± 0.30 | 0.62± 0.30 | 0.54± 0.27 | 0.60± 0.30 | 0.61± 0.31 | 0.56± 0.27 | 0.64± 0.28 | 0.67± 0.27 | 0.61± 0.24 | 0.65± 0.26 | 0.52± 0.31 | 0.53± 0.25 |
All (Macro) | 0.78±0.05 | 0.57±0.09 | 0.63±0.07 | 0.79±0.06 | 0.59±0.07 | 0.64±0.06 | 0.87±0.07 | 0.66±0.09 | 0.72±0.07 | 0.84±0.05 | 0.65±0.06 | 0.71±0.04 |
p. 26
AI for Plant Health AKIE2025
Claire Nédellec
EPOP data available on-line
27
HAL open science
Training and development datasets available
p. 27
AI for Plant Health AKIE2025
Claire Nédellec
PestCLEF Task 2026: NLP for plant surveillance
Web site and Evaluation service open soon
on the Bibliome Challenge web site
EPOP training and devt datasets available
https://nlptasks.mathnum.inrae.fr/bibliome-challenges
Schedule
Why Participate to PestCLEF?
📄 Call for participation coming soon!
p. 28
AI for Plant Health AKIE2025
Claire Nédellec
Nicolas Sauvion
Guillaume David
Cica Urbino
Emmanuel Wicker
Cindy Morris
Samuel Soubeyrand
Marie Grosdidier
Sandy Dupérier
Isabelle Pieretti
Jean-Baptiste Louvet
Simon Nicoux
Davide Martinetti
Claire Nédellec
Robert Bossy
Louise Deléger
Mouhamadou Ba,
Marine Courtin
Clara Sauvion
Xinzhi Yao
Cindy Morris
Eric Verdin
Sylvie Dallot
Catherine Abadie
Jean-Claude Streito
Sara Tramontini
Alexia Antoniou
Xavier Foissac
Sylvie Malembic-Maher
Pascal Frey
Frédérique Hilliou
Jean-Michel Hily
Christophe Le May
Frédéric Suffert
Jean-Pierre Thermoz
Philippe Reynaud
Anne Quillevere
Jaime Aguayo
Delphine Massé
Laboratoire de la santé des végétaux
Jingbo Xia
Xinzhi Yao
p. 29
AI for Plant Health AKIE2025
Claire Nédellec
p. 30
AI for Plant Health AKIE2025
Claire Nédellec