1 of 30

Leveraging NLP and LLMs for AI-Based Phytosanitary Early Warning Systems

AKIE 2025, China

November 2025

Claire Nédellec – MaIAGE, INRAE

Conference on Agricultural Knowledge Engineering

Huazhong Agricultural University, Wuhan

Icons of https://www.flaticon.com/fr/

p. 1

AI for Plant Health AKIE2025

Claire Nédellec

2 of 30

A world leader in agriculture, food and the environment

13 000 members

2

p. 2

AI for Plant Health AKIE2025

Claire Nédellec

3 of 30

With a dozen International Associated Laboratories launched since 2013 in partnership with major research institutes,

China is now INRAE’s largest partner in Asia

8 October 2025: opening of an INRAE and CIRAD office at the French Embassy in Beijing.

INRAE delegation this year signed seven scientific agreements with China’s leading scientific institutions on carbon neutrality, agroecology, biodiversity, plant genetics, nutrition and sustainable livestock farming

International cooperation infrastructures

p. 3

AI for Plant Health AKIE2025

Claire Nédellec

4 of 30

  • Addressing challenges in life sciences, agronomy and food
  • Developing research in applied mathematics and computer science
  • Providing the scientific research community with expertise, methods, and tools.

Gene

Cell

Individu

Ecosystem, Population

Network, Environment

Continuum of disciplines ranging from experimental data processing to predictive modeling.

Interdisciplinary research across multiple scales

4 research groups – 1 bioinformatics platform

Genomics, microbiology�(food, synthetic biology, pathogens)

Physiology, holobionts

(animal, plant, nutrition)

Epidemiology, Agroecology

(animal, plant)

statistics

bioinformatics

statistical modeling

automatic, system biology

natural language processing

STATINFOMICS

DYNENVIE

BIOSYS

BIBLIOME

MIGALE

Laboratory

Jouy-en-Josas

Paris area

p. 4

AI for Plant Health AKIE2025

Claire Nédellec

5 of 30

Bibliome research group at MaIAGE, INRAE

Language models for Information extraction.

Zero, few-shot learning. Knowledge injection.

Autoencoder & autoregressive models.

Formalization of the extracted information and cross-source combination

Normalization and entity-linking. Knowledge model alignment

Evaluation. Metrics for complex information quality and consistency measure. Datasets and challenges (BioNLP-ST, CLEF).

Focused and high quality knowledge. Complex structure representation for fine-grained information that fits application requirements

Natural Language Processing (NLP) for domain-specific goals

p. 5

AI for Plant Health AKIE2025

Claire Nédellec

6 of 30

Building epidemiological surveillance and prophylaxis with observations near and distant

6

The BEYOND Project

https://beyond.paca.hub.inrae.fr/

National project lead by INRAE

p. 6

AI for Plant Health AKIE2025

Claire Nédellec

7 of 30

Healthy plants for safe and sustainable food – BEYOND

Risk assessment for plant pests in the BEYOND project

Living with pests using very limited curative treatments,

Requires enhanced epidemiological surveillance to adapt prophylaxis, control methods and anticipate risks

Better anticipation for transition to agroecological systems

There is a range of approaches to be combined: mathematical, biological, economic, and organizational

Focus on emerging and regulated pests

Extension to distant observations in time and space

08/07/2025

p. 7

AI for Plant Health AKIE2025

Claire Nédellec

8 of 30

Levers for pesticide-free agriculture

BEYOND Project - Building epidemiological surveillance and prophylaxis with observations near and distant

Today

Tomorrow

Growing

Protecting Differently

p. 8

AI for Plant Health AKIE2025

Claire Nédellec

9 of 30

�Elaborate new indicators for surveillance that provide sufficient time/space advance for early prophylaxis�

Pest bio-sensor data

Abiotic conditions

Adjacent wild and cultivated plants

Plant Health Bulletins

Interconnectedness of regions via water, wind, transportation

Data from abiotic sensor networks

Regional land use & landscape data

Remote-sensing data

Historical meteorological trends and future predictions

Informations extracted by Text mining

Informations extracted by Text mining

p. 9

AI for Plant Health AKIE2025

Claire Nédellec

10 of 30

The international health monitoring system of the �French plant health epidemiological surveillance platform ESV

  • Answers scientific questions in plant epidemiology
  • Produces pest risk analyses for national and European decision making
  • Coordinated with the European Food Safety Authority (EFSA) platform
  • EFSA supports the European Union decision-making.

Keeping up to date with the situation, knowledge and centralising it

Horizon scanning for plant health through monitoring of

the media and scientific articles

indicating emerging threats from new and regulated pests.

p. 10

AI for Plant Health AKIE2025

Claire Nédellec

11 of 30

From data to knowledge for monitoring real-time events:�documents processing is a key process

docs empilés

100->1000 Regulated pest species, disease

URLs

Collect

Report

Monthly and weekly bulletins

Focused expertise

>1000 docs processed per week

Today: Manual reading, knowledge acquisition, summary and edit with limited tracability

10% relevant docs

1% referenced

keyword list

Goal: Scale-up enabled by Artificial Intelligence–based processing

p. 11

AI for Plant Health AKIE2025

Claire Nédellec

12 of 30

Text content processing to support comprehensive plant health surveillance

Increasing complexity and diversity of document sources

  • The massive increase of textual sources including reports, articles, and news (thousands of new publications each week) has created new opportunities.
  • Need for speed and precision in information selection, as well as synthesis. Recent advances in AI provide powerful tools to target relevant content and scale up processing.

© M de Sainte Marie 2020

Sainte Marie ©

p. 12

AI for Plant Health AKIE2025

Claire Nédellec

13 of 30

Official / Regulatory Documents

authoritative and often legally binding or policy-oriented.

→ Delimiting surveys or specific surveillance programs based on risk

Scientific & Technical Literature

detailed, validated knowledge.

-> current knowledge about pest biology and drivers of dynamics

-> identify gaps and research priorities for risk modeling

Monitoring & Field Reports

Operational documents from field surveillance

presence, abundance, or pressure

feed forecasting models

Knowledge-Based & Technical Reference Documents and Data

pest biology

pest distribution map

Climatic, landscape, ecological, phenological data

to predict pest dynamics

risk windows and optimal intervention periods

Historical Records

Useful for longitudinal studies or modeling

Pest Spread and Impact models for Risk-Based Decision Making

Official incidents and early warning Reporting documents

include contextual data: pest species, hosts affected, location, and actions taken

pest biology

pest distribution map

Data sources for risk anticipation in long, mid and short-term

🏛️

⚠️

🌱

📘

🌍

🔁

📚

p. 13

AI for Plant Health AKIE2025

Claire Nédellec

14 of 30

Official / Regulatory Documents

authoritative and often legally binding or policy-oriented.

→ Delimiting surveys or specific surveillance programs based on risk

Scientific & Technical Literature

detailed, validated knowledge.

-> current knowledge about pest biology and drivers of dynamics

-> identify gaps and research priorities for risk modeling

Monitoring & Field Reports

Operational documents from field surveillance

presence, abundance, or pressure

feed forecasting models

Knowledge-Based & Technical Reference Documents and Data

→ pest biology

→ pest distribution map

Climatic, landscape, ecological, phenological data

to predict pest dynamics

risk windows and optimal intervention periods

Historical Records

→ Useful for longitudinal studies or modeling

Pest Spread and Impact models for Risk-Based Decision Making

Official incidents and early warning Reporting documents

include contextual data: pest species, hosts affected, location, and actions taken

→ pest biology

→ pest distribution map

Sources for risk anticipation in long, mid and short-term

🏛️

⚠️

🌱

📘

🌍

🔁

📚

p. 14

AI for Plant Health AKIE2025

Claire Nédellec

15 of 30

Extracting key pathological ecosystem information from text to anticipate plant health threats

Contribute to pest occurrence histories

  • By linking information on pathogens, host plants, dates, and locations, it helps identifying vulnerability trends and anticipate future outbreaks.

Documents

    • New associations between previously non-vulnerable species,
    • Newly affected geographic regions,
    • Emerging pathogen strains and introduction pathways via vectors.

Enable rapid prediction and response

  • Through real-time detection of unusual entity relationships, it supports earlier interventions.
  • Leveraging NLP techniques, it efficiently identifies weak signals of emergence across vast datasets

p. 15

AI for Plant Health AKIE2025

Claire Nédellec

16 of 30

Natural Language Processing (NLP), key technology for information extraction from textual documents

To extract observation descriptions and scientific knowledge relevant to plant health monitoring

Extraction of observations/occurrences

Observation description of a pathogen on a host plant, causing a disease, in a given place and time

Named entity recognition and relationship extraction structure raw text,

transforming the textual description into machine-readable biological event records.

Knowledge extraction

Identifies knowledge in the text, including including rare, emerging, or contested findings, e.g. report of a pest in a new region.

By processing a comprehensive corpus of documents, including reports and multilingual literature,

it formalizes data into an actionable knowledge repository representing scientific consensus.

Towards an integrated information system

The interoperability between document-extracted information and other data sources

facilitates data enrichment, and

enables advanced analysis and

support automatic hypothesis inference

p. 16

AI for Plant Health AKIE2025

Claire Nédellec

17 of 30

From news to structured data

https://www.cdfa.ca.gov/exec/Public_Affairs/Press_Releases/Archive/pr.html?id=15-031

https://californiacitrusthreat.org/pest-disease/huanglongbing-quarantine/

https://acwm.lacounty.gov/asian-citrus-psyllid-and-hlb/

Definition of Epidemiomonitoring of plants (EPOP) Ontology

Methods for the annotation of text information

Information agregation into a knowledge graph

 

Interoperability ensured by the alignment of the extracted information with references

  • GeoNames nomenclature
  • OntoBiotope habitats
  • EPPO Global Database
  • NCBI taxonomy

p. 17

AI for Plant Health AKIE2025

Claire Nédellec

18 of 30

docs empilés

Collect – Extract – Translate web scraped documents

Filter – Summarize - Classify the documents

Transform raw text into structured data =

Extract and Standardise the information with respect to a reference

Integration of literature extracted data in existing knowledge base

Targeted information

  • Localisation
  • Species
  • Biotic interactions
  • Spatio-temporal relations

Interoperability ensured by the alignment of the extracted information with references

  • GeoNames nomenclature
  • OntoBiotope habitats
  • EPPO Global Database
  • NCBI taxonomy

Corpus-based evaluation of the methods, quality and robustness

Using language models

p. 18

AI for Plant Health AKIE2025

Claire Nédellec

19 of 30

EPOP, epidemiomonitoring of plant

Text-bound annotation.

To support the development of monitoring systems that can highlight mentions in context

Normalization of species and geographical locations.

For integration with knowledge graphs and predictive models

Modality detection

for assessing evidential reliability

Negation and hypothesis

N-ary relations (events) extraction

To represent epidemiological events, observations, or complex trophic interactions

Formalization as a new NLP Shared Task: Structured Extraction from Phytosanitary Reports

⚠️

p. 19

AI for Plant Health AKIE2025

Claire Nédellec

20 of 30

The new annotated corpus EPOP, for the epidemiomonitoring of plant

To train and evaluate NLP methods to extract semantically grounded and verifiable information.

540 news annotated by 30 French experts in plant health and NLP

in a double-bind way

submitted to LREC 2026

Token

Entity

Binary relation

N-ary relation

Coreference

115,000

7,537

4,717

2,929

373

Entity type

Training

Dev

Date

419

217

Disease

234

148

Dissemination_pathway

138

48

Location

1042

485

Pest

908

338

Plant

663

347

Vector

78

32

Total

2925

1350

Relation type

Training

Dev

Causes

66

35

Detected_on

287

134

Dispersed_by

36

18

Affects

141

74

Found_On

441

181

Located_In

1210

567

Transmits

36

13

Total

1894

870

The test set remains hidden to avoid data leakage and LLLM evaluation bias

p. 20

AI for Plant Health AKIE2025

Claire Nédellec

21 of 30

EPOP corpus specificities

Equivalent argument entities in relations or events.

Identity coreferences link mentions of the same entity in the same role

Discontinuous and overlapping

entity annotations

Annotation of overlapping long-distance n-ary relations (events)

Ambiguous semantic roles

North-Western provinces| and |Southern provinces|of Tuscany  

2 overlapping and discontinuous entities to denote distinct geographical locations

Are pome, or stone fruits, …

crop, or fruits ?

p. 21

AI for Plant Health AKIE2025

Claire Nédellec

22 of 30

Nomenclatures and ontologies for standardisation

Relations: EPOP ontology

Normalisation maps the text entity mentions to the relevant class

The identifiers in the graph designate the entity classes

Locations Geonames

Plant, vector, pest species NCBI taxonomy

Relations EPOP

Locations: Geonames nomenclature

Species: NCBI taxonomy

Information graph with the entity classes

Reference per entity type

p. 22

AI for Plant Health AKIE2025

Claire Nédellec

23 of 30

Performances measured on the EPOP corpus in the Named Entity Recognition task

Entity type

F₁

Recall

Precision

Any

0.81±0.01 (0.07)

0.84±0.01 (0.08)

0.78±0.01 (0.07)

Date

0.79±0.01 (0.10)

0.82±0.01 (0.10)

0.75±0.01 (0.09)

Disease

0.87±0.02 (0.05)

0.90±0.02 (0.05)

0.84±0.02 (0.05)

Dissemination_

pathway

0.49±0.04 (0.17)

0.52±0.04 (0.18)

0.49±0.04 (0.16)

Location

0.84±0.01 (0.07)

0.85±0.01 (0.08)

0.83±0.02 (0.07)

Pest

0.85±0.01 (0.04)

0.91±0.01 (0.05)

0.79±0.01 (0.04)

Plant

0.79±0.01 (0.09)

0.83±0.01 (0.09)

0.76±0.02 (0.09)

Vector

0.36±0.05 (0.13)

0.32±0.04 (0.11)

0.45±0.07 (0.15)

BioBERT plus a softmax layer to classify tokens and an entity span reconstruction step

BioBERT trained on the EPOP corpus

Performances measured by

  • Recall
  • Precision
  • F1 score

The difference between strict and relaxed matching is in parentheses

A single model predicts all entity types

  • The Disease and Pest entities are well-identified
  • The Vector and Dissemination pathway entities are more difficult to recognize due to their ambiguous roles

p. 23

AI for Plant Health AKIE2025

Claire Nédellec

24 of 30

Performances of ReBERT measured on the EPOP corpus in the Relation Extraction task

Relation

Recall

Precision

F₁

Causes

0,71

0,81

0,76

Detected_on

0,30

0,59

0,40

Dispersed_by

0,65

0,32

0,43

Affects

0,83

0,69

0,76

Found_on

0,72

0,68

0,70

Located_in

0,70

0,61

0,65

Transmits

0,75

0,43

0,55

ALL (micro)

0,70

0,59

0,64

ALL (macro)

0,66

0,59

0,61

Relation extraction scores obtained with gold-standard entities.

Detected on is the most difficult to predict

  • ReBERT classifies candidate entity pairs as either one of the relation types, or the absence of relation.

  • ReBERT is implemented as a BERT sequence classification through the embeddings of the [CLS] token.

  • Each relation candidate is represented as a text where the boundaries of the candidate arguments are marked with special tokens (“@@” and “$$” respectively for the head and tail).

ReBERT

p. 24

AI for Plant Health AKIE2025

Claire Nédellec

25 of 30

LLM hard-prompting

p. 25

AI for Plant Health AKIE2025

Claire Nédellec

26 of 30

Performances of LLM methods measured on the EPOP corpus in the Information Extraction task (joint task)

 

GPT-4o-mini

Kimi

DeepSeek-V3

Qwen3.0

Precision

Recall

F₁

Precision

Recall

F₁

Precision

Recall

F₁

Precision

Recall

F₁

Affects

0.75±0.41

0.50±0.37

0.57±0.36

0.70±0.39

0.50±0.37

0.56±0.36

0.82±0.34

0.61±0.35

0.68±0.33

0.84±0.27

0.64±0.29

0.70±0.26

Causes

0.71±0.43

0.63±0.42

0.65±0.42

0.78±0.37

0.68±0.38

0.71±0.37

0.80±0.39

0.69±0.40

0.72±0.39

0.76±0.36

0.68±0.37

0.70±0.36

Found on

0.74±0.32

0.75±0.39

0.60±0.39

0.64±0.38

0.75±0.37

0.59±0.38

0.84±0.31

0.69±0.34

0.74±0.32

0.83±0.29

0.70±0.34

0.74±0.32

Located in

0.81±0.36

0.44±0.31

0.53±0.31

0.88±0.30

0.51±0.32

0.61±0.31

0.93±0.22

0.53±0.29

0.64±0.26

0.91±0.21

0.55±0.28

0.64±0.25

Transmits

0.86±0.32

0.70±0.35

0.74±0.33

0.82±0.27

0.66±0.35

0.70±0.31

0.98±0.09

0.79±0.31

0.84±0.23

0.87±0.29

0.70±0.36

0.75±0.32

All (Micro)

0.55 ± 0.30

0.62± 0.30

0.54± 0.27

0.60± 0.30

0.61± 0.31

0.56± 0.27

0.64± 0.28

0.67± 0.27

0.61± 0.24

0.65± 0.26

0.52± 0.31

0.53± 0.25

All (Macro)

0.78±0.05

0.57±0.09

0.63±0.07

0.79±0.06

0.59±0.07

0.64±0.06

0.87±0.07

0.66±0.09

0.72±0.07

0.84±0.05

0.65±0.06

0.71±0.04

  • Encouraging results
  • DeepSeek outperforms GPT, Kimi and Qwen in the joint information extraction task
  • LLM outperforms BERT-based models

p. 26

AI for Plant Health AKIE2025

Claire Nédellec

27 of 30

EPOP data available on-line

27

HAL open science

Training and development datasets available

p. 27

AI for Plant Health AKIE2025

Claire Nédellec

28 of 30

PestCLEF Task 2026: NLP for plant surveillance

Web site and Evaluation service open soon

on the Bibliome Challenge web site

EPOP training and devt datasets available

https://nlptasks.mathnum.inrae.fr/bibliome-challenges

Schedule

Why Participate to PestCLEF?

  • Contribute to early-warning systems for pest outbreaks
  • Advance domain-specific NLP in a critical, under-resourced area
  • Challenge your language models on long-distance relations, ambiguous semantic roles, and knowledge graph generation

📄 Call for participation coming soon!

p. 28

AI for Plant Health AKIE2025

Claire Nédellec

29 of 30

Nicolas Sauvion

Guillaume David

Cica Urbino

Emmanuel Wicker

Cindy Morris

Samuel Soubeyrand

Marie Grosdidier

Sandy Dupérier

Isabelle Pieretti

Jean-Baptiste Louvet

Simon Nicoux

Davide Martinetti

Claire Nédellec

Robert Bossy

Louise Deléger

Mouhamadou Ba,

Marine Courtin

Clara Sauvion

Xinzhi Yao

Cindy Morris

Eric Verdin

Sylvie Dallot

Catherine Abadie

Jean-Claude Streito

Sara Tramontini

Alexia Antoniou

Xavier Foissac

Sylvie Malembic-Maher

Pascal Frey

Frédérique Hilliou

Jean-Michel Hily

Christophe Le May

Frédéric Suffert

Jean-Pierre Thermoz

Philippe Reynaud

Anne Quillevere

Jaime Aguayo

Delphine Massé

Laboratoire de la santé des végétaux

Jingbo Xia

Xinzhi Yao

p. 29

AI for Plant Health AKIE2025

Claire Nédellec

30 of 30

p. 30

AI for Plant Health AKIE2025

Claire Nédellec