JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 29

Advanced Planning and Scheduling with Semantic Knowledge Graphs and LLMs

Nenad Petrovic, Milorad Tosic

University of Niš, Faculty of Electronic Engineering, Niš, Serbia

2 of 29

Objectives

Provide experimental evidence about possible directions to more reliable, reasonable and responsible AI

Improve usability of existing solution for enterprise planning & scheduling

Explore value add of SKGs & LLMs in enterprise

3 of 29

Background: APS in Manufacturing

Manufacturing operations in modern digital enterprise are becoming increasingly complex
Advanced Planning and Scheduling (APS)

Any computer program that uses advanced mathematical algorithms or logic to perform optimization or simulation on finite capacity scheduling
Increasingly important in challenging Industry 4.0

Recent advances in Artificial Intelligence (AI) offer a promising approach to improve business value generated by APS.

4 of 29

Background: AI application layers challenge

???

5 of 29

Background: AI application layers challenge

???

6 of 29

Approach

Leverage ontology-driven APS solution
Adopt Large Language Models (LLM) to reduce APS costs by starting from textual descriptions
End-users do not necessarily need specific expert knowledge
In this way, the high potential of adopting ontologies for implementation of the cognitive intelligence would be proven.

LLM

ontologies

APS

7 of 29

Proposed solution

Different variations of context/prompt engineering techniques considered

Focus on the so-called In-Context Learning (ICL) prompting strategy, where

ontologies and
KG segments as examples are embedded within a prompt.

The main advantage of the approach is that pre-trained general LLMs can perform new tasks without requiring additional fine-tuning, which is time and resources intensive procedure.

8 of 29

Proposed solution

Direct embedding of textual serialization of planning ontologies in the context is not practical due to large volume of data contained.

huge costs resulting from token consumption
hallucinations in the case when the context is arbitrarily cut.

Additional strategy is needed for preprocessing of textual content of the target domain ontologies
Retrieval Augmented Generated (RAG) method is adopted, particularly Retrieve and Re-Rank (RRR) process, when content of the set of ontologies is larger than the context of commonly used LLM solutions.

9 of 29

Implementation: Architecture

User describes what is ordered together with related resources (such as machines and employees) as a freeform text.

User-provided text {story} is leveraged as input to RRR

1,5-User input {story} 2a-RDF schema 2b-RDF knowledge graph 3a-Ontology chunks 3b-Knowledge graph excerpt chunks 4a-Context: ontology excerpt {context1} 4b-Context: knowledge graph template {context2} 6-Intermediary result 7-Triplets 8-Order definition 9-Generated work plan.

10 of 29

Implementation: Architecture

As result of the RRR, the most relevant excerpts of the ontology collection based on the user description are extracted (4a-context1)

1,5-User input {story} 2a-RDF schema 2b-RDF knowledge graph 3a-Ontolgy chunks 3b-Knowledge graph excerpt chunks 4a-Context: ontology excerpt {context1} 4b-Context: knowledge graph template {context2} 6-Intermediary result 7-Triplets 8-Order definition 9-Generated work plan.

11 of 29

Implementation: Architecture

A representative excerpt of knowledge graph is taken into account in order to ensure that LLM will hit the right terms without hallucinations (4b-context2)

12 of 29

Retrieve and Re-Rank for In-Context Learning (RRR4ICL) – Workflow overview

Preprocessing

transform input textual files into format suitable for next phases (such as splitting document into chunks).

Retrieve

Initial retrieval of a broad set of potentially relevant parts of text (chunks).

Re-Rank

Re-ranking to isolate the most relevant subset of chunks

13 of 29

RRR4ICL components - RecursiveSplitter

The first phase is documents chnaking
An existing library adopted for creation of document chunks:

https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html

“Syntax driven” chunking

Splits by separators hierarchy;
Keeps natural units together;
Merges smaller chunks;
Recursive splitting for large units;
Final chunks are of varying length

Parameters (based on empirical estimation)

chunk_size - 600 characters

maximum number of characters or tokens allowed in a single chunk

chunk_overlap - up to 200 characters

number of characters or tokens shared between consecutive chunks.

14 of 29

RRR4ICL components – SentenceTransformer

The second phase begins with semantic search using the Bi-Encoder model
Given a search query, a large set of potentially relevant text segments (or chunks) is generated by using dense retrieval
SentenceTransformer -based Bi-Encoder is used.
However, this method can sometimes return results that are only loosely related to the query.

15 of 29

RRR4ICL components – CrossEncoder

The third phase : CrossEncoder evaluates and scores the relevance of each candidate segment more precisely.
The final output is a ranked list of results optimized for relevance.
The dense retrieval approach leverages semantic search, mapping both the query and documents into a shared vector space to retrieve the closest matches.

This method surpasses traditional lexical search by recognizing synonyms, acronyms, and semantically similar terms.

16 of 29

RRR4ICL – Steps

First, sentences and paragraphs are converted into 384-dimensional dense vectors (multi-qa-MiniLM-L6-cos-v1 model).
Than, cross-encoder is applied to re-rank (ms-marco-MiniLM-L-6-v2 model).

Input: query, document Steps:
Split the document into chunks Bi-Encoder rapidly computes similarity scores (ranging from 0 to 1) between the input query and all document chunks. Select the top k chunks (where k = 32 in our case) based on the similarity scores. Apply Cross-Encoder: Each of the top k chunks, along with the original query, is passed through the Cross-Encoder to refine the ranking. Top m results (where m = 3 in our case) are selected as the final context to be used for prompt execution
Output 1: context - top m chunks combined prompt:=parametrizePrompt(query, context) answer:=executePrompt(prompt) Output 2: answer – response generated by LLM

17 of 29

RRR4ICL – Prompt templates used

Prompt 1 - From text to planner inputs

"Create set of RDF triplets for semantic knowledge graph in XML with respect to given ontologies: {context1 – ontology excerpt} and example graph {context2 – knowledge graph template} based on user story: {story}“

Prompt 2 - Update of input knowledge graph

"Update the given semantic knowledge graph {graph} based on the given user story: {story}"

Prompt 3 - Planning assistant

"Answer the question about ontology: {story}, based on given excerpt: {context1 – ontology excerpt}"

18 of 29

Deployment overview

Planning-related components and semantic triple store

Part of Tasor semantic planner infrastructure which is part of commercial solution.

RRR approach models

deployable on local server
two language models used as bi-encoder and cross-encoder are not as large as text generation solutions.

LLM service unifying RRR with prompting against GPT-4o

local server, making use of Langchain library and Ollama for deployment of bi-encoder and cross-encoder models.

Result generation and prompting

GPT-4o which is a commercial solution and deployed on OpenAI’s cloud infrastructure.

1-HTTP request containing user story; 2-Prompt to GPT-4o; 3-Response of GPT-4o; 4-Triplet insertion; 5-Planning output; 6-HTTP response (created triplets/plan).

19 of 29

Development environment

Python-based Flask API web service implementation
Execution environment

Laptop with Intel i5-10300H 2.50GHz CPU, 24GB of RAM and NVIDIA GTX1650 with 4GB of VRAM

LangChain

open-source framework whose aim is to aid developers build applications relying on LLMs in more effective and efficient manner, by providing set of tools and abstractions.

Ollama

open-source platform that enables running and management of LLMs directly on local machine
Simplifies the deployment of open-source LLMs, allowing terminal and API iteration without relying on cloud services.

1-HTTP request containing user story; 2-Prompt to GPT-4o; 3-Response of GPT-4o; 4-Triplet insertion; 5-Planning output; 6-HTTP response (created triplets/plan).

20 of 29

API overview

Method	Arguments	Output	Description
load_ontology	ontologyPath – Path where the textual file of ontology is stored.	-	Appends the content from given file containg RDF format ontology to the overall text which will be processed by Retrieval and Re-Rank method
search	query – prompt that is used as input for Retrieval and Re-Rank process. It is constructed as combination of pre-defined template and direct input provided by user	Context that will be further used as input to LLM service	Constructs the context by combining the most relevant results (text chunks) returned as outcome of Retrieval and Re-Rank process against the set of planning-relevant ontologies
handle_question	question – user-defined input	Textual response	Relies on search method to get context that will be leveraged for prompt that produces the final LLM-generated answer
__init__	Model – Desired LLM that will be used for response generation (recommended gemma2:9b and GPT-4o)	-	Constructor of the underlying class encapsulating RRR process.

21 of 29

Experiments and evaluation

Two sets of user interaction experiments

Text-based Planner input: User description->Knowledge base
Question answering about Planner ontologies

Three approach variants

A1 - simpler solution leveraging only relevant parts of ontology as context
A2 - incorporating example graph excerpts within the context
A1’ - no RAG-extracted context

22 of 29

Experiment1: Text-based planner input

result - achieved accuracy, A1 - only relevant parts of ontology as context, A2 – with sample graph excerpts, A1’ - no RAG-extracted context, Y-yes, N-no, WS-wrong syntax, RAG - retrieval of context

Title	Text	Result	A1	A2	Execution time [s]
Employee creation	Name of employee is Dusan Kostic. He has id 612. He is member of department project managers. He is member of production team and his position is mechanics designer.	Classes	1/1	1/1	A1: 6.77	A2: 8.44	RAG: A1: 100 A2: 146
		Properties	1/4	4/4
		Hallucinated	Y	N
Order creation	We have a new order for C1 company and the product we want to produce is P1. The activity starts from 2025-06-01 and ends 2025-07-31.	Classes	WS	1/1	A1: 8.31	A2: 8.57	RAG: A1: 110 A2: 130
		Properties		5/5
		Hallucinated		N
Machine creation	Add new CNC machine with inventory id: CNC_1.	Classes	1/1	-	A1: 6.61	-	RAG: A1: 101 A2: 134
		Properties	1/1	-
		Hallucinated	N	-
Activity flow definition	The production flow contains the following activities: cutting, assembling and packing.	Classes	WS	4/4	A1: 7.72	A2: 7.84	RAG: A1: 108 A2: 131
		Properties		3/3
		Hallucinated		N
Employee update	Change the id of employee Dusan Kostic.	Correct update	A1’:Y	A2: Y	A1’: 3.1	A2: 6.13	RAG: A2: 101
		Hallucinated	N	N
Flow extension	Add new activity: preparation before cutting to given flow.	Correct update	A1’: Y	A2: Y	A1’: 5.61	A2: 7.88	RAG: A2: 115
		Hallucinated	N	N

23 of 29

Experiment1: Results’ accuracy estimation

The ratio of correctly identified RDF resources (classes, properties and individuals) is taken into account
Results based on average of 10 executions
Correctness of generated triplets was evaluated based on human inspection of LLM outputs and their comparison to the triplets from Tasor Planner.
Surplus RDF resources within output considered

Do not exist in any of the production-related ontologies, resulting as outcome of LLM’s hallucination effect

24 of 29

Experiment1: Results discussion

Approach with example graph excerpts within the context (A2) clearly outperforms the solution leveraging only relevant parts of ontology (A1).
Hallucinations are usually avoided considering the fact that correct classes, relation and properties are present within the example graph.
A1 in activity flow definition, as well as other more complex scenarios, is ineffective and prone to hallucinations.
A2 requires more processing time, taking into account that RRR process will be executed twice and the length of the prompt is also increased.
When it comes to graph update scenario, even the simpler approach A1’, which does not leverage RAG-extracted context, provides correct result.
A1 has advantage of much shorter execution time (no RRR process invocation).

25 of 29

Experiment2: Questions about ontologies

Experiment	Question	Result	Prompt execution [s]
Attributes list retrieval	Which attributes are relevant for employee definition?	4/4 in 90% cases	1.8
Attributes list retrieval	Which elements are relevant for composite activity?	4/4 in 90% cases	1.9
Related concepts retrieval	Which concepts are related to order?	3/3 in 80% cases	2.7
Relations retrieval	How employee is related to order?	Correct answer in 70% cases	1.8

Based on prompt 3: Answer the question about ontology: {story}, based on given excerpt: {context1 – ontology excerpt}"

26 of 29

Experiment2: Results discussion

The best performance is for attribute retrieval, while answering about relations is a bit more challenging.
This phenomenon can be explained that attribute retrieval and direct relation identification are simpler case, considering the locality of information which is extracted from document and injected into context.
However, relations in ontology, especially the indirect ones rely on information which is usually not stored inside the document close to each other.
Conclusion: the proposed approach exhibits better results when contextual information is stored within the continuous region inside the document.

27 of 29

Conclusions and discussion

Experimental evidence of potential of Retrieval Augmented Generation (RAG) for handling larger textual inputs containing semantic data.
Symbolic and statistical paradigms of cognition should be considered as complementing.
One possible practical implementation of synergy between symbolic and statistical paradigms of cognition is illustrated.

28 of 29

Future work

More detailed comparison of semantic-driven planning solutions against the more traditional ones.
Explore other strategies and additional steps required for adoption of smaller, locally deployable models
Focused LLM-based extraction of raw triplets with respect to simplified ontology representation.
More comprehensive evaluation of the proposed approach for other domains
Consider additional aspects, such as relationships and value constraints.
Improvements of individual components of the proposed RRR4ICL workflow:

“Semantics driven” recursive chunking instead of currently used “syntax driven”

29 of 29

Thank you!

Questions?

Prof dr Milorad Tosic

milorad.tosic@elfak.ni.ac.rs

milorad.tosic@virtuonasoft.com