1 of 29

Advanced Planning and Scheduling with Semantic Knowledge Graphs and LLMs

Nenad Petrovic, Milorad Tosic

University of Niš, Faculty of Electronic Engineering, Niš, Serbia

2 of 29

Objectives

  • Provide experimental evidence about possible directions to more reliable, reasonable and responsible AI

  • Improve usability of existing solution for enterprise planning & scheduling

  • Explore value add of SKGs & LLMs in enterprise

3 of 29

Background: APS in Manufacturing

  • Manufacturing operations in modern digital enterprise are becoming increasingly complex
  • Advanced Planning and Scheduling (APS)
    • Any computer program that uses advanced mathematical algorithms or logic to perform optimization or simulation on finite capacity scheduling
    • Increasingly important in challenging Industry 4.0
  • Recent advances in Artificial Intelligence (AI) offer a promising approach to improve business value generated by APS.

4 of 29

Background: AI application layers challenge

???

5 of 29

Background: AI application layers challenge

???

6 of 29

Approach

  • Leverage ontology-driven APS solution
  • Adopt Large Language Models (LLM) to reduce APS costs by starting from textual descriptions
  • End-users do not necessarily need specific expert knowledge
  • In this way, the high potential of adopting ontologies for implementation of the cognitive intelligence would be proven.

LLM

ontologies

APS

7 of 29

Proposed solution

  • Different variations of context/prompt engineering techniques considered
    • Focus on the so-called In-Context Learning (ICL) prompting strategy, where
      • ontologies and
      • KG segments as examples are embedded within a prompt.
    • The main advantage of the approach is that pre-trained general LLMs can perform new tasks without requiring additional fine-tuning, which is time and resources intensive procedure.

8 of 29

Proposed solution

  • Direct embedding of textual serialization of planning ontologies in the context is not practical due to large volume of data contained.
    • huge costs resulting from token consumption
    • hallucinations in the case when the context is arbitrarily cut.
  • Additional strategy is needed for preprocessing of textual content of the target domain ontologies
  • Retrieval Augmented Generated (RAG) method is adopted, particularly Retrieve and Re-Rank (RRR) process, when content of the set of ontologies is larger than the context of commonly used LLM solutions.

9 of 29

Implementation: Architecture

  • User describes what is ordered together with related resources (such as machines and employees) as a freeform text.
    • User-provided text {story} is leveraged as input to RRR

1,5-User input {story} 2a-RDF schema 2b-RDF knowledge graph 3a-Ontology chunks 3b-Knowledge graph excerpt chunks 4a-Context: ontology excerpt {context1} 4b-Context: knowledge graph template {context2} 6-Intermediary result 7-Triplets 8-Order definition 9-Generated work plan.

10 of 29

Implementation: Architecture

  • As result of the RRR, the most relevant excerpts of the ontology collection based on the user description are extracted (4a-context1)

1,5-User input {story} 2a-RDF schema 2b-RDF knowledge graph 3a-Ontolgy chunks 3b-Knowledge graph excerpt chunks 4a-Context: ontology excerpt {context1} 4b-Context: knowledge graph template {context2} 6-Intermediary result 7-Triplets 8-Order definition 9-Generated work plan.

11 of 29

Implementation: Architecture

  • A representative excerpt of knowledge graph is taken into account in order to ensure that LLM will hit the right terms without hallucinations (4b-context2)

1,5-User input {story} 2a-RDF schema 2b-RDF knowledge graph 3a-Ontolgy chunks 3b-Knowledge graph excerpt chunks 4a-Context: ontology excerpt {context1} 4b-Context: knowledge graph template {context2} 6-Intermediary result 7-Triplets 8-Order definition 9-Generated work plan.

12 of 29

Retrieve and Re-Rank for In-Context Learning (RRR4ICL) – Workflow overview

  • Preprocessing
    • transform input textual files into format suitable for next phases (such as splitting document into chunks).
  • Retrieve
    • Initial retrieval of a broad set of potentially relevant parts of text (chunks).
  • Re-Rank
    • Re-ranking to isolate the most relevant subset of chunks

13 of 29

RRR4ICL components - RecursiveSplitter

  • The first phase is documents chnaking
  • An existing library adopted for creation of document chunks:

https://python.langchain.com/api_reference/text_splitters/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html

  • “Syntax driven” chunking
    • Splits by separators hierarchy;
    • Keeps natural units together;
    • Merges smaller chunks;
    • Recursive splitting for large units;
    • Final chunks are of varying length
  • Parameters (based on empirical estimation)
    • chunk_size - 600 characters

maximum number of characters or tokens allowed in a single chunk

    • chunk_overlap - up to 200 characters

number of characters or tokens shared between consecutive chunks.

14 of 29

RRR4ICL components – SentenceTransformer

  • The second phase begins with semantic search using the Bi-Encoder model
  • Given a search query, a large set of potentially relevant text segments (or chunks) is generated by using dense retrieval
  • SentenceTransformer -based Bi-Encoder is used.
  • However, this method can sometimes return results that are only loosely related to the query.

15 of 29

RRR4ICL components – CrossEncoder

  • The third phase : CrossEncoder evaluates and scores the relevance of each candidate segment more precisely.
  • The final output is a ranked list of results optimized for relevance.
  • The dense retrieval approach leverages semantic search, mapping both the query and documents into a shared vector space to retrieve the closest matches.
  • This method surpasses traditional lexical search by recognizing synonyms, acronyms, and semantically similar terms.

16 of 29

RRR4ICL – Steps

  • First, sentences and paragraphs are converted into 384-dimensional dense vectors (multi-qa-MiniLM-L6-cos-v1 model).
  • Than, cross-encoder is applied to re-rank (ms-marco-MiniLM-L-6-v2 model).

Input: query, document

Steps:

  1. Split the document into chunks
  2. Bi-Encoder rapidly computes similarity scores (ranging from 0 to 1) between the input query and all document chunks.
  3. Select the top k chunks (where k = 32 in our case) based on the similarity scores.
  4. Apply Cross-Encoder:
    1. Each of the top k chunks, along with the original query, is passed through the Cross-Encoder to refine the ranking.
    2. Top m results (where m = 3 in our case) are selected as the final context to be used for prompt execution

Output 1: context - top m chunks combined

  1. prompt:=parametrizePrompt(query, context)
  2. answer:=executePrompt(prompt)

Output 2: answer – response generated by LLM

17 of 29

RRR4ICL – Prompt templates used

  • Prompt 1 - From text to planner inputs
    • "Create set of RDF triplets for semantic knowledge graph in XML with respect to given ontologies: {context1 – ontology excerpt} and example graph {context2 – knowledge graph template} based on user story: {story}“
  • Prompt 2 - Update of input knowledge graph
    • "Update the given semantic knowledge graph {graph} based on the given user story: {story}"
  • Prompt 3 - Planning assistant
    • "Answer the question about ontology: {story}, based on given excerpt: {context1 – ontology excerpt}"

18 of 29

Deployment overview

  • Planning-related components and semantic triple store
    • Part of Tasor semantic planner infrastructure which is part of commercial solution.
  • RRR approach models
    • deployable on local server
    • two language models used as bi-encoder and cross-encoder are not as large as text generation solutions.
  • LLM service unifying RRR with prompting against GPT-4o
    • local server, making use of Langchain library and Ollama for deployment of bi-encoder and cross-encoder models.
  • Result generation and prompting
    • GPT-4o which is a commercial solution and deployed on OpenAI’s cloud infrastructure.

1-HTTP request containing user story; 2-Prompt to GPT-4o; 3-Response of GPT-4o; 4-Triplet insertion; 5-Planning output; 6-HTTP response (created triplets/plan).

19 of 29

Development environment

  • Python-based Flask API web service implementation
  • Execution environment
    • Laptop with Intel i5-10300H 2.50GHz CPU, 24GB of RAM and NVIDIA GTX1650 with 4GB of VRAM
  • LangChain
    • open-source framework whose aim is to aid developers build applications relying on LLMs in more effective and efficient manner, by providing set of tools and abstractions.
  • Ollama
    • open-source platform that enables running and management of LLMs directly on local machine
    • Simplifies the deployment of open-source LLMs, allowing terminal and API iteration without relying on cloud services.

1-HTTP request containing user story; 2-Prompt to GPT-4o; 3-Response of GPT-4o; 4-Triplet insertion; 5-Planning output; 6-HTTP response (created triplets/plan).

20 of 29

API overview

Method

Arguments

Output

Description

load_ontology

ontologyPath – Path where the textual file of ontology is stored.

-

Appends the content from given file containg RDF format ontology to the overall text which will be processed by Retrieval and Re-Rank method

search

query – prompt that is used as input for Retrieval and Re-Rank process. It is constructed as combination of pre-defined template and direct input provided by user

Context that will be further used as input to LLM service

Constructs the context by combining the most relevant results (text chunks) returned as outcome of Retrieval and Re-Rank process against the set of planning-relevant ontologies

handle_question

question – user-defined input

Textual response

Relies on search method to get context that will be leveraged for prompt that produces the final LLM-generated answer

__init__

Model – Desired LLM that will be used for response generation (recommended gemma2:9b and GPT-4o)

-

Constructor of the underlying class encapsulating RRR process.

21 of 29

Experiments and evaluation

  • Two sets of user interaction experiments
    • Text-based Planner input: User description->Knowledge base
    • Question answering about Planner ontologies
  • Three approach variants
    • A1 - simpler solution leveraging only relevant parts of ontology as context
    • A2 - incorporating example graph excerpts within the context
    • A1’ - no RAG-extracted context

22 of 29

Experiment1: Text-based planner input

result - achieved accuracy, A1 - only relevant parts of ontology as context, A2 – with sample graph excerpts, A1’ - no RAG-extracted context, Y-yes, N-no, WS-wrong syntax, RAG - retrieval of context

Title

Text

Result

A1

A2

Execution time

[s]

Employee creation

Name of employee is Dusan Kostic. He has id 612. He is member of department project managers. He is member of production team and his position is mechanics designer.

Classes

1/1

1/1

A1:

6.77

A2:

8.44

RAG:

A1:

100

A2:

146

Properties

1/4

4/4

Hallucinated

Y

N

Order creation

We have a new order for C1 company and the product we want to produce is P1. The activity starts from 2025-06-01 and ends 2025-07-31.

Classes

WS

1/1

A1:

8.31

A2:

8.57

RAG:

A1:

110

A2:

130

Properties

5/5

Hallucinated

N

Machine creation

Add new CNC machine with inventory id: CNC_1.

Classes

1/1

-

A1:

6.61

-

RAG:

A1:

101

A2:

134

Properties

1/1

-

Hallucinated

N

-

Activity flow definition

The production flow contains the following activities: cutting, assembling and packing.

Classes

WS

4/4

A1:

7.72

 

A2:

7.84

RAG:

A1:

108

A2:

131

Properties

3/3

Hallucinated

N

Employee update

Change the id of employee Dusan Kostic.

Correct update

A1’:Y

A2:

Y

A1’:

3.1

A2:

6.13

RAG:

A2:

101

Hallucinated

N

N

Flow extension

Add new activity: preparation before cutting to given flow.

Correct update

A1’: Y

A2:

Y

A1’:

5.61

A2:

7.88

RAG:

A2:

115

Hallucinated

N

N

23 of 29

Experiment1: Results’ accuracy estimation

  • The ratio of correctly identified RDF resources (classes, properties and individuals) is taken into account
  • Results based on average of 10 executions
  • Correctness of generated triplets was evaluated based on human inspection of LLM outputs and their comparison to the triplets from Tasor Planner.
  • Surplus RDF resources within output considered
    • Do not exist in any of the production-related ontologies, resulting as outcome of LLM’s hallucination effect

24 of 29

Experiment1: Results discussion

  • Approach with example graph excerpts within the context (A2) clearly outperforms the solution leveraging only relevant parts of ontology (A1).
  • Hallucinations are usually avoided considering the fact that correct classes, relation and properties are present within the example graph.
  • A1 in activity flow definition, as well as other more complex scenarios, is ineffective and prone to hallucinations.
  • A2 requires more processing time, taking into account that RRR process will be executed twice and the length of the prompt is also increased.
  • When it comes to graph update scenario, even the simpler approach A1’, which does not leverage RAG-extracted context, provides correct result.
  • A1 has advantage of much shorter execution time (no RRR process invocation).

25 of 29

Experiment2: Questions about ontologies

Experiment

Question

Result

Prompt execution [s]

Attributes list retrieval

Which attributes are relevant for employee definition?

4/4 in 90% cases

1.8

Which elements are relevant for composite activity?

1.9

Related concepts retrieval

Which concepts are related to order?

3/3 in 80% cases

2.7

Relations retrieval

How employee is related to order?

Correct answer in 70% cases

1.8

  • Based on prompt 3: Answer the question about ontology: {story}, based on given excerpt: {context1 – ontology excerpt}"

26 of 29

Experiment2: Results discussion

  • The best performance is for attribute retrieval, while answering about relations is a bit more challenging.
  • This phenomenon can be explained that attribute retrieval and direct relation identification are simpler case, considering the locality of information which is extracted from document and injected into context.
  • However, relations in ontology, especially the indirect ones rely on information which is usually not stored inside the document close to each other.
  • Conclusion: the proposed approach exhibits better results when contextual information is stored within the continuous region inside the document.

27 of 29

Conclusions and discussion

  • Experimental evidence of potential of Retrieval Augmented Generation (RAG) for handling larger textual inputs containing semantic data.
  • Symbolic and statistical paradigms of cognition should be considered as complementing.
  • One possible practical implementation of synergy between symbolic and statistical paradigms of cognition is illustrated.

28 of 29

Future work

  • More detailed comparison of semantic-driven planning solutions against the more traditional ones.
  • Explore other strategies and additional steps required for adoption of smaller, locally deployable models
  • Focused LLM-based extraction of raw triplets with respect to simplified ontology representation.
  • More comprehensive evaluation of the proposed approach for other domains
  • Consider additional aspects, such as relationships and value constraints.
  • Improvements of individual components of the proposed RRR4ICL workflow:
    • “Semantics driven” recursive chunking instead of currently used “syntax driven”

29 of 29

Thank you!