1 of 32

Statnett-LLM: Talk2PowerSystem

Democratizing power system analytics

2 of 32

Presentation Agenda

CIM Background
Statnett and Graphwise
Talk2PowerSystem Project
Dissemination, Exploitation, Sustainability

3 of 32

CIM Background

Electricity is one of the early domains to use semantic technologies, and an early case of using RDF

1990s: started in EPRI
2000-now: IEC standards
2010-now: ENTSO-E, CIM UG

IEC electrical CIM and ENTSO-E CGMES

Comprehensive UML models of the electrical enterprise
From which RDFS/OWL ontologies and SHACL shapes are derived.

4 of 32

CIM Background

Smart energy Grid Architecture Model (SGAM)
CIM is the foundation of managing the electrical enterprise

The electrical grid (Transmission System Operators): planning, building, maintaining, balancing
Distribution (Distribution System Operators): CRM, sales, work orders, maintenance, metering…
Markets: information, transparency, operations
Network Codes (EU Regulations): data model/exchange

Organizations developing CIM/CGMES

IEC TC57 CIM (WG13, WG14 and WG16)
UCA CIMug
ENTSO-E CIM WG

5 of 32

CIM Examples (1)

UML model

Abstract classes, eg Identified Object, PSR, Conducting Equipment
Concrete classes, eg Breaker, ACLineSegment
Terminals
ConnectivityNodes

Volumetrics (CIM+NC)

Over 900 classes
Over 5300 properties�(3700 attributes, 750 pairs of inverse relations)
Constantly growing and evolving

6 of 32

CIM Examples (2)

Electrical diagram vs RDF instance diagram (EDF). Gray blobs: connectivity nodes; blue dots: terminals

7 of 32

CIM Examples (3)

Electrical diagram vs conceptual RDF instance diagram (CIM Primer by EPRI)

8 of 32

Statnett

Statnett

Norwegian TSO
Leader in applying and extending CIM/CGMES
Have been using GraphDB for 6-7 years (first demo of GraphDB and CIM at CIM Users Group Spring 2013 Meeting, Ljubljana, Slovenia)
Tens of GraphDB repositories, hundreds of named graphs (CIM models), largest ones are over 2B triples

Statnett-LLM tender

Aug 2023: RFI started
Apr 2024: RFP submitted
Oct 2024: Negotiation and Best & Final Offer
Dec 2024: signed

9 of 32

Relevant Graphwise Capabilities

KG building: analysis, cleaning and harmonization, ontology reuse and engineering, semantic data models, semantic data integration through ETL and NoETL, semantic transformations and generation from declarative models, data loading and update scenarios, querying and analytics.
SHACL shapes and validation
On-premise and cloud deployment architectures (including Azure)
Automation (Docker, Kubernetes, Helm), monitoring (Kibana, Grafana).
Electrical ontologies (CIM, CGMES, SAREF4ENER, ENTSO-E market data)

Application in commercial projects (Statnett, Svenska Kraftnat, EDF, ENTSO-E, etc)

BIM models and ontologies (IFC, LBD ontologies, Bricks Schema, RealEstateCore, ASHRAE 223p),

Application in commercial products (Schneider Electric and Johnson Controls BMS, semantic asset management through partnerships)
Research (ACCORD project on automated compliance checking)

ML for semantic NLP tasks, entity matching in commercial domains (eg companies, investments) and life sciences, graph embeddings and KG completion
LLM applications and demonstrators for semantic NLP tasks, ontology and KG building, querying, etc.

AI in Action series for blogs, experiments and technical contributions on these topics.
3 LLM Innovation Groups established: Build your KG, Link to your KG, Talk to your KG

Data Spaces: semantization and interoperation of data spaces, IDSA task forces, manufacturing data space, FAIR data.
Research proposal writing: over 60 EU research projects, participated in writing over 500 proposals

10 of 32

Graphwise Demonstrators

Transparency Energy Knowledge Graph (TEKG)

ENTSO-E Transparency data
Semantic Integration: power plant databases,
Advanced validation with SHACL

Semantic bSDD

GraphQL querying
Schema diagrams
LLM over GraphQL

CIM Demo

CGMES data
CIM LLM NLQ over GraphQL
CIM Map with LLM

11 of 32

TEKG Advanced Validation

Even Statnett EIC record has a mistake: GS1 GLN instead of VAT

12 of 32

TEKG Maps and Charts

Due to OpenStreetMap integration we can show detailed plant outline and in some cases even the individual generation units

13 of 32

CIM Demo

Internal links: SPARQL, Platform, GraphQL, github: queries, schema, simplified schema
CGMES: turtle 1.56M (Equipment*+Geography*+model), SOML 260k (16x shorter); simplified 37k (42x shorter)

objects:

AccumulatorReset:

descr: This command reset the counter value to zero

inherits: ControlInterface

label: AccumulatorReset

props:

accumulatorReset.AccumulatorValue: {}

type: cim:AccumulatorReset

ControlInterface:

descr: Abstract superclass of Control

inherits: IdentifiedObjectInterface

kind: abstract

search: {nested: true}

props:

control.PowerSystemResource: {}

properties:

accumulatorReset.AccumulatorValue:

descr: The accumulator value that is reset by the command

inverseOf: accumulatorValue.AccumulatorReset

kind: object

label: AccumulatorValue

max: 1

min: 1

range: AccumulatorValue

rdfProp: cim:AccumulatorReset.AccumulatorValue

control.PowerSystemResource:

descr: 'The controller outputs used to...'

inverseOf: powerSystemResource.Controls

kind: object

label: PowerSystemResource

max: inf

min: 0

range: PowerSystemResourceInterface

rdfProp: cim:Control.PowerSystemResource

AccumulatorReset:

ISA: ControlInterface

accumulatorReset_AccumulatorValue: AccumulatorValue

ControlInterface:

ISA: IdentifiedObjectInterface

control_PowerSystemResource: [PowerSystemResourceInterface]

14 of 32

CIM GraphQL Querying

Much simpler than SPARQL; if we shorten prop names becomes more natural and easier for LLM

query psrWithLocationPointsAndVoltage {

aCLineSegment(where: { powerSystemResource_Location: {} }) {

identifiedObject_name

identifiedObject_description

powerSystemResource_Location {

location_PositionPoints(orderBy: { positionPoint_sequenceNumber: ASC }) {

positionPoint_xPosition

positionPoint_yPosition

positionPoint_sequenceNumber

}

conductingEquipment_BaseVoltage {

baseVoltage_nominalVoltage

}

substation(where: { powerSystemResource_Location: {} }) {

identifiedObject_name

powerSystemResource_Location {

location_PositionPoints {

positionPoint_xPosition

positionPoint_yPosition

}

query psrWithLocationPointsAndVoltage {

aCLineSegment(where: { location: {} }) {

name

description

location {

positionPoints(orderBy: { sequenceNumber: ASC }) {

xPosition

yPosition

sequenceNumber

}

baseVoltage {

nominalVoltage

}

substation(where: { location: {} }) {

name

location {

positionPoints {

xPosition

yPosition

}

15 of 32

CIM LLM Querying and Code Gen

LLM generated GraphQL query "PSR with location and voltage"
Generated Python script (using Folium library) to make interactive map
Needed plenty guidance/dialog.

import json

import folium

# open JSON file

with open('locations-result.json', 'r') as file:

locations_data = json.load(file)

# Initialize a map

map = folium.Map(location=[63, 13], zoom_start=5)

# Process and plot AC line segments

for line in locations_data["data"]["aCLineSegment"]:

points = line["powerSystemResource_Location"]["location_PositionPoints"]

voltage = line["conductingEquipment_BaseVoltage"]["baseVoltage_nominalVoltage"]

tooltip = f'{line["identifiedObject_name"]}: {line["identifiedObject_description"]}'

color = 'blue' if voltage <= 300 else 'red'

folium.PolyLine(

[(float(p["positionPoint_yPosition"]), float(p["positionPoint_xPosition"]))

for p in points],

color=color,

tooltip=tooltip).add_to(map)

# Process and plot substations

for sub in locations_data["data"]["substation"]:

point = sub["powerSystemResource_Location"]["location_PositionPoints"][0]

tooltip = sub["identifiedObject_name"]

folium.Marker(

location=[float(point["positionPoint_yPosition"]), float(point["positionPoint_xPosition"])],

icon=folium.Icon(color='green', icon='bolt', prefix='fa'),

# icon_size=(10, 10), icon_color='green': causes no bubble, so the marker appears off-point

tooltip=tooltip).add_to(map)

# Save the map to an HTML file

map.save('locations-show.html')

16 of 32

Talk2PowerSystem Project Overview

Aims to "democratise power system analytics" by:

Enabling Natural Language Querying (NLQ) over electrical KGs through SPARQL (and later GraphQL) generation
Related use cases such as:

integration with Statnett enterprise systems,
time-series processing,
code generation for power system analytics libraries.

Creating a comprehensive Q&A dataset for electricity

Original questions, multiplied by "variations" and "paraphrases", parameterized, instantiated from KG data
Will likely store it in RDF (e.g. by extending the QADO ontology).

LLM models:

Currently working with OpenAI models
But also plan to work with local LLMs, and perhaps train an Electricity-specific model.

17 of 32

Project Stats and Work Breakdown

Timeline: kickoff Feb 2024, finishes Apr 2026 (14m).
Effort: 42p/m Graphwise, 18p/m Statnett
Monthly sprints, quarterly milestones

18 of 32

Project Requirements

Open LLM on Premise
Reducing Hallucinations
Open Software and IP
CIM Standards
BIM-CIM Integration
System Architecture
SPARQL and GraphQL
TRL7 Deployment
LLM Usage
Understandability of Results
CIM Navigation
Semantic Integration of non-CIM data (BIM, Time Series)
Integration in Enterprise IT Infrastructure
CIM and CGMES Versioning
Data Spaces, Writing Research Proposals
Joint R&D Publications

19 of 32

Project Tasks

Semantic database (GraphDB) and its LLM integrations
SPARQL generation
Extra indexes (eg Autocomplete, GeoSPARQL),
LLM Tools (eg Object Identification, Property Disambiguation, Value Disambiguation)
GraphQL access to KGs (Semantic Objects, Semantic Search)
CIM adjustments (eg attaching datatypes to literals)
KG building, including CIM RDFization, Conversion of BIM to CIM
Competency questions, i.e. creation of Q&A train/test/validate datasets (KGQA approaches)
Open (locally deployable) LLM fine tuning
LLM-based NLQ: Graph RAG, SPARQL generation, GraphQL generation
LLM-based analytics through source code generation (if needed)
NLQ provenance (links to source info) and validation
Machine Learning and ML ops
Software development (consumption APIs)
IT architecture and infrastructure (Docker, Kubernetes, Helm, Azure, software deployment)
Dissemination, Communication, Exploitation, Sustainability
Data Spaces strategies
Writing research proposals
CIM model visualization
Project management and client communication

20 of 32

Key Project Approaches

Reasoning
CIM Data Adjustments and Corrections
NLQ Approaches
KGQA Datasets
LLM-Based Analytics
LLM Agentic Frameworks / Autonomy

21 of 32

Reasoning Helps LLM

List all substations that are connected via an AC-line or a DC-line to substation named XYZ

Query is very complex, hard to generate

Add reasoning (OWL-RL-optimized)

Query becomes much simpler

PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>

select ?sub1Name ?lineName ?sub2Name {

{select distinct * {

values ?sub1Name {"ARENDAL"}

?sub1 a cim:Substation;

cim:IdentifiedObject.name ?sub1Name;

(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|

cim:VoltageLevel.Bays)+ / # equipment in ?sub1

cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /

cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment /

cim:Equipment.EquipmentContainer ?line. # part of ?line

?line a cim:Line; cim:IdentifiedObject.name ?lineName}}

{select distinct * {

?sub2 a cim:Substation;

cim:IdentifiedObject.name ?sub2Name;

(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|

cim:VoltageLevel.Bays)+ / # equipment in ?sub2

cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /

cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment /

cim:Equipment.EquipmentContainer ?line}}

filter(?sub1 != ?sub2)

}

PREFIX cimex: <https://rawgit2.com/statnett/Talk2PowerSystem/main/demo1/cimex/>

PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>

select ?sub1Name ?lineName ?sub2Name {

values ?sub1Name {"ARENDAL"}

?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;

cimex:connectedThroughPart ?line.

?line a cim:Line; cim:IdentifiedObject.name ?lineName.

?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;

cimex:connectedThroughPart ?line.

filter(?sub1 != ?sub2)

}

cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bays → cimex:hasPart; inverse cimex:isPart
cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipment → cimex:Terminal.Equipment; inverse cimex:Equipment.Terminals
cimex:hasPart+ → cimex:hasPartTransitive; inverse cimex:isPartTransitive
cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment → cimex:connectedTo (symmetric)
cimex:hasPartTransitive / cimex:connectedTo / cimex:isPartTransitive → cimex:connectedThroughPart (symmetric)

22 of 32

NLQ Approaches

RAG and Graph RAG to "ground" the LLM with local information and reduce hallucinations
Named Entity Recognition (NER) and Linking (NEL) over the question to find "anchor" KG entities.

Pertains to:

CGMES/CIM nomenclatures (eg Nuclear maps to a particular code in the assetType code list),
Some Entity types (e.g. locations, transmission lines, substations).

Modern NEL approaches (eg our own CEEL) use ML and LLM for:

Entity disambiguation
Matching synonyms that are not in the KG but can be resolved due to LLM "world knowledge"

Fine-tuning to optimize the quality of LLM on particular types of questions.

Depends on having a comprehensive KGQA dataset.

Evaluation: can't improve what you can't measure
Semantic parsing: understanding the intent and structure of a question is called

Intermediate forms (e.g. S-Expressions, Controlled Natural Language such as Sparklis or Squall) reduce the gap between natural and query language

Different target query languages: Elastic, GraphQL, SPARQL, with their pros and cons.
How to present the KG schema to LLM.

Successful experiments in this direction: ontologies, SOML schemas, Elastic object models.

Specific query hints in the prompt (eg structure of a WHERE clause)

23 of 32

NLQ Research

Zotero bibliography "Ontotext LLM":

Comprehensive research on these topics, especially KGQA datasets.

Ongoing project task to keep abreast of research.

24 of 32

KGQA Datasets

Key question: how to gather a diverse and large enough Competency Question dataset that can be used for both training (fine tuning) and evaluation.

Eg CrunchQA: 250 templates, 30k question instances, SPARQL
Eg InsuranceBot: 63 templates, SPARQL

Approach:

Seed with competency questions to be provided by Statnett:

query logs,
expert contributions.

Scale it up, by making multiple similar question templates, or adding completely new question types.

Use the "Overnight approach" or "QueryBridge" approach
Extract graph patterns of limited complexity from KG schema
Apply operations on certain nodes (e.g. fix it to a selected entity, take a count or other aggregation, etc),
Instantiate question templates by populating parameters (selected KG entities).
Extract answers from database

Correct response consists of:

Question translated to Intermediate Representation (eg S-Expressions, Controlled Natural Language)
Further translated to query in GraphQL or SPARQL
Factual answer from executing the query

Use Paraphrasing to multiply the number of question surface forms (asking the same thing with different words)

25 of 32

CrunchQA Example

question: NLQ with parameter markers in brackets

Several paraphrases

sparql: query translated to specific KG schema

Notice addition of RDF types
Inverse paths
Property path to access characteristic of a nomenclature

intermediate representation

Used as input to query generation
Recorded as comments in the query
1 main chain
2 constraint chains (one fixes the country code, another filters by "employee count" attribute)

params:

Parameters to populate the query
With optional types to select by rank, etc

output:

Output vars
Those plus params are in the select clause
So the same query can be used to find valid combinations of params

26 of 32

LLM-Based Analytics

Using LLM not just to generate queries, but also to do analytics

Generate Python code for charts and maps (see CIM LLM Maps)
Generate Vega Lite charts and maps (e.g. Visualization Generation with Large Language Models: An Evaluation, arxiv arxiv 2401.11255, Jan 2024)

Teach LLM about specialized power-system analytics libraries

Ingest library and API documentation to learn methods and signatures of specialized libraries
Prompt LLM to use them, rather than trying to recreate from scratch.
Combine with CIM SPARQL to provide input
Mesh CIM SPARQL CIM-based objects in method arguments
Organize local execution environment for LLM code interpretation (can't use GPT on Statnett data)

27 of 32

Open LLM on Premise

Fully understand limitation: production data cannot leave Statnett (Azure tenant or physical premises)

Prefer to work with a realistic architecture from the very beginning, to reach TRL 7
Will use Open (hosted on-premise) LLMs for all operations/queries on Statnett instance data

Can use ChatGPT for operations that don't involve Statnett data

Ontologies and other kinds of schemas
Code lists that are part of CGMES (open)
Paraphrasing questions to build up the Q&A dataset
Translating questions to an intermediate representation, GraphQL or SPARQL query
Generating Q&A datasets that can be used for fine-tuning of less capable LLMs

Open LLMs are constantly improving

Meta: Llama 3
Mistral AI: Mistral and Mixtral
Google: Gemini
China: GLM, DeepSeek, Qwen
UAE: Falcon

28 of 32

Reducing Hallucinations

KGs are key for avoiding hallucinations

Use fine-tuning to optimize LLM response. The KGQA dataset that we plan to build (grounded in the CIM schema and CIM data) will play a key role here.
Provide local "grounding" information to the LLM that is relevant to the question at hand
Generate queries to the CIM/CGMES KG (SPARQL or GraphQL), ensuring retrieved information is true
Provide URLs of grounding information, ask the LLM to cite its sources in the final answer.

Grounding information can be provided in different ways:
Retrieval Augmented Generation (RAG): Based on text similarity

GraphDB has a ChatGPT Retrieval Connector to index KG entities and a Talk to Your Graph feature (Q&A box).
We have experience with a variety of vector stores, including Weaviate, Elasticsearch etc.

Graph RAG: based on graph structure. Retrieves potentially relevant KG entities

using NER/NEL,
RAG similarity,
selection by type,
selecting the graph neighborhood of already selected entities.

29 of 32

SPARQL and GraphQL

Our solution is firmly based on SPARQL and/or GraphQL

Procedural graph traversal languages are harder to generate and are not needed for this project.
If specialized navigation/analytics is needed, we can use GraphDB's Graph Path Search

SPARQL or GraphQL? Each has benefits and shortcomings, so we'll research LLM use of both

Pro: SPARQL is powerful and can fetch and calculate any data retrievable from the repository.
Pro: SPARQL supports flexible constructs such as wildcard properties and property paths.
Cons: SPARQL is complex, and fetching deeply nested sub-objects is not easy (use UNION clauses, avoid Cartesian Products)
Cons: ontologies and SHACL shapes are voluminous and not easy for LLM to "understand" and remember. SOML and LinkML are simpler
Pro: GraphQL is easier to write and has a regular query structure
Pro: GraphQL excels at fetching nested sub-objects. Not possible to fetch N child objects per each master with one SPARQL query (sparql-dev #100 JOIN LATERAL or Correlated Subquery)
Pro: ONTO Semantic Objects takes care of sub-objects and other optimizations (during "transpiling" GraphQL to SPARQL)
Neutral: GraphQL supports a WHERE language for basic and nested object filtering. Any specialized construct requires a SPARQL Template wrapped into a GraphQL field name.
Cons: Semantic Objects do not support Aggregation queries. Search supports Aggregation, but through indexing of RDF data in Elastic

We may use both for different use cases

30 of 32

Data Spaces, Writing Research Proposals

HORIZON-CL5-2026-02-D3-19: Innovation solutions for a generative AI-powered digital spine of the EU energy system

Availability of generative Artificial Intelligence (AI) tools for electricity system operators, energy service providers, and households and energy communities to enhance digital and green transformation in energy, mobility, and buildings;
Implementation of decentralised IT solutions based on generative AI to support local grid optimisation, thereby increasing the uptake of renewable energy sources, electric vehicles, and electrification of household and industrial demand at the distribution level;
Increased reliability, resilience, security, and energy efficiency of the energy system through advanced AI and digital tools;
Enhanced knowledge for modernising and operating energy networks, integrating digital services, renewables, and electrification through the use of cutting-edge AI technologies;
Development of smarter demand-side tools for industries and consumers, leveraging AI to optimise energy production and consumption.

Data Spaces will be a major driver for AI and data science in EU. Huge EU investments in:

Sectoral data spaces, including 5+1 Energy Data Spaces and Digital Twin of the EU Grid
Associations (e.g. IDSA, BDVA, Gaia-X, DSSC, DSBA, Catena-X)
Software (e.g. SIMPL: 165M EUR)

Graphwise has experience with:

Dataspace calls, proposals, projects and initiatives
Project Underpin: manufacturing data space (refinery, wind farms). We are tech coordinator
Dataspaces semantic interoperability
Dataspace software and standards

Writing research proposals (e.g. see right)
We are very glad that Statnett is taking interest in these initiatives

31 of 32

Communication, Dissemination, Exploitation

Important activities to ensure the impact and sustainability of the development efforts:

Communication

Write blog posts, attend trade events

Dissemination

Write joint research papers, attend scientific conferences

Exploitation

Work with Statnett application owners and end-users to make the project results more usable and reusable in day-to-day work

Sustainability and Proliferation

Work with other electrical enterprises to attract them for joint exploitation and development

Research Proposals

Research appropriate funding calls, build appropriate consortia, write research proposals.
Also work towards securing GPU funding for joint training/fine-tuning of an electrical-CIM specific LLM.

32 of 32

THANK YOU FOR YOUR TIME!