1 of 32

Statnett-LLM: Talk2PowerSystem

Democratizing power system analytics

2 of 32

Presentation Agenda

  • CIM Background
  • Statnett and Graphwise
  • Talk2PowerSystem Project
  • Dissemination, Exploitation, Sustainability

3 of 32

CIM Background

  • Electricity is one of the early domains to use semantic technologies, and an early case of using RDF
    • 1990s: started in EPRI
    • 2000-now: IEC standards
    • 2010-now: ENTSO-E, CIM UG
  • IEC electrical CIM and ENTSO-E CGMES
    • Comprehensive UML models of the electrical enterprise
    • From which RDFS/OWL ontologies and SHACL shapes are derived.

4 of 32

CIM Background

  • Smart energy Grid Architecture Model (SGAM)
  • CIM is the foundation of managing the electrical enterprise
    • The electrical grid (Transmission System Operators): planning, building, maintaining, balancing
    • Distribution (Distribution System Operators): CRM, sales, work orders, maintenance, metering…
    • Markets: information, transparency, operations
    • Network Codes (EU Regulations): data model/exchange
  • Organizations developing CIM/CGMES
    • IEC TC57 CIM (WG13, WG14 and WG16)
    • UCA CIMug
    • ENTSO-E CIM WG

5 of 32

CIM Examples (1)

  • UML model
    • Abstract classes, eg Identified Object, PSR, Conducting Equipment
    • Concrete classes, eg Breaker, ACLineSegment
    • Terminals
    • ConnectivityNodes
  • Volumetrics (CIM+NC)
    • Over 900 classes
    • Over 5300 properties�(3700 attributes, 750 pairs of inverse relations)
    • Constantly growing and evolving

6 of 32

CIM Examples (2)

Electrical diagram vs RDF instance diagram (EDF). Gray blobs: connectivity nodes; blue dots: terminals

7 of 32

CIM Examples (3)

Electrical diagram vs conceptual RDF instance diagram (CIM Primer by EPRI)

8 of 32

Statnett

  • Statnett
    • Norwegian TSO
    • Leader in applying and extending CIM/CGMES
    • Have been using GraphDB for 6-7 years (first demo of GraphDB and CIM at CIM Users Group Spring 2013 Meeting, Ljubljana, Slovenia)
    • Tens of GraphDB repositories, hundreds of named graphs (CIM models), largest ones are over 2B triples
  • Statnett-LLM tender
    • Aug 2023: RFI started
    • Apr 2024: RFP submitted
    • Oct 2024: Negotiation and Best & Final Offer
    • Dec 2024: signed

9 of 32

Relevant Graphwise Capabilities

  • KG building: analysis, cleaning and harmonization, ontology reuse and engineering, semantic data models, semantic data integration through ETL and NoETL, semantic transformations and generation from declarative models, data loading and update scenarios, querying and analytics.
  • SHACL shapes and validation
  • On-premise and cloud deployment architectures (including Azure)
  • Automation (Docker, Kubernetes, Helm), monitoring (Kibana, Grafana).
  • Electrical ontologies (CIM, CGMES, SAREF4ENER, ENTSO-E market data)
    • Application in commercial projects (Statnett, Svenska Kraftnat, EDF, ENTSO-E, etc)
  • BIM models and ontologies (IFC, LBD ontologies, Bricks Schema, RealEstateCore, ASHRAE 223p),
    • Application in commercial products (Schneider Electric and Johnson Controls BMS, semantic asset management through partnerships)
    • Research (ACCORD project on automated compliance checking)
  • ML for semantic NLP tasks, entity matching in commercial domains (eg companies, investments) and life sciences, graph embeddings and KG completion
  • LLM applications and demonstrators for semantic NLP tasks, ontology and KG building, querying, etc.
    • AI in Action series for blogs, experiments and technical contributions on these topics.
    • 3 LLM Innovation Groups established: Build your KG, Link to your KG, Talk to your KG
  • Data Spaces: semantization and interoperation of data spaces, IDSA task forces, manufacturing data space, FAIR data.
  • Research proposal writing: over 60 EU research projects, participated in writing over 500 proposals

10 of 32

Graphwise Demonstrators

  • Transparency Energy Knowledge Graph (TEKG)
    • ENTSO-E Transparency data
    • Semantic Integration: power plant databases,
    • Advanced validation with SHACL
  • Semantic bSDD
    • GraphQL querying
    • Schema diagrams
    • LLM over GraphQL
  • CIM Demo
    • CGMES data
    • CIM LLM NLQ over GraphQL
    • CIM Map with LLM

11 of 32

TEKG Advanced Validation

Even Statnett EIC record has a mistake: GS1 GLN instead of VAT

12 of 32

TEKG Maps and Charts

Due to OpenStreetMap integration we can show detailed plant outline and in some cases even the individual generation units

13 of 32

CIM Demo

  • Internal links: SPARQL, Platform, GraphQL, github: queries, schema, simplified schema
  • CGMES: turtle 1.56M (Equipment*+Geography*+model), SOML 260k (16x shorter); simplified 37k (42x shorter)

objects:

AccumulatorReset:

descr: This command reset the counter value to zero

inherits: ControlInterface

label: AccumulatorReset

props:

accumulatorReset.AccumulatorValue: {}

type: cim:AccumulatorReset

ControlInterface:

descr: Abstract superclass of Control

inherits: IdentifiedObjectInterface

kind: abstract

search: {nested: true}

props:

control.PowerSystemResource: {}

properties:

accumulatorReset.AccumulatorValue:

descr: The accumulator value that is reset by the command

inverseOf: accumulatorValue.AccumulatorReset

kind: object

label: AccumulatorValue

max: 1

min: 1

range: AccumulatorValue

rdfProp: cim:AccumulatorReset.AccumulatorValue

control.PowerSystemResource:

descr: 'The controller outputs used to...'

inverseOf: powerSystemResource.Controls

kind: object

label: PowerSystemResource

max: inf

min: 0

range: PowerSystemResourceInterface

rdfProp: cim:Control.PowerSystemResource

AccumulatorReset:

ISA: ControlInterface

accumulatorReset_AccumulatorValue: AccumulatorValue

ControlInterface:

ISA: IdentifiedObjectInterface

control_PowerSystemResource: [PowerSystemResourceInterface]

14 of 32

CIM GraphQL Querying

Much simpler than SPARQL; if we shorten prop names becomes more natural and easier for LLM

query psrWithLocationPointsAndVoltage {

aCLineSegment(where: { powerSystemResource_Location: {} }) {

identifiedObject_name

identifiedObject_description

powerSystemResource_Location {

location_PositionPoints(orderBy: { positionPoint_sequenceNumber: ASC }) {

positionPoint_xPosition

positionPoint_yPosition

positionPoint_sequenceNumber

}

}

conductingEquipment_BaseVoltage {

baseVoltage_nominalVoltage

}

}

substation(where: { powerSystemResource_Location: {} }) {

identifiedObject_name

powerSystemResource_Location {

location_PositionPoints {

positionPoint_xPosition

positionPoint_yPosition

}

}

}

}

query psrWithLocationPointsAndVoltage {

aCLineSegment(where: { location: {} }) {

name

description

location {

positionPoints(orderBy: { sequenceNumber: ASC }) {

xPosition

yPosition

sequenceNumber

}

}

baseVoltage {

nominalVoltage

}

}

substation(where: { location: {} }) {

name

location {

positionPoints {

xPosition

yPosition

}

}

}

}

15 of 32

CIM LLM Querying and Code Gen

  • LLM generated GraphQL query "PSR with location and voltage"
  • Generated Python script (using Folium library) to make interactive map
  • Needed plenty guidance/dialog.

import json

import folium

# open JSON file

with open('locations-result.json', 'r') as file:

locations_data = json.load(file)

# Initialize a map

map = folium.Map(location=[63, 13], zoom_start=5)

# Process and plot AC line segments

for line in locations_data["data"]["aCLineSegment"]:

points = line["powerSystemResource_Location"]["location_PositionPoints"]

voltage = line["conductingEquipment_BaseVoltage"]["baseVoltage_nominalVoltage"]

tooltip = f'{line["identifiedObject_name"]}: {line["identifiedObject_description"]}'

color = 'blue' if voltage <= 300 else 'red'

folium.PolyLine(

[(float(p["positionPoint_yPosition"]), float(p["positionPoint_xPosition"]))

for p in points],

color=color,

tooltip=tooltip).add_to(map)

# Process and plot substations

for sub in locations_data["data"]["substation"]:

point = sub["powerSystemResource_Location"]["location_PositionPoints"][0]

tooltip = sub["identifiedObject_name"]

folium.Marker(

location=[float(point["positionPoint_yPosition"]), float(point["positionPoint_xPosition"])],

icon=folium.Icon(color='green', icon='bolt', prefix='fa'),

# icon_size=(10, 10), icon_color='green': causes no bubble, so the marker appears off-point

tooltip=tooltip).add_to(map)

# Save the map to an HTML file

map.save('locations-show.html')

16 of 32

Talk2PowerSystem Project Overview

  • Aims to "democratise power system analytics" by:
    • Enabling Natural Language Querying (NLQ) over electrical KGs through SPARQL (and later GraphQL) generation
    • Related use cases such as:
      • integration with Statnett enterprise systems,
      • time-series processing,
      • code generation for power system analytics libraries.
  • Creating a comprehensive Q&A dataset for electricity
    • Original questions, multiplied by "variations" and "paraphrases", parameterized, instantiated from KG data
    • Will likely store it in RDF (e.g. by extending the QADO ontology).
  • LLM models:
    • Currently working with OpenAI models
    • But also plan to work with local LLMs, and perhaps train an Electricity-specific model.

17 of 32

Project Stats and Work Breakdown

  • Timeline: kickoff Feb 2024, finishes Apr 2026 (14m).
  • Effort: 42p/m Graphwise, 18p/m Statnett
  • Monthly sprints, quarterly milestones

18 of 32

Project Requirements

  • Open LLM on Premise
  • Reducing Hallucinations
  • Open Software and IP
  • CIM Standards
  • BIM-CIM Integration
  • System Architecture
  • SPARQL and GraphQL
  • TRL7 Deployment
  • LLM Usage
  • Understandability of Results
  • CIM Navigation
  • Semantic Integration of non-CIM data (BIM, Time Series)
  • Integration in Enterprise IT Infrastructure
  • CIM and CGMES Versioning
  • Data Spaces, Writing Research Proposals
  • Joint R&D Publications

19 of 32

Project Tasks

  • Semantic database (GraphDB) and its LLM integrations
  • SPARQL generation
  • Extra indexes (eg Autocomplete, GeoSPARQL),
  • LLM Tools (eg Object Identification, Property Disambiguation, Value Disambiguation)
  • GraphQL access to KGs (Semantic Objects, Semantic Search)
  • CIM adjustments (eg attaching datatypes to literals)
  • KG building, including CIM RDFization, Conversion of BIM to CIM
  • Competency questions, i.e. creation of Q&A train/test/validate datasets (KGQA approaches)
  • Open (locally deployable) LLM fine tuning
  • LLM-based NLQ: Graph RAG, SPARQL generation, GraphQL generation
  • LLM-based analytics through source code generation (if needed)
  • NLQ provenance (links to source info) and validation
  • Machine Learning and ML ops
  • Software development (consumption APIs)
  • IT architecture and infrastructure (Docker, Kubernetes, Helm, Azure, software deployment)
  • Dissemination, Communication, Exploitation, Sustainability
  • Data Spaces strategies
  • Writing research proposals
  • CIM model visualization
  • Project management and client communication

20 of 32

Key Project Approaches

  • Reasoning
  • CIM Data Adjustments and Corrections
  • NLQ Approaches
  • KGQA Datasets
  • LLM-Based Analytics
  • LLM Agentic Frameworks / Autonomy

21 of 32

Reasoning Helps LLM

  • List all substations that are connected via an AC-line or a DC-line to substation named XYZ

Query is very complex, hard to generate

Add reasoning (OWL-RL-optimized)

Query becomes much simpler

PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>

select ?sub1Name ?lineName ?sub2Name {

{select distinct * {

values ?sub1Name {"ARENDAL"}

?sub1 a cim:Substation;

cim:IdentifiedObject.name ?sub1Name;

(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|

cim:VoltageLevel.Bays)+ / # equipment in ?sub1

cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /

cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment /

cim:Equipment.EquipmentContainer ?line. # part of ?line

?line a cim:Line; cim:IdentifiedObject.name ?lineName}}

{select distinct * {

?sub2 a cim:Substation;

cim:IdentifiedObject.name ?sub2Name;

(cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|

cim:VoltageLevel.Bays)+ / # equipment in ?sub2

cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode /

cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment /

cim:Equipment.EquipmentContainer ?line}}

filter(?sub1 != ?sub2)

}

PREFIX cimex: <https://rawgit2.com/statnett/Talk2PowerSystem/main/demo1/cimex/>

PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>

PREFIX sesame: <http://www.openrdf.org/schema/sesame#>

select ?sub1Name ?lineName ?sub2Name {

values ?sub1Name {"ARENDAL"}

?sub1 a cim:Substation; cim:IdentifiedObject.name ?sub1Name;

cimex:connectedThroughPart ?line.

?line a cim:Line; cim:IdentifiedObject.name ?lineName.

?sub2 a cim:Substation; cim:IdentifiedObject.name ?sub2Name;

cimex:connectedThroughPart ?line.

filter(?sub1 != ?sub2)

}

  • cim:EquipmentContainer.Equipments|cim:Substation.VoltageLevels|cim:VoltageLevel.Bayscimex:hasPart; inverse cimex:isPart
  • cim:Terminal.ConductingEquipment|cim:Terminal.AuxiliaryEquipmentcimex:Terminal.Equipment; inverse cimex:Equipment.Terminals
  • cimex:hasPart+cimex:hasPartTransitive; inverse cimex:isPartTransitive
  • cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipmentcimex:connectedTo (symmetric)
  • cimex:hasPartTransitive / cimex:connectedTo / cimex:isPartTransitivecimex:connectedThroughPart (symmetric)

22 of 32

NLQ Approaches

  • RAG and Graph RAG to "ground" the LLM with local information and reduce hallucinations
  • Named Entity Recognition (NER) and Linking (NEL) over the question to find "anchor" KG entities.
    • Pertains to:
      • CGMES/CIM nomenclatures (eg Nuclear maps to a particular code in the assetType code list),
      • Some Entity types (e.g. locations, transmission lines, substations).
    • Modern NEL approaches (eg our own CEEL) use ML and LLM for:
      • Entity disambiguation
      • Matching synonyms that are not in the KG but can be resolved due to LLM "world knowledge"
  • Fine-tuning to optimize the quality of LLM on particular types of questions.
    • Depends on having a comprehensive KGQA dataset.
  • Evaluation: can't improve what you can't measure
  • Semantic parsing: understanding the intent and structure of a question is called
    • Intermediate forms (e.g. S-Expressions, Controlled Natural Language such as Sparklis or Squall) reduce the gap between natural and query language
  • Different target query languages: Elastic, GraphQL, SPARQL, with their pros and cons.
  • How to present the KG schema to LLM.
    • Successful experiments in this direction: ontologies, SOML schemas, Elastic object models.
  • Specific query hints in the prompt (eg structure of a WHERE clause)

23 of 32

NLQ Research

Zotero bibliography "Ontotext LLM":

Comprehensive research on these topics, especially KGQA datasets.

Ongoing project task to keep abreast of research.

24 of 32

KGQA Datasets

Key question: how to gather a diverse and large enough Competency Question dataset that can be used for both training (fine tuning) and evaluation.

  • Eg CrunchQA: 250 templates, 30k question instances, SPARQL
  • Eg InsuranceBot: 63 templates, SPARQL

Approach:

  • Seed with competency questions to be provided by Statnett:
    • query logs,
    • expert contributions.
  • Scale it up, by making multiple similar question templates, or adding completely new question types.
    • Use the "Overnight approach" or "QueryBridge" approach
    • Extract graph patterns of limited complexity from KG schema
    • Apply operations on certain nodes (e.g. fix it to a selected entity, take a count or other aggregation, etc),
    • Instantiate question templates by populating parameters (selected KG entities).
    • Extract answers from database
  • Correct response consists of:
    • Question translated to Intermediate Representation (eg S-Expressions, Controlled Natural Language)
    • Further translated to query in GraphQL or SPARQL
    • Factual answer from executing the query
  • Use Paraphrasing to multiply the number of question surface forms (asking the same thing with different words)

25 of 32

CrunchQA Example

  • question: NLQ with parameter markers in brackets
    • Several paraphrases
  • sparql: query translated to specific KG schema
    • Notice addition of RDF types
    • Inverse paths
    • Property path to access characteristic of a nomenclature
  • intermediate representation
    • Used as input to query generation
    • Recorded as comments in the query
    • 1 main chain
    • 2 constraint chains (one fixes the country code, another filters by "employee count" attribute)
  • params:
    • Parameters to populate the query
    • With optional types to select by rank, etc
  • output:
    • Output vars
    • Those plus params are in the select clause
    • So the same query can be used to find valid combinations of params

26 of 32

LLM-Based Analytics

  • Using LLM not just to generate queries, but also to do analytics
    • Generate Python code for charts and maps (see CIM LLM Maps)
    • Generate Vega Lite charts and maps (e.g. Visualization Generation with Large Language Models: An Evaluation, arxiv arxiv 2401.11255, Jan 2024)
  • Teach LLM about specialized power-system analytics libraries
    • Ingest library and API documentation to learn methods and signatures of specialized libraries
    • Prompt LLM to use them, rather than trying to recreate from scratch.
    • Combine with CIM SPARQL to provide input
    • Mesh CIM SPARQL CIM-based objects in method arguments
    • Organize local execution environment for LLM code interpretation (can't use GPT on Statnett data)

27 of 32

Open LLM on Premise

  • Fully understand limitation: production data cannot leave Statnett (Azure tenant or physical premises)
    • Prefer to work with a realistic architecture from the very beginning, to reach TRL 7
    • Will use Open (hosted on-premise) LLMs for all operations/queries on Statnett instance data
  • Can use ChatGPT for operations that don't involve Statnett data
    • Ontologies and other kinds of schemas
    • Code lists that are part of CGMES (open)
    • Paraphrasing questions to build up the Q&A dataset
    • Translating questions to an intermediate representation, GraphQL or SPARQL query
    • Generating Q&A datasets that can be used for fine-tuning of less capable LLMs
  • Open LLMs are constantly improving
    • Meta: Llama 3
    • Mistral AI: Mistral and Mixtral
    • Google: Gemini
    • China: GLM, DeepSeek, Qwen
    • UAE: Falcon

28 of 32

Reducing Hallucinations

  • KGs are key for avoiding hallucinations
    • Use fine-tuning to optimize LLM response. The KGQA dataset that we plan to build (grounded in the CIM schema and CIM data) will play a key role here.
    • Provide local "grounding" information to the LLM that is relevant to the question at hand
    • Generate queries to the CIM/CGMES KG (SPARQL or GraphQL), ensuring retrieved information is true
    • Provide URLs of grounding information, ask the LLM to cite its sources in the final answer.
  • Grounding information can be provided in different ways:
  • Retrieval Augmented Generation (RAG): Based on text similarity
    • GraphDB has a ChatGPT Retrieval Connector to index KG entities and a Talk to Your Graph feature (Q&A box).
    • We have experience with a variety of vector stores, including Weaviate, Elasticsearch etc.
  • Graph RAG: based on graph structure. Retrieves potentially relevant KG entities
    • using NER/NEL,
    • RAG similarity,
    • selection by type,
    • selecting the graph neighborhood of already selected entities.

29 of 32

SPARQL and GraphQL

  • Our solution is firmly based on SPARQL and/or GraphQL
    • Procedural graph traversal languages are harder to generate and are not needed for this project.
    • If specialized navigation/analytics is needed, we can use GraphDB's Graph Path Search
  • SPARQL or GraphQL? Each has benefits and shortcomings, so we'll research LLM use of both
    • Pro: SPARQL is powerful and can fetch and calculate any data retrievable from the repository.
    • Pro: SPARQL supports flexible constructs such as wildcard properties and property paths.
    • Cons: SPARQL is complex, and fetching deeply nested sub-objects is not easy (use UNION clauses, avoid Cartesian Products)
    • Cons: ontologies and SHACL shapes are voluminous and not easy for LLM to "understand" and remember. SOML and LinkML are simpler
    • Pro: GraphQL is easier to write and has a regular query structure
    • Pro: GraphQL excels at fetching nested sub-objects. Not possible to fetch N child objects per each master with one SPARQL query (sparql-dev #100 JOIN LATERAL or Correlated Subquery)
    • Pro: ONTO Semantic Objects takes care of sub-objects and other optimizations (during "transpiling" GraphQL to SPARQL)
    • Neutral: GraphQL supports a WHERE language for basic and nested object filtering. Any specialized construct requires a SPARQL Template wrapped into a GraphQL field name.
    • Cons: Semantic Objects do not support Aggregation queries. Search supports Aggregation, but through indexing of RDF data in Elastic
  • We may use both for different use cases

30 of 32

Data Spaces, Writing Research Proposals

  • HORIZON-CL5-2026-02-D3-19: Innovation solutions for a generative AI-powered digital spine of the EU energy system
    • Availability of generative Artificial Intelligence (AI) tools for electricity system operators, energy service providers, and households and energy communities to enhance digital and green transformation in energy, mobility, and buildings;
    • Implementation of decentralised IT solutions based on generative AI to support local grid optimisation, thereby increasing the uptake of renewable energy sources, electric vehicles, and electrification of household and industrial demand at the distribution level;
    • Increased reliability, resilience, security, and energy efficiency of the energy system through advanced AI and digital tools;
    • Enhanced knowledge for modernising and operating energy networks, integrating digital services, renewables, and electrification through the use of cutting-edge AI technologies;
    • Development of smarter demand-side tools for industries and consumers, leveraging AI to optimise energy production and consumption.
  • Data Spaces will be a major driver for AI and data science in EU. Huge EU investments in:
    • Sectoral data spaces, including 5+1 Energy Data Spaces and Digital Twin of the EU Grid
    • Associations (e.g. IDSA, BDVA, Gaia-X, DSSC, DSBA, Catena-X)
    • Software (e.g. SIMPL: 165M EUR)
  • Graphwise has experience with:
    • Dataspace calls, proposals, projects and initiatives
    • Project Underpin: manufacturing data space (refinery, wind farms). We are tech coordinator
    • Dataspaces semantic interoperability
    • Dataspace software and standards
  • Writing research proposals (e.g. see right)
  • We are very glad that Statnett is taking interest in these initiatives

31 of 32

Communication, Dissemination, Exploitation

Important activities to ensure the impact and sustainability of the development efforts:

  • Communication
    • Write blog posts, attend trade events
  • Dissemination
    • Write joint research papers, attend scientific conferences
  • Exploitation
    • Work with Statnett application owners and end-users to make the project results more usable and reusable in day-to-day work
  • Sustainability and Proliferation
    • Work with other electrical enterprises to attract them for joint exploitation and development
  • Research Proposals
    • Research appropriate funding calls, build appropriate consortia, write research proposals.
    • Also work towards securing GPU funding for joint training/fine-tuning of an electrical-CIM specific LLM.

32 of 32

THANK YOU FOR YOUR TIME!

THANK YOU FOR YOUR TIME!