1 of 34

From Data Models to Knowledge Systems

�Reasoning and Versioning in CIM

Svein Harald Olsen (Statnett), Vladimir Alexiev (Graphwise)��

CIMug Meeting, Cologne, 30 Apr 2026

Åpen informasjon / Public information

2 of 34

Åpen informasjon / Public information

3 of 34

  • Graphwise (Ontotext + Semantic Web Company):�Largest Knowledge Graph and GenAI company world-wide
  • 16.5 years of semantic experience, 5.5 years with CIM
  • Chief Data Architect; KG Engineer (building industrial KGs)
  • Lead of Industrial Data group (part of Innovation team)
  • Experience with semantic industrial standards:
    • CIM, CGMES, NCP
    • GS1 EPCIS, GPC, WebVoc
    • AAS, ECLASS, IEC CDD, DPP
    • DCAT, ODRL, Data Spaces
  • And making connections between them!

Åpen informasjon / Public information

4 of 34

Why This Matters

  • CIM has been very successful as a data exchange model — but we are now having a stronger need for knowledge system.

  • Increasing system complexity:

Advanced control, FATS, HVDC and PowerElectronics

  • Shorter time to market

Faster changes, less time for manuel development processes

  • More data exchanges

TSO–DSO, cross-domain, closer link market and operation

  • Need for explainability, traceability, and automation

Åpen informasjon / Public information

5 of 34

Current Challenges in the CIM Community

Åpen informasjon / Public information

6 of 34

Format vs. Semantics Confusion

  • CIM is too tied to CIMXML and XSD-based XML validation
  • Same model → different formats: CIMXML, Trig, JSON-LD, CSV
  • Semantics is lost in transformations
  • Tooling becomes format-driven instead of model-driven
  • CIM must get a stronger format-independent meaning

Åpen informasjon / Public information

7 of 34

The "Perfect Data or Nothing" Problem

  • Traditional systems reject invalid data
  • Reality: data is incomplete, inconsistent, evolving
  • Must ingest it, validate it, track data issues to resolution
  • No way to represent partial truth, track corrections, or capture provenance
  • RDF allows "imperfect but usable" data due to late and multiple schema binding + provenance + validation layered on top

  • RDF included Named Graphs for 10 years: containers of of statements
  • RDF 1.2 includes RDF-star, enabling concise representation of statements about statements
  • Key for provenance, validation, and description of quality.

Åpen informasjon / Public information

8 of 34

Lack of Explainability

  • CIM models are hard to read, hard to query, not LLM-ready
  • Lots of duplication between CIM Profiles (same terms defined multiple times)
  • Little to no reuse of established ontologies
  • Engineers cannot easily understand data
  • Machines cannot explain results
  • Need self-describing, queryable, explainable models

Åpen informasjon / Public information

9 of 34

Hard to Migrate Between CIM Versions

  • Very hard to move from one to the next version of CIM.
  • No clear understanding of change impact:
    • Which ontology changes are backward compatible vs incompatible
    • Which terms are deprecated (not deleted!)
    • Are the changes breaking for “me” (my data)
  • All integrated systems must migrate to the next CIM version at the same time

Åpen informasjon / Public information

10 of 34

Supporting the Data Lifecycle

  • Models evolve over time
    • Early phases: incomplete, high-level representations
    • Later phases: detailed, complex, and fully specified
  • Not all information is available at all times
  • Different stakeholders require different levels of detail
  • Complex models become difficult to use directly
  • Need for simplified and derived representations

  • “We need to model once — and support multiple levels of understanding across the lifecycle.”

Åpen informasjon / Public information

11 of 34

SchameOps

Åpen informasjon / Public information

12 of 34

SchemaOps: support multiple Ontology versions

Previous

Current

Next

Authoritative source application

Additional data from

Temp authoritative source

Transformation with potential loss

Transformation

Destination application

R&D Project and

cutting edge application

Source/master and

analytic application

Limited scope

application

Åpen informasjon / Public information

13 of 34

Enabling Technologies �and Technical Approach

Åpen informasjon / Public information

14 of 34

Clean Semantics: Fixing the Foundation

  • Instance data suggestions: Inst4CIM-KG/rdf-improved (most of this is well-known)
    • CIMXML is non-standard, requires special parsing
    • RDF XML has no Named Graphs: necessary for CIM Models
    • Literals lack XSD datatypes: important for sorting and range searches
    • URIs are underdefined for lack of xml:base
  • Ontology suggestions: Inst4CIM-KG/rdfs-improved
    • Eliminate duplicated terms between Profiles (only Modules should define terms)
    • Improve class definitions. Example: AC vs. DC confusion hot-fixed with cims:pragmatics
    • cim:Datatypes are annotations on numeric properties, not actual ranges
  • Reuse established ontologies:
    • cim:PositionPoints 🡪 GeoSPARQL
    • cim:Datatypes (e.g. ActivePower), Units, Multipliers 🡪 QUDT QuantityKinds, Units, Prefixes
    • md:Model, dm:DifferenceModel 🡪 DCTerms, DCAT (basis of EU Dataspaces)
    • Enumerated classes 🡪 SKOS (?)
    • Avoid redefining external terms (namespace hijacking)
  • LLMs (and humans!) get confused by imprecise definitions
  • Better definitions = better interoperability + AI usability

Åpen informasjon / Public information

15 of 34

Inst4CIM-KG and ENTSO-E Collaboration

Evolution of CIM semantic representation (posted 160 issues):

  • rdfs-improved
    • improved representation of CIM ontologies (Turtle and JSON-LD)
    • basis for gradual improvement of CIM RDF standards
    • 15 SPARQL updates, namespace fixes
    • renditions of the latest CGMES and CGMES-NC ontologies
    • discuss appropriate reasoning
  • rdf-improved
    • improved representation of CIM instance data (Trig and JSON-LD)
    • conversion from custom CIM XML format to standard Trig, and thereon JSON-LD
    • JSON-LD context; Update to add datatypes to literals
    • Trig and JSON-LD examples in "test"; scaled data in "instances" (9.8B triples)
  • shacl-improved
    • improvement proposals for CIM SHACL validation shapes
    • Including incremental validation scenarios

Åpen informasjon / Public information

16 of 34

CIM Ontology Profiles

Inst4CIM-KG

  • CGMES 3.0.0 IEC 61970-600-2

1345 DiagramLayout

34922 Dynamics

7265 Equipment

1451 EquipmentBoundary

451 GeographicalLocation

179 Header

2024 Operation

2489 ShortCircuit

1527 StateVariables

2325 SteadyStateHypothesis

313 Topology

  • CGMES-NC 2.3

1345 DiagramLayout

34922 Dynamics

7265 Equipment

1451 EquipmentBoundary

451 GeographicalLocation

179 Header

2024 Operation

2489 ShortCircuit

1527 StateVariables

2325 SteadyStateHypothesis

313 Topology

CIM4Enterprise

1721 AssetCatalogue.ttl

1209 Assets.ttl

  • The count is number of lines in Turtle
  • Some terms are repeated 20 times.
    • Reason: CIM Modules (e.g. Core, Wires) are reused multiple times in different profiles (e.g. Equipment)
    • Remedy: profiles should only import modules, not define terms of their own
    • Improve ontology modularity
  • Total terms: 7020 (without Asset profiles)
    • Classes: 929, of which 111 enumerations
    • Object props: 1521
    • Datatype props: 3717
    • Enumeration values: 853
  • So: how to feed this to an LLM?
    • Limited memory (token limit)
    • Limited "attention span", info lost "in the middle"

Åpen informasjon / Public information

17 of 34

Major Inst4CIM-KG fix: Data props, Units, Quantity Kinds

Old representation: data props don't reflect actual instance data

New representation: real Datatype props, unit/quantity attached as annotations, alignment to QUDT

Åpen informasjon / Public information

18 of 34

RDF as the Canonical Representation

  • RDF as data model defines semantics
    • It has various serializations (CIMXML, RDF/XML, Trig, JSON-LD)
    • Format independence
  • Use named graphs to:
    • Package model statements
    • Express dependencies between models
    • Express derived (differential) models
  • Self-describing data (datatyped literals)
  • Full W3C compliance means you can reuse:
    • databases
    • validators
    • parsers, serializers
    • reasoners

Åpen informasjon / Public information

19 of 34

SHACL: Validation Without Rejection

  • Validate without discarding data
  • Quality reporting and structured error handling (sh:ValidationReport)
  • "Invalid data" becomes actionable insight, not rejection
  • SHACL: W3C standard for constraint checking
    • SHACL 1.0 established over 10 years ago, many implementations
    • SHACL 1.2 in active development, will be a lot more powerful
    • Thousands of constraints are generated from CIM UML models, and added by hand
    • Inst4CIM-KG/shacl-improved suggests how to improve them
    • Focus on the standard, not on particular SHACL implementations
  • RDF Graph Databases enable incremental validation:
    • Assume data at rest is valid
    • Focus on changed data in transaction (which can refer to data at rest)
    • Especially important for large CIM KGs, and Difference models

Åpen informasjon / Public information

20 of 34

Transparency Energy KG Advanced Validation�

Åpen informasjon / Public information

21 of 34

Transparency Energy KG Per-Rule Validation Results

Åpen informasjon / Public information

22 of 34

Reasoning: From Data to Knowledge

  • Add inferred knowledge automatically
    • CIM doesn’t define reasoning, Inst4CIM Reasoning discusses what is appropriate
    • SHACL relies on subClassOf reasoning: very significant simplification of constraints
    • CIM has full inverseOf relations: such reasoning simplifies navigation
    • CIM has no subPropertyOf but should: if A<B and C<D (subclass) then A.C<B.D (subprop)
    • CIM has no TransitiveProperty but could: e.g. nested containers/parthood
  • The standard OWL2 RL profile is sufficient
    • "Expansion ratio": (explicit+implicit)/explicit
    • The best databases have incremental inference and ensure truth maintenance
  • Inferring implicit relationships simplifies querying (next)

Åpen informasjon / Public information

23 of 34

A Difficult CIM Query

List all substations connected by a Line to "ARENDAL"

Query is complex, hard for LLM to generate:

  • 3 kinds of Container-Equipment relations
  • Some containers are optional (e.g. Bay in Substation)
  • Line is also considered a container (of ACLineSegments)
  • From Substation to Line: Container-Equipment-Terminal- ConnectivityNode-Terminal-Equipment-Container
  • Then up the other side from Line to Substation

Åpen informasjon / Public information

24 of 34

Add Reasoning

Namespace cimr: for inferred relations (CIM Rules)

  • Union: cim:EquipmentContainer.Equipments | cim:Substation.VoltageLevels|cim:VoltageLevel.Bays 🡪 cimr:hasPart; inverse cimr:isPart
  • Union: cim:Terminal.ConductingEquipment | cim:Terminal.AuxiliaryEquipment 🡪 cimr:Terminal.Equipment; inverse cimr:Equipment.Terminals
  • Transitive closure: cimr:hasPart+ 🡪 cimr:hasPartTransitive; inverse cimr:isPartTransitive
  • Property path: cim:ConductingEquipment.Terminals / cim:Terminal.ConnectivityNode / cim:ConnectivityNode.Terminals / cim:Terminal.ConductingEquipment -> cimr:connectedTo (symmetric)
  • Property path: cimr:hasPartTransitive / cimr:connectedTo / cimr:isPartTransitive -> cimr:connectedThroughPart (symmetric)

Dependency Graph of Relations

Åpen informasjon / Public information

25 of 34

Query is Simplified a Lot

Åpen informasjon / Public information

26 of 34

Versioning Through Semantics

  • Don’t change namespaces between ontology versions
    • The latest should be available at a permanent namespace
    • But each should have its own owl:versionIRI
    • Avoid the need for massive instance data migration
  • W3C DX PROF to describe coherent technical artefacts (Ontologies, SHACL)
  • Instance data (model graphs) should know what version(s) they use
    • dcterms:conformsTo: point to versioned namespaces
    • rsx:DataAndShapesGraphLink: which data graph(s) should be validated against which shape graph(s)
  • Fine-grained capture what’s changed between versions
    • owl:deprecated: don’t use term anymore (class, prop, enum value)
    • dcterms:isReplacedBy: replace with new term
  • Use ontologies, mappings, and reasoning
    • Enables backward and forward compatibility
    • Impact analysis of version changes

Åpen informasjon / Public information

27 of 34

SchemaOps

  • DevOps for data people:
    • ML Ops: model training/tuning, evaluation
    • SchemaOps: ontology+SHACL definition and evolution, impact analysis
    • ModelOps: instance data validation, ingestion, reasoning, incremental ops
  • Treat models like code
    • Versioned
    • Validated
    • Testable
  • Pipeline: define → validate → reason → publish

Åpen informasjon / Public information

28 of 34

Talk2PowerSystem: Bringing It to Life

  • Natural language question-answering over power system data
  • Query generation (text2sparql)
    • CIM ontologies
    • Grid, connectivity, equipment
    • Operational limits, simulation results
    • Geospatial info
    • Diagrams: electrical, VizGraph, geospatial (!)
    • Timeseries queries (!!)
  • Knowledge Graph-backed CIM
  • Explainable answers
  • CIM + KG + LLM → real value
  • From data exchange → to asking questions and getting answers

Åpen informasjon / Public information

29 of 34

Åpen informasjon / Public information

30 of 34

Key Message

Åpen informasjon / Public information

31 of 34

From data exchange to knowledge system

  • Increasing complexity & integration → need for understanding, not just exchange
  • Full data lifecycle → need for flexible, multi-level representations
  • Versioning & imperfect data → need for semantic traceability
  • Explainability → required for humans and machines (AI/LLMs)
  • CIM + Semantic Web (RDF, SHACL, reasoning) → enables a true knowledge system

“Semantic technologies do not replace object-orented development (OOD) tooling — they make the models machine-understandable and better lifecycle and quality support.”

Åpen informasjon / Public information

32 of 34

We are moving from exchanging files…

to sharing understanding.

Åpen informasjon / Public information

33 of 34

Join Us: Final Project Webinar

  • Talk2PowerSystem: Democratizing Power System Analytics via Generative AI
  • 21 May 2026: 3 hours, suitable for both EU and US timezones
  • Presenters
    • Project team members: Statnett, Graphwise
    • External speakers: PNNL, Siemens Energy
  • Panel discussion
  • 400+60 people expected
  • See promotional video to whet your appetite
  • See project Dissemination page for full details on presentations, blogs, etc

Åpen informasjon / Public information

34 of 34

Åpen informasjon / Public information