1 of 14

Knowledge Architecture for Organisations

2 of 14

Knowledge Architecture for Organisations - Overview

  • Earlier: Fundamentals on ontologies, schemas, RDF, etc.
  • Now: How can we build an architecture for utilizing these tools in real-life?
    • We will explore this by looking at an Abstract Reference Architecture (ARA) for knowledge architecture
      • Why? It is impossible to propose a single architecture that fits all use-cases of knowledge graphs
    • Focus of architecture, details in later chapters

3 of 14

Architectural overview

  • The book identifies the architecture as having three main layers:
    • Knowledge Acquisition and Integration Layer
      • Creating the graph
    • Knowledge Storage layer
      • Storing the graph
    • Knowledge Consumption layer
      • Using the graph

4 of 14

Architectural overview

5 of 14

Acquisition and Integration Layer

  • Ontology Development
  • Choosing a vocabulary
    • Lightweight vs Heavyweight
      • Lightweight: Without formal definitions = Easy to build
      • Heavyweight: Includes formal definitions = Harder to build
    • If you choose a heavyweight vocabulary the books suggests using a methodology
      • Defines steps to carry when developing the ontology
      • Suggests methods/tools to carry out these steps
      • Example include METHONTOLOGY, Diligent, HCOME

6 of 14

Acquisition and Integration Layer

  • Text Integration
  • Text is usually integrated into the knowledge graph by two methods:
    • Named Entity Resolution
      • Involves extracting mentions of an entity in the knowledge graph from the text
      • Example:�“Tim Cook took over as CEO of Apple
    • Thematic Scope Resolution
      • Figure out what the text is actually talking about
      • Different from NER: Extracting the mentions not enough

7 of 14

Knowledge Storing and Accessing Layer

  • Overview
  • Choosing a format for storing your ontological data
    • Organizations often store their data in “silos”
      • Ie. many different systems that are not interconnected
    • Integrating an organization’s solos is known as the data integration problem
    • The book introduces three ways of storing ontological information:
      • Ontology-Based Data Access (OBDA)
      • RDF Stores
      • Property Graph-Based Stores

8 of 14

Knowledge Storing and Accessing Layer

  • Ontology-Based Data Access (OBDA)
  • In OBDA we store the data in their original databases
  • Implementation:
    • Separate data (ABox) and semantics (TBox)
    • TBox keeps track of the conceptual model
    • ABox extends the conceptual model into�The data sources
    • ABox can be virtual or materialized
      • Virtual retrieves instances directly�from data sources
      • Materialized retrieves instances from�A triple store that is updated with �data from data sources

9 of 14

Knowledge Storing and Accessing Layer

  • RDF Stores
  • RDF Stores are database systems specifically created to store triples (Subject, Predicate, Object)
  • Built specifically for “data volume, bulk loading speed and query answering efficiencies”
  • There are many database techniques that can be used to implement a RDF Store:
    • Relational databases
    • Native triple stores
    • Graph Databases
    • NoSQL
    • Etc.
  • What is the difference between RDF Stores and NoSQL databases?
    • NoSQL are popular for storing RDF data, however they lack some benefits of RDF Stores
    • NoSQL databases are built for very lightweight schemas, RDF Stores can handle both light and heavier schemas
    • RDF Stores are based on W3C standards and built for using the web as a platform

10 of 14

Knowledge Storing and Accessing Layer

  • Property Graph-Based Stores
  • A property graph is simply a graph where nodes and edges can have multiple properties
  • Can be used to store RDF graphs
  • Similarly RDF graphs can be made to store property graphs
  • Representing (schemaless) RDF graphs in property graphs:
    • Properties of edges must be labels
    • Properties of nodes expressed as triples with {subject, key, value}

11 of 14

Knowledge Storing and Accessing Layer

  • Comparison
  • OBDA
    • Lightweight, doesn’t require creating new data structures
    • May not be suited for data intensive tasks
  • Relational Databases
    • Schema requirement means system always knows the structure of the graph
    • Hard to change or add to schema after it has been defined
    • Expensive joins when exploring large parts of graph
  • RDF Stores / Property Graph
    • Schemaless means easy to add new properties / change the schema
    • Cheap exploration of graph

12 of 14

Knowledge Consumption Layer

  • Semantic Search
  • Traditional search
    • Indexing documents
    • Query documents
  • Semantic search
    • Extends traditional search, but provides additional benefits
    • During indexing it can disambiguate entities
    • During search it can use help disambiguate user queries
      • Did you mean “Apple (Company)” or “Apple (Fruit)”?

13 of 14

Knowledge Consumption Layer

  • Summarization
  • Knowledge graphs can be summarized to give an overview of the information they contain
  • Entity summary
    • Summary of single entity in graph
    • Entity card
  • Graph summary
    • Summary of whole or part of graph
  • Goal-driven Graph Profiling
    • Custom summary of nodes relevant to some task
  • Graph Analytics
    • Finding interesting patterns in the graph

14 of 14

Knowledge Consumption Layer

  • Query Generation and Answering
  • Query generation
    • Users aren’t always familiar with the graph and/or tools they are using
    • Help users create the queries they want by helping the user understand the content/structure of the graph
  • Query answering
    • Traditional IR systems only return lists of documents as answers to query
    • Using a knowledge graph we can more directly answer a user query
    • “How old is Tim Cook?”
      • Traditional IR: Here is a list of documents containing the text “How old is Tim Cook”
      • With knowledge graph: “Tim Cook is 42 years old”