1 of 37

Linkmaster 3000

if it's not linked, does it even exist?

dbpedia.link/lm3k

Sebastian Hellmann,

Jan Forberg, �Marvin Hofer,

Johannes Frey

dbpedia@infai.org

1

2 of 37

dbpedia.link/lm3k

Sebastian Hellmann, Jan Forberg, �Marvin Hofer, Johannes Frey

dbpedia@infai.org

Linkmaster 3000

if it's not linked, does it even exist?

2

3 of 37

Acknowledgements

3

3

https://tinyurl.com/lm3-semantics22

4 of 37

DBpedia Members (contributed to LM3K )

5 of 37

Small Data

Nobody needs all the data all the time, � Small data is about the right data at the right time

Big data in comparison has a significant collection delay (Big delay), � by the time you collected it, it won’t be up-to-date anymore � Small data can be up-to-date (Small delay)

Infinite number of Small data use cases …� every human needs tiny bits of contextual information � on a couple occasions each day (phone number, food origin, covid, more� Stranger Things episodes?, free apartments close to bilingual kindergartens)

Small data is already plenty available!

5

https://tinyurl.com/lm3-semantics22

6 of 37

6

+

https://tinyurl.com/lm3-semantics22

7 of 37

Linked Data Principles

Tim Berners-Lee (provider guidelines):

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information [...]
  4. Include links to other URIs. so that they can discover more things.

7

https://tinyurl.com/lm3-semantics22

8 of 37

Linked Data Principles

Tim Berners-Lee (provider guidelines):

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information [...]
  • Include links to other URIs. so that they can discover more things.

Principles result in the largest, constantly updating knowledge graph on earth

  • Distributed database (key-value store, keys are resolvable URIs)
  • Anyone can extend it
  • Extremely fast, if cached properly (server- and client-side)

8

https://tinyurl.com/lm3-semantics22

9 of 37

Linked Data Principles

Tim Berners-Lee (provider guidelines):

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information [...]
  • Include links to other URIs. so that they can discover more things.

Links have extremely high value for data consumers (high need)� but creation and maintenance relies on provider (low incentives)� sub-sub-optimal

9

https://tinyurl.com/lm3-semantics22

10 of 37

Linked Data Principles

Tim Berners-Lee (provider guidelines):

  • Use URIs as names for things
  • Use HTTP URIs so that people can look up those names.
  • When someone looks up a URI, provide useful information [...]
  • Include links to other URIs. so that they can discover more things.

4b. Use the Linkmaster 3000 to create, maintain and retrieve links to other URIs �

10

https://tinyurl.com/lm3-semantics22

11 of 37

Use Cases and APIs

11

https://tinyurl.com/lm3-semantics22

12 of 37

InfAI - Small Data Use Case

Create a publication list for all 100 researchers of infai.org

Proposed solution by InfAI website team:

  • We set up a git or wikibase and all researchers copy their publications into it. Whenever somebody publishes, they just have to update it

Linkmaster 3000 approach:

  • Every researchers gets an InfAI URI, which we load into LM3K
  • For each researcher URI, we find exactly ONE LINK (called the Masterlink) to ONE major publication service (DBLP, DNB, ORCID, Bibsonomy.org, etc. )
  • Use link inference engine to find all URIs: LD Discovery API (owl:sameAs)
  • Use all those links to retrieve publication data on the fly and cache it

12

https://tinyurl.com/lm3-semantics22

13 of 37

Links are pointers to more information

Properties:

  • decentral database
  • very heterogeneous nodes

13

https://tinyurl.com/lm3-semantics22

14 of 37

Links are pointers to more information

Properties:

  • decentral database
  • very heterogeneous nodes

14

https://tinyurl.com/lm3-semantics22

15 of 37

Links are pointers to more information

Properties:

  • decentral database
  • very heterogeneous nodes

Useful links:

  • links behave like address pointers
  • enable very precise retrieval

Our APIs:

  • Linked Data Discovery API
  • Linked Data Retrieval and Caching API

15

https://tinyurl.com/lm3-semantics22

16 of 37

Linked Data Discovery API

  • pre-computed SameAs clusters for core namespaces
  • ability to lookup up links:

https://d-nb.info/gnd/4035206-7 (GND Leipzig) contains 3 owl:sameAs Links:

<http://viaf.org/viaf/155929994>, �<http://www.wikidata.org/entity/Q2079>, �<http://id.loc.gov/rwo/agents/n79125883>;

https://global.dbpedia.org/same-thing/lookup/?uri=http://d-nb.info/gnd/4035206-7

contains 118 links, also dbpedia, geonames, musicbrainz

http://139.18.2.173/browse/ (Berlin)

http://139.18.2.173/browse2/

16

https://tinyurl.com/lm3-semantics22

17 of 37

Linked Data Retrieval and Caching API

  • Proxy that tackles heterogeneity, uptime and speed
  • Takes any URI (LD or Schema.org) and delivers RDF in all formats
  • Uniform, stable interface to work with Linked Data + Schema.org

https://d-nb.info/gnd/4035206-7 (GND Leipzig)

doesn’t have CORS enabled, no access via browser

http://api.dbpedia.link/retrieve/proxy?iri=https://d-nb.info/gnd/4035206-7&format=jsonld

17

https://tinyurl.com/lm3-semantics22

18 of 37

Open Online Link Curation Platform

18

https://tinyurl.com/lm3-semantics22

19 of 37

19

https://tinyurl.com/lm3-semantics22

20 of 37

Demo

https://dbpedia.link

curl -H "Accept: application/ld+json" https://dbpedia.link/lm3k/e/dblp.org/pid/47/2811 | json_pp

curl api.dbpedia.link/discover?uri=https://dblp.org/pid/47/2811

http://api.dbpedia.link/retrieve/proxy?iri=http://dbpedia.org/resource/Leipzig

20

https://tinyurl.com/lm3-semantics22

21 of 37

LM3K Target User Groups

Please contribute your links to LM3K

21

Consumers (80%)

Providers (10%)

Prosumers (10%)

Data Scientists, AI, Databases�Integrators

Web publishers (IMDB, libraries, …)

Enrich own data and re-publish

Links are generated locally during everyday work

-> share upstream and help us improve data interlinking

Collaborative Tool (Verification, Statistics, Clean, Download)

Spread URIs and generate in-links (SEO)

Better integrated into the Linked Data Cloud

Find data easier

Keep in-sync

https://tinyurl.com/lm3-semantics22

22 of 37

DBpedia Members

23 of 37

Linkmaster 3000 Consumer vs Provider

23

https://tinyurl.com/lm3-semantics22

24 of 37

Linkmaster 3000 tools

Launch date September 12th, 2022 @SEMANTiCS

  1. Global LD API Link inference, lookup more links�
  2. Global LD Browser Resolve all links for a URI, look at all the data �
  3. LM3K Open Online Link Curation Platform� Utility for LD Providers� Utility for LD Consumers

24

https://tinyurl.com/lm3-semantics22

25 of 37

Linkmaster 3000 for LD Providers

25

https://tinyurl.com/lm3-semantics22

26 of 37

Global LD Browser

  • Resolve all links for a URI, look at all the data

Use external URIs (Keys):

https://global.dbpedia.org/?s=http://d-nb.info/gnd/4035206-7

http://tools.dbpedia.org:9005/?s=http://d-nb.info/gnd/4035206-7

Also mints its own URIs (Global IDs):

https://global.dbpedia.org/id/1yu7p

26

https://tinyurl.com/lm3-semantics22

27 of 37

Linkmaster 3000 for LD Providers

  • Useful for all sizes of LD providers
  • Increases inlinks ! We spread your URIs and will link back from dbpedia.org

27

https://tinyurl.com/lm3-semantics22

28 of 37

Linkmaster 3000 for LD Providers

Provider Todos:�1. Guarantee technical functionality of their Linked Data (#, 303 or schema.org)� Debug tools: CORS, syntax, data quality, FAIRness, dbpedia.link/lm3k

28

https://tinyurl.com/lm3-semantics22

29 of 37

Linkmaster 3000 for LD Providers

Provider Todos:�1. Guarantee technical functionality of Linked Data�2. Regularly upload all your URIs into your LM3K namespace, we store them and� provide downloads for everybody.�

29

https://tinyurl.com/lm3-semantics22

30 of 37

Linkmaster 3000 for LD Providers

Provider Todos:�1. Guarantee technical functionality of Linked Data�2. Regularly upload all your URIs into your LM3K namespace, we store them and� provide downloads for everybody.�3. …. the most important task�

30

https://tinyurl.com/lm3-semantics22

31 of 37

Linkmaster 3000 for LD Providers

Provider Todos:�1. Guarantee technical functionality of Linked Data�2. Regularly upload all your URIs into your LM3K namespace, we store them and� provide downloads for everybody�3. Relax!

Let users and data consumers �do the links!

Pick them up periodically.

31

https://tinyurl.com/lm3-semantics22

32 of 37

Linkmaster 3000 for Data Consumer

32

https://tinyurl.com/lm3-semantics22

33 of 37

Linkmaster 3000 for Data Consumer

Data consumer challenges

  1. Where data????
  2. After download, local linking & deduplication for integration�One time, 60-80% F-Measure (Weighted precision & recall )
  3. Next month link again?

33

https://tinyurl.com/lm3-semantics22

34 of 37

Linkmaster 3000 for Data Consumer

  • agile continuous integration process
    • (not one-time linking, but iterative)
  • re-use and share linking effort with other consumers

Open platform with services on-top:

  • Export plain links (retain your work)
  • Pay for inferred links & convenience (Global LD API)
  • On-premise solutions

34

https://tinyurl.com/lm3-semantics22

35 of 37

Linkmaster 3000 Future Work

Full stack of Link Curation Tools:

  • Recommendation engine (where to link against)
  • Online link services (DBpedia Lookup/Spotlight, OpenRefine APIs)
  • Pairwise Link Validators (automatic and manual)
  • Tracking of link state
  • Compare data in the Global LD Browser

35

https://tinyurl.com/lm3-semantics22

36 of 37

DBpedia Members

37 of 37

DBpedia Members (contributed to LM3K )