1 of 16

Integrating Folksonomies with the Semantic Web

Lucia Specia and Enrico Motta

2 of 16

Abstract

  • Tags ?
    • serve primarily for indexing purpose.
    • facilitates search and navigation of resources.
    • yields a collective classification schema when the same tags are used by more than one individual can .

  • This paper presents an approach for making explicit the semantics behind the tag space in social tagging systems, so that we can define concepts and ontologies.
  • focuses on the relationships amongst tags and their mapping into formal concepts in ontologies
  • How is it achieved? - shallow pre-processing strategies and statistical techniques along with ontologies available on the semantic web.

3 of 16

Introduction

  • Social tagging/bookmarking systems (Flickr, Del.ici.ous) have resulted in a crowdsourced evolving ontology - Folksonomy.
  • Key elements of such a system: Users, Resources, Tags.
  • Paper deals with collective purpose of tags and how they relate with the resources, users and themselves.
  • Tag disambiguation, Visualization (clustering), Tag suggestion (similarity) are key applications that result from the ontology.

4 of 16

Problems

  • ambiguity
  • lack of synonymy and
  • discrepancies in granularity

5 of 16

Introduction..

  • This can also be used for ontology evolution and population.
  • ontologies can be used to structure folksonomies semantically
  • Dynamic knowledge from folksonomies can be used for ontology evolution.
  • How this can be done?(steps)

6 of 16

Related Work

  • Search using similar images(Rely on context) (Aurnhammer et al.)
  • Subsumption-based model (Schmitz, 2006)
  • Probabilistic models (Wu et al.)
  • Tripartite model (Mika)
  • Association rules (Schmitz et al. as seen earlier)
  • Previous methods lack good tag pre-processing methods.
  • Better strategies to cleanup tags needed.

7 of 16

Integrating Folksonomies with the Semantic Web - what does it mean?

In short, the paper deals with integrating structure created by the people who use social bookmarks to come up with a clear system that matches content to topics on the basis of the underlying semantics.

8 of 16

Datasets

  • Similar to the paper that introduced the concept of Folksonomy, this paper also uses the data from Flickr and del.icio.us.

  • Reason: abundance of users, availability of semantically tagged content.

9 of 16

Methodology

  • Unsupervised - does not assume any previously defined mapping/tagging.
  • Three key steps:
    • Pre-processing,
    • Clustering
    • Concept/Relation identification.

10 of 16

Pre-processing

  • Filter out unusual tags (tags that do not fit a certain definition -- noise removal)
  • Use Levenshtein distance to group morphologically similar tags (plurals, misspellings, special characters)
  • Filter out infrequent tags - those that have frequency less than a threshold - outlier removal.

11 of 16

Clustering

  • Performed to identify groups of similar tags.
  • Similarity is defined based on co-occurrence (similar to collaborative filtering)
  • Uses angular separation as the distance measure:

12 of 16

Interpreting clustering results

  • Similar tags get ranked close to each other and they show some latent semantic similarity (audio,music,mp3 all got clustered together).
  • Improving clusters: by removing all subsets, by merging clusters with minimal differences.

13 of 16

Concept and Relation Identification

  • Post each possible pair of tags to existing sources like Wikipedia or other search engines.
  • Tags that are not found together (those that don’t return any search results) are eliminated.
  • If tags are found together use it as evaluation metric. Once tags are found together build ontology - hierarchy, ancestry.
  • Personal comment: This use of external resource seems computationally expensive and recursive.

14 of 16

Experiments and Discussion

15 of 16

Experiments and Discussion

  • Relationships between tags in a cluster
  • Arrows representing subclass relationships

16 of 16

Conclusions and Future Work

  • Integrated semantic web and folksonomies
  • Experiments performed on data from Flickr and del.icio.us
  • Meaningful clusters of tags created.
  • Better clustering techniques can be used
  • Using semantic web search engines to build better models in future.