1 of 35

Chapter Five

Multimedia

Database System

1

2 of 35

Objectives of today’s class

  • Multimedia Database & Indexing
  • How the search engine works?
  • What is Multimedia Information Retrieval?
  • Retrieval process and its structure
  • Issues that arise in MIR
  • Subsystems of IR system
  • Implementation issues

2

3 of 35

Multimedia Database & Indexing

  • Multimedia database has to deal with large media files.

  • In applications such as digital library, automatic data analysis has to be done to extract semantic meanings from audios, images and videos.

  • Multimedia data needs better data structures, indexing and searching methods.

3

4 of 35

Multimedia Data Management

  • Multimedia data must be stored and managed according to the specific characteristics of the available storage media.
  • Query of multimedia data should base on a descriptive and content-oriented.
    • e.g. “Picture of a woman with a red scarf

4

5 of 35

Google Searching

5

6 of 35

Google Searching

6

7 of 35

Google Searching

7

8 of 35

Content based multimedia retrieval is still an ongoing research topic.

8

9 of 35

…continued

9

10 of 35

Web Search Engines

Top 10 Search Engines In The World

10

11 of 35

Web Search Engines

  • There are more than 2,000 general web search engines. The big four are: Google, Yahoo, Live Search, Ask
  • Multimedia search engine: blinkx
  • Visual search engine:   Web Brain, Redzee, Kartoo
  • Audio/sound search engine: Findsounds
  • Video search engine: YouTube, Trooker
  • Medical search engine: Search Medica, Omnimedicalsearch  
  • There are also Digital Libraries: www virtual library, digital-librarian, librarians internet index

11

12 of 35

��������How the Search Engine Works?��SEO – Search Engine Optimization���

12

13 of 35

Real world example….

13

14 of 35

What are the search engines?

14

A software program designed to identify and respond to specific questions, called keywords and populate the page called as SERP with relevant information available on the web.

15 of 35

…continued

15

16 of 35

SEO – Spider Software

16

17 of 35

SEO – Spider Software

17

18 of 35

What is Multimedia Information Retrieval?

  • Multimedia Information Retrieval deals with representation, storage, organization, and access to information items such that users can have easy access to relevant information that satisfy their information need.
  • Features of a good information retrieval system:-
    • Representation
    • Storage
    • Organization
    • Access
    • Evaluation
    • Information need

18

19 of 35

Structure of an MIR System

  • An Information Retrieval System serves as a bridge between the world of authors and the world of readers/users.
  • That is, writers present a set of ideas in a document using a set of concepts. Then users seek the IR system for relevant documents that satisfy their information need.

  • What is in the Black Box?
    • The black box is the processing part of the information retrieval system. i.e. indexing and searching.

19

20 of 35

Overview of the Retrieval Process

20

Multimedia Data

21 of 35

…retrieval process

  • It is necessary to define the multimedia database before any of the retrieval processes are initiated.
  • This is usually done by the manager of the database and includes specifying the following:
    • The documents to be used
    • The operations to be performed on the data
    • The data model to be used (the data structure and what elements can be retrieved)
  • The document representation transform the original documents and the information needs and generate a logical view of them.

21

22 of 35

…retrieval process

  • Once the logical view of the documents is defined, then an index is built.
    • An index is a critical data structure.
    • It allows fast searching over large volumes of data.
  • Different index structures might be used, but the most popular are the inverted file, signature file, suffix tree, etc.
  • Given the collection is indexed, the retrieval process can then be initiated.

What is the different between the above index structures?

(Read about it…..)

22

23 of 35

…retrieval process

  • The user first specifies an information need in terms of queries which is then represented for searching.
  • Then with the help of matching module a comparison is made to find documents that are related with users’ query.
  • Before the retrieved documents are sent to the user, the retrieved documents are ranked according to the likelihood of relevance.
  • The user then examines the set of ranked documents in the search for useful information.
  • Two choices for the user: (i) reformulate query, run on entire collection or (ii) reformulate query, run on result set.

23

24 of 35

…continued

  • At this point, the user might pinpoint a subset of the documents seen as definitely of interest and initiate a user feedback cycle.
  • In such a cycle, the system uses the documents selected by the user to change the query formulation.
  • Hopefully, this modified query is a better representation of the real user need.

24

25 of 35

Issues that arise in MIR

  • Multimedia data representation
    • What makes a “good” representation?
    • How is a representation generated from multimedia data?
    • What are retrievable objects and how are they organized?
  • Representing information needs
    • What is an appropriate query language?
    • How can interactive query formulation be supported?
  • Comparing representations
    • What is a “good” matching technique?
  • Evaluating effectiveness of retrieval
    • What are “good” metrics for evaluation?

25

26 of 35

Designing MIR System

  • Our focus during MIR system design is two fold:-
  • In improving performance effectiveness of the system.
    • The concern here is what documents retrieved to meet users information need.
    • Techniques for enhancing effectiveness: index term selection, weighting schemes, matching algorithms, query languages & reformulations.
    • Effectiveness of the system is measured in: precision, recall.
  • In improving performance efficiency.
    • The concern here is storage space usage, access time
    • Techniques for enhancing efficiency: Compression, data/file structures.
    • Efficiency of the system is measured in terms of time & space complexity.

26

27 of 35

Subsystems of IR system

The two subsystems of an IR system:

    • Indexing:-
      • It is the process of organizing index terms identified from multimedia document corpus.
      • Indexing is used to speed up access to desired information from document collection as per users query.
    • Searching:-
      • Searching scans document corpus to find relevant documents that matches users query.

27

28 of 35

Indexing Subsystem

28

29 of 35

Searching Subsystem

29

30 of 35

Basic assertion

Indexing and searching: inexorably connected.

  • You cannot search that was not first indexed in some manner or other.
  • Indexing of documents or objects is done in order to be searchable.
    • there are many ways to do indexing.
    • there are many indexing languages.
    • Knowing searching is knowing indexing.

30

31 of 35

Implementation Issues

  • Storage of multimedia document
    • The need for multimedia document compression: to reduce storage space.
  • Indexing multimedia document
    • Organizing indexes:
      • What techniques to use? How to select it?
    • Storage of indexes:
      • Is compression required? Do we store on memory or in a disk?
    • Updating index file:
      • How to update the index?
  • Searching multimedia document
    • Accessing indexes:
      • How to access to indexes? What data/file structure to use?
    • Processing indexes:
      • How to search a given query in the index?
    • Accessing documents:

31

32 of 35

Indexing: Basic Concepts

  • Indexing is used to speed up access to desired information from document collection as per users query such that:
  • It enhances efficiency in terms of time for retrieval.
  • Relevant documents are searched and retrieved quick.
  • An index file consists of records, called index entries.
  • An index file of a document is a file consisting of a list of index terms and a link to one or more documents that has the index term.
  • Index files are much smaller than the original file.

32

33 of 35

Building Index file

  • A good index file maps each keyword ki to a set of documents di that contain the keyword.
  • Index file usually has index terms in a sorted order.
  • The sort order of the terms in the index file provides an order on a physical file.
  • An index file is list of search terms that are organized for associative look-up. i.e. to answer user’s query:-
      • In which documents does a specified search term appear?
      • Where within each document does each term appear?
  • For organizing index file for a collection of documents, there are various options available:
    • Decide what data structure and/or file structure to use. Is it sequential file, inverted file, suffix array, signature file, etc?

33

34 of 35

…continued

34

35 of 35

End of today’s class!���

35