Page 1 of 57

Norwegian University of Science and

Technology

Specialization project

Engineering a search engine using

Lucene and a cloud storage

Author:

Charly Molter

Supervisor:

Svein Erik Bratsberg &

Øystein Torbjørnsen

Departement of Computer Science and Information Science

December 2012

Page 2 of 57

NTNU

Abstract

Departement of Computer Science and Information Science

Engineering a search engine using Lucene and a cloud storage

by Charly Molter

Cloud computing has been one of the fastest computing architectures in the last few

years. It offers low cost storage and computing power to easily create and maintain

distributed systems. Companies nowadays have a growing quantity of data often un- structured that needs to be easily searchable. In this paper we expose clucene, a dis- tributed search engine in a cloud computing infrastructure. It is deployed on Microsoft

Windows Azure and uses a blob store to store its index. The core of the search engine

uses Lucene, an open-source search library. The access time of the blob store being high

we had to develop caching strategies to speed up search and indexing. We then analysed

the performances of clucene by deploying it on a small cluster of two servers.

Page 3 of 57

Acknowledgements

I would like to thank Svein Erik Bratsberg and Øystein Torbjørnsen for their time and

their really useful advice during the project and the Lucene community for their excellent

piece of software and the help they provided me on the IRC.. . .

ii