Page 1 of 57
Norwegian University of Science and
Technology
Specialization project
Engineering a search engine using
Lucene and a cloud storage
Author:
Charly Molter
Supervisor:
Svein Erik Bratsberg &
Øystein Torbjørnsen
Departement of Computer Science and Information Science
December 2012
Page 2 of 57
NTNU
Abstract
Departement of Computer Science and Information Science
Engineering a search engine using Lucene and a cloud storage
by Charly Molter
Cloud computing has been one of the fastest computing architectures in the last few
years. It offers low cost storage and computing power to easily create and maintain
distributed systems. Companies nowadays have a growing quantity of data often un- structured that needs to be easily searchable. In this paper we expose clucene, a dis- tributed search engine in a cloud computing infrastructure. It is deployed on Microsoft
Windows Azure and uses a blob store to store its index. The core of the search engine
uses Lucene, an open-source search library. The access time of the blob store being high
we had to develop caching strategies to speed up search and indexing. We then analysed
the performances of clucene by deploying it on a small cluster of two servers.
Page 3 of 57
Acknowledgements
I would like to thank Svein Erik Bratsberg and Øystein Torbjørnsen for their time and
their really useful advice during the project and the Lucene community for their excellent
piece of software and the help they provided me on the IRC.. . .
ii