Published using Google Docs
XWiki Solr component
Updated automatically every 5 minutes

Solr Search on XWiki in Action.

Solr component indexes all the pages inside Sandbox. To keep it simple, I have indexed pages in english, french and spanish.

I have used the below setup to explain the working of solr component.

Pages:

  1. Test page 1 (Default with XE in Sandbox )
  1. Couple of test strings added to body.
  1. Test page 2 (Default with XE in Sandbox )
  2. Test page 3 (Default with XE in Sandbox )
  3. Test page 4         
  1. Test page in English        
  2. Test page in French
  3. Test page in Spanish

Few more pages with “test” in the body but not in page title. Other random pages with l'arbre (the tree) to show french text is being parsed well. ( http://jira.xwiki.org/browse/XWIKI-6226 )

Fields:

  1. Title : [ title_en , title_fr, title_es ]  - Using Solr Dynamic fields *_en, *_fr, *_es        
  2. Full Text : [ ft_en, ft_fr, ft_es ]         

1. The apostrophe ( ' ) is now considered as a separator (http://jira.xwiki.org/browse/XWIKI-6226 

Searching for arbre and l’arbre returns the same set of results because l’ is treated as the stop word.

Searching with l’arbre

2. Customizing the relevancy score using boost index.

  1. qf=title_en^1.0 ft_en^1.0   - Equals weights.

   Test Page1 - 1.36   -> Has ‘Test’ in title_end and ft_en.

   Test Page4- 1.33    -> Has ‘Test’ in title_end only

   Test Page2 - 1.33   -> Has ‘Test’ in title_end only

   Test Page3 - 1.33   -> Has ‘Test’ in title_end only

   Tree of gods - 1.09 -> Has  ‘Test’ in ft_en, very small document.

   WebHome   - 0.32  -> Has ‘Test’ in ft_en, this is a large document. hence according to tf*idf, the term frequency is normalized and comes out to be a small value. Therefore a lower score all together.

   

score by EDISMAX handler in brackets.

  1. qf=title_en^1.0 ft_en^4.0 - making the full text filed more relevant.

 Tree of gods - 1.09 -> Has ‘Test’ in ft_en, very small document. Score reduced for title_en fields, as relative weight reduced.

  1. qf=title_en^3.0 title_fr^1.2 ft_en^1.0 ft_fr^1.2

English field has more weight.

Note: In the 3rd and 4th screen shot, the title is in english for both french and english pages but the content is in the respective languages.