1 of 39

Search API

Drupal Search in nutshell

Artem Sylchuk, F5

2 of 39

Imagine

You want to add search to your site

3 of 39

Putting a Search Engine on Your Website

  • Installing Your Own Search Engine Script
  • Using a Free or Commercial Third Party
  • Hosted Search Engine Service
  • Using the Major Search Engines

4 of 39

$sql = "SELECT

`ID`,

`FirstName`,

`LastName`

FROM `Contacts`

WHERE `FirstName` LIKE '%" . $letter . "%'

OR `LastName` LIKE '%" . $letter ."%'";

5 of 39

Why not?

  1. Zipf’z law
  2. Tags
  3. Search Exerpt
  4. Spellcheck
  5. Autocomplete
  6. ….

6 of 39

Indexing

Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing.

7 of 39

Search in Drupal core

  1. Builds index from strings
  2. Parses tags
  3. Does trims and cleanups
  4. Calculates words score
  5. Handle links between nodes and users
  6. Easily extandable

https://www.acquia.com/blog/drupal-search-how-indexing-works

8 of 39

How does it work?

  • Set tag score (h1 => 25, h2 => 18, a => 10, em => 3, b => 3)
  • Find opening tag, increment score
  • Calculate focus (how far text is from beginning): $focus = min(1, .01 + 3.5 / (2 + count($results[0]) * .015));
  • Find closing tag, decrement score
  • Find link to other node, index text for linked node with smaller focus
  • Update search_total table: $total = log10(1 + 1/(max(1, $total)));
  • Extend SelectQuery
  • Add magic "CAST(:multiply_$i AS DECIMAL) * COALESCE(( " . $score . "), 0) / CAST(:total_$i AS DECIMAL)";
  • Add relevancy and other metadata
  • Show the result

9 of 39

Wow, amazing! No?

No. We demand:

  1. Grouping of results by content type
  2. Autocomplete or recommendations
  3. Spellcheck
  4. Biasing or the ability to order the results outside the natural result set
  5. Search analytics
  6. Facets
  7. How easily can I wrap up all the configuration into an installable module or profile?

10 of 39

I want to build my own Google*

*with hookers

11 of 39

Search API

12 of 39

Search API

API Overview:

  • Framework for easily creating searches
  • Abstracts from data sources and backend implementations
  • Large ecosystem with extensions, e.g. backends
  • Facet API integration
  • Heavily based on Entity API
    • Provides metadata
    • Used for index and server configurations

Extension features:

  • Search API Autocomplete
  • Attachments
  • Saved Searches
  • Location
  • Pretty Facets Paths
  • Slider (Search API Ranges)
  • and many more.

13 of 39

14 of 39

Search API Views:

  • Full Views support
  • Display any property of an entity
  • Use any indexed field as filter, argument or sort
  • Most code based on Entity API's views integration
  • By default: data retrieved via entity load
    • Can be bypassed ("Retrieve data from Solr" setting in server)
  • Alternative: Search API pages

Search API Recipes:

  • CRUD hooks for indexes and servers
  • Hooks for adding
    • data sources
    • backends
    • data alterations
    • processors
  • Hook fired when indexing items
  • Hook fired when executing a search

15 of 39

No Silver Bullet

16 of 39

Some UI Examples

17 of 39

18 of 39

19 of 39

20 of 39

21 of 39

22 of 39

23 of 39

24 of 39

Search

  • Fulltext search
  • Fuzzy search
  • Stemmer
  • Transliteration
  • Tokenizer
  • Stopwords
  • Highlighting
  • Spellcheck
  • Suggestions
  • Excerpt
  • Facets

25 of 39

Drupal 8

26 of 39

Gimme some codez

27 of 39

Database & Indexing

drush sapi-i

  1. Selects changed items from search_api_item
  2. Doesn’t mark any items as being processed
  3. Indexes based only on changed property
  4. Can’t be parallelized
  5. Can be configured to run immediately on content update
  6. Normally runs on cron
  7. Any change in fields config requires re-indexing
  8. Re-indexing doesn’t purge index data

28 of 39

SearchAPI + EntityAPI =

Entity Medata Wrapper

$wrapper = entity_metadata_wrapper('node', $node);

$wrapper = $entity->wrapper();

$wrapper->author->mail->value();

$wrapper->author->mail->set('sepp@example.com');

$wrapper->author->mail = 'sepp@example.com';

$wrapper->body->value->value(array('decode' => TRUE));

$wrapper = entity_metadata_wrapper('node', $node);

foreach ($wrapper->field_taxonomy_terms->getIterator() as $delta => $term_wrapper) {

// $term_wrapper may now be accessed as a taxonomy term wrapper.

$label = $term_wrapper->name->value();

}

$first_name = $wrapper->field_tags[0]->name->value();

29 of 39

Death to Field Arrays!

http://www.mediacurrent.com/blog/entity-metadata-wrapper

30 of 39

Add Property To Entity

31 of 39

Views & entity_load()

32 of 39

Processors vs Alters

/**

* Implements hook_search_api_alter_callback_info().

*/

function search_api_search_api_alter_callback_info() {

$callbacks['search_api_alter_bundle_filter'] = array(

'name' => t('Bundle filter'),

'description' => t('Exclude items from indexing based on their bundle (content type, vocabulary, …).'),

'class' => 'SearchApiAlterBundleFilter',

// Filters should be executed first.

'weight' => -10,

);

}

/**

* Implements hook_search_api_processor_info().

*/

function search_api_search_api_processor_info() {

$processors['search_api_case_ignore'] = array(

'name' => t('Ignore case'),

'description' => t('This processor will make searches case-insensitive for fulltext or string fields.'),

'class' => 'SearchApiIgnoreCase',

);

}

33 of 39

Alter callbabks:

  • Run early
  • Used to add properties or unset values from index
  • Require alterItems() method to be implemented for work

Processors:

  • Can be invoked as pre- and post- processors
  • Can change query, results, fields keys
  • More complex, Simple processors can just override process(), while others might want to override the other process*() methods, and test*() (for restricting processing to something other than all fulltext data).

34 of 39

Availble filters and Processors

Data alterations

  • Bundle filter
  • Language control
  • Node access
  • URL field
  • Aggregated fields
  • Complete entity view
  • Index hierarchy

Processors

  • Ignore case
  • HTML filter
  • Tokenizer
  • Stopwords
  • Highlighting

35 of 39

Custom item types / datasource controllers

36 of 39

Intro to Apache Solr

37 of 39

SearchAPI vs Apache SOLR module

38 of 39

Setting up the Search

  1. Download
  2. Untar
  3. Run ./bin/solr start
  4. Enjoy Searching
  5. Go back and fix xml configs
  6. Define fields schema
  7. Have a lot of pain and goto 6

39 of 39