Tool Search:
"Past, Present, and Future"
Ahmed Awan, Tyler Collins, Michelle Savage
Johns Hopkins University
September 15, 2022
Flaws With the Old Tool Search
Tool Search GCC
Cameron Hyde
Tyler Collins
Michelle Savage
Ahmed Awan
The Backend Search: What is Whoosh?
Whoosh is a python library of classes and functions for indexing text and then searching the index.
Essentially, Woosh allows us to develop a custom search engine for searching the tools loaded on a Galaxy instance.
Previous Search Index Schema
The first step in building the backend search is defining an index schema of all of the fields that should be searchable such as title or content.
Populating the Search Index
We then populate our index by looping through all of the available tools on a galaxy instance.
How Woosh searches the index
Parsing the Query
Galaxy was using a custom n-gram implementation to help parse the query after it was converted to lowercase.
Galaxy was also configured to default to an ‘or’ search instead of an ‘and’ search.
Galaxy also added a wild card search to the query.
Old Search Implementation
Galaxy would then sum the score of all the fields. The field scores were calculated by the scoring algorithm BM25F * by some boosting value that was defined to mark the importance of each field.
The BM25 Algorithm
=
What we did to Improve the Search
Changed Scoring Algorithm
Fixed Boosting
Implemented Improved N-gram Search
Implemented Woosh Analyzers
Fixed id Search
Additionally we fixed the ID field to actually be searchable now.
Changed how Config Params are Accessed
We also added all of the boost parameters and ngram sizes and other changes as customizable options to the config_schema.yml.
Results After Changes
Why refactor (or clean) code?
Why (any of below)
credit: https://www.flickr.com/photos/mercer52/16141913875
API-to-Front End
Code Conventions
There are only two hard things in Computer Science: cache invalidation and naming things.
-- Phil Karlton
principal developer at Xerox PARC, Digital, Silicon Graphics, and Netscape (Karlton.org)
Code Conventions for Clarity*
Nice to Haves:
____
* Unless a developer is working entirely on their own, the code must be easily understood for other developers to understand, maintain, and extend.
** sometimes there is a naming limitation on the coding language/tech so brevity in naming must be adhered to
Code Conventions for Clarity*
Before
After
SOLID coding principles
Key Takeaways (from code file top-to-bottom):
https://en.wikipedia.org/wiki/SOLID
Front End Test Coverage *
* Integration coverage extremely helpful for more complex feature too.
Unit (or Integration) | Filename | Existing or New? |
Unit | ToolSection.test.js | Existing |
Unit | ToolSearch.test.js | New |
Advanced Tool Search
Advanced Tool Search
Implemented and merged in dev, as a dropdown, advanced menu similar to the history:
Advanced Tool Search
Advanced Tool Search
Clicking on a tool name may show tool help text/information.
Advanced Tool Search
Special Thanks