Democratizing data at Kiwi.com
Challenge
Challenge
Solution
Slack chatbot which will provide all the necessary information by human-like interaction.
Main technology stack
Workflow
Dialogflow
Dialogflow - classifier
Dialogflow - intents
Dialogflow - small talk
Dialogflow - intents
Dialogflow - problems
Dialogflow - Excel smalltalk
Dialogflow - problems - training
Dialogflow - other problems
Databases
Why do we even need graphs?
Our case
Document model in ES
1. class DocumentElastic(DocType):�2. uuid = Keyword()�3. title = Text(fields=default_fields)�4. ...�5. description = Text(fields=default_fields)�6. updated_at = Date()�7. ...�8. parameters = Nested(Parameter)�9. ...�10. graph_statistics = Nested(ResultType)�11. �12. class Index:�13. name = 'documents'�14. �15. def is_up_to_date(self, last_updated: datetime):�16. return self.updated_at >= last_updated���
User model in Neo4j
1. class UserNeo(StructuredNode):�2. uuid = StringProperty()�3. email = StringProperty(unique_index=True)�4. time_created = DateTimeProperty()�5. �6. created = RelationshipTo('DocumentNeo', 'CREATED', model=CreatedRelation)�7. consumed = RelationshipTo('DocumentNeo', 'CONSUMED', model=ConsumedRelation)�8. modified = RelationshipTo('DocumentNeo', 'MODIFIED', model=ModifiedRelation)��
Document model in Neo4j
1. class DocumentNeo(StructuredNode):�2. uuid = StringProperty()�3. source = StringProperty(required=True, index=True)�4. source_id = StringProperty(required=True, index=True)�5. views = IntegerProperty(default=0)�6. people_viewed = IntegerProperty(default=0)�7. page_rank = FloatProperty(default=0)�8. �9. created_by = RelationshipTo('UserNeo', 'CREATED_BY', model=CreatedRelation)�10. consumed_by = RelationshipTo('UserNeo', 'CONSUMED_BY', model=ConsumedRelation)�11. modified_by = RelationshipTo('UserNeo', 'MODIFIED_BY', model=ModifiedRelation)��
ES + Neo4j - how to use both dbs?
ES + Neo4j - interface to unite them
1. class Document:�2. """Unites ElasticSearch and Neo4j, representing an entity in both databases.�3. Entities are available by `uuid` or tuple `source, source_id`�4. """�5. �6. def __init__(self):�7. self._elastic_doc: DocumentElastic�8. self._neo4j_doc: DocumentNeo�9. �10. def __getattr__(self, name):�11. if name not in ('_elastic_doc', '_neo4j_doc'):�12. try:�13. return getattr(self._elastic_doc, name)�14. except AttributeError:�15. pass�16. return getattr(self._neo4j_doc, name)�17. return None���
ES + Neo4j - some methods
1. @staticmethod�2. def get_by_source_id(source, source_id):�3. doc = Document()�4. doc._elastic_doc = ElasticQuery.get_doc_by_source_id(source, source_id)�5. doc._neo4j_doc = NeoQuery.get_doc_by_source_id(source, source_id)�6. return doc�7. �8. @staticmethod�9. def get_by_uuid(uuid):�10. doc = Document()�11. doc._elastic_doc = ElasticQuery.get_doc_by_uuid(uuid)�12. doc._neo4j_doc = NeoQuery.get_doc_by_uuid(uuid)�13. return doc�14. �15. def is_up_to_date(self, last_updated: datetime):�16. return self._elastic_doc.is_up_to_date(last_updated)��
Elasticsearch-dsl - query examples
DocumentElastic\� .search(index='documents', using=elastic.client)\� .query('bool', filter=[Q('term', source=source)])\� .fields(['source_id'])[:limit]\
.execute()�
DocumentElastic.get(id=uuid, using=elastic.client, index='documents')�
Elasticsearch - word order
Elasticsearch - word order
Elasticsearch - analyzers
1. root = analyzer(�2. 'root',�3. type='custom',�4. tokenizer='standard',�5. char_filter=['html_strip'],�6. filter=[english_possessive_stemmer, synonyms_case_sensitive, 'lowercase',�7. synonyms_lowercase, english_stop, english_stemmer])�8. �9. shingles = analyzer(�10. 'shingles',�11. type='custom',�12. tokenizer='standard',�13. char_filter=['html_strip'],�14. filter=[english_possessive_stemmer, synonyms_case_sensitive, 'lowercase',�15. synonyms_lowercase, english_stop, english_stemmer, shingle_filter])�16. �17. default_fields = {�18. 'default': Text(analyzer=root),�19. 'shingles': Text(analyzer=shingles)�20. }�� |
Neo4j
Neo4j - Graph statistics
db.cypher_query('''� MATCH (doc:DocumentNeo) - [rel:CONSUMED_BY] - (user:UserNeo) # filtering nodes� WITH doc, sum(rel.times_viewed) AS views, # aggregating� SET doc.views = views # updating� ''')�
PageRank
Neo4j - trick to project bipartite graph
Elasticsearch - Function score
Elasticsearch-dsl - Function score
1. query = FunctionScore(�2. query=query,�3. functions=[�4. dict( # Gauss multiplier�5. gauss={�6. 'updated_at': {�7. 'origin': datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S'),�8. 'offset': '365d',�9. 'scale': '700d'�10. }�11. }�12. ),�13. dict( # Multipliers from graph features�14. script_score=dict(script=dict(�15. source=score_script,�16. params=dict(�17. pg_offset=1,�18. pg_multiplier=1,�19. vw_offset=1,�20. vw_multiplier=0.2�21. ),�22. )))]) ��
Are the results good?
Future plans
Thank you!
Questions?