Can AI Preserve Our Science Legacy?
Creating software to solve your problems
NASA INTERNATIONAL SPACE APPS CHALLENGE
2022
PROBLEM STATEMENT
Search Through Documents
User may need document summaries, key words, etc.
Sementic Analysis
Access Documents
User may need to access documents from Terabytes of data.
User may need to search certain word in available documents
SOLUTIONS PROPOSED
Relevancy Based Search
The main functionality is to provide summary of provided documents and list of keywords
Summarizer and KeyWords Extractor
The main functionality is to search words through documents based on their relevancy amoung multiple documents
Relevancy Based Search
Clean text extracted from files.
Iterate over each word and build dictionary
Normalize to compensate for the effect of document length
TF-IDF: Term Frequency and Inverse Document Frequency
Calculate Similarity for input word. Sort the documents based on calculated similarity relevance
Build VSM Dictionary
Calculate tf-idf
Normalize Values
Calculate Similarity
List Relevant Documents
1
i
ii
2
3
Summarizer
Data Pre-processing
Select Top Sentences
Score Sentences
Extract Sentences
Generate Summary
1
2
3
4
5
FUTURE EXTENSIONS
Automated VSM Creation
Introduce Databases
Enable Multi-phrase Search
Document Section Based Summaries