Can AI Preserve Our Science Legacy?
Rajiv Iyer
Poornima Iyer
Can AI Preserve Our Science Legacy?
Abstract
In this project we propose using Amazon Comprehend(AC) which uses natural language processing (NLP) to extract insights about the content of public NTRS documents. It develops insights by recognizing the entities, key phrases, language and other common elements in a document. Using Amazon Comprehend we can create web and mobile app to understand the structure of documents and give improved search results which will improve accessibility and discoverability of public NTRS records. It supports asynchronous analysis jobs for large document sets. We can find the documents about a particular subject using AC topic modeling, We can specify the number of topics that AC will return from the doc.
Can AI Preserve Our Science Legacy?
Problem Definition
NASA Technical Report Server (NTRS) includes hundreds of thousands of items containing scientific and technical information (STI) created or funded by NASA have a large number of pdf documents. It can be difficult to extract text from a scanned document when it contains formats such as tables, forms, paragraphs, and check boxes. Organizations have been addressing these problems with Optical Character Recognition (OCR) technology, but it requires templates for form extraction and custom workflows.
Extracting and analyzing text from images or PDFs is a classic machine learning (ML) and natural language processing (NLP) problem. When extracting the content from a document, you want to maintain the overall context and store the information in a readable and searchable format. Creating a sophisticated algorithm requires a large amount of training data and compute resources. Building and training a perfect machine learning model could be expensive and time-consuming.
Can AI Preserve Our Science Legacy?
Proposed Solution
Creating a web and mobile app using an NLP-powered search index with Amazon Textract and Amazon Comprehend as an automated content-processing pipeline for storing and analyzing scanned image documents. This solution uses serverless technologies and managed services to be scalable and cost-effective.
Can AI Preserve Our Science Legacy?
Proposed solution model
Can AI Preserve Our Science Legacy?
Proposed solution model
Can AI Preserve Our Science Legacy?
Proposed Solution