AR CMD+F
For this Design challenge I decided to tackle a problem that I’ve encountered a lot--the “command+f” action in physical documents. Many times I am reading a physical book or document and want to be able to “command+f” or search through the document to find a certain word. So my design solves exactly this! It uses a camera to read written words, and the Google Vision Cloud api to recognize the letters. Next I use my speaker to read back the word it is going to search for. Then the camera takes another picture of a printed document you show it, and uses Google’s “Detect text in image” api to pull all of the words off the document. Finally I do a search through the document for the written word that was being looked for.
I use a servo motor to then waves a green flag if the word was found, or a red flag if it was not. The device also says out loud “I found it” or “Sorry I didn’t find it”.
Watch the video of it in action here: https://youtu.be/Qhu8wC34G-g.
Laser cutting the paper stand. | The paper stand put together. |
Top of the stand showing the slit that the paper goes through | Back of the paper stand. |
Close up of camera contraption. | Entire gadget in view. |
Entire set up. | Close up of flags. |
import picamera #camera library import pygame as pg #audio library import os #communicate with os/command line import io from google.cloud import vision #gcp vision library from time import sleep from adafruit_crickit import crickit #set up your GCP credentials - replace the " " in the following line with your .json file and path os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="DET-2019-aad44b497877.json" # this line connects to Google Cloud Vision! client = vision.ImageAnnotatorClient() # global variable for our image file - to be captured soon! image = 'image.jpg' def takephoto(camera): # this triggers an on-screen preview, so you know what you're photographing! camera.start_preview() # sleep(5) #give it a pause so you can adjust if needed sayStuff("5") sayStuff("4") sayStuff("3") sayStuff("2") sayStuff("1") sayStuff("SNAP") camera.capture('image.jpg') #save the image camera.stop_preview() #stop the preview def ocr_handwriting(path): with io.open(path, 'rb') as image_file: content = image_file.read() image = vision.types.Image(content=content) response = client.text_detection(image=image) text = response.full_text_annotation word_text = "" for page in text.pages: for block in page.blocks: for paragraph in block.paragraphs: for word in paragraph.words: word_text += " " word_text += ''.join([ symbol.text for symbol in word.symbols ]) return word_text def detect_text(path): """Detects text in the file.""" client = vision.ImageAnnotatorClient() with io.open(path, 'rb') as image_file: content = image_file.read() image = vision.types.Image(content=content) response = client.text_detection(image=image) texts = response.text_annotations return texts[0].description def turnOnCamera(camera): camera.start_preview() def imageRec(): camera = picamera.PiCamera() pathToImg = 'image.jpg' pg.init() pg.mixer.init() # turnOnCamera(camera) # while True: # First Recognize written word #then take written word and use it to search through document. takephoto(camera) writtenText = ocr_handwriting(pathToImg).strip() print(writtenText) sayStuff('Searching for the word: "{0}"'.format(writtenText)) #recogize words in document takephoto(camera) docText = detect_text(pathToImg) # print(docText) #search through words to find initial word if (writtenText.lower() in docText.lower()): crickit.servo_1.angle = 180 sayStuff("Found it") else: crickit.servo_1.angle = 0 sayStuff("Sorry, I didn't find it.") def sayStuff(stuff): cmd_string = 'espeak -ven+f4 "{0}" 2>/dev/null'.format(stuff) cmd_string os.system(cmd_string) def resetServoPos(): crickit.servo_1.angle = 90 def main(): pg.init() pg.mixer.init() resetServoPos() imageRec() if __name__ == '__main__': main() |