AudioVision: Object Detection for the Visually Impaired

About the Project

This project is a program that is designed to provide a seamless and inclusive experience for individuals with visual impairments. Utilizing a laptop's webcam, this program captures and interprets real-time visual information using the YOLOv8 model and further integrates the Python Text to Speech library to audibly describe the objects detected within the bounding boxes, ensuring that users receive comprehensive and context-aware information about their environment, allowing them to navigate the world independently.

What is YOLOv8?

The "You Only Look Once" (or YOLO) machine learning model is a state-of-the-art deep learning architecture for object detection. Developed by Computer Scientist Joseph Redmon and his team, YOLO revolutionized object detection by introducing a single-shot detection approach, allowing it to process images in real-time and with high accuracy. The latest version of YOLO is currently YOLO version 8 or YOLOv8.


Unlike other more traditional object detection methods that require multiple passes through the network for each image, YOLO processes the entire image in just one single pass. This makes it incredibly fast, as it only needs to make predictions once per image. This efficiency makes YOLO suitable for applications where real-time processing is crucial, such as autonomous driving, video surveillance, and augmented reality.


The YOLOv8 architecture consists of a deep convolutional neural network (CNN) backbone, which extracts features from the input image, followed by computer vision detection that predicts where the bounding boxes should be placed, confidence scores, and class probabilities for objects within the image. The model is trained on multiple datasets, where each image is annotated with bounding boxes and class labels for the detected objects present.

Realtime Demo


Picture Upload Demo

Project Documentation

Files

This project uses several different files all working in tandem. The main files are the 'Yolov8Images.py', 'Yolov8ModelIT.py', 'outputfile.txt', and 'yolov8n.pt'.

ImagesScript
The 'Yolov8images' file first imports several libraries for image processing, object detection, speech synthesis, and data manipulation. It then initializes the YOLOv8 model and performs object detection on a loaded image file named 'classroom.jpg'it then utilizes the 'results.show()' method to display bounding boxes around the detected objects. And finally it saves the image with the detected objects marked as 'results2.jpg' using the 'results.save()' method.

Realtime Script
The 'Yolov8ModelIT.py' file also imports several libraries, however instead of having the user load in a photo it utilizes the laptop's built in webcam. Within a continuous loop (while cap.isOpened()), it reads frames from the video capture object using cap.read(). For each frame, the script passes it through the YOLOv8 model for object detection, resulting in a list of detection results. The loop continues until the user presses the 'q' key, at which point it breaks out of the loop, exits the program, and reads the objects that were detected outloud.

Send to Output File
For both programs, whatever objects that have been detected are printed onto a separate text file called 'outputFile.txt.' Subsequently, the script writes the counts of each object detected and checks if the count is greater than 1 to determine whether to add a plural suffix ('s') to the class name in the output.

textToSpeech
Finally, the python scripts utilize the pyttsx3 library to convert text from the 'outputFile.txt' into speech and reads it aloud.

PowerPoint Presentation

You can find the PowerPoint I created for UCD here.

Contact Information

For inquiries or feedback, please reach out via email: anaguilera@txwes.edu