About the Project
This project is a program that is designed to provide a seamless and inclusive experience for individuals with visual impairments. Utilizing a laptop's webcam, this program captures and interprets real-time visual information using the YOLOv8 model and further integrates the Python Text to Speech library to audibly describe the objects detected within the bounding boxes, ensuring that users receive comprehensive and context-aware information about their environment, allowing them to navigate the world independently.
What is YOLOv8?
The "You Only Look Once" (or YOLO) machine learning model is a state-of-the-art deep learning architecture for object detection. Developed by Computer Scientist Joseph Redmon and his team, YOLO revolutionized object detection by introducing a single-shot detection approach, allowing it to process images in real-time and with high accuracy. The latest version of YOLO is currently YOLO version 8 or YOLOv8.
Unlike other more traditional object detection methods that require multiple passes through the network for each image, YOLO processes the entire image in just one single pass. This makes it incredibly fast, as it only needs to make predictions once per image. This efficiency makes YOLO suitable for applications where real-time processing is crucial, such as autonomous driving, video surveillance, and augmented reality.
The YOLOv8 architecture consists of a deep convolutional neural network (CNN) backbone, which extracts features from the input image, followed by computer vision detection that predicts where the bounding boxes should be placed, confidence scores, and class probabilities for objects within the image. The model is trained on multiple datasets, where each image is annotated with bounding boxes and class labels for the detected objects present.
Realtime Demo
Picture Upload Demo
Project Documentation
This project uses several different files all working in tandem. The main files are the 'Yolov8Images.py', 'Yolov8ModelIT.py', 'outputfile.txt', and 'yolov8n.pt'.
The 'Yolov8images' file first imports several libraries for image processing, object detection, speech synthesis,
and data manipulation. It then initializes the YOLOv8 model and performs object detection on a loaded image file named 'classroom.jpg'it then
utilizes the 'results.show()' method to display bounding boxes around the detected objects. And finally it saves the image with the detected
objects marked as 'results2.jpg' using the 'results.save()' method.
The 'Yolov8ModelIT.py' file also imports several libraries, however instead of having the user load in a photo it
utilizes the laptop's built in webcam. Within a continuous loop (while cap.isOpened()), it reads frames from the video capture object using cap.read().
For each frame, the script passes it through the YOLOv8 model for object detection, resulting in a list of detection results.
The loop continues until the user presses the 'q' key, at which point it breaks out of the loop, exits the program, and reads the objects
that were detected outloud.
For both programs, whatever objects that have been detected are printed onto a separate text file called 'outputFile.txt.'
Subsequently, the script writes the counts of each object detected and checks if the count is greater than 1 to determine whether to add a
plural suffix ('s') to the class name in the output.
Finally, the python scripts utilize the pyttsx3 library to convert text from the 'outputFile.txt' into speech and reads it aloud.
PowerPoint Presentation
You can find the PowerPoint I created for UCD here.
Contact Information
For inquiries or feedback, please reach out via email: anaguilera@txwes.edu