Hi There !
I'MVENKAT SAI
Machine Learning Engineer
Hi There !
I'MVENKAT SAi
In the world of computer vision, hand gesture recognition has gained significant attention due to its potential applications in human-computer interaction, virtual reality, gaming, and many more. In this blog post, I will explain one of my Python project that utilizes the MediaPipe library along with OpenCV to perform real-time hand gesture recognition. By analyzing hand landmarks (fingers locations), we'll control keyboard inputs to navigate through the application using hand gestures.
Hand gesture recognition involves interpreting and analyzing the position of a person's hand to infer different gestures. This project uses MediaPipe, a popular open-source library developed by Google, and OpenCV, a well-known computer vision library, to achieve this task. By capturing the webcam feed, the code identifies hand landmarks and calculates the distances between specific finger landmarks to recognize gestures like a closed fist or an open palm.
MediaPipe : This library provides pre-trained machine learning models to detect and track various hand and body parts' landmarks. Click here To View MediaPipe Documentation
pip install mediapipe
OpenCV : Open Source Computer Vision Library, which is widely used for computer vision tasks, such as capturing video streams, image processing, and graphical display. Click here To View OpenCV Documentation
pip install opencv-python
Keyboard : This library provides functionality for controlling and simulating keyboard input. It allows you to automate keyboard actions, such as typing characters, pressing keys, and more. Click here To View Keyboard Documentation
pip install keyboard
import mediapipe as mp import cv2 import keyboard mp_drawing = mp.solutions.drawing_utils mp_hands = mp.solutions.hands cap = cv2.VideoCapture(0) hands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) def distance(point1, point2): return ((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2) ** 0.5 def fingers_landmarks(hand): thumbpoint1 = hand.landmark[mp_hands.HandLandmark.THUMB_TIP] thumbpoint2 = hand.landmark[mp_hands.HandLandmark.THUMB_IP] thumbpoint3 = hand.landmark[mp_hands.HandLandmark.THUMB_MCP] thumbpoint4 = hand.landmark[mp_hands.HandLandmark.THUMB_CMC] indexpoint1 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP] indexpoint2 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP] indexpoint3 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP] indexpoint4 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP] middlepoint1 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP] middlepoint2 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_DIP] middlepoint3 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_PIP] middlepoint4 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_MCP] ringpoint1 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_TIP] ringpoint2 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_DIP] ringpoint3 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_PIP] ringpoint4 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_MCP] littlepoint1 = hand.landmark[mp_hands.HandLandmark.PINKY_TIP] littlepoint2 = hand.landmark[mp_hands.HandLandmark.PINKY_DIP] littlepoint3 = hand.landmark[mp_hands.HandLandmark.PINKY_PIP] littlepoint4 = hand.landmark[mp_hands.HandLandmark.PINKY_MCP] wrist = hand.landmark[mp_hands.HandLandmark.WRIST] return [ thumbpoint1, thumbpoint2, thumbpoint3, thumbpoint4, indexpoint1, indexpoint2, indexpoint3, indexpoint4, middlepoint1, middlepoint2, middlepoint3, middlepoint4, ringpoint1, ringpoint2, ringpoint3, ringpoint4, littlepoint1, littlepoint2, littlepoint3, littlepoint4, wrist, ] def check(list_name, symbol): return ( all( [ True if dist <= list_name[i] else False for i, dist in enumerate(distances) ] ) if symbol == "<=" else all( [ True if dist >= list_name[i] else False for i, dist in enumerate(distances) ] ) ) fist_close = [0.38, 0.25, 0.2, 0.19, 0.2] fist_open = [0.24, 0.39, 0.43, 0.41, 0.35] while True: ret, frame = cap.read() image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) image = cv2.flip(image, 1) image.flags.writeable = False results = hands.process(image) image.flags.writeable = True image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) if results.multi_hand_landmarks: for num, hand in enumerate(results.multi_hand_landmarks): mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS) fingers = fingers_landmarks(hand) distances = [distance(fingers[20], fingers[i]) for i in range(0, 20, 4)] if check(fist_close, "<="): keyboard.press("left") elif check(fist_open, ">="): keyboard.press("right") else: keyboard.release("right") keyboard.release("left") cv2.imshow("Hand Tracking", image) if cv2.waitKey(1) == ord("q"): print(image.shape) break cap.release() cv2.destroyAllWindows()
The project consists the following steps :
Import Libraries : Import the required libraries, including MediaPipe, OpenCV, and the keyboard library for simulating keypresses.
# This contains the model that can recognize hand landmarksimport mediapipe as mp# This is used for video capture and processingimport cv2# This is used for the keyboard integrationimport keyboard
Initialize Components : Initialize the necessary components like the webcam feed, the MediaPipe hands detection model, and set the detection and tracking confidence levels.
# This is used to draw joining lines through hand landmarks (fingers locations)mp_drawing = mp.solutions.drawing_utils# This is used for the identifing the hand landmarksmp_hands = mp.solutions.hand# This is object that is required to start VideoCapture to Camcap = cv2.VideoCapture(0)# Initializes an instance of the Hands class from the MediaPipe libraryhands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5)
Define Functions : Define functions to calculate distances between hand landmarks and to check for specific hand gestures based on calculated distances.
# calculates the distance between the hand locationsdef distance(point1, point2): return ((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2) ** 0.5# This are the Hand landmarks values we can get this from MediaPipe Documentationdef fingers_landmarks(hand): thumbpoint1 = hand.landmark[mp_hands.HandLandmark.THUMB_TIP] thumbpoint2 = hand.landmark[mp_hands.HandLandmark.THUMB_IP] thumbpoint3 = hand.landmark[mp_hands.HandLandmark.THUMB_MCP] thumbpoint4 = hand.landmark[mp_hands.HandLandmark.THUMB_CMC] indexpoint1 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP] indexpoint2 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP] indexpoint3 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP] indexpoint4 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP] middlepoint1 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP] middlepoint2 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_DIP] middlepoint3 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_PIP] middlepoint4 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_MCP] ringpoint1 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_TIP] ringpoint2 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_DIP] ringpoint3 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_PIP] ringpoint4 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_MCP] littlepoint1 = hand.landmark[mp_hands.HandLandmark.PINKY_TIP] littlepoint2 = hand.landmark[mp_hands.HandLandmark.PINKY_DIP] littlepoint3 = hand.landmark[mp_hands.HandLandmark.PINKY_PIP] littlepoint4 = hand.landmark[mp_hands.HandLandmark.PINKY_MCP] wrist = hand.landmark[mp_hands.HandLandmark.WRIST] return [ thumbpoint1, thumbpoint2, thumbpoint3, thumbpoint4, indexpoint1, indexpoint2, indexpoint3, indexpoint4, middlepoint1, middlepoint2, middlepoint3, middlepoint4, ringpoint1, ringpoint2, ringpoint3, ringpoint4, littlepoint1, littlepoint2, littlepoint3, littlepoint4, wrist, ]
Gesture Thresholds : Define threshold values for distances between finger landmarks that correspond to certain gestures. These threshold values will be used to determine whether a gesture is being performed.
# We can custimize our own gesture valuesfist_close = [0.38,0.25,0.2,0.19,0.2]
# I found the values of fist_open and close by exploring various values
# Closed fist# Opened fistfist_open = [0.24,0.39,0.43,0.41,0.35]
Main Loop : Enter the main loop that captures video frames from the webcam feed, processes them using MediaPipe's hand detection model, and calculates distances between finger landmarks.
while True:# Starts the Web Camret, frame = cap.read()# By default the image/video is in BGR ( Blue, Green, Red ) we will convert it into RGB ( Red, Green, Blue )image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
# We do this Because RGB is a Standard format# By default opencv provides the mirror video , So we Will flip itimage = cv2.flip(image, 1)# Releasing the default values of the modelimage.flags.writeable = False# process the image/video the we capturedresults = hands.process(image)# Setting of image flagsimage.flags.writeable = True# Converting Back to BGRimage = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)# Obtaining the hand landmarks and joining the landmarksif results.multi_hand_landmarks: for num, hand in enumerate(results.multi_hand_landmarks): mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS) fingers = fingers_landmarks(hand) distances = [distance(fingers[20], fingers[i]) for i in range(0, 20, 4)]
Recognize Gestures : Based on the calculated distances and defined thresholds, recognize gestures such as a closed fist or an open palm. Simulate keyboard inputs (left and right arrow keys) using the keyboard library based on recognized gestures.
def check(list_name, symbol): return ( all( [ True if dist <= list_name[i] else False for i, dist in enumerate(distances) ] ) if symbol == "<=" else all( [ True if dist >= list_name[i] else False for i, dist in enumerate(distances) ] ) )# Returns True if the fist is open or close and Simulate the keyboard inputif check(fist_close, "<="): keyboard.press("left") elif check(fist_open, ">="): keyboard.press("right") else: keyboard.release("right") keyboard.release("left")
Display Output : Display the processed video frame with drawn hand landmarks and connections.
Exit the Application : Allow the user to exit the application by pressing the 'q' key.
# Shows the Cam Capturecv2.imshow('Hand Tracking', image)# If Q is pressed then the video capture will stopif cv2.waitKey(1) == ord('q'): break# Releases the cam Capturecap.release()# Destroys all Windows that are running by opencvcv2.destroyAllWindows()
Face recognition technology has become increasingly relevant across various domains, from security systems to personal identification. In this blog post, we'll explore a Python project that leverages the power of OpenCV and the Face Recognition library to build a real-time face recognition system. By analyzing webcam feed data, the system can detect and label known individuals while identifying unknown faces.
The goal of this project is to create a real-time face recognition system using the OpenCV library and the Face Recognition library. This system captures video from a webcam, processes the frames to detect faces, and then compares the detected face encodings with known face encodings to determine if the face belongs to a known person. The system annotates the frames with rectangles and labels, indicating the identity of known individuals or marking faces as "Unknown" if no match is found.
OpenCV : Open Source Computer Vision Library, a powerful tool for various computer vision tasks including image and video processing, object detection, and more. Click here to view OpenCV Documentation.
pip install opencv-python
Face Recognition : A Python library specifically designed for face recognition tasks. It provides tools for face detection, face encoding, and face comparison. Click here to view Face Recognition Library.
pip install face_recognition
import cv2 import face_recognition import numpy as np # List of image filenames containing known faces static = ['name.png'] # Encode known faces using face_recognition library known_face_encodings = [face_recognition.face_encodings(cv2.imread(image))[0] for image in static] known_faces = [name.split('.')[0] for name in static] # Initialize video capture from webcam cap = cv2.VideoCapture(0) while True: ret, frame = cap.read() resize_frame = cv2.resize(frame, (0, 0), fx=1, fy=1) resize_frame = cv2.cvtColor(resize_frame, cv2.COLOR_BGR2RGB) # Detect new face locations and encodings new_face_locations = face_recognition.face_locations(resize_frame) new_face_encodings = face_recognition.face_encodings(resize_frame, new_face_locations) for face_location, face_encoding in zip(new_face_locations, new_face_encodings): top, right, bottom, left = face_location # Compare face encodings with known face encodings matches = face_recognition.compare_faces(known_face_encodings, face_encoding) face_distances = face_recognition.face_distance(known_face_encodings, face_encoding) least_distance_index = np.argmin(face_distances) # Annotate the frame based on matching results if matches[least_distance_index]: cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 255, 0), 2) cv2.putText(resize_frame, known_faces[least_distance_index], (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 1, cv2.LINE_AA) else: cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 0, 255), 2) cv2.putText(resize_frame, 'Unknown', (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 1, cv2.LINE_AA) # Display annotated frame cv2.imshow('Face Detection', resize_frame) # Exit loop if 'q' key is pressed if cv2.waitKey(1) & 0xFF == ord('q'): break # Release video capture and close windows cap.release() cv2.destroyAllWindows()
Import Libraries : Import the necessary libraries, including OpenCV and the Face Recognition library, to prepare for the face recognition project.
import cv2 import face_recognition import numpy as np
Prepare Known Faces : Load static of known individuals, encode their faces using the Face Recognition library, and store the encodings along with the corresponding names.
# List of image filenames containing known faces static = ['name.png'] # Encode known faces using face_recognition library known_face_encodings = [face_recognition.face_encodings(cv2.imread(image))[0] for image in static] known_faces = [name.split('.')[0] for name in static]
Initialize Webcam : Start capturing video frames from the webcam and begin the main loop for real-time face recognition.
# Initialize video capture from webcam cap = cv2.VideoCapture(0) while True: ret, frame = cap.read()
Process Frames : Resize the captured frame, detect new face locations, and calculate face encodings for these new faces.
# Resize the frame and convert to RGB format resize_frame = cv2.resize(frame, (0, 0), fx=1, fy=1) resize_frame = cv2.cvtColor(resize_frame, cv2.COLOR_BGR2RGB) # Detect new face locations and encodings new_face_locations = face_recognition.face_locations(resize_frame) new_face_encodings = face_recognition.face_encodings(resize_frame, new_face_locations)
Recognize Faces : Compare the new face encodings with the known face encodings to identify known individuals. Annotate the frame with rectangles and labels based on the recognition results.
for face_location, face_encoding in zip(new_face_locations, new_face_encodings): top, right, bottom, left = face_location # Compare face encodings with known face encodings matches = face_recognition.compare_faces(known_face_encodings, face_encoding) face_distances = face_recognition.face_distance(known_face_encodings, face_encoding) least_distance_index = np.argmin(face_distances) # Annotate the frame based on matching results if matches[least_distance_index]: cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 255, 0), 2) cv2.putText(resize_frame, known_faces[least_distance_index], (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 1, cv2.LINE_AA) else: cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 0, 255), 2) cv2.putText(resize_frame, 'Unknown', (left, top - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 1, cv2.LINE_AA)
Display and Exit : Display the annotated frame with recognized faces and labels in real-time. Exit the loop when the 'q' key is pressed, release the video capture, and close all windows.
# Display annotated frame cv2.imshow('Face Detection', resize_frame) # Exit loop if 'q' key is pressed if cv2.waitKey(1) & 0xFF == ord('q'): break # Release video capture and close windows cap.release() cv2.destroyAllWindows()
In this project, we've built a real-time face recognition system using the OpenCV and Face Recognition libraries. The system processes video frames from a webcam, detects faces, and matches them against known faces, enabling the identification of individuals. The annotations on the frames provide visual feedback on the recognition results. This project serves as a foundational implementation for applications like access control, attendance systems, and personalized user experiences.
Through this project, we've demonstrated the potential of combining computer vision libraries to create powerful applications. The ability to process and recognize faces in real-time opens doors for various innovative solutions across diverse domains.
In today's fast-placed world, managing the personal finances efficiently is more important than ever. Wheather its's tracking daily expenses, sticking to a budget, or planning for future financial goals, having the right tool can make all the difference. That's why we desigined an app which helps to take control of your finances with ease. In this project we used java as the backed and xml as front end.
Let us present the what we have made in this app. We tried to implement a easy to use and user-friendly interface.
Since, This is our frist app adding every feature is a kind of challenge for us. Our Main challenge is to come up with a interactive and user-friendly interface. we searched many website and explored several kinds of app designs and finalized this User interface.
Implementation of RecyclerView enables user to efficiently manage the memory by only storing the items required or that fits screen. This is a way better than ListView because listview completely loads the all the items into the memory and uses it. we faced some issues while implementation of the recyclerview adding customAdapter to the recyclerview which contains tag Image, Name of Transaction, Tag, Amount and displaying that customAdapter on the dashboard was our very frist challenge.
Initally we thought of shared File System to store the data of transactions but it became overwhelming in terms of memory and maintaing code then we thought of using MongoDB Atlas to store and retrive the data but it results as failure after spending hours, reading documentation of MongoDB and reviews stackoverflow. After we thought of using the built-in database of android which is SQLite, it is fast and secure which store the data locally.It also slove the problem of security at the same time.
we have created a new Activity to added new transaction in that we have asked about the Name, Amount, Tag, Type, Date and Note to remember the transaction effectively, But problem is connectioning and the addTransaction button with the database and updating recyclerview.
We have coded the recyclerview in a single activity without using fragments but our intention is to show all transactions in Dashboard fragment and all expense in Expenses fragment and all savings in Savings Fragment but we did't even use a single fragment then we understood that rewriting the code in the form of fragments and updateing the recyclerview accordinglly was a big challenge.
We want to enable user to update the transaction and delete it for that we need to perform deletion and for that we need a unique id that we did't add then we have rewritten the entire code for database to add a auto-increment id or Sno for every transaction to make it unique. Then we created a method to update and perform the deletion for every transaction.
we have create savings and expenses fragments seperatly to manage and view the transactions effectively and efficiently. Adding for these fragments is not possible, every time user needs to go to dashboard for performing any CRUD operations and we fix that now user can perform CRUD operations from any fragment and recyclerview view will respond exacly accordinglly to the user.
These are challenges that we faced while making our frist app. Feel free to contribute or explore ExpenseTracker Project. If you have any feature that you want to contribute don't hesistate to contact us so, we can work together on the feature.
Here's some important this we learned from this project