Hi There !

I'M
VENKAT SAI

Machine Learning Engineer

Hi There !

 I'M VENKAT SAi

MY RECENT WORKS

MY PROJECTS

My Coding Blogs

Hand Gesture Recognition using MediaPipe and OpenCV

In the world of computer vision, hand gesture recognition has gained significant attention due to its potential applications in human-computer interaction, virtual reality, gaming, and many more. In this blog post, I will explain one of my Python project that utilizes the MediaPipe library along with OpenCV to perform real-time hand gesture recognition. By analyzing hand landmarks (fingers locations), we'll control keyboard inputs to navigate through the application using hand gestures.

Introduction :

Hand gesture recognition involves interpreting and analyzing the position of a person's hand to infer different gestures. This project uses MediaPipe, a popular open-source library developed by Google, and OpenCV, a well-known computer vision library, to achieve this task. By capturing the webcam feed, the code identifies hand landmarks and calculates the distances between specific finger landmarks to recognize gestures like a closed fist or an open palm.

Libraies Used :

MediaPipe : This library provides pre-trained machine learning models to detect and track various hand and body parts' landmarks. Click here To View MediaPipe Documentation

pip install mediapipe

OpenCV : Open Source Computer Vision Library, which is widely used for computer vision tasks, such as capturing video streams, image processing, and graphical display. Click here To View OpenCV Documentation

pip install opencv-python

Keyboard : This library provides functionality for controlling and simulating keyboard input. It allows you to automate keyboard actions, such as typing characters, pressing keys, and more. Click here To View Keyboard Documentation

pip install keyboard

The Project Complete Code

  import mediapipe as mp
  import cv2
  import keyboard
  
  mp_drawing = mp.solutions.drawing_utils
  mp_hands = mp.solutions.hands
  
  cap = cv2.VideoCapture(0)
  
  hands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5)
  
  
  def distance(point1, point2):
      return ((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2) ** 0.5
  
  
  def fingers_landmarks(hand):
      thumbpoint1 = hand.landmark[mp_hands.HandLandmark.THUMB_TIP]
      thumbpoint2 = hand.landmark[mp_hands.HandLandmark.THUMB_IP]
      thumbpoint3 = hand.landmark[mp_hands.HandLandmark.THUMB_MCP]
      thumbpoint4 = hand.landmark[mp_hands.HandLandmark.THUMB_CMC]
      indexpoint1 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
      indexpoint2 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP]
      indexpoint3 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP]
      indexpoint4 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP]
      middlepoint1 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP]
      middlepoint2 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_DIP]
      middlepoint3 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_PIP]
      middlepoint4 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_MCP]
      ringpoint1 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_TIP]
      ringpoint2 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_DIP]
      ringpoint3 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_PIP]
      ringpoint4 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_MCP]
      littlepoint1 = hand.landmark[mp_hands.HandLandmark.PINKY_TIP]
      littlepoint2 = hand.landmark[mp_hands.HandLandmark.PINKY_DIP]
      littlepoint3 = hand.landmark[mp_hands.HandLandmark.PINKY_PIP]
      littlepoint4 = hand.landmark[mp_hands.HandLandmark.PINKY_MCP]
      wrist = hand.landmark[mp_hands.HandLandmark.WRIST]
      return [
          thumbpoint1,
          thumbpoint2,
          thumbpoint3,
          thumbpoint4,
          indexpoint1,
          indexpoint2,
          indexpoint3,
          indexpoint4,
          middlepoint1,
          middlepoint2,
          middlepoint3,
          middlepoint4,
          ringpoint1,
          ringpoint2,
          ringpoint3,
          ringpoint4,
          littlepoint1,
          littlepoint2,
          littlepoint3,
          littlepoint4,
          wrist,
      ]
  
  
  def check(list_name, symbol):
      return (
          all(
              [
                  True if dist <= list_name[i] else False
                  for i, dist in enumerate(distances)
              ]
          )
          if symbol == "<="
          else all(
              [
                  True if dist >= list_name[i] else False
                  for i, dist in enumerate(distances)
              ]
          )
      )
  
  
  fist_close = [0.38, 0.25, 0.2, 0.19, 0.2]
  fist_open = [0.24, 0.39, 0.43, 0.41, 0.35]
  while True:
      ret, frame = cap.read()
      image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
      image = cv2.flip(image, 1)
      image.flags.writeable = False
      results = hands.process(image)
      image.flags.writeable = True
      image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  
      if results.multi_hand_landmarks:
          for num, hand in enumerate(results.multi_hand_landmarks):
              mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS)
              fingers = fingers_landmarks(hand)
              distances = [distance(fingers[20], fingers[i]) for i in range(0, 20, 4)]
              if check(fist_close, "<="):
                  keyboard.press("left")
              elif check(fist_open, ">="):
                  keyboard.press("right")
              else:
                  keyboard.release("right")
                  keyboard.release("left")
  
      cv2.imshow("Hand Tracking", image)
  
      if cv2.waitKey(1) == ord("q"):
          print(image.shape)
          break
  
  cap.release()
  cv2.destroyAllWindows()

Project Overview

The project consists the following steps :

Import Libraries : Import the required libraries, including MediaPipe, OpenCV, and the keyboard library for simulating keypresses.

  #  This contains the model that can recognize hand landmarks
  import mediapipe as mp 
  
  #  This is used for video capture and processing
  import cv2             
  
  #  This is used for the keyboard integration
  import keyboard

Initialize Components : Initialize the necessary components like the webcam feed, the MediaPipe hands detection model, and set the detection and tracking confidence levels.

          #  This is used to draw joining lines through hand landmarks (fingers locations)
  mp_drawing = mp.solutions.drawing_utils 
          #  This is used for the identifing the hand landmarks
  mp_hands = mp.solutions.hand        
          #  This is object that is required to start VideoCapture to Cam    
  cap = cv2.VideoCapture(0)
          #  Initializes an instance of the Hands class from the MediaPipe library
  hands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5)

Define Functions : Define functions to calculate distances between hand landmarks and to check for specific hand gestures based on calculated distances.

          # calculates the distance between the hand locations
  def distance(point1, point2):
      return ((point1.x - point2.x) ** 2 + (point1.y - point2.y) ** 2) ** 0.5
           # This are the Hand landmarks values we can get this from MediaPipe Documentation
  def fingers_landmarks(hand):
           thumbpoint1 = hand.landmark[mp_hands.HandLandmark.THUMB_TIP]
           thumbpoint2 = hand.landmark[mp_hands.HandLandmark.THUMB_IP]
           thumbpoint3 = hand.landmark[mp_hands.HandLandmark.THUMB_MCP]
           thumbpoint4 = hand.landmark[mp_hands.HandLandmark.THUMB_CMC]
           indexpoint1 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
           indexpoint2 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP]
           indexpoint3 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP]
           indexpoint4 = hand.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP]
           middlepoint1 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP]
           middlepoint2 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_DIP]
           middlepoint3 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_PIP]
           middlepoint4 = hand.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_MCP]
           ringpoint1 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_TIP]
           ringpoint2 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_DIP]
           ringpoint3 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_PIP]
           ringpoint4 = hand.landmark[mp_hands.HandLandmark.RING_FINGER_MCP]
           littlepoint1 = hand.landmark[mp_hands.HandLandmark.PINKY_TIP]
           littlepoint2 = hand.landmark[mp_hands.HandLandmark.PINKY_DIP]
           littlepoint3 = hand.landmark[mp_hands.HandLandmark.PINKY_PIP]
           littlepoint4 = hand.landmark[mp_hands.HandLandmark.PINKY_MCP]
           wrist = hand.landmark[mp_hands.HandLandmark.WRIST]
           return [
               thumbpoint1,
               thumbpoint2,
               thumbpoint3,
               thumbpoint4,
               indexpoint1,
               indexpoint2,
               indexpoint3,
               indexpoint4,
               middlepoint1,
               middlepoint2,
               middlepoint3,
               middlepoint4,
               ringpoint1,
               ringpoint2,
               ringpoint3,
               ringpoint4,
               littlepoint1,
               littlepoint2,
               littlepoint3,
               littlepoint4,
               wrist,
           ]

Gesture Thresholds : Define threshold values for distances between finger landmarks that correspond to certain gestures. These threshold values will be used to determine whether a gesture is being performed.

          # We can custimize our own gesture values 

  # I found the values of fist_open and close by exploring various values 

  # Closed fist
  fist_close = [0.38,0.25,0.2,0.19,0.2]
  
  # Opened fist 
  fist_open = [0.24,0.39,0.43,0.41,0.35]

Main Loop : Enter the main loop that captures video frames from the webcam feed, processes them using MediaPipe's hand detection model, and calculates distances between finger landmarks.

  while True:
      # Starts the Web Cam
      ret, frame = cap.read()
      # By default the image/video is in BGR ( Blue, Green, Red ) we will convert it into RGB ( Red, Green, Blue ) 

      # We do this Because RGB is a Standard format
      image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
      # By default opencv provides the mirror video , So we Will flip it
      image = cv2.flip(image, 1)
      # Releasing the default values of the model
      image.flags.writeable = False
      # process the image/video the we captured
      results = hands.process(image)
      # Setting of image flags
      image.flags.writeable = True
      # Converting Back to BGR
      image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
  
      # Obtaining the hand landmarks and joining the landmarks
      if results.multi_hand_landmarks:
      for num, hand in enumerate(results.multi_hand_landmarks):
          mp_drawing.draw_landmarks(image, hand, mp_hands.HAND_CONNECTIONS)
          fingers = fingers_landmarks(hand)
          distances = [distance(fingers[20], fingers[i]) for i in range(0, 20, 4)]

Recognize Gestures : Based on the calculated distances and defined thresholds, recognize gestures such as a closed fist or an open palm. Simulate keyboard inputs (left and right arrow keys) using the keyboard library based on recognized gestures.

  def check(list_name, symbol):
      return (
          all(
              [
                  True if dist <= list_name[i] else False
                  for i, dist in enumerate(distances)
              ]
          )
          if symbol == "<="
          else all(
              [
                  True if dist >= list_name[i] else False
                  for i, dist in enumerate(distances)
              ]
          )
      )
              
                      #  Returns True if the fist is open or close and Simulate the keyboard input
          if check(fist_close, "<="):
          keyboard.press("left")
      elif check(fist_open, ">="):
          keyboard.press("right")
      else:
          keyboard.release("right")
          keyboard.release("left")

Display Output : Display the processed video frame with drawn hand landmarks and connections.

Exit the Application : Allow the user to exit the application by pressing the 'q' key.

      # Shows the Cam Capture
  cv2.imshow('Hand Tracking', image)
      # If Q is pressed then the video capture will stop
  if cv2.waitKey(1) == ord('q'):
  
  
      break
      # Releases the cam Capture
  cap.release() 
      # Destroys all Windows that are running by opencv
  cv2.destroyAllWindows()

Real-time Face Recognition using OpenCV and Face Recognition Library

Face recognition technology has become increasingly relevant across various domains, from security systems to personal identification. In this blog post, we'll explore a Python project that leverages the power of OpenCV and the Face Recognition library to build a real-time face recognition system. By analyzing webcam feed data, the system can detect and label known individuals while identifying unknown faces.

Introduction :

The goal of this project is to create a real-time face recognition system using the OpenCV library and the Face Recognition library. This system captures video from a webcam, processes the frames to detect faces, and then compares the detected face encodings with known face encodings to determine if the face belongs to a known person. The system annotates the frames with rectangles and labels, indicating the identity of known individuals or marking faces as "Unknown" if no match is found.

Libraries Used :

OpenCV : Open Source Computer Vision Library, a powerful tool for various computer vision tasks including image and video processing, object detection, and more. Click here to view OpenCV Documentation.

pip install opencv-python

Face Recognition : A Python library specifically designed for face recognition tasks. It provides tools for face detection, face encoding, and face comparison. Click here to view Face Recognition Library.

pip install face_recognition

The Project Complete Code

        import cv2
        import face_recognition
        import numpy as np
        
        # List of image filenames containing known faces
        static = ['name.png']
        
        # Encode known faces using face_recognition library
        known_face_encodings = [face_recognition.face_encodings(cv2.imread(image))[0] for image in static]
        known_faces = [name.split('.')[0] for name in static]
        
        # Initialize video capture from webcam
        cap = cv2.VideoCapture(0)
        
        while True:
            ret, frame = cap.read()
            resize_frame = cv2.resize(frame, (0, 0), fx=1, fy=1)
            resize_frame = cv2.cvtColor(resize_frame, cv2.COLOR_BGR2RGB)
            
            # Detect new face locations and encodings
            new_face_locations = face_recognition.face_locations(resize_frame)
            new_face_encodings = face_recognition.face_encodings(resize_frame, new_face_locations)
            
            for face_location, face_encoding in zip(new_face_locations, new_face_encodings):
                top, right, bottom, left = face_location
                
                # Compare face encodings with known face encodings
                matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
                face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
                least_distance_index = np.argmin(face_distances)
                
                # Annotate the frame based on matching results
                if matches[least_distance_index]:
                    cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 255, 0), 2)
                    cv2.putText(resize_frame, known_faces[least_distance_index], (left, top - 10),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 1, cv2.LINE_AA)
                else:
                    cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 0, 255), 2)
                    cv2.putText(resize_frame, 'Unknown', (left, top - 10),
                                cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 1, cv2.LINE_AA)
                
                # Display annotated frame
                cv2.imshow('Face Detection', resize_frame)
                
                # Exit loop if 'q' key is pressed
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        
        # Release video capture and close windows
        cap.release()
        cv2.destroyAllWindows()

Project Overview

Import Libraries : Import the necessary libraries, including OpenCV and the Face Recognition library, to prepare for the face recognition project.

        import cv2
        import face_recognition
        import numpy as np

Prepare Known Faces : Load static of known individuals, encode their faces using the Face Recognition library, and store the encodings along with the corresponding names.

        # List of image filenames containing known faces
        static = ['name.png']
        
        # Encode known faces using face_recognition library
        known_face_encodings = [face_recognition.face_encodings(cv2.imread(image))[0] for image in static]
        known_faces = [name.split('.')[0] for name in static]

Initialize Webcam : Start capturing video frames from the webcam and begin the main loop for real-time face recognition.

        # Initialize video capture from webcam
        cap = cv2.VideoCapture(0)
        
        while True:
            ret, frame = cap.read()

Process Frames : Resize the captured frame, detect new face locations, and calculate face encodings for these new faces.

        # Resize the frame and convert to RGB format
        resize_frame = cv2.resize(frame, (0, 0), fx=1, fy=1)
        resize_frame = cv2.cvtColor(resize_frame, cv2.COLOR_BGR2RGB)
        
        # Detect new face locations and encodings
        new_face_locations = face_recognition.face_locations(resize_frame)
        new_face_encodings = face_recognition.face_encodings(resize_frame, new_face_locations)

Recognize Faces : Compare the new face encodings with the known face encodings to identify known individuals. Annotate the frame with rectangles and labels based on the recognition results.

        for face_location, face_encoding in zip(new_face_locations, new_face_encodings):
            top, right, bottom, left = face_location
            
            # Compare face encodings with known face encodings
            matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
            face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
            least_distance_index = np.argmin(face_distances)
            
            # Annotate the frame based on matching results
            if matches[least_distance_index]:
                cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 255, 0), 2)
                cv2.putText(resize_frame, known_faces[least_distance_index], (left, top - 10),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 1, cv2.LINE_AA)
            else:
                cv2.rectangle(resize_frame, (left, top), (right, bottom), (0, 0, 255), 2)
                cv2.putText(resize_frame, 'Unknown', (left, top - 10),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 1, cv2.LINE_AA)

Display and Exit : Display the annotated frame with recognized faces and labels in real-time. Exit the loop when the 'q' key is pressed, release the video capture, and close all windows.

        # Display annotated frame
        cv2.imshow('Face Detection', resize_frame)
                
        # Exit loop if 'q' key is pressed
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
        
        # Release video capture and close windows
        cap.release()
        cv2.destroyAllWindows()

Conclusion

In this project, we've built a real-time face recognition system using the OpenCV and Face Recognition libraries. The system processes video frames from a webcam, detects faces, and matches them against known faces, enabling the identification of individuals. The annotations on the frames provide visual feedback on the recognition results. This project serves as a foundational implementation for applications like access control, attendance systems, and personalized user experiences.

Through this project, we've demonstrated the potential of combining computer vision libraries to create powerful applications. The ability to process and recognize faces in real-time opens doors for various innovative solutions across diverse domains.

Expenses Tracker The Android App

Introduction

In today's fast-placed world, managing the personal finances efficiently is more important than ever. Wheather its's tracking daily expenses, sticking to a budget, or planning for future financial goals, having the right tool can make all the difference. That's why we desigined an app which helps to take control of your finances with ease. In this project we used java as the backed and xml as front end.

Features Designed In This App

Let us present the what we have made in this app. We tried to implement a easy to use and user-friendly interface.

Challenges Faced In This App

Since, This is our frist app adding every feature is a kind of challenge for us. Our Main challenge is to come up with a interactive and user-friendly interface. we searched many website and explored several kinds of app designs and finalized this User interface.

Recycler View

Implementation of RecyclerView enables user to efficiently manage the memory by only storing the items required or that fits screen. This is a way better than ListView because listview completely loads the all the items into the memory and uses it. we faced some issues while implementation of the recyclerview adding customAdapter to the recyclerview which contains tag Image, Name of Transaction, Tag, Amount and displaying that customAdapter on the dashboard was our very frist challenge.

Setuping DataBase

Initally we thought of shared File System to store the data of transactions but it became overwhelming in terms of memory and maintaing code then we thought of using MongoDB Atlas to store and retrive the data but it results as failure after spending hours, reading documentation of MongoDB and reviews stackoverflow. After we thought of using the built-in database of android which is SQLite, it is fast and secure which store the data locally.It also slove the problem of security at the same time.

Adding Transaction and updating RecyclerView

we have created a new Activity to added new transaction in that we have asked about the Name, Amount, Tag, Type, Date and Note to remember the transaction effectively, But problem is connectioning and the addTransaction button with the database and updating recyclerview.

Dashboard Functionality

We have coded the recyclerview in a single activity without using fragments but our intention is to show all transactions in Dashboard fragment and all expense in Expenses fragment and all savings in Savings Fragment but we did't even use a single fragment then we understood that rewriting the code in the form of fragments and updateing the recyclerview accordinglly was a big challenge.

Updation And Deletion Of Transaction

We want to enable user to update the transaction and delete it for that we need to perform deletion and for that we need a unique id that we did't add then we have rewritten the entire code for database to add a auto-increment id or Sno for every transaction to make it unique. Then we created a method to update and perform the deletion for every transaction.

Savings and Expenses Fragment

we have create savings and expenses fragments seperatly to manage and view the transactions effectively and efficiently. Adding for these fragments is not possible, every time user needs to go to dashboard for performing any CRUD operations and we fix that now user can perform CRUD operations from any fragment and recyclerview view will respond exacly accordinglly to the user.

These are challenges that we faced while making our frist app. Feel free to contribute or explore ExpenseTracker Project. If you have any feature that you want to contribute don't hesistate to contact us so, we can work together on the feature.

Conclusion

Here's some important this we learned from this project

Design structrue or blue print of the project before getting into coding.

Fragments are used when you have reusable UI components or when you need to create flexible layouts that can adapt to different screen sizes or orientations.

activities are used when you have distinct screens or workflows in your app, such as navigating from a list screen to a detail screen.

MY RECENT WORKS

MY PROJECTS

My Coding Blogs

Hand Gesture Recognition using MediaPipe and OpenCV

Introduction :

Libraies Used :

The Project Complete Code

Project Overview

Real-time Face Recognition using OpenCV and Face Recognition Library

Introduction :

Libraries Used :

The Project Complete Code

Project Overview

Conclusion

Expenses Tracker The Android App

Introduction

Features Designed In This App

Challenges Faced In This App

Conclusion

Thanks alot for your contribution

Connect with Me