Skip to main navigation menu Skip to main content Skip to site footer

Articles

Vol. 6 No. 3 (2019)

End-to-End Multiview Gesture Recognition for Autonomous Car Parking System

  • KARRAY Fakhri
  • AMARA Hassene Ben
Submitted
February 5, 2024
Published
2024-02-05

Abstract

The use of hand gestures can be the most intuitive human-machine interaction medium. The early approaches for hand gesture recognition used device-based methods. These methods use mechanical or optical sensors attached to a glove or markers, which hinder the natural human-machine communication. On the other hand, vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine. Therefore, vision gesture recognition has been a popular area of research for the past thirty years. Hand gesture recognition finds its application in many areas, particularly the automotive industry where advanced automotive human-machine interface (HMI) designers are using gesture recognition to improve driver and vehicle safety. However, technology advances go beyond active/passive safety and into convenience and comfort. In this context, one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence (CPAMI) at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking. The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system. We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database. We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data, which significantly improves the proposed system performance in a real world environment. We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input (fed by monocular camera) to a Multiview 360 feed, offered by a six cameras module. Finally, we optimize the proposed solution to work on a limited resource embedded platform (Nvidia Jetson TX2) that is used by automakers for vehicle-based features, without sacrificing the accuracy robustness and real time functionality of the system.

Downloads

Download data is not yet available.

Similar Articles

<< < 2 3 4 5 6 7 8 > >> 

You may also start an advanced similarity search for this article.