• The Process of AI Image Recognition Systems.
  • AI Image Recognition with Machine Learning
  • Deep Learning Image Recognition Models.
  • Image Recognition vs. Image Detection




    Download 7,26 Mb.
    bet4/15
    Sana25.05.2024
    Hajmi7,26 Mb.
    #253499
    TuriИсследование
    1   2   3   4   5   6   7   8   9   ...   15
    Bog'liq
    m12


    Image Recognition vs. Image Detection.
    The terms image recognition and image detection are often used in place of each other. However, there are important technical differences. Image Detection is the task of taking an image as input and finding various objects within it. An example is face detection, where algorithms aim to find face patterns in images (see the example below). When we strictly deal with detection, we do not care whether the detected objects are significant in any way. The goal of image detection is only to distinguish one object from another to determine how many distinct entities are present within the picture. Thus, bounding boxes are drawn around each separate object. On the other hand, image recognition is the task of identifying the objects of interest within an image and recognizing which category or class they belong to.



    Pic 1.1.2 Real-time video recognition with the different classes “car” and “truck”– built with Viso Suite
    The conventional computer vision approach to image recognition is a sequence (computer vision pipeline) of image filtering, image segmentation, feature extraction, and rule-based classification. However, engineering such pipelines requires deep expertise in image processing and computer vision, a lot of development time and testing, with manual parameter tweaking. In general, traditional computer vision and pixel-based image recognition systems are very limited when it comes to scalability or the ability to re-use them in varying scenarios/locations.
    Image recognition with machine learning, on the other hand, uses algorithms to learn hidden knowledge from a dataset of good and bad samples (see supervised vs. unsupervised learning). The most popular machine learning method is deep learning, where multiple hidden layers of a neural network are used in a model.

    Pic 1.1.3 Video analysis with machine learning using YOLOv7 – Built with Viso Suite
    The introduction of deep learning, in combination with powerful AI hardware and GPUs, enabled great breakthroughs in the field of image recognition. With deep learning, image classification and deep neural network face recognition algorithms achieve above-human-level performance and real-time object detection. Still, it is a challenge to balance performance and computing efficiency. Hardware and software with deep learning models have to be perfectly aligned in order to overcome costing problems of computer vision. Therefore, the ability to always use the most recent algorithm has direct costing implications: The most powerful and efficient algorithm requires several times cheaper hardware or achieves several times better performance on equivalent hardware when compared to legacy algorithms.

    • Computer Vision Algorithm Progress Over the years, we have seen significant jumps in computer vision algorithm performance:

    • In 2017, the Mask RCNN algorithm was the fastest real-time object detector on the MS COCO benchmark, with an inference time of 330ms per frame.

    • In comparison, the YOLOR algorithm released in 2021 achieves inference times of 12ms on the same benchmark, surpassing the popular YOLOv4 and YOLOv3 deep learning algorithms.

    • And in July 2022, the YOLOv7 algorithm even surpassed YOLOR significantly in terms of both speed and accuracy.

    • In 2023, a newly released YOLOv8 model achieved state-of-the-art performance for real-time object detection. The powerful Segment Anything model marks the current SOTA for image segmentation.

    • At the beginning of 2024, YOLOv9 was released, a new architecture for training object detection AI models.

    Compared to the traditional computer vision approach in early image processing 20 years ago, deep learning requires only engineering knowledge of a machine learning tool, not expertise in specific machine vision areas to create handcrafted features. While early methods required enormous amounts of training data, newer deep learning methods only needed tens of learning samples. However, deep learning requires manual labeling of data to annotate good and bad samples, a process called image annotation. The process of learning from data that is labeled by humans is called supervised learning. The process of creating such labeled data to train AI models requires time-consuming human work, for example, to label images and annotate standard traffic situations for autonomous vehicles.
    The Process of AI Image Recognition Systems. There are a few steps that are at the backbone of how image recognition systems work.

    1. Dataset with training data.

    The image recognition models require labeled images as training data (video, picture, photo, etc.). Neural networks need those training images from an acquired dataset to create perceptions of how certain classes look. For example, an image recognition model that detects different poses (pose estimation model) would need multiple instances of different human poses to understand what makes poses unique from each other.

    1. Training of Neural Networks for AI Image Recognition Online

    The images from the created dataset are fed into a neural network algorithm. This is the deep or machine learning aspect of creating an image recognition model. The training of an image recognition algorithm makes it possible for convolutional neural network image recognition to identify specific classes. Multiple well-tested frameworks are widely used for these purposes today.

    1. AI Model Testing

    The trained model needs to be tested with images that are not part of the training dataset. This is used to determine the usability, performance, and accuracy of the model. Therefore, about 80-90% of the complete image dataset is used for model training, while the remaining data is reserved for model testing. The model performance is measured based on a set of parameters that indicate the percent confidence of accuracy per test image, incorrect identifications, and more. Read our article about how to evaluate the model performance in machine learning.

    Pic 1.1.3 Application of computer vision in aviation – Viso Suite
    AI Image Recognition with Machine Learning. Before GPUs (Graphical Processing Unit) became powerful enough to support massively parallel computation tasks of neural networks, traditional machine learning algorithms have been the gold standard for image recognition.
    Let’s look at the three most popular image recognition machine learning models.

    • Support Vector Machines

    SVMs work by making histograms of images containing the target objects and also of images that don’t. The algorithm then takes the test picture and compares the trained histogram values with the ones of various parts of the picture to check for matches.

    • Bag of Features Models

    Bag of Features models like Scale Invariant Feature Transformation (SIFT) and Maximally stable extremal regions (MSER) work by taking the image to be scanned and a sample photo of the object to be found as a reference. The model then tries to pixel-match the features from the sample photo to various parts of the target image to see if matches are found.

    • Viola-Jones Algorithm

    A widely-used facial recognition algorithm from pre-CNN (Convolutional Neural Network) times, Viola-Jones works by scanning faces and extracting features that are then passed through a boosting classifier. This, in turn, generates a number of boosted classifiers that are used to check test images. For a successful match to be found, a test image must generate a positive result from each of these classifiers. Deep Learning Image Recognition Models In image recognition, the use of Convolutional Neural Networks (CNN) is also called Deep Image Recognition. CNNs are unmatched by traditional machine learning methods. Not only are CNNs faster and deliver the best detection results in machine learning image recognition, but they can also detect multiple instances of an object from within an image, even if the image is slightly warped, stretched, or altered in some other form.
    Deep Learning Image Recognition Models. In image recognition, the use of Convolutional Neural Networks (CNN) is also called Deep Image Recognition. CNNs are unmatched by traditional machine learning methods. Not only are CNNs faster and deliver the best detection results in machine learning image recognition, but they can also detect multiple instances of an object from within an image, even if the image is slightly warped, stretched, or altered in some other form.
    In Deep Image Recognition, Convolutional Neural Networks even outperform humans in tasks such as classifying objects into fine-grained categories such as the particular breed of dog or species of bird. The most popular deep learning models, such as YOLO, SSD, and RCNN use convolution layers to parse a digital image or photo. During training, each layer of convolution acts like a filter that learns to recognize some aspect of the image before it is passed on to the next. One layer processes colors, another layer shapes, and so on. In the end, a composite result of all these layers is collectively taken into account when determining if a match has been found.

    Pic 1.1.4 AI image recognition with object detection and classification using Deep Learning

    Download 7,26 Mb.
    1   2   3   4   5   6   7   8   9   ...   15




    Download 7,26 Mb.

    Bosh sahifa
    Aloqalar

        Bosh sahifa



    Image Recognition vs. Image Detection

    Download 7,26 Mb.