Sommaire
- Introduction
- Some classic algorithms
- AI-boosted computer vision
Introduction
What is computer vision
This term is used to refer to all techniques and algorithms used for analyzing visual elements (photo, video, gif, thermal sensor, etc.)
To draw a parallel with a human being, computer vision would be similar to the ability that our brain possesses to name what our eyes see, classify it, record it, anticipate it or understand it.
This concept has existed since 1960, and many applications of mathematical models have made it possible to perform rudimentary detections and extract some visual information from images, such as labeling for example.
In 1985, more advanced algorithms began to emerge, such as the Tesseract OCR (Optical Character Recognition) Engine developed by Google and HP. This program allowed for the first time the transformation of an image into text for a computer. (This project is still being updated and maintained by Google).
Link to the research paper
In the 1990s, the first human face detection systems appeared. This was made possible thanks to techniques like, static learning.
Today, thanks to the rise of deep learning, computer vision has never been so precise and fast.
Some classic algorithms
This field, in addition to being exciting, is increasingly accessible thanks to the many tools that exist today:
- Keras
- BoofCV
- OpenCV
- Yolo
- SimpleCV
- TensorFlow
- MatLab
- Cuda(Nvidia)
I undertook the experimentation of some tests with different algorithms which do not rely on the application of artificial intelligence at first.
Simple filter detection
The most basic way to detect an object/shape in an image is to define a filter for our search. A filter can have several representations, but the easiest way is to use an image.
Once the filter has been defined, all you have to do is retrieve an input such as a photo for example, and pass our filter over all the pixels of this image.
When we get a certain level of resemblance between the filter and a location on the target photo we can report a detection.
This is what it looks like in practice!
I made this example with a python script using OpenCV.
Every second I capture my computer screen several times and I pass a filter on each capture to determine its presence.
If I find a match, I retrieve the coordinates and display a cross at its location along with the name of what I'm looking for.
This type of detection is very limited, because it takes a close resemblance between the element displayed on the screen and the filter to obtain a match.
Canny Edge Detection
This algorithm comes to us from John F. Canny who developed it in 1986.
It uses the operation of detection by filter, but goes further in the concept in order to increase the precision of the detection.
How does it works?
This method is based on a simplification of the target image in order to keep only the important details, the
Edges
.
From a technical point of view, this consists of the application of a Gaussian filter (to reduce the noise of the image) and the use of convolution masks to detect and bring out the contours of the image thanks to the different color changes be the latter elements.
An image is made up of several thousand or even million pixels of different colors with many possible variations.
John Canny's algorithm removes colors and keeps only the most rudimentary shapes of the processed images to facilitate detection and reduce the computational load.
Example of Filter Edge
This demonstration is also based on a script in Python and OpenCV.
The goal is to detect the orange dots on the screen and click on them as fast as possible.
For this test, I implemented an algorithm that also aims to reduce the computational load of the computer to facilitate detection by modifying the image.
This is therefore not a Canny Edge, but a Filter Edge. By playing with the contrasts and intensities of the image, I can remove from the visual field the elements that do not interest me.
As a result, I can greatly increase the leniency of the algorithm when calculating the correspondence between the filter and the target image.
Due to the performance of my computer and the way MacOs handles screenshots the results are not impressive but on another machine it would work fine.
ORB (Oriented Fast and Rotated BRIEF)
Behind this barbaric name hides a relatively effective computer vision technique developed by OpenCV.
The objective of this algorithm is once again to define a filter. But instead of keeping the whole of this filter, we will keep only a few anchor points.
It will then be a question of looking for a correspondence between each point of our filter and our target image. If the algorithm detects at a certain place in our image more than X corresponding points, it signals it to us and marks the detection.
This algorithm is particularly effective in detecting the presence of a filter on an image, but is not precise in determining an exact position. In question the fact that it is not the filter which is applied in full, but only certain anchor points.
ORB in action
For this example, I applied a combination of Canny Edge and ORB.
This allows me to keep a minimum of elements on the screen to limit false positives and reduce computational loads.
Despite the detections, we can see on the left screen that the mouse does not move, because my program cannot detect the position of the circles on the screen due to the operation of the algorithm which acts more like a tool of detection and not of localization.
AI-boosted computer vision
Artificial intelligence has come to disrupt the world of computer vision. The latter offers unprecedented speed, efficiency and simplicity in image detection.
Many frameworks offer pre-trained AI models allowing many functionalities.
With openCV and Yolo, for example, it is possible to connect your webcam and perform object recognition in a few minutes
In order to experiment what it was possible to do with these tools. I made a “sports assistant” based on Google's mediapipe framework.
The objective of this assistant and to control the amplitude when performing a bodybuilding exercise: the curl.
Setup
The first step of the project was to set up a python working environment with OpenCV and mediapipe.
To make it easier to set up, we'll only work with a photo to start with.
Here is the raw result given by Mediapipe with the detection model of a human body.
We can visualize different anchor points at the level of the head and the body.
By following the documentation of the framework we can isolate only the points that interest us on the body.
In order to simplify the visualization, we will only use OpenCV for the graphical rendering of the application.
By taking the coordinates of 3 succinct joints, it is possible with a formula to calculate the angle of the elbow and thus detect the flexions of our user.
Adding logic to our program, it is possible to determine a starting and ending angle to guide the user in his exercises.
Final version of the App
A real-time feedback allows you to count the number of repetitions as well as the progress in the current movement.
Conclusion
The field of computer vision is an exciting field of activity that has evolved very rapidly over the past 10 years, and which is increasingly in demand in our society.
More generally, this area meets major needs in current challenges such as autonomous driving, for example.
I strongly urge you to try it out for fun or training in this area.