Has deep learning already replaced traditional computer vision?

The author argues that deep learning is merely a computer vision tool, not a panacea, and shouldn't be used indiscriminately simply because it's popular. Traditional computer vision techniques still offer significant advantages, and understanding them can save you considerable time and effort; moreover, mastering traditional computer vision can indeed improve your performance in deep learning. This is because you can better understand the inner workings of deep learning and perform preprocessing steps to enhance the results.

This article was also inspired by a common question on the forum:

Has deep learning replaced traditional computer vision?

Or to put it another way:

Since deep learning seems so effective, is it still necessary to learn traditional computer vision techniques?

That's an excellent question. Deep learning has indeed brought revolutionary breakthroughs to the fields of computer vision and artificial intelligence. Many problems that once seemed insurmountable can now be solved by machines better than humans. Image classification is the best example of this. Indeed, as mentioned before, deep learning has a responsibility to integrate computer vision into the industry landscape.

However, deep learning remains just one tool in computer vision and is clearly not a panacea for all problems. Therefore, this article will elaborate on this point. In other words, I will explain why traditional computer vision techniques are still very useful and worth learning and teaching.

This article is divided into the following parts/arguments:

Deep learning requires big data

Deep learning can sometimes go too far.

Traditional computer vision will enhance your deep learning skills.

Before getting to the main text, I think it's necessary to explain in detail what "traditional computer vision " is, what deep learning is, and its revolutionary nature.

Background Information

Before the advent of deep learning, if you had a task such as image classification, you would perform a process called "feature extraction." A "feature" is a small, interesting, descriptive, or informative part of an image. You would apply a combination of what I call "traditional computer vision techniques" in this article to find these features, including edge detection, corner detection, object detection, and so on.

When using these techniques related to feature extraction and image classification, as many features as possible are extracted from an image of a class of objects (e.g., chairs, horses, etc.), and this is treated as a "definition" (called a "bag of words") of that class of objects. Next, you search for these "definitions" in other images. If a significant portion of the features in the bag of words are present in another image, then that image is classified as containing that specific object (e.g., chairs, horses, etc.).

The challenge of this image classification feature extraction method lies in the fact that you must select which features to look for in each image. This becomes extremely cumbersome and even difficult to achieve as the number of categories you're trying to distinguish grows, say, beyond 10 or 20. Are you looking for corners? Edges? Or texture information? Different categories of objects are best described using different types of features. If you choose to use many features, you'll have to deal with a massive amount of parameters and fine-tune them yourself.

Deep learning introduced the concept of "end-to-end learning," which (simply put) allows machines to learn to find features in each specific category of objects—that is, the most descriptive and prominent features. In other words, it allows neural networks to discover potential patterns in various types of images.

Therefore, with end-to-end learning, you no longer need to manually decide which traditional machine vision technique to use to describe the features. The machine does it all for you. Wired magazine writes:

For example, if you want to teach a deep neural network to recognize a cat, you don't need to tell it to look for whiskers, ears, fur, or eyes. You just need to show it thousands of images of cats, and it will naturally solve the problem. If it keeps mistaking a fox for a cat, you don't need to rewrite the code. You just need to keep training it.

The image below illustrates this difference between feature extraction (using traditional computer vision) and end-to-end learning:

That concludes the background information. Now, let's discuss why traditional computer vision remains essential and why learning it is still highly beneficial.

Deep learning requires a large amount of data

First, deep learning requires data—lots and lots of data. The famous image classification models mentioned earlier were all trained on massive datasets. The top three training datasets are:

ImageNet – 1.5 million images, 1000 object classifications/categories;

COCO – 2.5 million images, 91 object categories;

PASCALVOC – 500,000 images, 20 object classifications.

However, a poorly trained model is likely to perform poorly outside of your training data because the machine lacks insight into the problem and cannot generalize without seeing the data. Furthermore, it's too difficult for you to examine the internals of the training model and manually tweak it, as a deep learning model contains millions of parameters—each of which is adjusted during training. In a sense, a deep learning model is a black box.

Traditional computer vision is completely transparent, allowing you to better evaluate whether your solution remains effective outside the training environment. Your deep insights into the problem can be incorporated into your algorithm. And if anything goes wrong, you can more easily figure out what needs adjusting and where to adjust it.

Deep learning sometimes goes too far.

This is probably my favorite reason for supporting research into traditional computer vision techniques.

Training a deep neural network takes a very long time. You need specialized hardware (such as a high-performance GPU) to train the latest, most advanced image classification models. Want to train it on your decent laptop? Go on a week-long vacation; when you come back, the training will likely still be incomplete.

Furthermore, what if your trained model performs poorly? You'll have to go back to square one and redo everything with different training parameters. This process could be repeated hundreds of times.

But sometimes all of this is completely unnecessary. Traditional computer vision techniques can solve problems more efficiently than deep learning, and with far less code. For example, one project I worked on involved checking if each can passing through a conveyor belt contained a red spoon. You could train a deep neural network to detect spoons using the time-consuming process described earlier, or you could write a simple algorithm with a red threshold (marking any pixel with a certain range of red as white, and all other pixels as black), and then calculate how many white pixels there are. Simple, and it can be done in an hour!

Mastering traditional computer vision techniques can save you a lot of time and reduce unnecessary hassle.

Traditional computer vision will enhance your deep learning skills

Understanding traditional computer vision can actually help you do better in deep learning.

For example, the most commonly used neural network in computer vision is the convolutional neural network. But what is convolution? Convolution is actually a widely used image processing technique (e.g., Sobel edge detection). Understanding this can help you understand what actually happens inside a neural network, allowing you to design and fine-tune it to better solve your problems.

Another process is preprocessing. The data you input into the model often undergoes this process to prepare it for subsequent training. These preprocessing steps are primarily accomplished using traditional computer vision techniques. For example, if you don't have enough training data, you can perform a process called data augmentation. Data augmentation involves randomly rotating, moving, cropping, etc., images in your training dataset to create "new" images. By performing these computer vision operations, you can significantly increase the amount of training data you have.

in conclusion

This article explains why deep learning hasn't replaced traditional computer vision techniques, and why the latter is still worth learning and teaching. First, it focuses on the fact that deep learning often requires large amounts of data to perform well. Sometimes, however, large amounts of data are unavailable, and traditional computer vision can serve as an alternative in such cases. Second, deep learning occasionally over-engineers for specific tasks. In these tasks, standard computer vision can solve the problem more efficiently and with less code than deep learning. Third, mastering traditional computer vision can indeed improve your performance in deep learning. This is because you gain a better understanding of the inner workings of deep learning and can perform preprocessing steps to improve the results.

In conclusion, deep learning is merely a tool for computer vision, not a panacea. Don't use it blindly just because it's popular. Traditional computer vision techniques still have significant advantages, and understanding them can save you a lot of time and effort.

Has deep learning already replaced traditional computer vision?

Read next

CATDOLL Dolly Hybrid Silicone Head

CATDOLL CATDOLL 115CM Dora (TPE Body with Soft Silicone Head)

CATDOLL Nanako Hard Silicone Head

CATDOLL Kelsie Hard Silicone Head