Exploring the infinite possibilities of machine vision evolution

QualcommAI researchers receive ICLR honors:

"Spherical Convolutional Neural Networks (CNNs)" won the Best Paper Award at ICLR 2018.

Advances in deep learning are enabling machines to "see" the world like humans, a fascinating aspect of artificial intelligence research. A breakthrough technology will allow machines to see and recognize objects in three-dimensional space through processing called spherical convolutional neural networks (CNNs). From machines recognizing tiny molecules to tools analyzing the largest structures in outer space, the possibilities this new discovery can unlock are only beginning. Of course, this also includes many use cases in between, such as guiding robots to navigate safely through crowds.

As this recent award demonstrates, this research area is at the forefront of artificial intelligence (AI) development. Taco Cohen and Max Welling, researchers from Qualcomm Technologies in the Netherlands, along with their partners, won the Best Paper Award at the 2018 International Conference for Learning Representations (ICLR) for their paper "Spherical CNNs," co-authored with the University of Amsterdam. ICLR primarily publishes the latest research in artificial intelligence (AI) and machine learning, and is now in its sixth year. Yoshua Bengio of the University of Montreal and Yann LeCun of NYU/Facebook co-chaired ICLR 2018. The Best Paper Award recognizes the most innovative and impactful research in AI from approximately 1,000 submissions from leading AI labs worldwide.

This paper on spherical CNNs introduces a novel mathematical architecture for constructing CNNs that can robustly analyze spherical images without being affected by surface distortion. This is because spherical CNNs possess an "isovariant" property for rotation, meaning that the internal representations learned by the network rotate synchronously with the input information. Experimental results show that spherical CNNs achieve excellent prediction accuracy on two very different tasks: 3D model recognition of spherical images and atomized energy prediction (an important chemical problem).

Why are spherical CNNs important?

To understand the importance of spherical CNNs, let's look at some background: Deep learning—especially CNNs—has revolutionized AI in recent years, achieving breakthroughs in speech recognition, visual object recognition, natural language processing, and other fields. CNNs excel at analyzing linear signals, such as audio or text, images, or videos, due to their inherent ability to recognize patterns regardless of their spatial or temporal location. This allows CNNs to learn and recognize visual objects, regardless of their position in an image, without needing to view multiple moving versions of the same object during the deep learning model training phase. However, in several recently popular applications, we want the learned signal to be represented on a sphere, such as the omnidirectional cameras used by cars, drones, and other robots to capture spherical images of their entire surroundings. Spherical signals also exist extensively in scientific applications, with examples ranging from earth science to astrophysics.

One approach to analyzing such spherical signals is to project the signal onto a plane and then use a CNN to analyze the results. However, according to cartography, any such "map projection" will result in distortion, making some areas appear larger or smaller than their actual size. This renders the CNN ineffective because as objects move across the sphere, they not only appear to move on the map but also appear shrunk and stretched.

How to use spherical CNNs

Spherical CNNs have numerous applications in the Internet of Things (IoT), robotics, autonomous vehicles, augmented reality (AR), and virtual reality (VR). Today, autonomous drones are already available to consumers, and one day they may be able to deliver packages to your doorstep in minutes. This is a natural application of spherical CNNs in improving object detection and recognition, as well as visual motion analysis. In AR, a 360-degree panoramic view of a room captured by a set of cameras can be integrated into a single spherical image, and virtual objects can be accurately overlaid using the efficient analysis capabilities of spherical CNNs.

Qualcomm is very excited about the applications mentioned above and other transformative applications that this work may bring, and we are actively promoting this research and other research on efficient data learning.

Exploring the infinite possibilities of machine vision evolution

Read next

CATDOLL Miho Hard Silicone Head

CATDOLL 128CM Katya

CATDOLL Ava Hard Silicone Head

CATDOLL 123CM Laura (TPE Body with Hard Silicone Head)