1. Machine vision hardware can collect information about the surrounding environment.
Currently, commonly used visual sensors mainly include: cameras, ToF lenses, and LiDAR technology.
Machine vision cameras. The purpose of a machine vision camera is to transmit images projected onto a sensor through a lens to a machine device capable of storing, analyzing, and/or displaying them. Images can be displayed using a simple terminal, such as a computer system, for example, to display, store, and analyze them.
LiDAR technology. LiDAR is a scanning sensor that uses non-contact laser ranging technology. Its working principle is similar to that of a conventional radar system: it emits a laser beam to detect targets and collects the reflected beams to form point clouds and acquire data. This data, after photoelectric processing, can generate a precise three-dimensional image. Using this technology, high-precision physical space environment information can be accurately acquired, with ranging accuracy down to the centimeter level.
ToF camera technology. TOF is an abbreviation for Time of Flight technology. It means that the sensor emits modulated near-infrared light, which is reflected when it encounters an object. The sensor calculates the distance of the photographed object by calculating the time difference or phase difference between the emission and reflection of the light, so as to generate depth information. In addition, combined with traditional camera shooting, the three-dimensional outline of the object can be presented as a topographic map with different colors representing different distances.
2. AI vision technology algorithms help robots recognize their surroundings.
Visual technologies include: face recognition, object detection, visual question answering, image captioning, and visual embedded technologies.
Face recognition technology: Face detection can quickly detect faces and return their bounding boxes, accurately identifying various facial attributes; face comparison extracts facial features, calculates the similarity between two faces, and provides a similarity percentage; face search searches for similar faces in a specified face database; given a photo, it compares it with N faces in a specified face database to find the most similar face or multiple similar faces. Based on the degree of matching between the face to be identified and the faces in the existing face database, user information and the matching degree are returned, i.e., 1:N face retrieval.
Object detection: Object detection technology based on deep learning and large-scale image training can accurately identify comprehensive information such as object category, location, and confidence level in an image.
Visual Question Answering: A visual question answering (VQA) system can take an image and a question as input and produce a human-language output.
Image description: It needs to be able to capture the semantic information of an image and generate human-readable sentences.
Visual embedded technologies include human detection and tracking, scene recognition, etc.
3. SLAM technology gives robots better ability to plan movements.
SLAM, short for Simultaneous Localization and Mapping, involves three main steps: localization, mapping, and path planning. Through machine vision mapping, robots can simultaneously locate and map their environment using sophisticated algorithms. SLAM technology effectively addresses problems such as inadequate path planning that fails to cover all areas, resulting in mediocre cleaning outcomes.
▲SLAM technology
Without SLAM (Site-Based Algorithm), the robot vacuum cleaner will randomly turn back every time it encounters an obstacle, failing to cover every area due to the lack of a map and path planning. With SLAM, it can cover any area. Furthermore, the robot vacuum cleaner is equipped with a camera to identify items such as shoes, socks, and pet feces, enabling intelligent avoidance.
4. Ultra-wideband positioning technology based on ToF machine vision
In robotics, Time-of-Flight (ToF) technology is mainly used for high-precision ranging and positioning, with ultra-wideband positioning technology being the most commonly used.
Ultra-wideband (UWB) is a wireless communication technology used for high-precision ranging and positioning. Simplified UWB sensor devices are divided into two types: tags and base stations. Its basic operating principle is to use Time-of-Flight (TOF) for wireless ranging, quickly and accurately calculating the location based on the measured distance.
5. AI Natural Language Processing is an important technology for human-computer interaction.
Humans acquire information 90% of the time through vision, but express themselves 90% of the time through language. Language is the most natural way of human-computer interaction. However, Natural Language Processing (NLP) is very difficult, with differences in grammar, semantics, and culture, as well as the emergence of non-standard languages such as dialects. As NLP matures, human-machine voice interaction will become increasingly convenient, which will also drive robots towards greater "intelligence."
Array microphone and speaker technology for robots is relatively mature. With the rapid development of smart speakers and voice assistants in recent years, microphone arrays and miniature speakers are widely used. In the Iron Man companion robot, voice interaction with the user relies on microphone arrays and speakers. This type of companion robot is like a moving "smart speaker," expanding the boundaries of form.
Currently, chatbots can be divided into general-purpose chatbots and domain-specific chatbots. Advances in natural language processing technology will enhance the interaction experience between robots and humans, making robots appear more "intelligent."
6. AI deep learning algorithms help robots evolve towards self-awareness.
Hardware: The development of AI chip technology has enabled robots to possess higher computing power. Due to the advancement of Moore's Law, the number of transistors per unit area on a chip continues to increase, driving chip miniaturization and improving AI computing power. Furthermore, the emergence of heterogeneous chips, such as RISC-V architecture chips, has also provided hardware support for enhancing the computing power of AI chips.
Algorithms: AI deep learning algorithms are the future of robotics. AI deep learning algorithms give robots the ability to learn through input variables. Whether future robots can possess autonomous consciousness depends on the continuous development of AI technology. Deep learning algorithms offer a possibility for robots to gain self-awareness. Through training neural network models, some algorithms can already surpass humans in single-point domains. The success of AlphaGo shows us that humans can already achieve self-learning capabilities in single categories within AI technology, and in some fields, such as Go, Texas Hold'em, and knowledge competitions, they can rival or even defeat humans.
AI deep learning algorithms have endowed robots with intelligent decision-making capabilities, freeing them from the previous programming logic of single input corresponding to single output, and making robots more "intelligent." However, robots still cannot rival humans in the "multimodal" domain. In particular, signals that are difficult to quantify, such as smell, taste, touch, and psychological signals, still lack reasonable quantification methods.
7. AI+5G expands the activity boundaries of robots, provides greater computing power and more storage space, and enables knowledge sharing.
Four major pain points for mobile robots in the 4G era:
1) Limited scope of work: Tasks can only be performed within a fixed area, the constructed maps are not easy to share, and it is difficult to work in a large-scale environment.
2) Limited business coverage: Limited computing power, recognition performance still needs to be improved; limited capabilities, only able to discover problems, difficult to deploy in batches quickly.
3) Limited service provision: Poor capabilities for complex business operations, need to improve interactive capabilities, and low efficiency in deploying special business applications.
4) High operation and maintenance costs: Low deployment efficiency, each scenario requires map building, route planning, and inspection tasks, etc.
These four major pain points have hindered the penetration of mobile robots in the 4G era. In general, robots still require more storage space and stronger computing power. 5G's low latency, high speed, and wide connectivity will be able to solve these current pain points.
5G's empowerment of mobile robots:
1) Expanding the working scope of robots. The greatest benefit of 5G for robots lies in expanding their physical boundaries. 5G's support for Time-Sensitive Networking (TSN) allows robots to expand their activities from the home to all aspects of society. We can easily imagine a future where humans and robots coexist. In logistics, retail, inspection, security, firefighting, traffic control, and healthcare, 5G and AI can empower robots, helping humanity achieve smart cities.
2) Providing robots with greater computing power and more storage space, enabling knowledge sharing. 5G's advancements in cloud robotics provide robots with greater computing power and more storage space: Flexible allocation of computing resources: meeting the needs of synchronous localization and mapping in complex environments. Access to large databases: object recognition and grasping; long-term localization based on outsourced maps. Enabling knowledge sharing: knowledge sharing among multiple robots.