Refer to caption

4D Millimeter-Wave Radar in Autonomous Driving: A Survey

Electrical Engineering and Systems Science > Signal Processing

[Submitted on 7 Jun 2023 (v1), last revised 26 Apr 2024 (this version, v4)]

Zeyu Han, Jiahao Wang, Zikun Xu, Shuocheng Yang, Lei He, Shaobing Xu, Jianqiang Wang, Keqiang Li

Abstract: The 4D millimeter-wave (mmWave) radar, proficient in measuring the range, azimuth, elevation, and velocity of targets, has attracted considerable interest within the autonomous driving community. This is attributed to its robustness in extreme environments and the velocity and elevation measurement capabilities. However, despite the rapid advancement in research related to its sensing theory and application, there is a conspicuous absence of comprehensive surveys on the subject of 4D mmWave radar.

In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radar in autonomous driving. Initially, the paper provides reviews on the theoretical background and progress of 4D mmWave radars, encompassing aspects such as the signal processing workflow, resolution improvement approaches, and extrinsic calibration process. Learning-based radar data quality improvement methods are present following. Then, this paper introduces relevant datasets and application algorithms in autonomous driving perception, localization and mapping tasks. Finally, this paper concludes by forecasting future trends in the realm of 4D mmWave radar in autonomous driving. To the best of our knowledge, this is the first survey specifically dedicated to the 4D mmWave radar in autonomous driving.

Subjects:	Signal Processing (eess.SP); Robotics (cs.RO)
Cite as:	arXiv:2306.04242 [eess.SP]
	(or arXiv:2306.04242v4 [eess.SP] for this version)
	https://doi.org/10.48550/arXiv.2306.04242

Submission history

From: Zeyu Han [view email]
[v1] Wed, 7 Jun 2023 08:33:00 UTC (1,781 KB)
[v2] Wed, 14 Jun 2023 09:10:02 UTC (1,781 KB)
[v3] Mon, 19 Feb 2024 03:09:23 UTC (13,904 KB)
[v4] Fri, 26 Apr 2024 10:56:42 UTC (14,203 KB)

Summary

The article "4D Millimeter-Wave Radar in Autonomous Driving: A Survey" by Zeyu Han et al. provides a comprehensive survey of 4D millimeter-wave (mmWave) radars and their applications in autonomous driving. Here are the key points:

Introduction: 4D mmWave radars measure range, azimuth, elevation, and velocity of targets, offering advantages over traditional sensors in autonomous driving. The survey focuses on the theoretical background, learning-based data generation methods, perception and SLAM algorithms, and related datasets.
Theoretical background: The signal processing workflow of 4D mmWave radars is explained, along with methods to improve resolution and extrinsic calibration.
Learning-based radar data generation: The article discusses "Reconstructor" and "Detector" methods to improve radar data quality, addressing noise and sparsity issues.
Perception applications: Feature extraction methods for radar point clouds and pre-CFAR data are reviewed, as well as multi-modal fusion techniques with cameras and LiDAR.
SLAM applications: The use of Doppler information, traditional methods like point cloud registration, and learning-based approaches for odometry estimation are discussed.
Datasets: Available 4D mmWave radar datasets for perception and SLAM tasks are listed and compared.
Future trends: The authors discuss potential research directions, including noise and sparsity handling, specialized information utilization, dataset enrichment, and exploring new tasks like scene reconstruction and 4D occupancy prediction.

In conclusion, this survey provides a foundation for researchers working on 4D mmWave radars in autonomous driving, highlighting the current state-of-the-art and future research opportunities.

Radars Evaluated

The survey mentions several specific 4D mmWave radars and datasets that include data from various radar systems:

1. Oculii Eagle 4D mmWave radar (used for comparison with Ouster 128-channel LiDAR)
2. Continental ARS548RDI 4D mmWave radar (mentioned in a figure caption)
3. AWR1843 from Texas Instruments (mentioned as an example for maximum unambiguous range)

Datasets with specific radar information:

1. Astyx dataset
2. RADIal dataset
3. VoD dataset
4. TJ4DRadSet dataset
5. K-Radar dataset
6. Dual Radar dataset (includes Arbe Phoenix and ARS548 RDI radar)
7. SCORP dataset
8. Radatron dataset
9. Coloradar dataset
10. MSC-RAD4R dataset
11. NTU4DRadLM dataset

The survey also mentions radar systems from companies such as Bosch, Continental, ZF, Arbe, Huawei, and Oculii, but does not provide specific model details for all of them.

Datasets

The survey mentions several datasets containing 4D mmWave radar data, which were collected using various radar systems and setups. Here is a summary of the datasets and their contents:

Datasets for Perception:

1. Astyx: 500 synchronized frames (radar, LiDAR, camera) with 3,000 annotated 3D objects.
2. RADIal: Raw radar data after Analog-to-Digital Converter (ADC), with 2D bounding box annotations in the image plane.
3. VoD: 8,693 frames (LiDAR, camera, 4D mmWave radar) with 123,106 3D bounding box annotations and tracking IDs.
4. TJ4DRadSet: 7,757 synchronized frames with 3D bounding boxes, trajectory IDs, occlusion, and truncation indicators.
5. K-Radar: 35,000 frames with 4D mmWave radar data, high-resolution LiDAR point clouds, surround RGB imagery, RTK-GPS, and IMU data, collected under various weather conditions.
6. Dual Radar: Data from two different mmWave radars (Arbe Phoenix and ARS548 RDI) for comparison.

Datasets for SLAM:

1. Coloradar: 2 hours of data (radar, LiDAR, 6-DOF pose ground truth) with raw ADC data, 3D range-azimuth-elevation tensors, and radar point clouds from indoor and outdoor environments.
2. MSC-RAD4R: Data collected under clear and snowy weather conditions, as well as artificially generated smoke environments.
3. NTU4DRadLM: Extensive localization-related sensor data (4D mmWave radar, LiDAR, camera, IMU, GPS, thermal camera) from structured, semi-structured, and unstructured roads in small-scale and large-scale urban settings, captured using robotic and vehicular platforms.

The datasets were obtained through various data collection campaigns using different radar systems, often in conjunction with other sensors like cameras, LiDARs, and IMUs. The data was collected in a range of environments, including urban roads, highways, industrial parks, and indoor settings, under various weather conditions. Some datasets also include ground truth annotations for object detection, tracking, and localization tasks.

Evaluation Criteria

The survey does not provide a direct comparison or ranking of the top 10 performing radars. Instead, it focuses on reviewing the theoretical background, data generation methods, perception and SLAM algorithms, and datasets related to 4D mmWave radars in autonomous driving.

However, the survey does mention some evaluation metrics and results for specific perception tasks using 4D mmWave radar data:

1. 3D Object Detection:

- Astyx 3D mAP (mean Average Precision) - Easy, Moderate, and Hard difficulty levels
- VoD mAP - Entire and Driving scenarios
- TJ4D mAP - 3D and BEV (Bird's Eye View) detection

2. Freespace Segmentation and 2D Object Detection (using RADIal dataset):

- Segmentation mIoU (mean Intersection over Union)
- Detection AP (Average Precision)
- Detection AR (Average Recall)

The survey presents a table summarizing the performance of various perception methods on these datasets, but it does not directly compare the performance of different radar systems. The focus is on the algorithms and their performance using data from specific datasets, rather than evaluating the radars themselves.

It's important to note that the performance of perception and SLAM algorithms using 4D mmWave radar data depends on various factors, including the radar system, data processing techniques, and the specific algorithms employed. The survey aims to provide a comprehensive overview of the current state-of-the-art in this field, rather than ranking individual radar systems.

Zeyu Han¹, Jiahao Wang¹, Zikun Xu¹, Shuocheng Yang², Zhouwei Kong³, Lei He¹, Shaobing Xu¹,
Jianqiang Wang^1,∗, Keqiang Li^1,∗ Zeyu Han and Jiahao Wang contribute equally to this work.This work was supported by the National Natural Science Foundation of China (NSFC) under grant number 52221005 and Tsinghua University - Chongqing Changan Automobile Co., Ltd. Joint Research Project.¹School of Vehicle and Mobility, Tsinghua University, Beijing, China²Xingjian College, Tsinghua University, Beijing, China³Changan Automobile Co., Ltd., Chongqing, China^∗Correspondence: wjqlws@tsinghua.edu.cn (J.W.), likq@tsinghua.edu.cn (K.L.)

Abstract

The 4D millimeter-wave (mmWave) radar, proficient in measuring the range, azimuth, elevation, and velocity of targets, has attracted considerable interest within the autonomous driving community. This is attributed to its robustness in extreme environments and the velocity and elevation measurement capabilities. However, despite the rapid advancement in research related to its sensing theory and application, there is a conspicuous absence of comprehensive surveys on the subject of 4D mmWave radars. In an effort to bridge this gap and stimulate future research, this paper presents an exhaustive survey on the utilization of 4D mmWave radars in autonomous driving. Initially, the paper provides reviews on the theoretical background and progress of 4D mmWave radars, encompassing aspects such as the signal processing workflow, resolution improvement approaches, and extrinsic calibration process. Learning-based radar data quality improvement methods are present following. Then, this paper introduces relevant datasets and application algorithms in autonomous driving perception and localization tasks. Finally, this paper concludes by forecasting future trends in the realm of the 4D mmWave radar in autonomous driving. To the best of our knowledge, this is the first survey specifically dedicated to the 4D mmWave radar in autonomous driving.

Index Terms:

4D millimeter-wave radar, Autonomous driving, Perception, SLAM, Dataset

I Introduction

Autonomous driving technology, which aspires to provide safe, convenient, and comfortable transportation experiences, is advancing at a remarkable pace. To realize high-level autonomous driving, the capabilities of environment perception and localization are indispensable. Consequently, the sensors deployed on autonomous vehicles, such as cameras, LiDARs, and radars, along with their application algorithms, are garnering increasing research interest.

Among the various sensors, mmWave radars, with their acknowledged advantages of compact size, cost-effectiveness, all-weather adaptation, velocity-measuring capability, and long detection range, etc.[1], have always been extensively employed in autonomous driving. However, conventional mmWave radars, often referred to as 3D mmWave radars, demonstrate limited efficacy in measuring the elevation of targets, and their data typically encompassing only range, azimuth, and Doppler velocity information. Additionally, 3D mmWave radars suffer from clutter, noise, and low resolution, particularly in the angular dimension. These limitations further constrain their suitability for intricate perception tasks.

Refer to caption — Figure 1: The main pipeline of this survey.

The recent advancement of multiple-input multiple-output (MIMO) antenna technology has catalyzed a significant enhancement in elevational resolution, leading to the emergence of 4D mmWave radars. As the name suggests, 4D mmWave radars are capable of measuring four distinct types of target information: range, azimuth, elevation, and velocity. In addition to the augmented elevational resolution, 4D mmWave radars still preserve the salient advantages of their 3D predecessors. The comparison among autonomous driving sensors, including 4D mmWave radar, 3D mmWave radar, LiDAR, RGB camera and thermal camera is shown in Table. I. The 4D mmWave radar distinctly holds advantages in velocity measurement, detection range, all-environment robustness and low cost. Enterprises that involve in the 4D mmWave radar industry ranging from conventional suppliers like Bosch, Continental, and ZF, to a host of burgeoning tech companies such as Arbe, Huawei, and Oculii. An illustrative example of this technology is the Oculii Eagle 4D mmWave radar, which, when compared with the Ouster 128-channel LiDAR, demonstrates its capabilities such as long detection range through the point cloud representation as depicted in Fig. 2.

TABLE I: Comparison among autonomous driving sensors and data formats

Features	4D mmWave Radar	3D mmWave Radar	LiDAR	RGB Camera	Thermal Camera
Range Resolution	High	High	Very High	Low	Low
Azimuth Resolution	High	Moderate	Very High	Moderate	Moderate
Elevation Resolution	High	Unmeasurable	Very High	Moderate	Moderate
Velocity Resolution	High	High	Unmeasurable	Unmeasurable	Unmeasurable
Detection Range	High	High	Moderate	Low	Moderate
Surface Measurement	Texture	Texture	No	Color	Thermal Signature
Lighting Robustness	High	High	High	Low	High
Weather Robustness	High	High	Low	Low	High
Cost	Moderate	Low	High	Moderate	High

However, as a newly developed sensor, the 4D mmWave radar also presents some challenges due to its inherent characteristics. On the one hand, the raw data volume generated by the 4D mmWave radar substantially exceeds that of its traditional counterpart, thereby presenting formidable problems in signal processing and data generation. on the other hand, the sparsity and noise natural in 4D mmWave radar point clouds, generated in the existing signal processing workflow are notably more severe than those in LiDAR point clouds. Aiming at solving these issues, as well as utilizing 4D mmWave radar features such as Doppler and elevation measurement, a great number of researchers have engaged in studies within the fields of 4D mmWave radar-based data generation[3, 4], perception[5, 6] and SLAM(Simultaneous Localization and Mapping)[7, 8].

In recent years, numerous surveys have been conducted on the theory and application of mmWave radars[9, 10, 11, 12, 13, 14], but most of them are centered on 3D mmWave radars. Bilik et al. [9] have reviewed the challenges faced by mmWave radars in autonomous driving and its future trends. Venon et al. [10] have provided a comprehensive summary of the theory and existing perception algorithms of mmWave radar in autonomous driving, while Harlow et al. [11] have concentrated on mmWave radar applications in robotics for their survey.

Despite the transformative emergence of 4D mmWave radars and associated algorithms, there have been few specialized surveys. Liu et al. [15] compare different pipelines of 4D mmWave radar-based object tracking algorithms. Fan et al. [14] summarize perception and SLAM applications for 4D mmWave radar, but ignore the quite important radar data generation studies, and the logical framework within the applications are not systematically outlined. To bridge this gap, this paper presents a thorough review of 4D mmWave radars in autonomous driving. The principal contributions of this work can be summarized as follows:

To the best of our knowledge, this is the first publicly available survey focuses on 4D mmWave radars within the context of autonomous driving.
Acknowledging the distinctiveness of 4D mmWave radars, this survey outlines its theoretical background, draws a detailed signal processing workflow figure, as the foundation of its application.
Given the sparsity and noise of existing 4D mmWave radar point cloud, this paper discusses newly emerged learning-based radar data generation methods that can enhance data quality.
This paper delivers an extensive survey of 4D mmWave radar application algorithms in autonomous driving. It systematically presents research on perception and SLAM algorithms of 4D mmWave radars, as well as related datasets, and categorizes them on a timeline in Fig. 3.
By thoroughly tracing relevant research, this paper presents classification framework diagrams for 4D mmWave radar data generation, perception, and SLAM applications. Existing challenges and in-depth future outlook are also illustrated.

The remainder of this paper is organized as shown in Fig. 1: Section II introduces the foundational theory of 4D mmWave radars, including the signal processing workflow, methods for improving resolution and extrinsic calibration. Section III summarizes some learning-based methods for radar data generation. Section IV reviews 4D mmWave radar perception applications, categorized into different input formats. 4D mmWave radar applications in SLAM are presented in Section V. Moreover, Section VI lists available 4D mmWave radar datasets for researchers’ convenience. Section VII discusses future trends of 4D mmWave radar in autonomous driving, and Section VIII draws the conclusion.

II Background of 4D mmWave Radars

For researchers dedicated to the field of autonomous driving, fundamental knowledge about 4D mmWave radars may often be undervalued. This section briefly revisits the basic theory of 4D mmWave radars, laying the groundwork for the subsequent discussions.

II-A Signal Processing Workflow

The traditional signal processing workflow and corresponding data formats of 4D mmWave radars are shown in Fig.4. In step 1, millimeter waves are transmitted from the transmitting (TX) antennas. These waves, upon encountering surrounding targets, are reflected back to receiving (RX) antennas. The waveform employed by the majority of extant 4D mmWave radars is the Frequency Modulated Continuous Wave (FMCW), which is renowned for its superior resolution capabilities in comparison to alternative waveforms. During each operational cycle (commonly referred to as a ’chirp’) of the FMCW radar’s TX antennas, the frequency of the emitted signal increases linearly, characterized by an initial frequency $f_{c}$ , a bandwidth $B$ , a frequency slope $S$ , and a time duration $T_{c}$ . By measuring the frequency of the signal received, the range $r$ of the target can be calculated as follows:

r = \frac{c t}{2}, t = \frac{Δ f}{S},

(1)

where $t$ denotes the temporal interval between transmission and reception, $c$ represents the light speed, and $Δ f$ is the discrepancy in frequency between the transmitted and received signals. Concurrently, a single frame of an FMCW radar comprises $N_{c}$ chirps and spans a temporal duration $T_{f}$ . To avoid interference amongst successive chirps, the transmitted and received signals are considered within an individual chirp. Consequently, the maximum unambiguous range detectable by 4D mmWave radars is restricted by the chirp duration $T_{c}$ . By way of illustration, the AWR1843 from Texas Instruments features a chirp duration of $T_{c} = 0.33 μ s$ , accordingly its maximum unambiguous range is 50 meters. Presuming the target’s range remains invariant within a single frame, the frequency shift between two successive chirps is employed to deduce the radial relative velocity $v$ of the target, utilizing the Doppler effect, as delineated below:

v = \frac{c Δ f}{2 f_{c}}, Δ f = \frac{Δ φ}{2 π T_{c}},

(2)

where the first equation is the Doppler effect formula, $Δ f$ and $Δ φ$ correspond to the frequency and phase shifts, respectively, between the received signals of two successive chirps. It is manifest that the range and Doppler resolutions depend on parameters such as $f_{c}, T_{c}, N_{c}$ . For an in-depth exposition of these dependencies, readers are directed to consult the work of Venon et al. [10].

The signals of each TX-RX pair are mixed by a mixer at step 2 and subsequently transduced into digital form by an Analog-to-Digital Converter (ADC) at step 3, yielding raw ADC data. It should be noted that within the matrices of raw ADC data depicted in Fig. 4, the coordinate axes represent the sampling timestamps within a chirp and a frame, respectively, while the value of each matrix element corresponds to the intensity of the reflected signal. Sampling within a chirp aims to calculate range information, and is also referred to as fast time sampling. Conversely, sampling within a frame is intended to deduce Doppler information, and is thus termed slow time sampling. Subsequently, at step 4, a two-dimensional Fast Fourier Transformation (FFT) is applied along the range and Doppler dimensions to construct the Range-Doppler (RD) map, the axes of which are range and Doppler velocity.

However, despite the RD map providing the signal intensities of different ranges and velocities, it does not specify azimuth and elevation angles, rendering the data challenging for humans to understand due to its complex structure. To address this, two prevalent signal processing methodologies are employed to distinguish real objects with high intensity and obtain point clouds. The former is to first conduct a FFT along different TX-RX pairs to deduce the direction-of-arrival (DOA) of the target (step 5a), acquiring a 4D range-azimuth-elevation-Doppler tensor, while for 3D mmWave radars, the result is a 3D range-azimuth-Doppler tensor. Each cell within the 4D tensor corresponds to the intensity of the reflected signal. For DOA estimation, a MIMO antenna design is typically applied in mmWave FMCW radars. As illustrated in Fig. 5, the $n$ TX antennas and $m$ RX antennas form $n \times m$ virtual TX-RX pairs. To ensure signal separation, different TX antennas should transmit orthogonal signals. By analyzing the phase shift between different TX-RX pairs, distance differences between different pairs to the same target can be calculated. Furthermore, by considering the positional arrangement of the TX and RX antennas, the DOA of the target can be ascertained. At step 6a, the Constant False Alarm Rate (CFAR) algorithm [18] is typically implemented in the four dimensions to filter the tensor based on the intensity of each cell, thereby obtaining real targets in the format of point cloud for subsequent applications [19]. The CFAR algorithm sets dynamic intensity thresholds by comparing the intensity of each cell with its neighboring cells to realize a constant false alarm rate effect.

In contrast, the alternative signal processing workflow initially filters RD maps to generate target cells using also a CFAR-type algorithm (step 5b), then digital beamforming (DBF) is employed in step 6b to recover angular information and generate point clouds [17].

II-B Methods to Lift from 3D to 4D

As previously discussed, the most crucial ability of 4D mmWave radars lies in their ability to measure the elevation dimension, which enriches the data from three-dimensional to four-dimensional space. The methodologies to achieve this enhancement can be categorized into hardware-based and software-based approaches, as detailed below:

II-B1 Hardware

At the hardware level, there are two principal strategies to improve elevation resolution. The first is to increase the number of TX-RX pairs by simply cascading multiple standard mmWave radar chips [20] or integrating more antennas onto a single chip [21]. The second strategy aims to the effective aperture of the antennas by techniques such as meta-material [22].

II-B2 Software

By virtually realizing hardware improvement or optimizing signal processing algorithms along the processing workflow, radar resolution can also be improved at the software level. Inspired by the synthetic aperture radar (SAR) technology, angular resolution can be increased by virtually expanding the aperture of antennas through software design [23]. Furthermore, innovative learning-based algorithms have the potential to replace traditional signal processing algorithms, such as FFT and CFAR [24] [3] thus facilitating a super-resolution effect.

II-C Extrinsic Calibration

Given the relative sparsity and noise of radar point clouds, and the non-intuitive nature of spectrum data, it is a significant challenge to calibrate radars with other sensors. While the enhanced resolution of 4D mmWave radars somewhat mitigates this issue, there remains a dearth of robust online calibration methods.

Following traditional calibration methods of 3D mmWave radars, corner reflectors are commonly employed to improve calibration accuracy. By carefully placing several corner reflectors and analyzing the sensing results of the 4D mmWave radar in conjunction with LiDAR and camera data, the extrinsic parameters can be calibrated [25]. In a departure from the conventional approach of calibrating each sensor pair sequentially, Domhof et al. calibrate all sensors simultaneously with respect to the body of mobile robot, achieving a median rotation error of a mere 0.02 ^∘ [26]. By leveraging the Random Sample and Consensus (RANSAC) and Levenberg-Marquardt non-linear optimization, [27] accomplishes radar-camera calibration with only one single corner reflector, obviating the requirement of a specially designed calibration environment.

However, the practicability of corner reflectors in real-world scenarios is limited. Recent research has proposed calibration methods for 4D mmWave radars that avoid the need for specially placed corner reflectors, instead utilizing radar motion measurement to conduct online calibration for radars [28] or radar-camera pairs [29]. While these methods offer convenience, their efficacy under extreme weather conditions remains to be validated.

In light of the similar data structures of 4D mmWave radars and LiDARs, modifying conventional LiDAR-to-camera calibration methods [30] [31] is a promising avenue. Nevertheless, to address online joint calibration in extreme weather conditions, especially in the situation where LiDAR and camera have lousy performance, the potential of learning-based methods [32] warrants further exploration.

III Learning-based Radar Data Generation

As discussed in Section II, the initial data from 4D mmWave radars comprise spectral signals heavily masked by noise, necessitating filtering algorithms like CFAR to generate usable point clouds. However, such traditional handcrafted methods have inherent limitations. They often struggle to adapt to the complexity of real-world targets, which can vary significantly in shape and extend across multiple resolution cells. This mismatch can induce masking effects within CFAR-type algorithms, consequently reducing the resolution of point clouds and resulting in significant information loss.

To overcome those limitations, this section introduces learning-based techniques for radar data generation. By leveraging the capabilities of deep learning, it is possible to develop more adaptive and robust algorithms that can improve the fidelity of radar imaging. As illustrated in Fig. 6, radar point clouds generated using a learning-based detector are denser and contain less noise compared to those produced by the traditional OS-CFAR detector.

In the current landscape of mmWave radar technology, two primary learning-based pipelines have been developed: "Reconstructor" and "Detector," as shown in Fig. 7. Reconstructor techniques focus on refining radar point clouds by enhancing their density and resolution. In contrast, Detector methods bypass the preliminary CFAR filtering stage and process radar frequency data directly, thus preventing the information loss typically associated with traditional filtering methods. Subsequent sections will provide a detailed comparison of these approaches and discuss the persistent challenges in this area.

III-A Reconstructor

Reconstructor methods focus on improving the resolution and detail of previously acquired radar point clouds. This approach is dedicated to the post-processing enhancement of data fidelity, thereby increasing the usefulness of the radar imagery.

Much of the inspiration for these methods comes from the reconstruction of LiDAR point clouds. Notably, the PointNet structure [34] used in [35] has influenced subsequent studies by Sun et al. [36] [33] [37]. Although these studies require data from multiple viewpoints, which may constrain their immediate integration into autonomous driving systems, the underlying principles of their model architectures offer valuable insights. For example, they use a conditional Generative Adversarial Network (GAN) to train generator and discriminator networks concurrently, as detailed in [36]. Moreover, the innovative two-stage point cloud generation process, which incorporates a loss function that synergistically combines Chamfer Distance (CD) and Earth Mover’s Distance (EMD) metrics, is described in [33]. Sun et al.’s methods have shown significant improvements over existing techniques such as PointNet [34], PointNet++ [38], and PCN [35], particularly with coarse and sparse input point clouds. The robustness of these methods underscores their potential to enhance the accuracy and reliability of point cloud reconstruction, even with suboptimal radar data.

III-B Detector

Meanwhile, Detector approaches leverage neural networks to engage directly with raw radar data such as RD maps or 4D tensors, which circumvents conventional techniques such as CFAR or DBF, potentially leading to more efficient and robust detection capabilities in real-time applications.

Brodeski et al. [3] pioneer such frameworks and apply CNN-based image segmentation networks to RD maps for the detection and localization of multiple objects. Confronted with the scarcity of well-annotated RD map datasets, they devise a strategy to extract labeled radar data from the calibration process conducted within an anechoic chamber. Experiments of the DRD network demonstrate its capability to function in real-time, with inference times recorded at approximately 20ms. Notably, the DRD network has been shown to surpass classic methods in terms of detection accuracy and robustness. Though this work does not include real-world radar data with all the impairments that come along, the findings from this study unequivocally illustrate the considerable promise of neural network applications to radar complex data.

However, accurately labeling radar frequency data remains a formidable challenge. This is primarily due to disparities between data collected in the controlled environment of anechoic chambers and that obtained under real-world driving scenarios. The latter presents greater complexity with factors such as multi-path reflections, interference, attenuation, etc.

To address this challenge, Cheng et al. [17, 4] use LiDAR point clouds as the supervision and successively design network architectures inspired by U-Net[39] and Generative Adversarial Networks (GAN)[40]. In complex roadway scenes, the generated 4D mmWave radar point clouds by [4] not only demonstrate a reduction in clutter but also provide denser point clouds of real targets compared to the classical CFAR detectors. Additional comparisons on the performance of perception and localization tasks between the generated point cloud and traditional point cloud further prove the improvement of data quality.

III-C Challenge

The development of Learning-based Radar Data Generation methods, particularly for 4D mmWave radar, is hindered by the scarcity of large, public datasets and benchmarks. As highlighted in recent studies [41, 42], the diversity of mmWave radar hardwares models complicates the standardization of pre-CFAR data andchallenges the creation of a comprehensive public dataset and benchmark. Furthermore, pre-CFAR data is far less intuitive than point cloud data, rendering manual annotation both laborious and prone to errors, thus complicating the production of high-quality supervised data for neural networks.

Another avenue is the adoption of learning-based approaches for generating synthetic automotive radar scenes and data[43, 44]. However, as such methods are based on simulation, it is limited by potential inaccuracies in the sensor and world model.

Moreover, the management of pre-CFAR data is associated with considerable computational and memory demands. While current methods employ lower-resolution radars to achieve real-time performance, accommodating higher resolution radars—which are essential for large outdoor scenes—requires further optimization of algorithmic efficiency.

IV Perception Applications

Currently, the point cloud density of 4D mmWave radars has already attained a level comparable to that of low-beam LiDAR, with the added advantages of exhibiting superior robustness under low visibility and adverse weather conditions. Therefore, researchers have been attempting to transfer LiDAR point cloud processing models to 4D mmWave radars in various tasks, including target detection[48, 6, 49], trajectory tracking[50, 51], and scene flow prediction[52, 53], among others. Furthermore, as described in Section III, pre-CFAR radar data encompasses a wealth of information, promoting some researchers to engage directly with RD maps or 4D tensors, bypassing point cloud generation tasks. Existing 4D mmWave radar point cloud and pre-CFAR data feature extraction methods in autonomous driving perception are summarized in Fig. 10, which will be detailed in this section.

In Section IV-A, we review and analyze perception models for radar point clouds (RPC), which are primarily enhancements of LiDAR-based methodologies. The 4D mmWave radar branch of some fusion methods are also included. Section IV-B investigates the methods utilizing pre-CFAR radar data, including the range-frequency map, range-azimuth map, range-azimuth-Doppler cube, and 4D tensors. The integration of 4D mmWave radars into multi-modal fusion systems, as well as the effectiveness of such methods are present in Section IV-C. Finally in Section IV-D, we discuss current challenges of this field.

TABLE II: SUMMARY OF 4D MMWAVE RADAR PERCEPTION METHODS

A. WITH RADAR POINTCLOUD
Task	Methods	Year	Modalities	Astyx 3D mAP(%)			VoD[54] mAP(%)		TJ4D mAP(%)
Task	Methods	Year	Modalities	Easy	Moderate	Hard	Entire	Driving	3D	BEV
3D Object Detection	PointPillars[55]‡	2019	RPC	30.14	24.06	21.91	38.09	62.58	28.31	36.23
3D Object Detection	CenterPoint[56]‡	2021	RPC	-	-	-	45.42†	65.06†	29.07	36.18
3D Object Detection	PillarNeXt[57]‡	2023	RPC	-	-	-	42.23†	63.61†	29.20	35.71
3D Object Detection	RPFA-Net[47]	2021	RPC	38.85	32.19	30.57	38.75	62.44	29.91	38.94
3D Object Detection	MVFAN[5]	2023	RPC	45.60	39.52	38.53	39.42	64.38	-	-
3D Object Detection	RadarPillarNet[45]	2023	RPC	-	-	-	46.01†	65.86†	30.37	39.24
3D Object Detection	LXL-R[58]	2023	RPC	-	-	-	46.84†	68.51†	30.79	38.42
3D Object Detection	SMIFormer[59]	2023	RPC	-	-	-	48.77†	71.13†	-	-
3D Object Detection	SMURF[6]	2023	RPC	-	-	-	50.97†	69.72†	32.99	40.98
3D Object Detection	RadarMFNet[46]	2023	RPC	-	-	-	-	-	42.61†	49.07†
3D Object Detection	FUTR3D[60]‡	2023	C&RPC	-	-	-	49.03†	69.32†	32.42	37.51
3D Object Detection	BEVFusion[61]‡	2023	C&RPC	-	-	-	49.25†	68.52†	32.71	41.12
3D Object Detection	3DRC[62]	2019	C&RPC	61.00	48.00	45.00	-	-	-	-
3D Object Detection	Cui et al.[48]	2021	C&RPC	69.50	50.05	49.13	-	-	-	-
3D Object Detection	RCFusion[45]	2023	C&RPC	-	-	-	49.65	69.23	33.85	39.76
3D Object Detection	LXL[58]	2023	C&RPC	-	-	-	56.31	72.93	36.32	41.20
3D Object Detection	InterFusion[63]	2022	L&RPC	57.07	47.76	45.05	-	-	-	-
3D Object Detection	$M^{2}$ -Fusion[49]	2023	L&RPC	61.33	49.85	49.12	-	-	-	-
B. WITH PRE-CFAR DATA
Task	Methods	Year	Modalities	Radar Input	RADIal[64](%)				K-Radar[65] mAP(%)
Task	Methods	Year	Modalities	Radar Input	Seg. mIoU	Det. AP	Det. AR	3D Det. mAP	3D	BEV
Freespace Segmentation; 2D Object Detection	T-FFTRadNet[66]	2023	R	ADC	79.60	88.20	86.70	-	-	-
Freespace Segmentation; 2D Object Detection	T-FFTRadNet[66]	2023	R	RD	80.20	89.60	89.50	-	-	-
Freespace Segmentation; 2D Object Detection	ADCNet[67]	2023	R	ADC	78.59	95.00	89.00	-	-	-
Freespace Segmentation; 2D Object Detection	FFTDASHNet[68]	2023	R	RD	85.58	96.53	98.51	-	-	-
Freespace Segmentation; 2D Object Detection	FFTRadNet[64]	2022	R	RD	73.98	96.84	82.18	-	-	-
Freespace Segmentation; 2D Object Detection	TransRadar[69]	2024	R	RAD	81.10	97.30	98.40	-	-	-
Freespace Segmentation; 2D Object Detection	CMS[70]	2023	C&R	RD	80.40	96.90	83.49	-	-	-
2D & 3D Object Detection	EchoFusion[71]	2023	C&R	RT	-	96.95	93.43	39.81	68.35*	69.95*
3D Object Detection	RTN[65]	2022	R	4DRT	-	-	-	-	40.12	50.67
3D Object Detection	RTNH[65]	2022	R	4DRT	-	-	-	-	47.44	58.39
3D Object Detection	E-RTNH[72]	2023	R	4DRT	-	-	-	-	47.90	59.40

Abbreviation about Modalities and Radar Input: R(Radar), C(Camera), L(Lidar), RPC(Radar Point Cloud), ADC(raw radar data after Analog-to-Digital Converter), RD(Range-Doppler map), RAD(Range-Azimuth-Doppler cube), RT(Range-Time map), 4DRT(Range-Azimuth-Elevation-Doppler Tensor)
† indicates data derived through multi-frame accumulation. Specifically, the methodologies referenced in relation to the VoD dataset employ detection points from 5 scans of radar data, whereas the RadarMFNet[46] approach, as applied to the TJ4DRadSet, utilizes data from 4 consecutive frames.
‡ denotes strategies originally conceptualized for LiDAR point clouds, which have been subsequently adapted and retrained utilizing radar datasets to serve as baseline comparisons. The comparative outcomes are directly inherated from [47, 5, 45, 6, 58].
* signifies data extracted from a subset comprising 20 sequences, which is part of the K-Radar dataset encompassing a total of 58 sequences.

IV-A Point Cloud Feature Extraction

Given the analogous nature of their data formats, it is clear that a significant number of RPC methodologies originate from LiDAR-based techniques. Despite this similarity, this transposition requires careful consideration of the inherent constraints associated with radar systems. These limitations include low-resolution representations, data sparsity, and inherent uncertainty within the data. Conversely, radar systems outperform in areas such as superior range resolution, velocity measurement capabilities, and early target detection. Consequently, these unique characteristics necessitate the development of specifically tailored network designs.

A comprehensive and succinct overview of recent advances in this field is presented in Table II. These investigations have masterfully exploited the distinctive attributes of 4D mmWave radar point clouds, encompassing elements like elevation, Doppler data, and Radar Cross Section (RCS) intensity. Moreover, they have ingeniously formulated strategies to address the inherent sparsity and irregular distribution of these data points, thereby advancing the field significantly.

IV-A1 Distinctive Information

4D mmWave radars provide a full three-dimensional view by measuring range, azimuth, and elevation of targets. Additionally, mmWave radars can measure the velocity of objects directly through the Doppler effect, a feature that distinguishes them from LiDAR systems. This combination of enhanced spatial data and velocity information makes 4D mmWave radars particularly valuable for Autonomous Driving tasks and presents unique opportunities for further studies.

Most studies[62, 47, 46, 54, 63] have opted to reference the implicit structures like SECOND[73] and PointPillars[55]. These studies encode extra radar attributes directly, comparable to the conventional spatial coordinates $x, y, z$ in point clouds. Palffy et al.[54] demonstrate that the addition of elevation data, Doppler information, and RCS information respectively increase the 3D mean Average Precision (mAP) with 6.1%, 8.9% and 1.4%. However, the result of the proposed method (47.0% mAP) is still far inferior to the LiDAR detector on 64-beam LiDAR (62.1% mAP), indicating that there is still room for improvement in the optimization of 4D mmWave radar-based detection methods.Nevertheless, Zheng et al.[45] introduce a subtle yet impactful modification, by proposing the Radar PillarNet backbone, colloquially termed RPNet. This structure employs three separate linear layers, each with unshared weights, to extract spatial position, velocity, and intensity features, respectively. Subsequently, a BEV pseudo image is generated. Ablation studies have demonstrated that RPNet enhances the 3D mAP by 4.26%.

Furthermore, to explicitly utilize the elevation information, Cui et al.[48], Yan et al.[5] and Shi et al.[59] have each explored extracting point cloud features from multiple viewpoints. In [48], radar point clouds are processed into Front View (FV) and BEV perspectives. Features extracted from each view are subsequently fused with features derived from the camera branch. MVFAN[5] employs both BEV pillar and cylinder pillar methods to extract point cloud features. Conversely, SMIFormer[59] transforms point clouds into voxel features, which are then projected onto three distinct planes: FV, Side View (SV), and BEV. Following this, features are aggregated using intra-view self-attention and inter-view cross-attention mechanisms. Notably, this methodology is further refined by employing a sparse dimension compression technique, significantly reducing the memory and computational demands involved in converting 3D voxel features into 2D features.

Addressing the explicit utilization of Doppler information, Tan et al. [46] delineate a widely-adopted yet efficacious technique. Recognizing that mmWave radars intrinsically measure the radial relative velocity of detected objects, and the motion of the vehicle itself results in different coordinate systems for multi-frame point clouds, they first calculate the vehicle’s velocity, followed by compensating and obtaining each point’s true velocity relative to the ground, which is often referred to as ’absolute velocity’. Moreover, the integration of Doppler information facilitates a more convenient and accurate accumulation of historical frame point clouds, thereby enhancing point cloud density. Yan et al. [5] further propose the Radar Feature Assisted Backbone. In this design, each point’s absolute and relative velocities, along with its reflectivity, are integrated into position embeddings. These embeddings are then multiplied with the self-attention reweighting map of point-wise features, thereby enhancing the exchange of information at the feature vector level in a trainable fashion. On the other hand, Pan et al. [50] introduce a ’detection by tracking’ strategy. This approach leverages velocity characteristics to achieve point-level motion segmentation and scene flow estimation. Subsequently, employing the classical DBSCAN clustering method suffices to surpass the tracking accuracy of established techniques like centerpoint[56] and AB3DMOT[74].

IV-A2 sparsity and Irregularity

Another significant challenge in processing 4D mmWave radar point clouds is the inherent sparsity and irregular distribution. Considering the physical size constraints on the aperture of vehicular radars and the omnipresence of electromagnetic interference and multipath reflections in traffic environments, the resolution of 4D mmWave radar point clouds is considerably inferior to that of LiDAR, often resulting in a higher prevalence of clutter and noise. For instance, studies have noted that point clouds in the Astyx dataset struggle to articulate detailed features, which complicates the assessment of the orientation of detected objects [47]. Moreover, a considerable number of points are found to be distributed below the ground plane [49], adversely affecting detection accuracy.

To mitigate these challenges, several improvement strategies have been proposed and employed. Common methods include the accumulation of multiple frame point clouds [46, 50], preprocessing and filtering of point clouds [47, 49, 6], employing spatial attention mechanisms to extract contextual information for feature enhancement [47, 59, 45, 5], and integrating information from different sensor modalities [63, 45, 49, 58].

To accumulate point clouds across multiple consecutive frames, as previously mentioned, the Doppler information plays a pivotal role. This can be achieved through ego-velocity estimation and motion compensation [46], or by motion segmentation and scene flow estimation [50], resulting in precision that surpasses mere simple stacking of point clouds.

Targeting at the preprocessing phase, InterFusion[63] and M2Fusion[49] utilize a Gaussian normal distribution to assess whether the vertical angle of point falls within a normal range, based on the Shapiro-Wilk (S-W) test[75]. This approach effectively filters out a substantial number of noise points that are below the ground plane in the Astyx dataset[76]. Additionally, SMURF[6] incorporates a point-wise kernel density estimation (KDE) branch, which calculates the density of point clouds within several predefined distance ranges, offering a detailed understanding of point distribution. The derived density information is then concatenated with pillarized features, resulting in enhanced BEV features. By refining the initial data, these methods lay a strong foundation for more accurate and reliable downstream processing.

In the domain of model backbone architecture, the incorporation of spatial attention mechanisms has been acknowledged as an effective strategy to address the sparsity and irregular distribution of point clouds. Xu et al.[47] implemented a self-attention mechanism to extract global information from the pillarized radar point cloud. Further advancing this concept, Shi et al.[59] augment different view features using a combination of self-attention and cross-attention mechanisms. While self-attention focuses on understanding the relationships within a single view, cross-attention extends this understanding across different views, thus enabling a more comprehensive and integrated feature representation. Yan et al.[5] take a different approach by utilizing the attention matrix inherent in the self-attention mechanism to differentiate and reweight foreground and background points and their respective features. They also introduce a binary classification auxiliary loss to aid the learning process. Additionally, several studies have used spatial attention to fuse multimodal sensor data, not only addressing the noise and sparsity in radar point clouds, but also leveraging the strengths of different sensor modalities. These methods will be further elaborated in Section IV-C.

IV-B Pre-CFAR Feature Extraction

In millimeter-wave (mmWave) radar signal processing, side lobe suppression and CFAR algorithms play a crucial role in reducing noise and minimizing false alarms. These techniques help extract signal peaks, thereby reducing data volume and computational cost. However, a consequential drawback of this approach is the sparsity of radar point clouds, characterized by diminished resolution. Given the profound advancements in deep learning, particularly in the processing of dense image data, a pivot in research focuses towards Pre-CFAR data is observed, aiming to utilize more underlying hidden information.

To our best knowledge, there are currently several datasets containing 4D mmWave radar Pre-CFAR data [65, 64, 41, 77, 78]. Notably, a subset of these datasets [41, 77, 78] have relatively lower elevation resolution, exceeding 15 degrees. Consequently, our survey will focus on the methodologies employed within the high resolution datasets provided by [65, 64]. This section intends to explain the comprehensive pipeline and the enhancements tailored for optimizing 4D mmWave radar characteristics.

As discussed in Section II-A, the 4D mmWave radar signal processing workflow applies FFT methodologies on raw ADC data to discretely process four dimensions: range, Doppler, azimuth, and elevation. This processing generate diverse data representations, including the Range-Doppler (RD) map, Range-Azimuth (RA) map, Range-Azimuth-Doppler (RAD) cube, and ultimately, a 4D tensor. From 2D maps to 4D tensors, the complexity of extracted features varies. Higher dimensional data requires more memory and computation for feature extraction. The extracted features are typically aligned to the RA axis in BEV under polar coordinates or the XY axis in Cartesian coordinates, connecting to detection or segmentation heads, serving as the foundational elements for subsequent detection or segmentation operations.

IV-B1 4D Tensor

For the 4D tensor, Paek et al. [72, 65] opt to further extract a sparse tensor aligned to the Cartesian coordinate system, subsequently utilizing 3D sparse convolution to extract multi-scale spatial features. Experiments have demonstrated that retaining only the top-5% elements with the highest power measurements can maintain detection accuracy while significantly enhancing processing speed. Furthermore, the elevation information included in 4D tensors is essential facing 3D but not BEV 2D object detection.

IV-B2 RAD Cube

In the context of RAD cube processing, TransRadar [69] projects the data onto the AD, RD, and RA planes, respectively, and innovatively designs an adaptive directional attention block to encode features separately.

IV-B3 RD Map

Works related to the RD map [64, 66, 68] generally encode features along the RD dimensions using CNN or Swin Transformer structures. Subsequently, a noteworthy technique involves the transposition of the Doppler dimension with the channel dimension, thereby redefining the conventional channel axis as the azimuth axis, followed by a series of deconvolution and upsampling steps to extrapolate features defined along the range-azimuth axes.

IV-B4 RA Map

RA map data, inherently aligned with the polar coordinate system, is amenable to direct processing through dense feature extraction networks. The generated BEV features are then either converted into the Cartesian coordinate system via bi-linear interpolation [41] or utilized within polar-based detection frameworks [71].

IV-B5 Raw ADC Data

Recently, some studies have also shifted towards addressing the elevated computational demands by performing FFT on raw ADC data. Consequently, strategies have emerged wherein raw ADC data is directly processed via complex-valued linear layers[66], utilizing the prior knowledge of the Fourier transform. Alternatively, Liu et al.[71] have leveraged data derived from a single-range FFT operation to generate Range-Time (RT) representations. Comparative analyses show the performance gap between the RT map and RA map with camera modality is within the error bar. These findings suggest that the resolution of azimuth angles and the permutation of Doppler-angle dimensions, as previously posited by Rebut et al.[64] and Giroux et al.[66], may not be requisite for achieving satisfactory performance outcomes.

IV-C Multi-Modal Fusion Methods

Considering the capability of 4D mmWave radars to furnish point cloud data, several scholars have embarked on integrating this information with inputs from cameras or LiDAR systems to enhance the accuracy and robustness of the perception model. Generally, there are three fusion levels for different modalities: data level, feature level, and decision level. Existing research primarily focuses on the feature-level fusion.

IV-C1 4D Radar with Vision

As for 4DRV (4D mmWave Radar and Vision) fusion, 4D mmWave radars offer the ability to deliver high-precision depth and velocity information in a cost-effective manner, thereby mitigating the limitations inherent in camera systems and enhancing the accuracy of 3D detection. In recent studies, 4D mmWave radar signals are typically transformed into 2D image-like features, facilitating their practical deployment in conjunction with camera images. This fusion strategy leverages the strengths of both modalities, enabling a more comprehensive and accurate representation of the environment for advanced perception tasks.

Exemplifying this approach, Meyer et al.[62] adapt a CNN architecture initially designed for camera-LiDAR fusion [79] to process RGB images with height and density maps generated from 4D mmWave radar point clouds. Remarkably, their fusion network demonstrates enhanced precision employing radar instead of LiDAR point clouds, achieving an average precision (AP) of 61% on the Astyx dataset [76]. A subsequent study is performed by Cui et al.[48] with a novel self-supervised model adaptation block[80], which dynamically adapts the fusion of different modalities in accordance with the object properties. Besides, a FV map is generated from the 4D mmWave radar point clouds together with the BEV image. The presented method outperforms the former study[62] by up to 9.5% in 3D AP. The FV map effectively leverages the elevation information provided by 4D mmWave radars and achieves easier fusion with the monocular camera feature, balancing detection accuracy and computational efficiency.

Additionally, recent works such as RCFusion[45] and LXL[58] have advanced the integration of attention mechanisms for the fusion of image and 4D mmWave radar features. They begin by separately extracting BEV features from the camera and radar branches, then employ convolutional networks to create scale-consistent attention maps, effectively delineating the occupancy grid for target objects. The distinction lies in the fact that RCFusion[45] generates 2D attention maps in both the camera and radar branches, which are then multiplied with the BEV feature from the other modality. Conversely, LXL[58] solely utilizes the radar BEV feature to infer the 3D occupancy grid, which is then multiplied with the 3D image voxel features to achieve attention sampling of the image features.

IV-C2 4D Radar with LiDAR

Despite the notable advantages of 4DRV fusion, the vision-based branch may still struggle when facing aggressive lighting changes or adverse weather conditions, which in turn affects the overall performance of the model. Addressing this challenge, Wang et al. [63] first explore the advantages of 4DRL(4D mmWave Radar and LiDAR) fusion with an interaction-based fusion framework. They design an InterRAL (Interaction of Radar and LiDAR) module and update pillars from both modalities to enhance feature expression. The efficacy of this approach is substantiated through a series of ablation experiments, demonstrating the potential of this fusion strategy in improving the robustness and performance of perception models under challenging conditions.

In a subsequent investigation, Wang et al. [49] propose the $M^{2}$ -Fusion network that integrates an interaction-based multi-modal fusion(IMMF) block and a center-based multi-scale fusion(CMSF) block. Evaluated in the Astyx dataset[76], this novel approach outperforms mainstream LiDAR-based object detection methods significantly. As LiDARs can accurately detect objects at close range, 4D mmWave radars have a greater detection range owing to its penetrability, the fusion of 4DRL presents a promising technical solution that combines cost-effectiveness with high-quality performance.

IV-D Challenge

Current methodologies for 4D mmWave radar point cloud perception predominantly adapt established techniques from LiDAR applications, whereas approaches for pre-CFAR data often draw from the vision domain. Although the data formats exhibit similarities, the unique characteristics of mmWave radar data, specifically Doppler velocity and intensity information, warrant more focused attention for effective feature extraction. Moreover, pre-CFAR radar data characteristically contains a significantly higher ratio of background to actual objects (foreground)[69], a factor that complicates data interpretation and model training. The inherent noise within radar data further presents a substantial challenge for learning algorithms. The resilience of 4D mmWave radar models under out-of-distribution (OoD) conditions also remains inadequately explored and understood[65], which underscores the necessity for refined methodologies that can more accurately account for the distinct properties of mmWave radar data, thereby enhancing model robustness and performance in real-world scenarios.

V SLAM Applications

In challenging environments where satellite-based positioning is unreliable or high-definition maps are absent, localization and mapping by perception sensors becomes indispensable. Recently, a collection of SLAM research has been conducted utilizing the emerging 4D mmWave radars.

As Fig. 15 demonstrates, the unique Doppler information contained within radar point clouds presents a notable opportunity for exploitation, which will be discussed in Section V-A. Subsequently, both traditional and learning-based SLAM approaches will be introduced in Section V-B and Section V-C, respectively. And Section V-D briefly introduces current challenges in 4D mmWave radar-based SLAM.

V-A Doppler Information Utilizing

The Doppler information constitutes a significant advantage of 4D mmWave radars, especially in the context of SLAM applications. Generally speaking, the utilization of Doppler information can be categorized into several categories:

V-A1 Ego-Velocity Estimation

To estimate the ego-velocity of radar using Doppler information, a straightforward approach is the linear least squares (LSQ). Doppler velocity reflects the radial relative velocity between the object and the ego vehicle. Consequently, it is only the Doppler velocity from stationary objects that are viable for deducing the ego vehicle’s velocity, while Doppler information from dynamic objects is considered as outliers. Under the assumption that the majority of surroundings points are from stationary objects, LSQ offers a suitable and computationally efficient method for ego-velocity estimation [88].

Assuming the 3D spatial coordinates of a point in the 4D mmWave radar system are denoted as $𝒑_{i}$ . Its directional vector is determined as follows:

𝒓_{i} = \frac{𝒑_{i}}{‖ 𝒑_{i} ‖} .

(3)

Let us consider a scenario where point $𝒑_{i}$ is detected from a stationary object, the ideally measured Doppler velocity $v_{d, i}$ represents the projection of the ego-velocity $𝒗_{e}$ onto the line-of-sight vector connecting the 4D mmWave radar with point $𝒑_{i}$ . This velocity can be mathematically expressed as follows:

v_{d, i} = 𝒗_{e} \cdot 𝒓_{i} = v_{e, x} r_{i, x} + v_{e, y} r_{i, y} + v_{e, z} r_{i, z} .

(4)

For a set of $N$ points constituting a single frame of the point cloud, the above equation can be generalized into a matrix formulation:

[\begin{matrix} v_{d, 1} \\ v_{d, 2} \\ \dots \\ v_{d, N} \end{matrix}] = [\begin{matrix} r_{1, x} & r_{1, y} & r_{1, z} \\ r_{2, x} & r_{2, y} & r_{2, z} \\ \dots & \dots & \dots \\ r_{N, x} & r_{N, y} & r_{N, z} \end{matrix}] [\begin{matrix} v_{e, x} \\ v_{e, y} \\ v_{e, z} \end{matrix}] .

(5)

Leveraging LSQ, an optimal solution can be obtained as follows:

𝒗_{e} = [\begin{matrix} v_{e, x} \\ v_{e, y} \\ v_{e, z} \end{matrix}] = {(𝑹^{T} 𝑹)}^{- 1} 𝑹^{T} [\begin{matrix} v_{d, 1} \\ v_{d, 2} \\ \dots \\ v_{d, N} \end{matrix}],

(6)

𝑹 = [\begin{matrix} r_{1, x} & r_{1, y} & r_{1, z} \\ r_{2, x} & r_{2, y} & r_{2, z} \\ \dots & \dots & \dots \\ r_{N, x} & r_{N, y} & r_{N, z} \end{matrix}] .

(7)

Indeed, relying solely on the LSQ method may result in substantial inaccuracies, primarily due to the influence of dynamic objects and noise. To mitigate these effects, specific research endeavors have implemented techniques such as RANSAC to effectively remove dynamic points before the application of LSQ [89].

An innovative enhancement to the conventional LSQ involves the incorporation of weighting mechanisms. Galeote-Luque et al. [90] weight each point by its signal power to diminish the influence of noise. Addressing the challenge posed by dynamic objects, Zhuang et al. [7] propose a reweighted least squares method for the estimation of ego-velocity. The objective function is constructed as follows:

\min_{𝒗_{e}} \sum_{i = 1}^{n} λ_{i} ‖ v_{d, i} - 𝒓_{i} \cdot 𝒗_{e} ‖,

(8)

where $λ_{i}$ denotes the weight of the $i$ -th radar point. In the first iteration, $λ_{i} = 1$ , and $λ_{i} = 1 / (‖ v_{d, i} - 𝒓_{i} \cdot 𝒗_{e} ‖ + ϵ), ϵ = 0.00001$ . $λ_{i}$ in the subsequent iterations, which quantifies the difference between the actual Doppler velocity $v_{d, i}$ and the ideal Doppler velocity assuming the $i$ -th point originates from a stationary object,denoted as $𝒓_{i} \cdot 𝒗_{e}$ . Through iterative refinement, the weights of points from dynamic objects are progressively decrease, culminating in a more accurate computation of the ego-velocity $𝒗_{e}$ . The RCS values are then employed to weight point cloud registration residuals to reduce the impact of matches with large RCS differences [91].

V-A2 Dynamic Points Removal

Beyond ego-velocity estimation, another coherent exploration of Doppler information is to remove dynamic points, particularly leveraging the result derived from ego-velocity estimation. Zhang et al. [86] [92] apply a RANSAC-like method, while Zhuang et al. [7] utilize the weights to distinguish dynamic points in accordance with ego-velocity estimation.

V-A3 Angular Resolution Improving

The angular resolution of 4D mmWave radars is determined by virtual TX-RX pairs mentioned in Section II, yet the range and Doppler resolution are dictated by the frequency disparity between transmitted and received signals. Therefore, 4D mmWave radars typically exhibit superior performance in terms of range and Doppler resolution, as opposed to angular resolution. Cheng et al. [4] demonstrate that for two points from stationary objects, provided they share the same range and azimuth, the differences in elevation $Δ ϕ$ and Doppler $Δ v$ between the two points are interconnected. Similarly, the Doppler velocity resolution can also be converted into azimuth resolution. Hence, leveraging Doppler information can improve the angular resolution in particular azimuth and elevation ranges.

Drawing a parallel, Chen et al. [93] harness Doppler information to refine the point cloud and implement radar-inertial odometry using ground points, which exhibit stability in dynamic environments. Given that the resolution of the $z$ coordinate in point clouds is generally inferior than Doppler resolution, once a point is identified as a ground point, its $z$ coordinate can be recalculated using Doppler information and $x, y$ coordinates. Adopting a strategy inspired by RANSAC, the authors hypothesize and refine ground points, then estimate ego-velocity iteratively. The refined point clouds can subsequently be applied in other tasks such as object detection.

V-A4 Network Enhancement

In the context of learning-based SLAM, Doppler information also holds significant value.As previously discussed, Doppler velocity can serve as an indicator of whether a point originates from a stationary or dynamic object, which enlightens authors of [82] and [8] to establish a velocity-aware attention module. This module leverages Doppler information to learn attention weights, thereby distinguishing between stationary and dynamic points.

V-B Traditional Methods

Traditional SLAM refers to methods with no neural networks, and can be composed into four modules: odometry estimation, loop closure detection, global optimization and mapping. Considering the unique characteristic of 4D mmWave radars, related research mostly feature in the former two modules, and will be discussed below.

V-B1 Odometry Estimation

Odometry estimation is the core of localization and serves as a crucial component of SLAM. A substantial body of traditional research on odometry estimation has been conducted in the context of 4D mmWave radars.

Considering the inherent noise and sparsity of 4D mmWave radar point clouds, original odometry estimation research have primarily concentrates on the estimation of ego-velocity derived from the Doppler information instead of point cloud registration. Doer and Trommer make plenty of contributions to this field using Unmanned Aerial Vehicles (UAV). They fuse ego-velocity estimated by LSQ with the IMU data to perform UAV odometry estimation[88, 94], and further extend their work to multiple radars [95, 96] and radar-camera fusion systems [83]. As early investigations, these research efforts exhibit certain limitations. They rely on Manhattan world assumptions and consider the surroundings to be stationary, which may restrict the applicability in challenging outdoor scenarios.

Additionally, the ego-velocity estimated by 4D mmWave radar Doppler information has been explored by various researchers, in conjunction with additional assumptions, to achieve radar-based odometry. Ng et al. [81] present a continuous-time framework that fuse the ego-velocity from multiple radars with the measurement of an IMU. The continuity of this framework facilitates closed-form expressions for optimization and makes it well-suited for asynchronous sensor fusion. Given the relatively low elevation resolution of 4D mmWave radars in contrast to the more precise Doppler information, Chen et al.[93] propose a method to detect ground points and estimate ego velocity iteratively. Furthermore, Galeote-Luque et al. [90] combine the linear ego-velocity estimated from radar point clouds with the kinematic model of the vehicle to reconstruct the 3D motion of the vehicle.

Recent research has shifted focus from direct odometry estimation through Doppler-based ego-velocity, as seen in above studies, to point cloud registration akin to traditional LiDAR odometry. In 4D mmWave radar point cloud registration, Doppler information utilization and noise and sparsity handling are two main concerns. Michalczyk et al. [84] pioneer the realization of 3D point registration across sparse and noisy radar scans based on the classic Hungary algorithm [97]. Additionally, emerging research in 4D mmWave radar SLAM is also delving into specialized designs of point cloud registration techniques. Zhuang et al. [7] develop a 4D mmWave radar inertial odometry and mapping system named 4D iRIOM employing iterative Extended Kalman Filter (EKF). To mitigate the effects of sparsity, they introduce an innovative point cloud registration method between each scan and submap. This method accounts for the local geometry of points in the current scan and the corresponding $N$ nearest points in the submap, weighting the distances between them by their covariance to achieve a distribution-to-multi-distribution effect. The results of this approach are shown in Fig. 16. Moreover, they incorporate 4D iRIOM with GNSS and propose G-iRIOM [91], which further utilizes RCS value to weight the point cloud registration. Besides, the pose graph optimization is applied by Zhang et al.[86, 92] to construct a 4D mmWave radar SLAM system adapted from a well-known LiDAR SLAM method named hdl_graph_slam [98]. Building upon the traditional point cloud registration algorithm Generalized Iterative Closest Point (GICP)[99], they propose an adaptive probability distribution-GICP, assigning different covariance to each point according to its uncertainty inferred from the coordinate of each point, given that points at greater distances may exhibit increased uncertainty. This design considers not only the geometric distribution of neighboring points but also the spatial variance of each point. Also considering point uncertainty, Li et al [89] propose 4DRaSLAM to incorporate the probability density function of each point to develop a probability-aware Normal Distributions Transform (NDT) [100] routine for scan-to-submap point cloud registration. Notably, the ego-velocity estimated from 4D mmWave radar Doppler information is utilized as a pre-integration factor in this system to replace the role of IMU.

V-B2 Loop Closure Detection

With respect to loop closure detection, relevant inventive research remains scarce. Existing 4D mmWave radar SLAM research that involve loop closure detection [7, 89, 86] typically reference the well-known Scan Context algorithm [85]. The original Scan Context algorithm partitions a LiDAR point cloud into several bins based on the azimuth, utilizing the maximum height of the points in each bin to encode the entire point cloud into an image. However, considering the relatively low resolution of height information provided by 4D mmWave radars, maximum intensity instead of height is adapted as context for loop closure detection in these systems.

V-C Learning-based Methods

Research on learning-based 4D mmWave radar SLAM has predominantly focused on odometry estimation, replacing traditional point cloud registration and pose regression by deep networks.

As the originator, Lu et al. [87] design CNNs and Recurrent Neural Networks (RNNs) to extract the features from radar point clouds and IMU data, respectively. Subsequently, they propose a two-stage cross-modal attention mechanism to achieve feature integration. An RNN is utilized additionally to capture the long-term temporal dynamics of the system.

To make full use of Doppler information, 4DRO-Net [82] establishes a velocity-aware attention cost volume network within a coarse-to-fine hierarchical optimization framework to iteratively estimate and refine pose estimation. Global-level and point-level features are extracted to generate initial pose estimations and subsequent corrections. 4DRVO-Net [8] further extracts and fuses image features with 4D mmWave radar point cloud features. Fig. 17 displays the pipeline of 4DRVO-Net. The adaptive fusion module employs a deformable attention-based spatial cross-attention mechanism to align each 4D mmWave radar feature with corresponding image feature to achieve optimal fusion.

All these advanced learning-based radar odometry methods perform odometry estimation in an end-to-end fashion, i.e., the network ingests 4D mmWave radar point clouds (supplemented with images for fusion methods), and directly outputs odometry estimation. This benefits research on comprehensive end-to-end autonomous driving systems.

V-D Challenge

The Doppler velocity inherent in 4D mmWave radar point clouds has been recognized and utilized in SLAM to realize ego-velocity estimation and dynamic object removal, etc. However, considering adequate semantic information provided by 4D mmWave radars, such as intensity, traditional 3D registration methods like ICP may be less effective. Therefore, the exploration of learning-based methods or feature-based registration could yield more effective results. The optimal exploitation of these semantic information within the autonomous driving sphere remains a relatively open question.

Given that radar point clouds are significantly less data-heavy than their tensor counterparts, and methodologies developed for LiDARs can be adapted to 4D mmWave radar point clouds with minimal modifications, the majority of SLAM studies preferentially employ radar point clouds as their input data instead of radar tensors. But regarding mapping, the sparsity of 4D mmWave radar point clouds presents a significant challenge. A potential solution could lie in mapping and rendering the environment using 4D tensor-level data.

VI Datasets

Public datasets play an indispensable role in the advancement of 4D mmWave radar-based algorithms, as they furnish essential platforms for the development, benchmarking, and comparative analysis of diverse algorithms, thereby stimulating research in the field. This section categorizes and introduces current available datasets containing 4D mmWave radars, which are summarized in Table III.

TABLE III: 4D mmWave radar datasets

Dataset	Resolution				Total Frames	Labeled Frames	Data Formats¹	Modality²	Bounding box	Tracking ID	Odometry
Dataset	Azi.	Ele.	Range(m)	Velo.(m/s)	Total Frames	Labeled Frames	Data Formats¹	Modality²	Bounding box	Tracking ID	Odometry
Datasets for Perception
Astyx [76]	N/M³	N/M	N/M	N/M	0.5K	0.5K	RPC	RCL	3D	✓	$\times$
RADIal [64]	0.1^∘	1^∘	0.2	0.1	25K	8.3K	ADC, RAD, RPC	RCL	2D	$\times$	✓
VoD [54]	1.5^∘	1.5^∘	0.2	0.1	8.7K	8.7K	RPC	RCI	3D	✓	✓
TJ4DRadSet [25]	1^∘	1^∘	0.86	N/M	40K	7.8K	RPC	RCL	3D	✓	✓
K-Radar [65]	1^∘	1^∘	0.46	0.06	35K	35K	4DRT, RPC	RCLI	3D	✓	✓
Dual Radar [101]	1.2^∘	2^∘	0.22	N/M	50K	10K	RPC	RCL	3D	✓	$\times$
SCORP [77]	15^∘	30^∘	12	0.33	3.9K	3.9K	ADC, RAD, RPC	RC	$\times$	$\times$	✓
Radatron [41]	1.2^∘	18^∘	0.05	N/M	152K	16K	RA	RC	2D	$\times$	$\times$
Datasets for SLAM⁴
Coloradar [19]	1^∘	22.5^∘	0.12	0.25	108K	-	ADC, RAE, RPC	RLI	$\times$	$\times$	✓
MSC-RAD4R [2]	1^∘	0.5^∘	0.86	0.27	90K	-	RPC	RCLI	$\times$	$\times$	✓
NTU4DRadLM [102]	0.5^∘	0.1^∘	0.86	N/M	61K	-	RPC	RCLIT	$\times$	$\times$	✓

•

¹ADC: raw radar data after Analog-to-Digital Converter; RA: Range-Azimuth map; RD: Range-Doppler map; RAD: Range-Azimuth-Doppler cube;
RAE: Range-Azimuth-Elevation cube; 4DRT: Range-Azimuth-Elevation-Doppler Tensor; RPC: Radar Point Cloud;
•

²R: Radar, C:Camera, L:LiDAR, I:IMU(Inertial Measurement Unit), T:Thermal Camera
•
•

⁴ Datasets designed only for SLAM contain no labels like bounding box and tracking ID.

VI-A Datasets for Perception

Datasets for 4D mmWave radar perception typically include 3D (or 2D) bounding boxes for object detection tasks, and tracking ID for object tracking tasks. Astyx [76] represents the first 4D mmWave radar dataset. It consists of 500 synchronized frames (radar, LiDAR, camera) encompassing approximately 3,000 annotated 3D object annotations. As a pioneer dataset in the realm of 4D mmWave radars, the volume of data in Astyx is relatively limited.

In order to facilitate researchers in handling radar data in a more fundamental manner, the RADIal dataset [64] records raw radar data after Analog-to-Digital Converter, which serves as the foundation for generating various conventional radar representations, such as radar tensors and point clouds. Given that the raw ADC data is not interpretable by human eyes, the annotations in the RADIal dataset are presented as 2D bounding boxes in the image plane.

To advance 4D mmWave radar-based multi-class 3D road user detection, the VoD dataset [54] is collected comprising LiDAR, camera, and 4D mmWave radar data. It contains 8693 frames of data captured in complex urban traffic scenarios, and includes 123106 3D bounding box annotations of both stationary and dynamic objects and tracking IDs for each annotated object.

In a similar vein, the Tj4DRadSet dataset [25] comprises 44 consecutive sequences, totaling 7757 synchronized frames, well-labeled with 3D bounding boxes and trajectory IDs. Notably, the TJ4DRadSet dataset offers occlusion and truncation indicators in each object to distinguish between different levels of detection difficulty. Unlike the VoD dataset, Tj4DRadSet is characterized by its inclusion of a broader and more challenging array of driving scenario clips, such as urban roads, highways, and industrial parks.

To the best of our knowledge, K-Radar dataset[65] currently contains most diverse scenarios in 4D mmWave radar datasets as it collects 35k frame under a variety of weather conditions, including sunny, foggy, rainy, and snowy. K-Radar not only provides 4D mmWave radar data but also includes high-resolution LiDAR point clouds, surround RGB imagery from four stereo cameras, RTK-GPS and IMU data from the ego-vehicle. Fig. 18 illustrates the comparison of different modality sensors across different weather conditions. It is worth mentioning that K-radar is currently the only dataset that provides range-azimuth-elevation-Doppler tensors.

However, each of the datasets mentioned above contains only one type of radar, making it challenging for researchers to analyze and compare the performance of different 4D mmWave radars. The recently unveiled Dual Radar dataset[101], as illustrated in Figure 19, encompasses two distinct types of mmWave radars, the Arbe Phoenix and the ARS548 RDI radar. Dual Radar enables an investigation into the impact of different sparsity levels in radar point clouds on object detection performance, providing assistance in the selection of radar products.

In addition, several datasets enumerated in Table III provide radar data with relatively low angular resolution [77, 41]. These datasets, which are not detailed within the scope of this paper, provide alternative radar data characteristics that may be leveraged in different research contexts.

VI-B Datasets for SLAM

As its ability of perception in severe environments, the 4D mmWave radar enhances robust localization in difficult conditions, hence leading to the release of several 4D mmWave radar datasets specifically designed for localization, mapping and SLAM. Besides, any above dataset containing odometry information, not just those designed for SLAM, can also be employed for SLAM.

ColoRadar [19] comprising approximately 2 hours of data from radar, LiDAR, and the 6-DOF pose ground truth. It provides radar data of three processing levels: raw ADC data, 3D range-azimuth-elevation tensors derived by compressing the Doppler dimension of 4D radar tensors, and radar point clouds. This dataset collects data from a variety of unique indoor and outdoor environments, thus providing a diverse spectrum of sensor data.

Considering SLAM in severe environments, the MSC-RAD4R dataset [2] records data under a wide range of environmental conditions, with the same route yielding data from both clear and snowy weather for comparison. Additionally, MSC-RAD4R introduces artificially generated smoke environment data generated by a smoke machine, further emphasizing the robust capabilities of 4D mmWave radars.

The NTU4DRadLM dataset [102] is a recent contribution to this field captured using both robotic and vehicular platforms. Distinguished from its predecessors, NTU4DRadLM delivers an extensive array of localization-related sensor data, including the 4D mmWave radar, LiDAR, camera, IMU, GPS, and even a thermal camera. Furthermore, it encompasses a wide range of road conditions, encompassing structured, semi-structured, and unstructured roads, spanning both small-scale environments (e.g., gardens) and large-scale urban settings.

VI-C Challenge

Taking into account the datasets mentioned above, it becomes apparent that there is a lack of labeled radar ADC data required for deep radar detection. To address this deficiency, using synthetic data generation techniques based on 4D mmWave radar sensor models can be considered, though radar modeling is challenging due to multiple effects of radar signal processing like multi-path reflections and signal interference.

Moreover, the scales of current 4D mmWave radar datasets far below other famous autonomous driving datasets such as nuScenes[103] and ONCE[104]. For the evaluation of algorithmic generalizability and for facilitating comparative analysis with other sensor modalities, large scale 4D mmWave datasets are indispensable in the future.

VII Future Trends

4D mmWave radars have the potential to bring about transformative advancements in the field of autonomous vehicles. Nonetheless, it is far from mature at the moment. The prospective evolution of 4D mmWave radar technology in autonomous driving is likely to be contingent upon advancements in several key domains.

VII-A Noise and Sparsity Handling

Despite the superior resolution of 4D mmWave radars compared to traditional 3D radars, factors such as antenna design, power, and multi-path effects still lead to significant noise and sparsity issues, impacting the safety of autonomous driving applications.

VII-A1 Radar Data Generation

In recent years, the computer vision field has seen numerous studies on image super-resolution, and the related theories and models can be transferred to the mmWave radar domain for application. Learning-based methods show promise for generating higher-resolution radar data, such as replacing traditional signal processing steps like CFAR and DBF, is a promising direction. However, considering the large spectral data volume of mmWave radar (a single frame 4D radar tensor in the K-RADAR dataset [65] is around 200MB), related researches still necessitates improvements in real-time processing and addressing high bandwidth requirements for pre-CFAR data transmission. Compared to Transformer-based methods with quadratic time complexity, we believe that models using space state models [105, 106] with linear complexity will have better prospects in this field.

VII-A2 Application Algorithms Redesign

Current mmWave radar perception algorithms often construct dense BEV features using pillars or cylinders, which may not be optimal for noisy, lower-resolution mmWave radar data. We believe a more efficient form is to organize object features in a sparse query manner and aggregate sensor data features using attention mechanisms, a paradigm that has been verified in computer vision[107, 108, 109].

Additionally, the ’detection by tracking’ strategy[50] is also noteworthy. By leveraging Doppler information, a key feature distinguishing mmWave radar from LiDAR, dynamic information like scene flow can be estimated first. Temporal information can then be used for feature denoising and enhancement, followed by object detection head. We believe this paradigm has the potential to improve multi-object detection and tracking accuracy in complex dynamic autonomous driving scenarios.

VII-B Specialized Information Utilizing

Compared with LiDAR, the 4D mmWave radar is characterized by its specialized information, such as pre-CFAR data and Doppler information. Their comprehensive utilization holds great importance in the competition between 4D mmWave radar and LiDAR technologies.

VII-B1 Pre-CFAR Data

Regarding the distinctive data formats throughout the 4D mmWave radar signal processing workflow before CFAR, such as raw ADC data, RD maps, and 4D tensors, their utilization for perception and SLAM tasks represents an interesting yet largely unexplored area of research. The development of learning-based models that can effectively leverage the information contained within these data formats, while maintaining satisfactory real-time performance, could potentially emerge as a focal point in future research endeavors.

VII-B2 Doppler Information

The velocity measurement ability based on Doppler effect makes the 4D mmWave radar a unique sensor. Separating point clouds from static and dynamic empowers applications such as object detection, semantic segmentation, localization, and so on. Existing research has already made certain explorations into the utilization of Doppler information, but there are still great potential of it. For example, Doppler-based multiple object tracking may achieve higher accuracy than traditional methods. Besides, Doppler information is also an indispensable feature in network designing. Related architectures such as Doppler-based attention can promote better feature extraction of the scene.

VII-C Dataset Enriching

As with all other data-driven research domains, datasets pertaining to 4D mmWave radars play a significant role in facilitating related studies. Though as Section VI illustrates, there are already several datasets dedicated in 4D mmWave radar, the scale of each dataset and the standardization of 4D mmWave radars are still two main concerns about dataset.

VII-C1 Scale Expansion

The largest 4D mmWave radar dataset contains only 152K labeled frames [41]. As a comparison, there are 1.4M and 7M frames of image in nuscenes[103] and ONCE [104] dataset, respectively. To ensure the generalizability of 4D mmWave radar algorithms, dataset scale expansion is non-negligible.

VII-C2 Standardization

The 4D mmWave radar is industry just emerging recently, with many manufacturers being newly established startups. This brings about the problem of the type standardization of 4D mmWave radars. Existing datasets contains varying types of 4D mmWave radars with different parameters such as angular resolution and the largest detection range, which hinders the cross testing in different datasets. Therefore, the type standardization of 4D mmWave radars is also of great necessity.

VII-D Tasks Exploring

Although the integration of 4D mmWave radar into autonomous driving systems has shown promising advancements, we still find certain critical applications have not yet been extensively explored.

VII-D1 Scene Reconstruction and Generation

Scene reconstruction and generation are pivotal in synthesizing realistic scenarios from actual vehicular data, allowing for object manipulation within these scenarios to generate new data sets. These techniques are instrumental in producing a substantial amount of realistic data, accelerating testing cycles, and greatly alleviate the long-tail data challenges faced by autonomous driving. In recent years, numerous algorithms for scene reconstruction and generation tasks based on Neural Radiance Field(NeRF)[110] or 3D Gaussian Splatting[111] have emerged within the visual domain, but research integrating these methods with 4D mmWave radar remains sparse. The primary challenges in adapting these visual-based methods to mmWave radar technology include their lack of sensitivity to the electromagnetic wave reflection properties of different materials, resulting in discrepancies between the synthetically rendered point clouds and those captured by actual mmWave radar sensors.

VII-D2 4D Occupancy Prediction

Occupancy Prediction is an emerging task aimed at detecting irregularly-shaped and out-of-vocabulary objects, providing detailed occupancy states and semantic information for each spatial grid. However, existing methods based on vision or LiDAR [112, 113, 114] lack consideration of target velocity, making them difficult to apply to downstream decision-making and planning processes in autonomous driving. The incorporation of the Doppler effect, a feature inherent to mmWave radar, presents a novel opportunity. By harnessing the Doppler dimension, 4D mmWave radar could simultaneously estimate occupancy, semantic attributes, and velocity of objects within spatial grids, offering a comprehensive scene discription that could significantly enhance the performance and reliability of autonomous driving systems.

VIII Conclusion

This paper offers a comprehensive overview of the role and potential of 4D mmWave radars in autonomous driving. It sequentially delves into the background theory, learning-based data generation methods, application algorithms in perception and SLAM, and related datasets. Furthermore, it casts a forward-looking gaze towards future trends and potential avenues for innovation in this rapidly evolving field. The exploration of 4D mmWave radars within the scope of autonomous driving is an ongoing endeavor. This comprehensive review serves as both a primer for those new to the field and a resource for experienced researchers, offering insights into the current state of the art and highlighting the potential for future developments.

References

[1] M. Jiang, G. Xu, H. Pei, Z. Feng, S. Ma, H. Zhang, and W. Hong, “4D High-Resolution Imagery of Point Clouds for Automotive mmWave Radar,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–15, 2023.
[2] M. Choi, S. Yang, S. Han, Y. Lee, M. Lee, K. H. Choi, and K.-S. Kim, “MSC-RAD4R: ROS-Based Automotive Dataset With 4D Radar,” IEEE Robotics and Automation Letters, pp. 1–8, 2023.
[3] D. Brodeski, I. Bilik, and R. Giryes, “Deep Radar Detector,” in 2019 IEEE Radar Conference (RadarConf), Apr. 2019, pp. 1–6.
[4] Y. Cheng, J. Su, M. Jiang, and Y. Liu, “A Novel Radar Point Cloud Generation Method for Robot Environment Perception,” IEEE Transactions on Robotics, vol. 38, no. 6, pp. 3754–3773, Dec. 2022.
[5] Q. Yan and Y. Wang, “MVFAN: Multi-View Feature Assisted Network for 4D Radar Object Detection,” Oct. 2023.
[6] J. Liu, Q. Zhao, W. Xiong, T. Huang, Q.-L. Han, and B. Zhu, “SMURF: Spatial Multi-Representation Fusion for 3D Object Detection with 4D Imaging Radar,” IEEE Transactions on Intelligent Vehicles, pp. 1–14, 2023.
[7] Y. Zhuang, B. Wang, J. Huai, and M. Li, “4D iRIOM: 4D Imaging Radar Inertial Odometry and Mapping,” IEEE Robotics and Automation Letters, vol. 8, no. 6, pp. 3246–3253, Jun. 2023.
[8] G. Zhuo, S. Lu, H. Zhou, L. Zheng, and L. Xiong, “4DRVO-Net: Deep 4D Radar-Visual Odometry Using Multi-Modal and Multi-Scale Adaptive Fusion,” Aug. 2023.
[9] I. Bilik, O. Longman, S. Villeval, and J. Tabrikian, “The Rise of Radar for Autonomous Vehicles: Signal Processing Solutions and Future Research Directions,” IEEE Signal Processing Magazine, vol. 36, no. 5, pp. 20–31, Sep. 2019.
[10] A. Venon, Y. Dupuis, P. Vasseur, and P. Merriaux, “Millimeter Wave FMCW RADARs for Perception, Recognition and Localization in Automotive Applications: A Survey,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 3, pp. 533–555, Sep. 2022.
[11] K. Harlow, H. Jang, T. D. Barfoot, A. Kim, and C. Heckman, “A New Wave in Robotics: Survey on Recent mmWave Radar Applications in Robotics,” May 2023.
[12] T. Zhou, M. Yang, K. Jiang, H. Wong, and D. Yang, “MMW Radar-Based Technologies in Autonomous Driving: A Review,” Sensors, vol. 20, no. 24, p. 7283, Dec. 2020.
[13] Z. Wei, F. Zhang, S. Chang, Y. Liu, H. Wu, and Z. Feng, “MmWave Radar and Vision Fusion for Object Detection in Autonomous Driving: A Review,” Sensors, vol. 22, no. 7, p. 2542, Jan. 2022.
[14] L. Fan, J. Wang, Y. Chang, Y. Li, Y. Wang, and D. Cao, “4D mmWave Radar for Autonomous Driving Perception: A Comprehensive Survey,” IEEE Transactions on Intelligent Vehicles, pp. 1–15, 2024.
[15] J. Liu, G. Ding, Y. Xia, J. Sun, T. Huang, L. Xie, and B. Zhu, “Which Framework is Suitable for Online 3D Multi-Object Tracking for Autonomous Driving with Automotive 4D Imaging Radar?” in IV 2024, Apr. 2024.
[16] S. Abdulatif, Q. Wei, F. Aziz, B. Kleiner, and U. Schneider, “Micro-doppler based human-robot classification using ensemble and deep learning approaches,” in 2018 IEEE Radar Conference (RadarConf18), Apr. 2018, pp. 1043–1048.
[17] Y. Cheng, J. Su, H. Chen, and Y. Liu, “A New Automotive Radar 4D Point Clouds Detector by Using Deep Learning,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun. 2021, pp. 8398–8402.
[18] H. Rohling, “Radar CFAR Thresholding in Clutter and Multiple Target Situations,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-19, no. 4, pp. 608–621, Jul. 1983.
[19] A. Kramer, K. Harlow, C. Williams, and C. Heckman, “ColoRadar: The direct 3D millimeter wave radar dataset,” The International Journal of Robotics Research, vol. 41, no. 4, pp. 351–360, Apr. 2022.
[20] A. Och, C. Pfeffer, J. Schrattenecker, S. Schuster, and R. Weigel, “A Scalable 77 GHz Massive MIMO FMCW Radar by Cascading Fully-Integrated Transceivers,” in 2018 Asia-Pacific Microwave Conference (APMC), Nov. 2018, pp. 1235–1237.
[21] R. Q. Charles, H. Su, M. Kaichun, L. J. Guibas, P. Ritter, M. Geyer, T. Gloekler, X. Gai, T. Schwarzenberger, G. Tretter, Y. Yu, and G. Vogel, “A Fully Integrated 78 GHz Automotive Radar System-an-Chip in 22nm FD-SOI CMOS,” in 2020 17th European Radar Conference (EuRAD), Jan. 2021, pp. 57–60.
[22] J. Jiang, Y. Li, L. Zhao, and X. Liu, “Wideband MIMO Directional Antenna Array with a Simple Meta-material Decoupling Structure for X-Band Applications,” The Applied Computational Electromagnetics Society Journal (ACES), pp. 556–566, May 2020.
[23] Z. Wu, L. Zhang, and H. Liu, “Generalized Three-Dimensional Imaging Algorithms for Synthetic Aperture Radar With Metamaterial Apertures-Based Antenna,” IEEE Access, vol. 7, pp. 59 716–59 727, 2019.
[24] H.-W. Cho, W. Kim, S. Choi, M. Eo, S. Khang, and J. Kim, “Guided Generative Adversarial Network for Super Resolution of Imaging Radar,” in 2020 17th European Radar Conference (EuRAD). New York: Ieee, Jan. 2021, pp. 144–147.
[25] L. Zheng, Z. Ma, X. Zhu, B. Tan, S. Li, K. Long, W. Sun, S. Chen, L. Zhang, M. Wan, L. Huang, and J. Bai, “TJ4DRadSet: A 4D Radar Dataset for Autonomous Driving,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Oct. 2022, pp. 493–498.
[26] J. Domhof, J. F. P. Kooij, and D. M. Gavrila, “A Joint Extrinsic Calibration Tool for Radar, Camera and Lidar,” IEEE Transactions on Intelligent Vehicles, vol. 6, no. 3, pp. 571–582, Sep. 2021.
[27] L. Cheng, A. Sengupta, and S. Cao, “3D Radar and Camera Co-Calibration: A flexible and Accurate Method for Target-based Extrinsic Calibration,” in 2023 IEEE Radar Conference (RadarConf23), May 2023, pp. 1–6.
[28] Y. Bao, T. Mahler, A. Pieper, A. Schreiber, and M. Schulze, “Motion Based Online Calibration for 4D Imaging Radar in Autonomous Driving Applications,” in 2020 German Microwave Conference (GeMiC), Mar. 2020, pp. 108–111.
[29] E. Wise, J. Persic, C. Grebe, I. Petrovic, and J. Kelly, “A Continuous-Time Approach for 3D Radar-to-Camera Extrinsic Calibration,” in 2021 IEEE International Conference on Robotics and Automation (ICRA), May 2021, pp. 13 164–13 170.
[30] A. Dhall, K. Chelani, V. Radhakrishnan, and K. M. Krishna, “LiDAR-camera calibration using 3D-3D point correspondences,” arXiv preprint arXiv:1705.09785, 2017.
[31] Z. Pusztai and L. Hajder, “Accurate calibration of LiDAR-camera systems using ordinary boxes,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 394–402.
[32] C. Schöller, M. Schnettler, A. Krämmer, G. Hinz, M. Bakovic, M. Güzet, and A. Knoll, “Targetless Rotational Auto-Calibration of Radar and Camera for Intelligent Transportation Systems,” Jul. 2019.
[33] Y. Sun, H. Zhang, Z. Huang, and B. Liu, “R2P: A Deep Learning Model from mmWave Radar to Point Cloud,” in Lecture Notes in Computer Science, ser. Lecture Notes in Computer Science, E. Pimenidis, P. Angelov, C. Jayne, A. Papaleonidas, and M. Aydin, Eds. Cham: Springer International Publishing, 2022, pp. 329–341.
[34] R. Q. Charles, H. Su, M. Kaichun, L. J. Guibas, P. Ritter, M. Geyer, T. Gloekler, X. Gai, T. Schwarzenberger, G. Tretter, Y. Yu, and G. Vogel, “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation,” in 2020 17th European Radar Conference (EuRAD). Honolulu, HI: IEEE, Jan. 2021, pp. 77–85.
[35] W. Yuan, T. Khot, D. Held, C. Mertz, and M. Hebert, “PCN: Point Completion Network,” in 2018 International Conference on 3D Vision (3DV), Sep. 2018, pp. 728–737.
[36] Y. Sun, Z. Huang, H. Zhang, Z. Cao, and D. Xu, “3DRIMR: 3D Reconstruction and Imaging via mmWave Radar based on Deep Learning,” in 2021 IEEE International Performance, Computing, and Communications Conference (IPCCC), Oct. 2021, pp. 1–8.
[37] Y. Sun, Z. Huang, H. Zhang, and X. Liang, “3D Reconstruction of Multiple Objects by mmWave Radar on UAV,” in 2022 IEEE 19th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Oct. 2022, pp. 491–495.
[38] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space,” in Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc., 2017.
[39] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Lecture Notes in Computer Science, ser. Lecture Notes in Computer Science, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234–241.
[40] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic Segmentation using Adversarial Networks,” in NIPS Workshop on Adversarial Training, 2016.
[41] S. Madani, J. Guan, W. Ahmed, S. Gupta, and H. Hassanieh, “Radatron: Accurate Detection Using Multi-resolution Cascaded MIMO Radar,” in Lecture Notes in Computer Science, ser. Lecture Notes in Computer Science, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds. Cham: Springer Nature Switzerland, 2022, pp. 160–178.
[42] I. Orr, M. Cohen, and Z. Zalevsky, “High-resolution radar road segmentation using weakly supervised learning,” Nature Machine Intelligence, vol. 3, no. 3, pp. 239–246, Mar. 2021.
[43] U. Chipengo, “High Fidelity Physics-Based Simulation of a 512-Channel 4D-Radar Sensor for Automotive Applications,” IEEE Access, vol. 11, pp. 15 242–15 251, 2023.
[44] B. Tan, L. Zheng, Z. Ma, J. Bai, X. Zhu, and L. Huang, “Learning-based 4D Millimeter Wave Automotive Radar Sensor Model Simulation for Autonomous Driving Scenarios,” in 2023 7th International Conference on Machine Vision and Information Technology (CMVIT), Mar. 2023, pp. 123–128.
[45] L. Zheng, S. Li, B. Tan, L. Yang, S. Chen, L. Huang, J. Bai, X. Zhu, and Z. Ma, “RCFusion: Fusing 4-D Radar and Camera With Bird’s-Eye View Features for 3-D Object Detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023.
[46] B. Tan, Z. Ma, X. Zhu, S. Li, L. Zheng, S. Chen, L. Huang, and J. Bai, “3D Object Detection for Multiframe 4D Automotive Millimeter-Wave Radar Point Cloud,” IEEE Sensors Journal, vol. 23, no. 11, pp. 11 125–11 138, Jun. 2023.
[47] B. Xu, X. Zhang, L. Wang, X. Hu, Z. Li, S. Pan, J. Li, and Y. Deng, “RPFA-Net: A 4D RaDAR Pillar Feature Attention Network for 3D Object Detection,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Sep. 2021, pp. 3061–3066.
[48] H. Cui, J. Wu, J. Zhang, G. Chowdhary, and W. R. Norris, “3D Detection and Tracking for On-road Vehicles with a Monovision Camera and Dual Low-cost 4D mmWave Radars,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Sep. 2021, pp. 2931–2937.
[49] L. Wang, X. Zhang, J. Li, B. Xv, R. Fu, H. Chen, L. Yang, D. Jin, and L. Zhao, “Multi-Modal and Multi-Scale Fusion 3D Object Detection of 4D Radar and LiDAR for Autonomous Driving,” IEEE Transactions on Vehicular Technology, vol. 72, no. 5, pp. 5628–5641, May 2023.
[50] Z. Pan, F. Ding, H. Zhong, and C. X. Lu, “Moving Object Detection and Tracking with 4D Radar Point Cloud,” Sep. 2023.
[51] B. Tan, Z. Ma, X. Zhu, S. Li, L. Zheng, L. Huang, and J. Bai, “Tracking of Multiple Static and Dynamic Targets for 4D Automotive Millimeter-Wave Radar Point Cloud in Urban Environments,” Remote Sensing, vol. 15, no. 11, p. 2923, Jan. 2023.
[52] F. Ding, Z. Pan, Y. Deng, J. Deng, and C. X. Lu, “Self-Supervised Scene Flow Estimation With 4-D Automotive Radar,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 8233–8240, Jul. 2022.
[53] F. Ding, A. Palffy, D. M. Gavrila, and C. X. Lu, “Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9340–9349.
[54] A. Palffy, E. Pool, S. Baratam, J. F. P. Kooij, and D. M. Gavrila, “Multi-Class Road User Detection With 3+1D Radar in the View-of-Delft Dataset,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4961–4968, Apr. 2022.
[55] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “PointPillars: Fast Encoders for Object Detection From Point Clouds,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, pp. 12 689–12 697.
[56] T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3D Object Detection and Tracking,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 11 779–11 788.
[57] J. Li, C. Luo, and X. Yang, “PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 17 567–17 576.
[58] W. Xiong, J. Liu, T. Huang, Q.-L. Han, Y. Xia, and B. Zhu, “LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion,” Aug. 2023.
[59] W. Shi, Z. Zhu, K. Zhang, H. Chen, Z. Yu, and Y. Zhu, “SMIFormer: Learning Spatial Feature Representation for 3D Object Detection from 4D Imaging Radar via Multi-View Interactive Transformers,” Sensors, vol. 23, no. 23, p. 9429, Jan. 2023.
[60] X. Chen, T. Zhang, Y. Wang, Y. Wang, and H. Zhao, “FUTR3D: A Unified Sensor Fusion Framework for 3D Detection,” Apr. 2023.
[61] Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. L. Rus, and S. Han, “BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird’s-Eye View Representation,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), May 2023, pp. 2774–2781.
[62] M. Meyer and G. Kuschk, “Deep Learning Based 3D Object Detection for Automotive Radar and Camera,” in 2019 16th European Radar Conference (EuRAD), Oct. 2019, pp. 133–136.
[63] L. Wang, X. Zhang, B. Xv, J. Zhang, R. Fu, X. Wang, L. Zhu, H. Ren, P. Lu, J. Li, and H. Liu, “InterFusion: Interaction-based 4D Radar and LiDAR Fusion for 3D Object Detection,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2022, pp. 12 247–12 253.
[64] J. Rebut, A. Ouaknine, W. Malik, and P. Perez, “Raw High-Definition Radar for Multi-Task Learning,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 17 000–17 009.
[65] D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “K-Radar: 4D Radar Object Detection for Autonomous Driving in Various Weather Conditions,” in Thirty-Sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
[66] J. Giroux, M. Bouchard, and R. Laganiere, “T-FFTRadNet: Object Detection with Swin Vision Transformers from Raw ADC Radar Signals,” Mar. 2023.
[67] B. Yang, I. Khatri, M. Happold, and C. Chen, “ADCNet: Learning from Raw Radar Data via Distillation,” Dec. 2023.
[68] T. Boot, N. Cazin, W. Sanberg, and J. Vanschoren, “Efficient-DASH: Automated Radar Neural Network Design Across Tasks and Datasets,” in 2023 IEEE Intelligent Vehicles Symposium (IV), Jun. 2023, pp. 1–7.
[69] Y. Dalbah, J. Lahoud, and H. Cholakkal, “TransRadar: Adaptive-Directional Transformer for Real-Time Multi-View Radar Semantic Segmentation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 353–362.
[70] Y. Jin, A. Deligiannis, J.-C. Fuentes-Michel, and M. Vossiek, “Cross-Modal Supervision-Based Multitask Learning With Automotive Radar Raw Data,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 4, pp. 3012–3025, Apr. 2023.
[71] Y. Liu, F. Wang, N. Wang, and Z. Zhang, “Echoes Beyond Points: Unleashing the Power of Raw Radar Data in Multi-modality Fusion,” in Thirty-Seventh Conference on Neural Information Processing Systems, Nov. 2023.
[72] D.-H. Paek, S.-H. Kong, and K. T. Wijaya, “Enhanced K-Radar: Optimal Density Reduction to Improve Detection Performance and Accessibility of 4D Radar Tensor-based Object Detection,” in 2023 IEEE Intelligent Vehicles Symposium (IV), Jun. 2023, pp. 1–6.
[73] Y. Yan, Y. Mao, and B. Li, “SECOND: Sparsely Embedded Convolutional Detection,” Sensors, vol. 18, no. 10, p. 3337, Oct. 2018.
[74] X. Weng, J. Wang, D. Held, and K. Kitani, “3D Multi-Object Tracking: A Baseline and New Evaluation Metrics,” in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2020, pp. 10 359–10 366.
[75] M. L. Puri and C. R. Rao, “Augmenting Shapiro-Wilk Test for Normality,” in Contribution to Applied Statistics, W. J. Ziegler, Ed. Basel: Birkhäuser Basel, 1976, vol. 22, pp. 129–139.
[76] M. Meyer and G. Kuschk, “Automotive Radar Dataset for Deep Learning Based 3D Object Detection,” in 2019 16th European Radar Conference (EuRAD), Oct. 2019, pp. 129–132.
[77] F. E. Nowruzi, D. Kolhatkar, P. Kapoor, F. Al Hassanat, E. J. Heravi, R. Laganiere, J. Rebut, and W. Malik, “Deep Open Space Segmentation using Automotive Radar,” in 2020 IEEE MTT-S International Conference on Microwaves for Intelligent Mobility (ICMIM), Nov. 2020, pp. 1–4.
[78] A. Zhang, F. E. Nowruzi, and R. Laganiere, “RADDet: Range-Azimuth-Doppler based Radar Object Detection for Dynamic Road Users,” in 2021 18th Conference on Robots and Vision (CRV), vol. 19, May 2021, pp. 95–102.
[79] J. Ku, M. Mozifian, J. Lee, A. Harakeh, and S. L. Waslander, “Joint 3D Proposal Generation and Object Detection from View Aggregation,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Oct. 2018, pp. 1–8.
[80] A. Valada, R. Mohan, and W. Burgard, “Self-Supervised Model Adaptation for Multimodal Semantic Segmentation,” International Journal of Computer Vision, vol. 128, no. 5, pp. 1239–1285, May 2020.
[81] Y. Z. Ng, B. Choi, R. Tan, and L. Heng, “Continuous-time Radar-inertial Odometry for Automotive Radars,” Jan. 2022.
[82] S. Lu, G. Zhuo, L. Xiong, X. Zhu, L. Zheng, Z. He, M. Zhou, X. Lu, and J. Bai, “Efficient Deep-Learning 4D Automotive Radar Odometry Method,” IEEE Transactions on Intelligent Vehicles, pp. 1–15, 2023.
[83] C. Doer and G. F. Trommer, “Radar Visual Inertial Odometry and Radar Thermal Inertial Odometry: Robust Navigation even in Challenging Visual Conditions,” in Gyroscopy and Navigation, Sep. 2021, pp. 331–338.
[84] J. Michalczyk, R. Jung, and S. Weiss, “Tightly-Coupled EKF-Based Radar-Inertial Odometry,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Kyoto, Japan: IEEE, Oct. 2022, pp. 12 336–12 343.
[85] G. Kim and A. Kim, “Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid: IEEE, Oct. 2018, pp. 4802–4809.
[86] J. Zhang, H. Zhuge, Z. Wu, G. Peng, M. Wen, Y. Liu, and D. Wang, “4DRadarSLAM: A 4D Imaging Radar SLAM System for Large-scale Environments based on Pose Graph Optimization,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), May 2023, pp. 8333–8340.
[87] C. X. Lu, M. R. U. Saputra, P. Zhao, Y. Almalioglu, P. P. B. de Gusmao, C. Chen, K. Sun, N. Trigoni, and A. Markham, “milliEgo: Single-chip mmWave Radar Aided Egomotion Estimation via Deep Sensor Fusion,” Oct. 2020.
[88] C. Doer and G. F. Trommer, “An EKF Based Approach to Radar Inertial Odometry,” in 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Sep. 2020, pp. 152–159.
[89] X. Li, H. Zhang, and W. Chen, “4D Radar-Based Pose Graph SLAM With Ego-Velocity Pre-Integration Factor,” IEEE Robotics and Automation Letters, vol. 8, no. 8, pp. 5124–5131, Aug. 2023.
[90] A. Galeote-Luque, V. Kubelka, M. Magnusson, J.-R. Ruiz-Sarmiento, and J. Gonzalez-Jimenez, “Doppler-only Single-scan 3D Vehicle Odometry,” Oct. 2023.
[91] B. Wang, Y. Zhuang, and N. El-Bendary, “4D RADAR/IMU/GNSS INTEGRATED POSITIONING AND MAPPING FOR LARGE-SCALE ENVIRONMENTS,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLVIII-1/W2-2023, pp. 1223–1228, Dec. 2023.
[92] J. Zhang, R. Xiao, H. Li, Y. Liu, X. Suo, C. Hong, Z. Lin, and D. Wang, “4DRT-SLAM: Robust SLAM in Smoke Environments using 4D Radar and Thermal Camera based on Dense Deep Learnt Features,” in 10th IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and the 10th IEEE International Conference on Robotics, Automation and Mechatronics (RAM), Jun. 2023.
[93] H. Chen, Y. Liu, and Y. Cheng, “DRIO: Robust Radar-Inertial Odometry in Dynamic Environments,” IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5918–5925, Sep. 2023.
[94] C. Doer and G. F. Trommer, “Yaw aided Radar Inertial Odometry using Manhattan World Assumptions,” in 2021 28th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), May 2021, pp. 1–9.
[95] ——, “X-RIO: Radar Inertial Odometry with Multiple Radar Sensors and Yaw Aiding,” Gyroscopy and Navigation, vol. 12, no. 4, pp. 329–339, Dec. 2021.
[96] C. Doer, J. Atman, and G. F. Trnmmer, “GNSS aided Radar Inertial Odometry for UAS Flights in Challenging Conditions,” in 2022 IEEE Aerospace Conference (AERO), Mar. 2022, pp. 1–10.
[97] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, Mar. 1955.
[98] K. Koide, J. Miura, E. Menegatti, H.-W. Cho, W. Kim, S. Choi, M. Eo, S. Khang, and J. Kim, “A portable three-dimensional LIDAR-based system for long-term and wide-area people behavior measurement,” International Journal of Advanced Robotic Systems, vol. 16, no. 2, p. 172988141984153, Mar. 2019.
[99] A. Segal, D. Haehnel, and S. Thrun, “Generalized-ICP,” in Robotics: Science and Systems V, vol. 2. Seattle, WA, 2009, p. 435.
[100] P. Biber and W. Strasser, “The normal distributions transform: A new approach to laser scan matching,” in Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453), vol. 3. Las Vegas, Nevada, USA: IEEE, 2003, pp. 2743–2748.
[101] X. Zhang, L. Wang, J. Chen, C. Fang, L. Yang, Z. Song, G. Yang, Y. Wang, X. Zhang, J. Li, Z. Li, Q. Yang, Z. Zhang, and S. S. Ge, “Dual Radar: A Multi-modal Dataset with Dual 4D Radar for Autonomous Driving,” Nov. 2023.
[102] J. Zhang, H. Zhuge, Y. Liu, G. Peng, Z. Wu, H. Zhang, Q. Lyu, H. Li, C. Zhao, D. Kircali, S. Mharolkar, X. Yang, S. Yi, Y. Wang, and D. Wang, “NTU4DRadLM: 4D Radar-centric Multi-Modal Dataset for Localization and Mapping,” Sep. 2023.
[103] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuScenes: A Multimodal Dataset for Autonomous Driving,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA: IEEE, Jun. 2020, pp. 11 618–11 628.
[104] J. Mao, M. Niu, C. Jiang, H. Liang, J. Chen, X. Liang, Y. Li, C. Ye, W. Zhang, Z. Li, J. Yu, H. Xu, and C. Xu, “One Million Scenes for Autonomous Driving: ONCE Dataset,” Oct. 2021.
[105] A. Gu and T. Dao, “Mamba: Linear-Time Sequence Modeling with Selective State Spaces,” Dec. 2023.
[106] Y. Liu, Y. Tian, Y. Zhao, H. Yu, L. Xie, Y. Wang, Q. Ye, and Y. Liu, “VMamba: Visual State Space Model,” Jan. 2024.
[107] Y. Wang, V. C. Guizilini, T. Zhang, Y. Wang, H. Zhao, and J. Solomon, “DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries,” in Proceedings of the 5th Conference on Robot Learning. PMLR, Jan. 2022, pp. 180–191.
[108] X. Lin, T. Lin, Z. Pei, L. Huang, and Z. Su, “Sparse4D v2: Recurrent Temporal Fusion with Sparse Model,” May 2023.
[109] X. Jiang, S. Li, Y. Liu, S. Wang, F. Jia, T. Wang, L. Han, and X. Zhang, “Far3D: Expanding the Horizon for Surround-view 3D Object Detection,” Aug. 2023.
[110] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
[111] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42, no. 4, pp. 1–14, 2023.
[112] X. Tian, T. Jiang, L. Yun, Y. Mao, H. Yang, Y. Wang, Y. Wang, and H. Zhao, “Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving,” Apr. 2023.
[113] X. Wang, Z. Zhu, W. Xu, Y. Zhang, Y. Wei, X. Chi, Y. Ye, D. Du, J. Lu, and X. Wang, “OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Oct. 2023, pp. 17 804–17 813.
[114] Y. Wei, L. Zhao, W. Zheng, Z. Zhu, J. Zhou, and J. Lu, “SurroundOcc: Multi-camera 3D Occupancy Prediction for Autonomous Driving,” in ICCV 2023, 2023, pp. 21 729–21 740.

Thursday, May 2, 2024