UAV Classification Based on Deep Learning Fusion of Multidimensional UAV Micro-Doppler Image Features | IEEE Journals & Magazine | IEEE Xplore
keywords: {Autonomous aerial vehicles;Spectrogram;Radar;Deep learning;Rotors;Time-frequency analysis;Feature extraction;Deep learning fusion;micro-doppler images;time–frequency analysis;unmanned aerial vehicle (UAV) classification},
In the realm of expanding unmanned aerial vehicle (UAV) applications and types, the precision of UAV target classification is of paramount importance. Deep learning has emerged as the linchpin of such endeavors.
A new approach based on deep learning fusion technique is proposed by our team, which integrates frequency modulated continuous wave (FMCW) radar micro-Doppler signals, cadence-velocity diagram (CVD) signals and cepstrum (CEP) signals. This synthesis culminates in UAV classification with exceptional accuracy, surpassing 97%.
In this letter, two deep learning fusion approaches leveraging the ResNet34 network were employed: data-level fusion and feature-level fusion. Empirical results unequivocally highlight the potency of deep learning information fusion—most notably, the fusion of the three spectrograms—exceeding 97% accuracy. This firmly underscores the pivotal role that deep learning fusion techniques play in amplifying precision in UAV target classification.
SECTION I. Introduction
In recent years, the rapid advancement of unmanned aerial vehicle (UAV) has revolutionized various fields, ranging from military and surveillance operations to civilian applications such as aerial photography, disaster management, and environmental monitoring [1]. Concurrently, UAV classification plays a pivotal role in enhancing tasks such as security and defense, efficient airspace management, infrastructure inspection, and environmental monitoring. Therefore, an imperative has arisen for precise and effective approaches in UAV classification, particularly in scenarios where conventional visual-based methodologies confront limitations imposed by factors such as inclement weather conditions or diminished visibility [2].
The standalone application of frequency-modulated continuous-wave (FMCW) radar presents significant challenges in capturing precise information from UAV targets. However, the integration of micro-Doppler analysis enables accurate extraction of the distinctive motion-related features of the targets from radar returns [3]. Oh et al. [4] propose an effective and efficient UAV classification system using FMCW radar echo signals. The proposed system consistently outperforms a commercial-off-the-shelf UAV classification system in terms of classification accuracy. Nevertheless, in practical applications, the system proposed above for FMCW radar echo processing is more complex and more difficult to implement.
Most of the traditional UAV classification methods, such as the one mentioned above, rely on the differences in target radar echoes without incorporating the micro-Doppler spectrogram images of the targets. Deep learning, specifically convolutional neural networks (CNNs), on the other hand, is more capable of fully utilizing the image information to classify the target. It has emerged as a powerful tool for image classification tasks across various domains [5]. The synergy between FMCW radar and CNNs presents a novel avenue for UAV target classification, where micro-Doppler signatures can be treated as image data, enabling the development of accurate and robust classification models.
Currently, various CNN methods are widely applied in the field of UAV classification. Passafiume et al. [6] proposed a compact model of micro-Doppler signals that can be effectively applied for the construction of deep learning datasets. Then, the UAV target classification task was accomplished with a limited measurement dataset by training the CNN network using a synthetic dataset [7]. Another approach is to utilize micro-Doppler spectrograms as input for deep learning classification [8]. Alternatively, it is possible to combine the micro-Doppler signal with other spectrograms, such as cadence-velocity diagram (CVD) spectrograms, by cropping and splicing the two types of diagrams into a single one [9]. Wang et al. [10] propose a CNN with two heads: one for classifying the input range-Doppler map patch into target present or target absent and the other for regressing the offset between the target and the patch center. To mitigate the problem of target feature stability, the work in [11] proposes a transfer learning-based parallel network with the spectrogram and the CVD as the inputs.
Despite numerous attempts at UAV classification based on CNN, the input features used are currently limited to information from a single spectrogram. The current methods have not achieved very high classification accuracy, primarily due to the underutilization of various types of valid data.
The aim of this letter is to explore the feasibility and effectiveness of using micro-Doppler images and other images for UAV target classification through deep learning fusion techniques. We propose a deep learning fusion method that leverages the information relationships among different images and data, as well as the inherent advantages of deep learning, to address the challenges associated with UAV target classification in complex environments. In addition to micro-Doppler time–frequency images, CVD images and cepstrum (CEP) images are combined as inputs for deep learning, and the information they carry is fully utilized through certain fusion strategies. In addition, the raw data are employed as an informative complement to the spectrogram data for fusion. ResNet34 is used as the network model, and a classification accuracy of 97% is achieved through data fusion.
SECTION II. Radar Dataset and Spectrogram Analysis
A. Dataset of UAV
The dataset employed in this study originates from [12]. This dataset contains radar measurements on birds, humans, and six different UAVs with a total of 75 868 samples. The sensor was an FMCW radar operating at 77 GHz with a mechanically scanning antenna. Moreover, the range of targets in the dataset is between 5 and 200 m, which brings about different signal-to-noise ratio (SNR) values, while ensuring the complexity of the dataset. This letter only focuses on UAVs to do classification research, so four of the typical UAV types are selected from the above dataset as the dataset for this letter, as shown in Fig. 1. They include T1 (quadcopter with longer propeller), T2 (quadcopter), T3 (hexacopter), and T4 (quadcopter with shorter propeller), with 1000 samples of each type taken at random.
B. Micro-Doppler Features and Spectrogram Analysis
Based on the helicopter rotor model proposed in [13], the echo model of a multirotor UAV can be represented as follows:
The Doppler frequency amplitude value at the tip of the leaf blade is maximum, and then, the maximum spread of the micro-Doppler frequency of the blade is
From this, the blade length of the rotor is deduced to be
The maximum micro-Doppler frequency of the UAV blades can be obtained from the time–frequency spectrum. As illustrated in Fig. 2, the data were processed using the short-time Fourier transform (STFT) to obtain the time–frequency spectrogram, and the CVD and CEP were derived from the data after STFT. The CVD and CEP are obtained by transforming STFT data in different dimensions. The STFT is defined as
The window function
SECTION III. Methodology
Through the analysis in Section II, it can be observed that there is some data correlation between time–frequency plots, CVD plots, and CEP plots. For the training of deep learning network, richer input data covering target features lead to better training and higher classification accuracy. In order to avoid the situation of using only a single data that may lead to more homogeneous input features, we propose a UAV target classification method based on deep learning fusion. Deep learning fusion can be performed at three levels, including data-level fusion, feature-level fusion, and decision-level fusion. In this letter, two different fusion strategies will be employed: data-level and feature-level fusion.
For data-level fusion, as depicted in Fig. 3, the three spectral images are taken as the input of the network simultaneously, and since the network input size is (
For feature-level fusion, it is observed that while STFT effectively extracts UAV features, it alone is not sufficient. This suggests that the information it carries may benefit from external supplementation. Feature fusion is performed through raw radar echo data and time–frequency spectrograms, as demonstrated in Fig. 4, with the 1-D data utilized as the information supplement for the spectrograms.
The spectrogram data and 1-D data are subjected to feature extraction through the corresponding network structure, and then, the obtained features are fused by means of concatenation, as shown in the following equation:
After the network feature extraction, due to the different characteristics of different dimensional data, the target features obtained by both have certain complementary properties, so the features are spliced after the flatten layer and then finally output through the linear layer, which can optimize the final results.
The raw data are sample data received by the radar. Each sample is a scan segment of five samples in range (after range compression) and 256 samples in azimuth centered on the target. The segments are stored as a 1280-element complex-valued vector where the first 256 samples correspond to the first range cell and so forth. The 1-D data used in this letter are obtained by taking the modulus of the complex data mentioned above.
SECTION IV. Simulation Results and Analysis
A. CNN Model Training
In this research, all models are implemented on a PC equipped with 32-GB memory, a 3.7-GHz Intel1 Core2 i5-12600k CPU, and an NVIDIA GeForce GTX3070Ti graphics card with 8 GB of video memory. Each model uses Python 3.10.6, PyCharm 2022.2.1, CUDA 12.0, and cuDNN11.7 and uses NVIDIA digits for image graphical interface processing.
In the experiment, the optimizer utilized is the Adam algorithm, the activation function employed is the linear rectification function (ReLU), and the cost function applied is the cross entropy. The initial learning rate is set to 0.0001.
According
to the convergence of the loss value during the training process, the
model training parameters are set as: the number of epochs is 100.
Besides, to prevent the overfitting problem, dropout is applied to
improve model generalization ability. The size of the extracted
spectrogram was
As demonstrated in Fig. 3, ResNet34 is employed as the primary model for classifying the data, and the ResNet34 model parameters are shown in [15].
B. Classification Effects Without Fusion Methods
For the rigor of the work, first, the raw data are fed directly into the deep learning network without any processing, where the input 1-D data have the format of (1, 1, 1280). The accuracy curve obtained is depicted in Fig. 5. It can be seen that although it is feasible to use the 1-D raw data directly for classification, the raw data are not processed in any way and carries too much interfering information, such as noise, thus undermining the final classification result.
The time–frequency spectrograms, CEP, and CVD maps obtained after STFT were then fed into the deep learning network for training, and their classification accuracy curves were obtained in accordance with Fig. 5. It can be seen that since STFT can effectively extract the UAV target information, especially the micro-motion features of the UAV rotor, when the spectrogram obtained after using STFT is used as an input for classification training, the effect is significantly better than the original 1-D data. Among them, the effect of using spectrogram plot is the best, while the effect of CVD and CEP is worse.
C. Classification Effects With Fusion Methods
With the fusion method proposed in Section III, the 1-D raw data and spectrogram were fused, and the three types of spectrograms were fused at the data level to obtain the accuracy curves (Fig. 5). From the accuracy results, it can be seen that both the data-level fusion and the fusion of 1-D raw data and spectrogram give better classification results than raw data or single image data. To analyze the classification effects of each fusion method, their respective confusion matrices were plotted.
Moreover, in order to reflect the advantages of the methods in this letter more obviously, the histogram of classification accuracy and the training time curve for each class of methods were plotted, as indicated in Fig. 6.
In addition, metrics, such as precision, recall, and specificity, were employed to assess the effectiveness of each classification method for different types of UAVs. For the general binary classification problem, the above metrics are defined as
For multivariate classification problems like the one studied in this letter, the correctly classified samples (actual label = predicted label) are distributed on the diagonal from top left to bottom right in the confusion matrix, where accuracy is defined as the ratio of the number of correctly classified (on the diagonal) samples to the total number of samples. Accuracy measures global sample prediction. For precision and recall, each class needs to calculate its precision and recall separately.
As discernible in Fig. 6, on the principle of accuracy-based, the two fusion methods proposed in this letter have more excellent performance, which are data-level fusion method: 97.125% and feature-level fusion method: 96.875%. Moreover, the data-level fusion method not only has the highest accuracy but also has a training time of 1.188 h, which is only increased by 0.178 h compared with the shortest training time of 1.01 h. This method is able to make the accuracy as high as possible without increasing the cost of training time too much.
Fig. 7 displays the confusion matrix diagram for fusion methods. Table I shows the precision, recall, and specificity parameters for fusion methods. For the data-level fusion method, its precision, recall, and specificity for the four UAVs are at a high level, while the specificity of the feature-level fusion method for the T2 UAV (quadcopter) is only 0.8, which shows that this method is not particularly effective in classifying and recognizing such kind of UAV.
SECTION V. Conclusion
This study presents a deep learning fusion-based UAV target classification method that can classify four different types of UAV targets. First, a micro-Doppler-spectral image is obtained through the STFT of the captured FMCW radar data, from which the CVD and CEP are subsequently derived. Then, two effective deep learning fusion methods, data-level fusion and feature-level fusion, are applied in our method. In data fusion, the grayscale images of the three feature maps are combined on the channel, and in feature fusion, feature fusion is performed using both the micro-Doppler spectrum and raw data. The experimental results demonstrate that the effectiveness of the proposed deep learning fusion approach can be more effective for UAV target classification than the single-input deep learning approach and can increase the accuracy of UAV target classification to more than 97%. Moreover, all experimental results were obtained under typical hardware conditions, indicating that the method proposed is not highly dependent on specialized hardware configurations. In addition, based on the architecture proposed in our approach, the underlying network model can be tailored to suit diverse application scenarios, thereby adapting to various practical contexts.
No comments:
Post a Comment