Tuesday, March 12, 2024

UAV Classification Based on Deep Learning Fusion of Multidimensional UAV Micro-Doppler Image Features | IEEE Journals & Magazine | IEEE Xplore

Fig. 3. - Schematic of the proposed methodology of data-level fusion.

UAV Classification Based on Deep Learning Fusion of Multidimensional UAV Micro-Doppler Image Features | IEEE Journals & Magazine | IEEE Xplore

Authors

The authors and their associated institutions mentioned in the paper are:

  1. Xu Chen, Chunguang Ma, Chaofan Zhao - School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
  2. Yong Luo - School of Electronic Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China - State Key Laboratory of Electromagnetic Space Cognition and Intelligent Control Technology, Beijing, China

The paper cites and discusses several prior related works:

  1. Oh et al. [4] proposed an effective UAV classification system using FMCW radar echo signals that outperformed commercial systems.
  2. Previous works [6]-[8] have used micro-Doppler signatures as image data for training CNN models for UAV classification.
  3. Kim et al. [9] combined micro-Doppler and cadence-velocity diagrams by cropping and splicing them.
  4. Wang et al. [10] used a two-headed CNN for UAV detection on range-Doppler maps.
  5. Jung et al. [11] proposed a transfer learning parallel network using spectrogram and CVD inputs.

However, the authors note that these previous methods were limited to using single types of spectrograms as input features, whereas their proposed approach fuses multiple spectrograms and raw data through data-level and feature-level fusion for improved accuracy.

Summary

This letter proposes a new approach for unmanned aerial vehicle (UAV) classification based on deep learning fusion of multidimensional UAV micro-Doppler image features. The key points are:

  1. It combines frequency modulated continuous wave (FMCW) radar micro-Doppler signals, cadence-velocity diagram (CVD) signals, and cepstrum (CEP) signals to extract UAV features.
  2. Two deep learning fusion approaches are employed: data-level fusion and feature-level fusion, using the ResNet34 network model.
  3. In data-level fusion, grayscale images of the three feature maps are combined on the channel as input to the network.
  4. In feature-level fusion, features from micro-Doppler spectrogram and raw radar data are fused through concatenation.
  5. Experimental results show the proposed fusion approaches can achieve over 97% accuracy in classifying four types of UAVs, outperforming single-input deep learning methods.
  6. The deep learning fusion of multidimensional micro-Doppler image features significantly enhances the precision of UAV target classification.

In summary, this work presents an effective deep learning fusion technique that integrates multiple micro-Doppler image representations for highly accurate UAV classification.

According to the letter, the following data and artifacts were used:

Data:

  1. A radar dataset containing measurements on birds, humans, and six different UAVs from an FMCW radar operating at 77 GHz.
  2. Four typical UAV types were selected from the dataset for classification: T1 (quadcopter with longer propeller), T2 (quadcopter), T3 (hexacopter), and T4 (quadcopter with shorter propeller), with 1000 samples of each type.
  3. The raw radar echo data which are 1280-element complex-valued vectors.

Artifacts/Features:

  1. Micro-Doppler time-frequency spectrograms obtained from the raw radar data using short-time Fourier transform (STFT).
  2. Cadence-velocity diagram (CVD) obtained by Fourier transforming the STFT data along the time dimension.
  3. Cepstrum (CEP) obtained from the STFT data using short-time cepstral analysis.

The three different types of spectrograms (micro-Doppler, CVD, CEP) extracted from the raw radar data were used as input feature maps for the deep learning fusion approaches proposed in this work for UAV classification.

Neural Network Model Architecture

The paper uses the ResNet34 network model for the deep learning UAV classification task. However, it does not provide details about the ResNet34 architecture itself. ResNet34 belongs to the family of Residual Networks (ResNets) introduced by He et al. in their 2015 paper "Deep Residual Learning for Image Recognition".

Here are some key points about the ResNet34 model:

  1. It is a 34-layer deep convolutional neural network architecture.
  2. It follows the residual learning framework, which introduces skip connections or shortcut connections that bypass layers and allow gradients to flow more easily during backpropagation.
  3. The core building block is the residual block, which contains 3x3 convolutional layers along with batch normalization and ReLU activations.
  4. It has several stacked residual blocks, with shortcut connections implementing identity mappings skipping over 2-3 residual blocks.
  5. Downsampling is performed directly by convolutional layers that have a stride of 2.
  6. The final layers are an average pooling and a fully-connected layer.

ResNet34 overcomes the degradation problem of very deep networks and allows training of much deeper networks than was possible before residual connections. It achieved state-of-the-art results on image classification benchmarks like ImageNet when introduced. The authors leverage this powerful architecture for the multi-input fusion of micro-Doppler images for UAV classification.

 

keywords: {Autonomous aerial vehicles;Spectrogram;Radar;Deep learning;Rotors;Time-frequency analysis;Feature extraction;Deep learning fusion;micro-doppler images;time–frequency analysis;unmanned aerial vehicle (UAV) classification},

Abstract:

In the realm of expanding unmanned aerial vehicle (UAV) applications and types, the precision of UAV target classification is of paramount importance. Deep learning has emerged as the linchpin of such endeavors. 

A new approach based on deep learning fusion technique is proposed by our team, which integrates frequency modulated continuous wave (FMCW) radar micro-Doppler signals, cadence-velocity diagram (CVD) signals and cepstrum (CEP) signals. This synthesis culminates in UAV classification with exceptional accuracy, surpassing 97%. 

In this letter, two deep learning fusion approaches leveraging the ResNet34 network were employed: data-level fusion and feature-level fusion. Empirical results unequivocally highlight the potency of deep learning information fusion—most notably, the fusion of the three spectrograms—exceeding 97% accuracy. This firmly underscores the pivotal role that deep learning fusion techniques play in amplifying precision in UAV target classification.

Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 21)
Article Sequence Number: 3503205
Date of Publication: 28 February 2024
ISSN Information:
Publisher: IEEE
Funding Agency:

SECTION I. Introduction

In recent years, the rapid advancement of unmanned aerial vehicle (UAV) has revolutionized various fields, ranging from military and surveillance operations to civilian applications such as aerial photography, disaster management, and environmental monitoring [1]. Concurrently, UAV classification plays a pivotal role in enhancing tasks such as security and defense, efficient airspace management, infrastructure inspection, and environmental monitoring. Therefore, an imperative has arisen for precise and effective approaches in UAV classification, particularly in scenarios where conventional visual-based methodologies confront limitations imposed by factors such as inclement weather conditions or diminished visibility [2].

The standalone application of frequency-modulated continuous-wave (FMCW) radar presents significant challenges in capturing precise information from UAV targets. However, the integration of micro-Doppler analysis enables accurate extraction of the distinctive motion-related features of the targets from radar returns [3]. Oh et al. [4] propose an effective and efficient UAV classification system using FMCW radar echo signals. The proposed system consistently outperforms a commercial-off-the-shelf UAV classification system in terms of classification accuracy. Nevertheless, in practical applications, the system proposed above for FMCW radar echo processing is more complex and more difficult to implement.

Most of the traditional UAV classification methods, such as the one mentioned above, rely on the differences in target radar echoes without incorporating the micro-Doppler spectrogram images of the targets. Deep learning, specifically convolutional neural networks (CNNs), on the other hand, is more capable of fully utilizing the image information to classify the target. It has emerged as a powerful tool for image classification tasks across various domains [5]. The synergy between FMCW radar and CNNs presents a novel avenue for UAV target classification, where micro-Doppler signatures can be treated as image data, enabling the development of accurate and robust classification models.

Currently, various CNN methods are widely applied in the field of UAV classification. Passafiume et al. [6] proposed a compact model of micro-Doppler signals that can be effectively applied for the construction of deep learning datasets. Then, the UAV target classification task was accomplished with a limited measurement dataset by training the CNN network using a synthetic dataset [7]. Another approach is to utilize micro-Doppler spectrograms as input for deep learning classification [8]. Alternatively, it is possible to combine the micro-Doppler signal with other spectrograms, such as cadence-velocity diagram (CVD) spectrograms, by cropping and splicing the two types of diagrams into a single one [9]. Wang et al. [10] propose a CNN with two heads: one for classifying the input range-Doppler map patch into target present or target absent and the other for regressing the offset between the target and the patch center. To mitigate the problem of target feature stability, the work in [11] proposes a transfer learning-based parallel network with the spectrogram and the CVD as the inputs.

Despite numerous attempts at UAV classification based on CNN, the input features used are currently limited to information from a single spectrogram. The current methods have not achieved very high classification accuracy, primarily due to the underutilization of various types of valid data.

The aim of this letter is to explore the feasibility and effectiveness of using micro-Doppler images and other images for UAV target classification through deep learning fusion techniques. We propose a deep learning fusion method that leverages the information relationships among different images and data, as well as the inherent advantages of deep learning, to address the challenges associated with UAV target classification in complex environments. In addition to micro-Doppler time–frequency images, CVD images and cepstrum (CEP) images are combined as inputs for deep learning, and the information they carry is fully utilized through certain fusion strategies. In addition, the raw data are employed as an informative complement to the spectrogram data for fusion. ResNet34 is used as the network model, and a classification accuracy of 97% is achieved through data fusion.

SECTION II. Radar Dataset and Spectrogram Analysis

A. Dataset of UAV

The dataset employed in this study originates from [12]. This dataset contains radar measurements on birds, humans, and six different UAVs with a total of 75 868 samples. The sensor was an FMCW radar operating at 77 GHz with a mechanically scanning antenna. Moreover, the range of targets in the dataset is between 5 and 200 m, which brings about different signal-to-noise ratio (SNR) values, while ensuring the complexity of the dataset. This letter only focuses on UAVs to do classification research, so four of the typical UAV types are selected from the above dataset as the dataset for this letter, as shown in Fig. 1. They include T1 (quadcopter with longer propeller), T2 (quadcopter), T3 (hexacopter), and T4 (quadcopter with shorter propeller), with 1000 samples of each type taken at random.

Fig. 1. - Four types of UAVs.
Fig. 1.

Four types of UAVs.

B. Micro-Doppler Features and Spectrogram Analysis

Based on the helicopter rotor model proposed in [13], the echo model of a multirotor UAV can be represented as follows:

sΣ(t)=r=1RLexp{j4πλ[R0r+z0rsinβr]}k=0N1sinc{4πλL2cosβrcos(Ωrt+φ0r+k2/N)}exp{jΦr,k(t)}(1)
View SourceRight-click on figure for MathML and additional features. in which
Φr,k(t)=4πλL2cosβrcos(Ωrt+φ0r+k2πN)(k=0,1,2,,N1;r=1,2,,R)(2)
View SourceRight-click on figure for MathML and additional features. where R is the total number of rotors, N is the total number of blades of a single rotor, L denotes the rotor blade length, R0r is the distance from the radar to the center of the r th rotor, Z0r denotes the height of the r th rotor blade, βr is the pitch angle of the radar to the center of the r th rotor (which is approximately equal to the pitch angle from the radar to the center of the UAV axis, i.e., β1=β2==βR=β),Ωr is the rotation angular frequency of the r th rotor, and φ0r is the initial rotation angle of the r th rotor. The instantaneous Doppler frequency of the echo signal can be obtained by taking the time derivative of the phase function of the signal, and the equivalent instantaneous micro-Doppler frequency of a scattering point P of the k th blade of the r th rotor is obtained by taking the time derivative of (2) as
fr,k,P(t)=2lPΩrλcosβsin(Ωrt+φ0r+k2πN)(3)
View SourceRight-click on figure for MathML and additional features. where lp is the distance from the scattering point P to the rotation center of the rotor blade and 0lPL .

The Doppler frequency amplitude value at the tip of the leaf blade is maximum, and then, the maximum spread of the micro-Doppler frequency of the blade is

fMax=2×2LΩrλcosβ.(4)
View SourceRight-click on figure for MathML and additional features.

From this, the blade length of the rotor is deduced to be

L=λfmax4Ωrcosβ.(5)
View SourceRight-click on figure for MathML and additional features.

The maximum micro-Doppler frequency of the UAV blades can be obtained from the time–frequency spectrum. As illustrated in Fig. 2, the data were processed using the short-time Fourier transform (STFT) to obtain the time–frequency spectrogram, and the CVD and CEP were derived from the data after STFT. The CVD and CEP are obtained by transforming STFT data in different dimensions. The STFT is defined as

 STFT(t,f)=x(τ)h(τt)ej2πfτdτ.(6)
View SourceRight-click on figure for MathML and additional features.

Fig. 2. - Relationship between STFT, spectrogram, CVD, and CEP.
Fig. 2. Relationship between STFT, spectrogram, CVD, and CEP.

The window function h(t) in the above equation is used in Gaussian in this letter. Congruently to the STFT and spectrogram, a short-time CEP [14] and “cepstrogram” have been proposed as

W{x[n]}(p,q)=|F1{10log(| STFT{x[n]}(p,q)|2)}|2(7)
View SourceRight-click on figure for MathML and additional features. in which F1 is the inverse Fourier transform. Also here, the calculation of W for a single p is referred to as a “trace” and corresponds to a single integration interval of N samples. The running variable of the CEP has the dimension of seconds and has been coined “quefrency.” The micro-Doppler periodicity expressed in hertz can be obtained by taking the inverse of the quefrency. A CVD is computed by Fourier transforming along the time dimension.

SECTION III. Methodology

Through the analysis in Section II, it can be observed that there is some data correlation between time–frequency plots, CVD plots, and CEP plots. For the training of deep learning network, richer input data covering target features lead to better training and higher classification accuracy. In order to avoid the situation of using only a single data that may lead to more homogeneous input features, we propose a UAV target classification method based on deep learning fusion. Deep learning fusion can be performed at three levels, including data-level fusion, feature-level fusion, and decision-level fusion. In this letter, two different fusion strategies will be employed: data-level and feature-level fusion.

For data-level fusion, as depicted in Fig. 3, the three spectral images are taken as the input of the network simultaneously, and since the network input size is (224×224×3 ) and the size of each class of images is also (224×224×3 ), it cannot be directly summed up. Therefore, the three kinds of image data are transformed into (224×224×1 ) through grayscale transformation, and then, they are superimposed on the channel to obtain the final size of (224×224×3 ) input data. The obtained input data are then fed into CNN network training to get the final classification.

Fig. 3. - Schematic of the proposed methodology of data-level fusion.
Fig. 3. Schematic of the proposed methodology of data-level fusion.

For feature-level fusion, it is observed that while STFT effectively extracts UAV features, it alone is not sufficient. This suggests that the information it carries may benefit from external supplementation. Feature fusion is performed through raw radar echo data and time–frequency spectrograms, as demonstrated in Fig. 4, with the 1-D data utilized as the information supplement for the spectrograms.

Fig. 4. - Schematic of the proposed methodology of feature-level fusion.
Fig. 4. Schematic of the proposed methodology of feature-level fusion.

The spectrogram data and 1-D data are subjected to feature extraction through the corresponding network structure, and then, the obtained features are fused by means of concatenation, as shown in the following equation:

F=concat(Finput1,Finput2).(8)
View SourceRight-click on figure for MathML and additional features.

After the network feature extraction, due to the different characteristics of different dimensional data, the target features obtained by both have certain complementary properties, so the features are spliced after the flatten layer and then finally output through the linear layer, which can optimize the final results.

The raw data are sample data received by the radar. Each sample is a scan segment of five samples in range (after range compression) and 256 samples in azimuth centered on the target. The segments are stored as a 1280-element complex-valued vector where the first 256 samples correspond to the first range cell and so forth. The 1-D data used in this letter are obtained by taking the modulus of the complex data mentioned above.

SECTION IV. Simulation Results and Analysis

A. CNN Model Training

In this research, all models are implemented on a PC equipped with 32-GB memory, a 3.7-GHz Intel1 Core2 i5-12600k CPU, and an NVIDIA GeForce GTX3070Ti graphics card with 8 GB of video memory. Each model uses Python 3.10.6, PyCharm 2022.2.1, CUDA 12.0, and cuDNN11.7 and uses NVIDIA digits for image graphical interface processing.

In the experiment, the optimizer utilized is the Adam algorithm, the activation function employed is the linear rectification function (ReLU), and the cost function applied is the cross entropy. The initial learning rate is set to 0.0001.

According to the convergence of the loss value during the training process, the model training parameters are set as: the number of epochs is 100. Besides, to prevent the overfitting problem, dropout is applied to improve model generalization ability. The size of the extracted spectrogram was 875×656 , but this was normalized to 224×224 for data training in order to increase the computational speed. In order to improve the generalization ability of the model, the data were divided into training and validation sets in an 8:2 ratio. The results of the accuracy curves shown in this letter are obtained under the validation set.

As demonstrated in Fig. 3, ResNet34 is employed as the primary model for classifying the data, and the ResNet34 model parameters are shown in [15].

B. Classification Effects Without Fusion Methods

For the rigor of the work, first, the raw data are fed directly into the deep learning network without any processing, where the input 1-D data have the format of (1, 1, 1280). The accuracy curve obtained is depicted in Fig. 5. It can be seen that although it is feasible to use the 1-D raw data directly for classification, the raw data are not processed in any way and carries too much interfering information, such as noise, thus undermining the final classification result.

Fig. 5. - Accuracy of different methods.
Fig. 5. Accuracy of different methods.

The time–frequency spectrograms, CEP, and CVD maps obtained after STFT were then fed into the deep learning network for training, and their classification accuracy curves were obtained in accordance with Fig. 5. It can be seen that since STFT can effectively extract the UAV target information, especially the micro-motion features of the UAV rotor, when the spectrogram obtained after using STFT is used as an input for classification training, the effect is significantly better than the original 1-D data. Among them, the effect of using spectrogram plot is the best, while the effect of CVD and CEP is worse.

C. Classification Effects With Fusion Methods

With the fusion method proposed in Section III, the 1-D raw data and spectrogram were fused, and the three types of spectrograms were fused at the data level to obtain the accuracy curves (Fig. 5). From the accuracy results, it can be seen that both the data-level fusion and the fusion of 1-D raw data and spectrogram give better classification results than raw data or single image data. To analyze the classification effects of each fusion method, their respective confusion matrices were plotted.

Moreover, in order to reflect the advantages of the methods in this letter more obviously, the histogram of classification accuracy and the training time curve for each class of methods were plotted, as indicated in Fig. 6.

Fig. 6. - Comparison of accuracy and training time for different methods.
Fig. 6. Comparison of accuracy and training time for different methods.

In addition, metrics, such as precision, recall, and specificity, were employed to assess the effectiveness of each classification method for different types of UAVs. For the general binary classification problem, the above metrics are defined as

 AccuracyPrecisionRecallSpecificity=TP+TNTP+TN+FP+FN=TPTP+FP=TPTP+FN=TNTN+FP.(9)(10)(11)(12)
View SourceRight-click on figure for MathML and additional features.

For multivariate classification problems like the one studied in this letter, the correctly classified samples (actual label = predicted label) are distributed on the diagonal from top left to bottom right in the confusion matrix, where accuracy is defined as the ratio of the number of correctly classified (on the diagonal) samples to the total number of samples. Accuracy measures global sample prediction. For precision and recall, each class needs to calculate its precision and recall separately.

As discernible in Fig. 6, on the principle of accuracy-based, the two fusion methods proposed in this letter have more excellent performance, which are data-level fusion method: 97.125% and feature-level fusion method: 96.875%. Moreover, the data-level fusion method not only has the highest accuracy but also has a training time of 1.188 h, which is only increased by 0.178 h compared with the shortest training time of 1.01 h. This method is able to make the accuracy as high as possible without increasing the cost of training time too much.

Fig. 7 displays the confusion matrix diagram for fusion methods. Table I shows the precision, recall, and specificity parameters for fusion methods. For the data-level fusion method, its precision, recall, and specificity for the four UAVs are at a high level, while the specificity of the feature-level fusion method for the T2 UAV (quadcopter) is only 0.8, which shows that this method is not particularly effective in classifying and recognizing such kind of UAV.

TABLE I Precision, Recall, and Specificity of Fig. 7(a) and (b)
Table I- 
Precision, Recall, and Specificity of Fig. 7(a) and (b)
Fig. 7. - Confusion matrices. (a) Data-level fusion. (b) Feature-level fusion.
Fig. 7. Confusion matrices. (a) Data-level fusion. (b) Feature-level fusion.

SECTION V. Conclusion

This study presents a deep learning fusion-based UAV target classification method that can classify four different types of UAV targets. First, a micro-Doppler-spectral image is obtained through the STFT of the captured FMCW radar data, from which the CVD and CEP are subsequently derived. Then, two effective deep learning fusion methods, data-level fusion and feature-level fusion, are applied in our method. In data fusion, the grayscale images of the three feature maps are combined on the channel, and in feature fusion, feature fusion is performed using both the micro-Doppler spectrum and raw data. The experimental results demonstrate that the effectiveness of the proposed deep learning fusion approach can be more effective for UAV target classification than the single-input deep learning approach and can increase the accuracy of UAV target classification to more than 97%. Moreover, all experimental results were obtained under typical hardware conditions, indicating that the method proposed is not highly dependent on specialized hardware configurations. In addition, based on the architecture proposed in our approach, the underlying network model can be tailored to suit diverse application scenarios, thereby adapting to various practical contexts.



 

No comments:

Post a Comment

Space-Air-Ground Integrated Wireless Networks for 6G: Basics, Key Technologies, and Future Trends | IEEE Journals & Magazine | IEEE Xplore

Multiple Integrated SAGINs An International Team of Scientists Outline Vision for Space-Air-Ground Integration in 6G Networks A comprehensi...