Aerospace Electronic and Defense Systems: Detection of SAR Image Multiscale Ship Targets in Complex Inshore Scenes Based on Improved YOLOv5 | IEEE Journals & Magazine

Fig. 6. - Structure of the improved YOLOv5s.

Detection of SAR Image Multiscale Ship Targets in Complex Inshore Scenes Based on Improved YOLOv5 | IEEE Journals & Magazine | IEEE Xplore

; ; ; ; ;

Abstract:

Synthetic aperture radar (SAR) can operate around the clock and in all weather, and therefore high-resolution SAR images have been frequently applied for ship inspection. However, current ship target detection and identification methods have limited detection accuracy and lead to missing detection of small targets due to speckle noise caused by the imaging principle of SAR imagery and complex nearshore interference.

Therefore, this article proposes an improved YOLOv5 method to address the problem of low accuracy in multiship target detection tasks in complex scenes. The developed scheme enhances the ship target detection performance while reducing the number of parameters. Specifically,

first, we improve the size of the input SAR images and optimize the anchor frames of ship targets in the training dataset to locate small target ships more accurately.
Then, asymmetric pyramidal nonlocal block and sim attention mechanism are introduced to reduce nearshore background interference.
Additionally, to make the C3 module output richer and with more feature information, channel shuffling is performed after the C3 output to enhance the information exchange between channels.
Finally, to reduce the number of parameters and computational cost during model training, the normal convolution in the neck part is replaced with Ghost convolution.

The index F 1 of the proposed method on the high-resolution SAR image dataset and SAR ship detection dataset reaches the highest of 91.3% and 95.8%, respectively. MAPs (0.5:0.95) for the two datasets are both the highest, which are at least 2% higher than the suboptimal method.

In selected specific inshore scenes, the ship detection performance of the proposed method outperforms current advanced methods for multiscale ships. It is shown that the proposed method can extract ship features effectively in complex scenes and its effectiveness is further validated on the large-scene AIR–SARShip-1 dataset.

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 17)

Page(s): 5804 - 5823

Date of Publication: 27 February 2024

ISSN Information:

DOI: 10.1109/JSTARS.2024.3370722

Funding Agency:

SECTION I.

Introduction

With the development of synthetic aperture radar (SAR) technology, SAR image data explodes, with SAR image containing more information. Therefore, SAR image target detection need further in-depth and detailed research. In recent years, the in-depth development of deep-learning theory has led to a worldwide research boom on SAR image detection technology. Deep-learning methods can integrate the detection steps by automatically extracting important features from different targets and reducing the time required to select features and classifiers. Deep learning based on convolutional neural networks (CNNs) is considered one of the most comprehensive SAR image classification and detection approaches. Deep CNN extracts SAR image target features with powerful feature representation capability, avoiding the limitations of handcrafted target features and significantly reducing the workload. Thus, deep CNN-based methods can effectively complete target recognition tasks in various scenes. Therefore, deep-learning methods are applied to SAR image target detection. Using deep-learning methods has the advantages of low workload, high flexibility, strong generalization ability, and high accuracy. These features reduce the difficulty of SAR image target detection and pose an important research significance and practical value for agricultural and forestry management, military target monitoring, and disaster prevention.

CNN-based target detection models can be classified into two types as follows.

One-stage detectors: YOLOv3 [1], YOLOX [2], YOLOv4 [3], YOLOv5 [4], fully convolution one-stage object detection (FCOS) [5], and detection transformer (DETR) [6]. Among them, anchor-based detectors include YOLOv4 and YOLOv5, and anchor-free detectors include FCOS and DETR
Two-stage detectors: fast R-CNN [7] and faster R-CNN [8].

The above CNN target detection models comprise four main parts: input, the backbone, which is used for image feature extraction, the neck, which enhances the utilization of the backbone-extracted features, and the head, which detects the object's class and bounding box. In the field of SAR ship detection, many algorithms have been proposed for the improvement of the detection performance of the model. Lots of deep-learning-based SAR ship detection models involve attention mechanism, which are often used for multiscale feature extraction to detect ship targets of different sizes. Yang et al. [9] proposed a robust one-stage detector (ROSD) to mitigate the interference of complex backgrounds. ROSD introduced the coordinate attention and perceptual field increase modules to locate and distinguish between ship targets accurately. Moreover, Sun et al. [10] and Yang et al. [11] improved target localization performance in complex scenarios by developing category-position modules and multilayer feature attention mechanisms based on FCOS. Based on YOLOv5, Zhu et al. [12] added a prediction head to acquire more underlying features in the last part. In addition, the original prediction head was replaced by a transformer prediction head (TPH) with a self-attentive mechanism to improve the effect of multiscale target detection and improve further the model's detection capability. A novel ship target detector called squeeze and excitation rank (SER) faster R-CNN was proposed in [13]. A multiscale feature map concatenation strategy was used to improve the quality of shared feature maps extracted by CNN, with detection performance further enhanced by using squeeze and excitation mechanism. In ship detection, contextual information can help the model better understand the location, size, and shape of objects. Most of the above research methods capture global context information by adding attention mechanisms, thereby obtaining richer feature representations to improve detection accuracy.

Most of the deep-learning-based SAR ship detection models incorporate a feature pyramid network (FPN), which can effectively improve the ability to handle multiscale targets. A full-level context-squeezed excitation region of interest extractor was proposed in [14] to extract feature subsets at each level of FPN to retain multiscale features. A dense attention pyramid network for SAR images was proposed in [15], where the convolutional block attention module is introduced to each branch of the pyramidal network to obtain more semantic information for multiscale ship detection. A feature aggregation enhancement pyramid network and a new method called attention perception pyramid network were proposed in [16] and [17] to improve multiscale ship detection performance in SAR images. In [18], a feature fusion network based on taskwise attention FPN was designed to enhance multiscale feature representations. Through feature fusion, the model can simultaneously capture high-level semantic information and low-level detailed information to improve the detection capabilities of multiscale targets. There are some other multifeature fusion methods besides the use of FPN and its improvements. Xiao et al. [19] proposed a power transform and feature alignment bootstrap network to improve the effect of feature fusion, thereby improving multiscale detection capabilities. In [20], a CNN ship detection algorithm based on multiscale rotation-invariant Haar-like feature integration was proposed. This method was used for ship detection in a multitarget environment in SAR images to improve detection accuracy and performance. In [21], a novel YOLO-based arbitrary-oriented SAR ship detector using bidirectional feature fusion and angular classification was proposed. The model improved information interaction in the feature maps, which is helpful for detecting multiscale ships. A saliency-guided single shot multibox detector for target detection in SAR images was designed in [22]. The dense connection structure integrates lower level features and higher level features, which can introduce more context information. Multiscale fusion feature maps were used in the fully convolutional detection subnetwork in [23] to realize the fusion and comprehensive utilization of features, thereby enhancing the expression ability of multiscale features.

In SAR image ship detection, due to the complexity of the scene, there is an imbalance of positive and negative samples. During the training process, the model can more easily learn how to detect large targets while ignoring small targets, increasing the missed rate and affecting detection performance. An efficient YOLOX-based ship detection model was developed in [24] to solve the problem of high missed detection rate of small targets. Moreover, Zhang et al. [25] suggested a novel quad-FPN for SAR ship detection to enhance the feature extraction of small-sized ship targets. In [26], a deep ship detection method that learns from scratch was proposed with better performance in detecting small dense ships. A large-scale SAR image ship detection method with SSE attention module was proposed in [27] to avoid missing small targets and extract stronger semantic features. In [28], a CFAR-guided CNN was proposed to reduce the problem of missed detection of small ship targets. In [29], a contextual-region-based CNN with multilayer fusion was proposed to improve the detection performance of small ships. In [30], a local and global context fusion module was designed to retain more shallow features to improve the detection performance of small targets.

Most deep CNN-based ship target detectors focus on the detection performance and ignore computational complexity. Thus, Zhou et al. [31] proposed a lightweight anchorless ship detection network for SAR images. A novel target detection method was proposed in [32] to reduce the SAR ship detection time and space complexity. The 3S-YOLO network was proposed in [33] to improve the real-time performance of model detection. 3S-YOLO is a lightweight feature extraction and fusion network, ensuring the model's detection accuracy. In [34], the authors augmented the YOLOv4 backbone with a lightweight module and a coordinated attention mechanism to improve model detection performance. A SAR ship detection model named mask efficient adaptive network was proposed in [35]. Using a lightweight network structure effectively reduced calculation parameters and improved model detection accuracy. Improved YOLO models were proposed in [36], [37], and [38] to achieve model compression while improving accuracy. A lightweight backbone network based on deep dense sim attention mechanism (SimAM) network was introduced in [39]. Results demonstrate that the proposed algorithm performs well in terms of speed and accuracy and has better robustness and real-time performance compared to similar detection algorithms.

Due to SAR imaging characteristics, complex ship movement may lead to the defocus of image azimuth, which brings difficulty to detect ship targets accurately. The inshore background presents numerous interference factors, such as little and scattered buildings, containers, vehicles, short plank roads, cranes, and some hatch covers, which lead to the low accuracy of ship target detection. Furthermore, in such complex inshore areas, ship targets with multiple size information, namely multiscale targets, will increase the difficulty of ship detection effectively. Meanwhile, some dense small targets are easily ignored and missed. These factors make ship detection more complex and challenging. In order to adapt to multiscale ships, the algorithms must have rich feature representation capabilities to handle multiscale target detection. At the same time, for different scenes with small ship targets, detection algorithms need to be more sensitive to small targets. In addition, the large number of parameters in the network structure consumes a lot of computing resources, which makes it unsuitable for practical scenarios. Although current deep-learning-based SAR ship detection methods have achieved impressive results, there still exist some problems and challenges to achieve high detection accuracy in such complex scenes. Multiscale feature extraction methods may not integrate the feature information at different levels effectively. During the actual fusion process, the features at different scales cannot be strictly aligned. Moreover, they are unable to alleviate the concealment of high-level semantic information on the details of small targets. The above small ship target detection methods may not be valid to the multiscale targets in the same or different scenes. In addition, the balance between model computational complexity and performance remains to be investigated.

Aiming at the above problems, this article develops an improved YOLOv5s method to address the problem of low accuracy of ship target detection in complex inshore scenes. The main contributions of this article can be summarized as follows.

In order to more accurately locate small target ship information and reduce missed detections, we modify the sizes of the input SAR images and optimize the anchor frames of ship targets in the training dataset.
Asymmetric pyramidal nonlocal block and SimAM attention mechanisms are introduced to reduce nearshore background interference. Channel shuffling is performed after the C3 output to enhance the information exchange between channels, which helps to obtain more feature information to enhance the model's multiscale target detection capabilities and further improves detection accuracy.
Considering the number of parameters and computational cost during model training, the normal convolution in the neck part is replaced with Ghost convolution to make the model structure lighter.
Experimental results on a high-resolution SAR images dataset (HRSID) and the SAR ship detection dataset (SSDD) datasets show that ship targets can be effectively detected in different scenarios. Especially the detection performance for multiple targets in complex scenes is superior to other advanced methods. Moreover, the generalization of this article's method is also verified on the large-scene AIR-SARShip-1 dataset.

The rest of this article is organized as follows. Section II presents the YOLOv5 network, and Section III introduces the improved YOLOv5 framework. Section IV evaluated the proposed method on the HRSID, SSDD, and AIR-SARShip-1 datasets, demonstrating our method's effectiveness on ship target recognition in SAR images. Section V presents discussions. Finally, Section VI concludes this article.

SECTION II.

Related Work

A. YOLOv5

YOLOv5 is a real-time target detection algorithm that inherits the advantages of the YOLO series and optimizes them in many ways. Its main advantages include real-time, high accuracy, scalability, ease of use, automatic data enhancement, and multiscale prediction. Therefore, we will employ YOLOv5s as an example for a detailed introduction. The structure of YOLOv5s is illustrated in Fig. 1 comprising four parts. The first part is the input, where the image is input and preprocessed. The second part is the backbone, used for image feature extraction. The third part is the neck, which enhances the utilization of the backbone-extracted features. The last part is the head, which predicts the class and bounding box of the object.

Fig. 1.

Structure of YOLOv5s.

Saturday, March 16, 2024

Detection of SAR Image Multiscale Ship Targets in Complex Inshore Scenes Based on Improved YOLOv5 | IEEE Journals & Magazine | IEEE Xplore

Detection of SAR Image Multiscale Ship Targets in Complex Inshore Scenes Based on Improved YOLOv5 | IEEE Journals & Magazine | IEEE Xplore

Introduction

Related Work

A. YOLOv5

B. Selection of the YOLOv5 Model

Improved YOLOv5

A. Proposed Method

B. Image Preprocessing Before Input

C. Attentional Mechanisms

D. Neck Lightweight

Ship Target Detection Results

A. Datasets and Evaluation Metrics

B. Detection Results for Different Input Image Sizes

C. Ablation Experiments

D. Analysis of Detection Results Based on CNN Methods

E. Comparative Analysis With State-of-the-Art Methods

F. High-Resolution Complex Large-Scale SAR Image Verification

Discussion

A. Analysis of Visual Feature Map Results

B. Analysis of the Experimental Results

Conclusion

No comments:

Post a Comment

Anthropic Claude Turns Every User Into a No-Code App Developer