Figure 1. Improved Network Architecture

Chinese Research Team Achieves Breakthrough in Real-Time Ship Detection Using Advanced AI

A research team from the University of South China, led by Associate Professor Xiao Tang, has developed a groundbreaking method for detecting ships in radar imagery that could revolutionize maritime navigation and safety. The new system, which builds upon existing artificial intelligence technology, achieved an impressive 94.25% accuracy while maintaining real-time processing speeds.

The team's innovation, detailed in a recent IEEE journal publication, enhances the popular CenterNet detection system with several sophisticated improvements. By incorporating advanced neural network components and optimization techniques, they created a system that can rapidly identify ships in synthetic aperture radar (SAR) images, even in challenging conditions such as poor weather and low visibility.

"Our improved method significantly outperforms existing technologies, with a 5.26% increase in detection accuracy while maintaining processing speeds of 49 frames per second," explained Dr. Tang. The system's ability to detect small ships and vessels near shorelines – traditionally challenging scenarios for maritime surveillance – marks a particular advancement in the field.

The collaborative effort, which included researchers from the China Electronics Technology Group Corporation and Shanghai Academy of Spaceflight Technology, addresses a critical need in maritime safety and navigation. The system's real-time processing capabilities make it especially valuable for predicting ships' navigational intentions and preventing potential collisions or maritime accidents.

Working alongside Tang, team members Jiufeng Zhang and Yunzhi Xia played crucial roles in developing the system's innovative features, including a new attention mechanism that helps the AI focus on relevant details in radar images. Their work was supported by the National Natural Science Foundation of China, highlighting the project's national significance.

The breakthrough has immediate applications for maritime safety and could lead to improved traffic management in busy shipping lanes. While the team acknowledges some limitations with their current system, such as challenges with densely packed vessels, they are already working on enhancements that will integrate multiple data sources for even more accurate ship detection and tracking capabilities.

Summary

This paper introduces an improved real-time ship detection method for synthetic aperture radar (SAR) images based on CenterNet, aimed at enhancing navigational intent prediction. Here are the key points:

Main Contributions:
1. The researchers improved the original CenterNet network by:
- Adding feature pyramid fusion structure (FPN)
- Replacing upsampling deconvolution with Deformable Convolution Networks (DCNets)
- Integrating BiFormer attention mechanism and spatial pyramid pooling (SPP)
- Optimizing the loss functions using improved Focal Loss and Smooth L1 loss

2. Performance Results:
- Achieved Average Precision (AP) values of 82.87% on HRSID dataset and 94.25% on SSDD dataset
- Maintained detection speeds of 49 FPS on both datasets
- Showed improvements of 5.26% and 4.04% in AP compared to original CenterNet
- Outperformed other methods like Faster R-CNN, SSD, and YOLOv7-tiny

3. Key Advantages:
- Better feature extraction capabilities
- Improved detection of small and nearshore ships
- Enhanced accuracy while maintaining real-time processing speeds
- More robust performance in complex environments

Limitations/Future Work:
- Performance may degrade with dense or overlapping targets
- Challenges with unified data platforms in high-resolution images
- Resource constraints when deploying on edge devices
- Future work will focus on integrating multiple data sources and continuous model updates

The paper demonstrates that the improved CenterNet method provides a good balance between detection accuracy and processing speed, making it suitable for real-time SAR ship detection applications that support navigational intent prediction.

Major Developments in Detail

Here's a detailed breakdown of the main research contributions:

1. Enhanced Feature Extraction Architecture
- Integration of Feature Pyramid Fusion (FPN) structure into the backbone network
- This allows the system to process ship features at multiple scales simultaneously
- Replacement of traditional upsampling deconvolution with Deformable Convolution Networks (DCNets)
- DCNets enable adaptive adjustment of convolution kernel sizes based on ship features
- Results in more detailed and informative feature maps of ship targets
- Particularly effective for capturing complex ship characteristics in varying conditions

2. Advanced Attention and Pooling Mechanisms
- Implementation of BiFormer attention mechanism at the end of downsampling stage
- BiFormer uses Bi-level Routing Attention as its core building block
- Reduces computational burden while maintaining high performance
- Integration of Spatial Pyramid Pooling (SPP) module after feature fusion
- SPP enlarges the network's receptive field
- Enables recognition of ships at different scales and resolutions
- Particularly effective for identifying small ships and vessels near shorelines
- Helps manage complex background interference in coastal areas

3. Optimized Loss Function Design
- Enhancement of the heatmap Focal Loss for better target center point detection
- Replacement of traditional L1 loss with Smooth L1 loss for:
- Width and height measurements
- Center point offset calculations
- These improvements lead to:
- Better convergence speed during training
- Enhanced detection accuracy
- Improved model generalization
- More stable training process
- Reduced impact of outliers

4. Performance Improvements
- Significant accuracy gains:
- HRSID dataset: 82.87% AP (5.26% improvement)
- SSDD dataset: 94.25% AP (4.04% improvement)
- Maintained real-time processing capability:
- 49 FPS processing speed on both datasets
- Balanced trade-off between accuracy and speed
- Superior performance compared to existing methods:
- Outperformed Faster R-CNN, SSD, and YOLOv7-tiny
- Better handling of challenging detection scenarios

5. Architectural Efficiency
- Anchor-free design simplifies the detection process
- Reduced computational parameters compared to traditional methods
- Efficient deployment potential on edge devices
- Real-time processing capability maintained despite added features
- Effective balance between model complexity and performance

6. Practical Applications
- Enhanced capability for nearshore ship detection
- Improved small target recognition
- Better performance in complex environmental conditions
- Real-time monitoring support for navigational intent prediction
- Potential integration with existing maritime surveillance systems

These contributions collectively represent a significant advancement in SAR ship detection technology, offering both theoretical innovations and practical improvements for real-world applications in maritime safety and surveillance.

The research team's comprehensive approach to improving multiple aspects of the detection system - from feature extraction to loss function optimization - demonstrates a thorough understanding of both the technical challenges and practical requirements in the field of maritime surveillance and ship detection.

Figures and Tables

Here's a comprehensive breakdown of all figures and tables in the paper:

FIGURES:

1. Figure 1: Network Architecture of the Improved CenterNet Method
- Illustrates the complete network structure
- Shows preprocessing, backbone network, and detection heads
- Highlights FPN structure and component connections
- Demonstrates the flow from input to final detection output

2. Figure 2: Detailed Configuration of the SPP Block
- Shows the structure of the Spatial Pyramid Pooling module
- Illustrates different pooling layers and their connections
- Details the feature map processing pathway
- Demonstrates concatenation and output processes

3. Figure 3: Samples of HRSID Dataset
- Shows three types of samples:
a) Multi-scale ship samples
b) Inshore ship samples
c) Small ship samples

4. Figure 4: Samples of Official-SSDD Dataset
- Presents two types of samples:
a) Offshore ship samples
b) Inshore ship samples

5. Figure 5: P-R Curves of Different Improvement Experiments
- Shows Precision-Recall curves
- Compares performance of different experimental improvements

6. Figure 6: Loss Curves for Different Loss Functions
- Compares loss values over time for different functions
- Shows convergence patterns of various loss function implementations

7. Figure 7: Performance Comparison on HRSID Dataset
- Shows six sub-images:
a) Original label image 1
b) Original label image 2
c) CenterNet visualization results of image 1
d) CenterNet visualization results of image 2
e) Improved CenterNet visualization results of image 1
f) Improved CenterNet visualization results of image 2

8. Figure 8: Performance Comparison on SSDD Dataset
- Similar structure to Figure 7, showing detection results

TABLES:

1. Table I: Description of Experimental Setup
- Details the experimental environment
- Lists hardware and software configurations
- Specifies training parameters

2. Table II: Detailed Information of Different Dataset
- Compares characteristics of HRSID and SSDD datasets
- Includes image counts, ship counts, and other relevant metrics

3. Table III: Comparative Experiments of Different Backbone Network Performance
- Compares performance metrics of various backbone networks
- Includes AP50, AP50:95, and FPS measurements

4. Table IV: Ablation Experiments on the Performance of the Improved Method of CenterNet
- Shows impact of different improvements
- Details performance metrics for each modification

5. Table V: Comparative Experiments on the Performance of Different Attention Mechanisms
- Compares various attention mechanisms
- Includes precision, recall, and AP measurements

6. Table VI: Ablation Experiment on Improving the Performance of Loss Function
- Shows impact of different loss function modifications
- Includes performance metrics for each variant

7. Table VII: Performance Comparison Experiments of Different Models on HRSID and SSDD Datasets
- Compares the improved method with other detection models
- Shows comprehensive performance metrics across datasets

Each figure and table serves to validate the research findings and demonstrate the improvements achieved through the proposed methods. They provide both qualitative and quantitative evidence of the system's performance and effectiveness.

Artifacts

Datasets Used

HRSID Dataset

Type: SAR Ship Image Dataset
Composition: 5,604 SAR images containing 16,951 ships
Image Properties:
- Size: 800 x 800 pixels
- Resolution: 1-5m
- 25% overlap rate
Status: Publicly available
Independent Validation: Yes, can be used for benchmark comparison
No direct link provided in paper

Official-SSDD Dataset

Type: SAR Ship Image Dataset
Composition: 1,160 images containing 2,456 ships
Sources: Sentinel-1, TerraSAR, and RadarSat-2
Image Properties:
- Resolution: 1-15m
- Approximate size: 600 x 600 pixels
Status: Publicly available
Independent Validation: Yes, can be used for benchmark comparison
No direct link provided in paper

Code and Implementation

Model Implementation

No public code repository mentioned
No reference to implementation availability
Key components detailed in paper include:
- CenterNet modifications
- BiFormer attention mechanism
- Loss function implementations
- Feature extraction network

Validation Materials

Training parameters provided in Table I
Network architecture detailed in Figure 1
Detailed configuration of SPP block in Figure 2
No mention of publicly available model weights or configurations

Gaps in Artifact Availability

Missing Source Code
- Implementation details of improved CenterNet
- BiFormer attention mechanism integration
- Custom loss function implementations
- Training scripts and configurations
Missing Model Artifacts
- Trained model weights
- Model checkpoints
- Pre-trained models
Missing Validation Tools
- Evaluation scripts
- Performance measurement tools
- Testing frameworks

Recommendations for Independent Validation

Dataset Access
- Contact authors for dataset access information
- Use publicly available HRSID and SSDD datasets
- Follow dataset split ratios mentioned in paper (8:2 training/testing)
Implementation
- Follow detailed network architecture in Figure 1
- Implement loss functions as described in Section II
- Use provided experimental parameters from Table I
- Parameters to match:
  - Learning rate: 0.01
  - Batch size: 64
  - Input image size: 512 x 512
  - Training epochs: 200
  - Optimizer: SGD
Performance Validation
- Use metrics provided in paper:
  - Average Precision (AP)
  - Frames Per Second (FPS)
  - Precision-Recall curves
- Compare against baseline models mentioned:
  - Original CenterNet
  - Faster R-CNN
  - SSD
  - YOLOv7-tiny

Contact Information

Corresponding author: Yunzhi Xia
Email: yzxia@hust.edu.cn
Institution: University of South China

A Real-Time SAR Ship Detection Method Based on Improved CenterNet for Navigational Intent Prediction | IEEE Journals & Magazine | IEEE Xplore

Abstract:

Utilizing massive spatio-temporal sequence data and real-time synthetic aperture radar (SAR) ship target monitoring technology, it is possible to effectively predict the future trajectories and intents of ships. While real-time monitoring technology validates and adjusts spatio-temporal sequence prediction models, it still faces challenges, such as manual anchor box sizing and slow inference speeds due to large computational parameters. To address this challenge, a SAR ship target real-time detection method based on CenterNet is introduced in this article. The proposed method comprises the following steps. First, to improve the feature extraction capability of the original CenterNet network, we introduce a feature pyramid fusion structure and replace upsampled deconvolution with Deformable Convolution Networks (DCNets), which enable richer feature map outputs. Then, to identify nearshore and small target ships better, BiFormer attention mechanism and spatial pyramid pooling module are incorporated to enlarge the receptive field of network. Finally, to improve accuracy and convergence speed, we optimize the Focal loss of the heatmap and utilize Smooth L1 loss for width, height, and center point offsets, which enhance detection accuracy and generalization. Performance evaluations on two SAR image ship datasets, HRSID and SSDD, validate the method's effectiveness, achieving Average Precision (AP) values of 82.87% and 94.25%, representing improvements of 5.26% and 4.04% in AP compared to the original models, with detection speeds of 49 FPS on both datasets. These results underscore the superiority of the improved CenterNet method over other representative methods for SAR ship detection in overall performance.

Topic: AI boosted Spatio-Temporal Sequence Prediction for Remote Sensing images and data

Published in: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ( Volume: 17)

Page(s): 19467 - 19477

Date of Publication: 23 October 2024

ISSN Information:

DOI: 10.1109/JSTARS.2024.3485222

Funding Agency:

10.13039/501100001809-National Natural Science Foundation of China (Grant Number: 62302205 and 41904163)

SECTION I.

Introduction

The spatio-temporal sequence prediction technology is extensively utilized in the field of ship navigational intent prediction. It analyzes historical navigational data of ships at sea, including position, speed, heading, and other relevant information. By establishing spatio-temporal models, the future navigational path of the ship can be predicted, thus inferring its intent, such as destination port and collision avoidance strategy. Real-time monitoring technology acts as an effective complement to these spatio-temporal models [1], [2]. It optimizes the parameters and structures of prediction models to better adapt to the spatio-temporal characteristics of targets, thereby enhancing the predictions' accuracy and practicality.

With the rapid advancement of technologies such as wireless communication [3] and autonomous intelligent transportation [4], [5], the field of ship target detection has seen significant advancements. Traditional methods, particularly those based on visual images, often struggle with challenges posed by varying weather conditions, lighting, and occlusions. In contrast, synthetic aperture radar (SAR) imagery provides a more reliable source of data, capable of capturing high-resolution images regardless of weather and lighting conditions [6]. This makes SAR an invaluable tool for maritime surveillance and real-time ship detection. However, the unique characteristics of SAR imagery, such as speckle noise and complex background, present new challenges for detection algorithms. These challenges necessitate the development of specialized detection methods that can effectively process SAR data while maintaining high accuracy and efficiency, enabling real-time ship detection and monitoring. Therefore, an urgent and critical issue gradually emerges: How to develop a real-time ship target detection method with higher accuracy, thereby enabling faster and more precise prediction of ship navigational intent.

However, anchor-based real-time detection algorithms, particularly single-stage detectors like the YOLO series [7], require prior design of anchor box sizes, which may lead to inadequate detection performance when faced with diverse scales of SAR image backgrounds and ship targets [8]. In addition, these algorithms entail generating a large number of candidate boxes during target identification, resulting in substantial model parameters and slow training and inference speeds.

In order to address this issues, anchor-free detection algorithms have gained favor in the real-time ship target detection in SAR imagery [9]. Unlike anchor-based approaches, these algorithms transform ship target detection into a problem of locating and identifying key points of the targets. By directly focusing on key points without the need for box generation, these algorithms offer simplicity, intuitiveness, lower computational complexity, and shorter training and inference times, thus suitable for various edge deployment tasks with stringent computational resource requirements. Law et al. [10] introduced a corner-based object detection approach, CornerNet, which employed a novel method to keypoint detection in 2018. Building upon CornerNet, Zhou et al. [11] incorporated center-point detection and presents a new anchor-free object detection algorithm named CenterNet. Tian et al. [12] introduced FCOS, a pixel-wise prediction method for object detection, resembling other dense prediction problems like semantic segmentation. These three methods represent significant anchor-free object detection algorithms, serving as foundations for subsequent algorithmic enhancements.

Among these, although the CornerNet algorithm improves detection efficiency by using corner detection methods and reducing redundant candidate regions through the combination of corner pairs, its high computational complexity can result in suboptimal performance in certain specific scenarios. Therefore, the FCOS algorithms stands out for its ability to predict target boundaries more accurately by leveraging pixel-wise predictions, especially in cases where target boundaries are ambiguous or overlapping. It is widely utilized in the field of ship target detection, e.g., Zhu et al. [13] based on the FCOS algorithm, redesigned the feature extraction part, proposing an R-FCOS method with excellent recognition capabilities for small ship targets in large-scale SAR images. Yang et al. [14] introduced Improved-FCOS, an anchor-free detection method to address multiresolution and nearshore target detection issues. Chang et al. [15] proposed SPANet, an innovative anchor-free SAR ship detection method leveraging self-balanced position attention, which allows the network to efficiently manage complex land interference and dense target detection scenarios. All these improvements greatly enhance the detection performance of the FCOS. However, FCOS uses full convolution for pixel-by-pixel prediction of the target region, which results in a high number of parameters and slow inference speeds.

Conversely, CenterNet uses keypoints for prediction, achieving an optimal trade-off between detection accuracy and processing speed, making it suitable for real-time SAR ship target detection. Hou et al. [16] introduced an anchor-free detection algorithm leveraging CenterNet in 2021, improving detection accuracy for SAR image ships through enhanced convolutional structures and activation functions. Guo et al. [17] introduced CenterNet++, which enhances recognition capabilities for small ship targets in SAR images through feature refinement, feature pyramid fusion, and head enhancement modules. Zhang et al. [18] proposed an improved CenterNet algorithm by enhancing small target detection capabilities through improved deep aggregation networks. Zhang et al. [19] combined CenterNet with saliency detection methods to enhance model recognition of nearshore ships. Li et al. [20] introduced an enhanced CenterNet algorithm with a multilevel pyramid feature extraction module to improve CenterNet's feature extraction capabilities.

With the emergence of the aforementioned anchor-free object detection methods, the focus has gradually shifted towards keypoint prediction. These models are simple and efficient, making them suitable for various edge deployment tasks with stringent computational resource requirements, such as real-time target detection in SAR imagery. However, the CenterNet algorithm, due to its small model parameters and simple structure, exhibits insufficient accuracy when directly applied to SAR image target detection. To tackle these challenges, we presents a real-time detection method for SAR ship target based on CenterNet, which can better balance detection performance and recognition speed. The results show that after the improvement, the Average Precision (AP) value of this method increased by 5.26% and 4.04% on the HRSID and SSDD datasets, respectively, with a detection speed of 49 frames per second (FPS) on both datasets. The main contributions of this work are as follows.

To improve the feature extraction capability of the original CenterNet backbone network for ship target, we incorporated the feature pyramid fusion structure (FPN) and replaced the upsampling deconvolution with Deformable Convolution Networks (DCNets) [21] in backbone network. This modification produces more informative feature maps of ship targets.
To better identify nearshore and small target ships in complex scenes, we integrated BiFormer attention module [22] at the end of the downsampling stage. In addition, we introduced the spatial pyramid pooling (SPP) structure after feature fusion. These two modifications enlarge the receptive field of network and enable the recognition of ships at different scales.
To further enhance the detection accuracy and convergence speed of the model, we improved the Focal Loss [23] for heatmap and utilized Smooth L1 [24] loss for width, height, and center point offset, thereby strengthening the model's detection accuracy and generalization performance.

SECTION II.

Method

This section presents a comprehensive overview of the improved CenterNet method, focusing on its network structure and key components.

A. Network Structure

The network structure of the improved CenterNet method comprises three main components: the preprocessing of input images, the backbone network tasked with extracting ship feature maps, and the detection heads for target class identification. The detailed network architecture is depicted in Fig. 1.

Fig. 1.

Network architecture of the improved CenterNet method.

Show All

1) Preprocessing

The input SAR ship images, sized at 512 × 512, undergo initial downsampling via a convolutional layer with a 7 × 7 kernel and a stride of 2. ReLU activation and Batch normalization are applied subsequently to reduce the sensitivity of parameters during training. Then, a max-pooling layer is employed for dimensionality reduction, emphasizing the texture features of ship targets.

2) Backbone

The backbone structure consists of the ResNet50 network [25], BiFormer attention mechanism, and deformable convolution networks (DCNets). The preliminary feature maps obtained from preprocessing undergo continuous downsampling operations (C1, C2, C3, and C4), resulting in a low-resolution feature map of size 16×16×2048. Subsequently, the BiFormer attention block is applied to focus the network more on ship features, enhancing the recognition of small target ships. Finally, after three successive upsampling operations (D1, D2, and D3), a high-resolution feature map of size 128×128×64 is obtained and fused with the feature map from C1 to yield the feature extraction results. During this process, the FPN structure (depicted by red dashed lines in the figure) is employed to enhance the upsampling part of the network. Specifically, the output feature maps from C2 and C3 are added to the input positions of D2 and D3, enriching the information of the feature maps and enabling the network to learn more ship target features. In addition, DCNets are utilized to replace the original deconvolution in the upsampling network, allowing the new upsampling network to adaptively adjust the scale of convolutions, thereby obtaining feature maps with stronger representational capabilities.

3) Head

The backbone network first pass through the SPP module to increase the network's receptive field, enabling it to adapt to images of different resolutions and further optimizing the extracted features. Subsequently, the optimized feature maps are fed into three branch networks, each comprising a 3 × 3 convolution followed by a 1 × 1 convolution. These branches generate the heatmap for locating the target center point, the width-height map representing the predicted object dimensions, and the offset map representing the target center point offsets, respectively. Finally, the outputs from these three detection heads are mapped back to the SAR images, completing the detection of ship targets.

B. Components

The three key improvements of this method are the incorporation of the SPP module, BiFormer and DCNets, and the loss function, as outlined below.

1) SPP Module

This article introduces a SPP module aimed at enlarging the network's receptive field, optimizing feature extraction results, and enhancing the ability to recognize information of various scales. The detailed configuration of the SPP block is depicted in Fig. 2.

Fig. 2.

Detailed configuration of the SPP block.

Show All

The results shows that, firstly, the high-resolution feature map obtained from feature extraction undergoes further feature extraction via a 3 × 3 ScConv [26] convolution, followed by a 1 × 1 CBL (a 1 × 1 convolution with Batch Normalization and LeakyReLU activation) to expand the feature map's channel count. Subsequently, the expanded feature map undergoes three different pooling layers: 3 × 3 Max Pooling, 5 × 5 Max Pooling, and 7 × 7 Max Pooling, aimed at identifying features of different scales. The pooled feature maps are then concatenated and passed through a 1 × 1 CBL operation with the initial input feature map to achieve feature maps with consistent channel counts. Finally, the concatenated feature maps are passed through a 1 × 1 CBL to yield the final output. Due to the increased sensitivity of training parameters introduced by residual structures, the purpose of the 1 × 1 CBL here is to mitigate the negative impact of residual connections through layer normalization.

2) BiFormer and DcNets

For SAR ship detection methods, attention mechanisms are commonly incorporated into networks to enhance feature extraction. However, this often leads to significant resource consumption and computational burdens, posing challenges for model deployment and application [27]. In this section, BiFormer attention mechanism is introduced to effectively extract image features. It utilizes Bi-level Routing Attention as its core building block, which reduces the computational burden and enables flexible calculations [22]. This approach improves the performance in recognizing features of small targets while simultaneously reducing computational load. In addition, DCNets are used in the upsampling process, allowing for adaptive recognition of the distribution of ship features, thereby assisting the backbone network in obtaining purer ship feature. Unlike graph convolutions [28], which are used for processing graph-structured data, deformable convolutions are primarily used to enhance the flexibility of 2-D image processing. The main aspects are outlined as follows.

To strengthen the recognition of small target ships and reduce computational burden, the BiFormer block is inserted at the end of down-sampling layer C4, as illustrated in Fig. 1. This enables the extraction of more accurate ship feature information.
To enable the backbone network to generate a feature map with stronger characterization ability, DCNets replace the deconvolution layers in the up-sampling process (D1, D2, D3). This allows the network to adaptively adjust the scale of convolution and ultimately obtain more detailed feature information about ship targets.
To provide richer feature information for the backbone network, an FPN structure is added to the feature extraction network, as indicated by the red dashed section in Fig. 1. This enables the network to fuse ship information at different scales, thereby improving detection performance.

3) Loss Function

The loss function utilized in this method comprises of three components: 1) heatmap loss; 2) center point offset loss; and 3) bounding box size loss. In the CenterNet heatmap, Gaussian kernels [29] are utilized to generate distance label maps near the center points, where the value of the center point is 1, gradually decreasing towards the periphery. Specifically, it is represented as follows:

Y x y c = exp (- ( x - x ^ ) 2 + ( y - y ^ ) 2 2 σ 2 p) (1)

View Source

where x^ and y^ represent the predicted coordinates of the center point of the target, σp represents the standard deviation of the sample point P(x,y), indicating the coverage radius of the distance value. Typically, it is set to 1. In scenarios where multiple targets overlap within the same image, the generated distance label maps may exhibit overlapping values. To address this issue, it suffices to select only the maximum value.

The heatmap loss in this method employs Focal Loss as the heatmap loss for target center points, expressed as

Lheat=−1N∑xyc{(1−Y^xyc)αlog(Y^xyc),(1−Yxyc)β(Y^xyc)αlog(1−Y^xyc),if Yxyc=1otherwise(2)

View Source

where α and β are adjustable hyperparameters, typically set to 2 and 4, respectively. Yxyc represents the center point label value, where 1 indicates a positive sample. Y^xyc denotes the predicted center point label value, and N represents the number of sample targets in an image.

For the center point offset loss, it employs L1 loss as the loss function for target center point offsets, expressed as

L offset = 1 N \sum p | O^p' - (p / R - p') | (3)

View Source

where N represents the number of points with center point label values equal to 1in the heatmap, O^p′ represents the predicted target offset, R represents the downsampling factor, usually set to 4, p represents the predicted coordinate of the center point, and p′ represents the coordinate of the feature map's center point. Since using L1 loss as the target loss may result in nonsmooth gradients at zero, causing the loss of minimum values, the improved center point offset loss adopts the Smooth L1 loss. The formula for the improved single target center offset loss is as follows:

Loffset1={|O^p′|−0.5,0.5(O^p′)2,|O^p′|>1|O^p′|<1.(4)

View Source

In comparison to L1 loss, the Smooth L1 loss converges faster, is more robust to outliers compared to L2 loss [30], and has smaller gradient changes, resulting in more stable training.

For the bounding box size loss in the proposed algorithm, it also adopts L1 loss, expressed as

L WH = 1 N \sum k = 1 N | (S^k) - S k | . (5)

View Source

The improved bounding box size loss also utilizes Smooth L1 loss, expressed as

LWH1={|(S^k)|−0.5,0.5(S^k)2,|(S^k)|>1|(S^k)|<1.(6)

View Source

The loss function is computed as the weighted sum of three individual losses. Each loss is assigned specific weight parameters to ensure a balanced loss. The formula is expressed as follows:

L sum = L heat 1 + λ offset \times L offset 1 + λ WH \times L WH 1 (7)

View Source

where λoffset is set to 1, λWH is set to 0.1, and the heatmap loss weight parameter is set to 1. The new loss function can accurately identify the target's center point, increase the focus on hard samples, enhance the model's detection accuracy, and minimize the loss value.

SECTION III.

Experiment and Results

To estimate the performance of the improved CenterNet method, a series of experiments were set up involving modifications to the activation function and various network structures of the comparative models. These results can further guide the optimization of the model.

A. Experimental Environment

All experiments in this section utilized the HRSID dataset, which encompasses multiscale and multitype training sets, facilitating effective evaluation of experimental results. Consistency was maintained across all experiments: The training and testing set split ratio was 8:2, training epochs were set to 200, SGD optimizer was utilized with an initial learning rate of 0.01, batch size was set to 64, and input image size was 512 × 512. Detailed configurations of the experimental setup are presented in Table I.

TABLE I Description of Experimental Setup

The choice of learning rate and batch size are crucial for the performance of the model. In our experiments, we found that a learning rate of 0.01 provided a good balance between convergence speed and stability, avoiding the issues of oscillation with higher rates and slow convergence with lower rates. A batch size of 64 was selected as it offered a good compromise between training speed and model generalization. Smaller batch sizes, like 32, led to longer training times but potentially better generalization, while larger batch sizes, like 128, sped up training but risked overfitting to the training data.

B. Datasets

Owing to the rapid development of SAR imaging technology, SAR ship image datasets have become increasingly abundant, such as OpenSARShip-2.0, SSDD, SAR-Ship-Dataset, AIR-SARShip-1.0, HRSID, and so on. These datasets have provided strong support for the advancement of SAR ship target detection.

1) HRSID Dataset

The HRSID dataset, introduced by Wei et al. [31], comprises 5604 SAR images and 16951 ships. These images offer a panoramic view with a 25% overlap rate and resolutions ranging from 1 to 5 m, each sized at 800 × 800 pixels. Due to its diverse representation of ship targets, this dataset is commonly introduced for multiscale SAR ship target detection. Sample images extracted from the HRSID dataset are depicted in Fig. 3.

Fig. 3.

Samples of HRSID dataset. (a) Multi-scale ship samples, (b) inshore ship samples, and (c) small ship samples.

Show All

2) SSDD Dataset

The SSDD dataset, presented by Li et al. [32], includes 1160 images and 2456 ships collected from Sentinel-1, TerraSAR, and RadarSat-2, featuring resolutions ranging from 1 to 15 m. Every image has an approximate size of 600 pixels in length and width. In order to make full use of SSDD dateset, an official version of the SSDD dataset was presented by Zhang et al. [33], which offers specific criteria for usage. Sample images from the Official-SSDD dataset are illustrated in Fig. 4. The detailed information of the SSDD dataset and its comparison to the HRSID dataset are presented in Table II [9].

TABLE II Detailed Information of Different Dataset

Fig. 4.

Samples of Official-SSDD dataset. (a) Offshore ship samples and (b) inshore ship samples.

Show All

C. Performance Metrics

When assessing the effectiveness of the proposed method, three primary performance metrics are employed: Precision, Recall, and AP. These metrics are composed of four core components: True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). TP represents the instances where positive samples are correctly predicted as ships, while FP denotes the instances where positive samples are incorrectly predicted as ships. TN accounts for the correct classification of negative samples as background, whereas FN denotes the misclassification of negative samples as background clutter. Precision, denoted as P, quantifies the ratio of accurately predicted ships and is mathematically expressed as

P = TP TP+FP \cdot (8)

View Source

Moreover, Recall, denoted as R, calculates the proportion of correctly retrieved ships and is given by

R = TP TP+FN \cdot (9)

View Source

In addition, AP is computed based on the Precision-Recall (P-R) curve, which integrates Precision over Recall

AP = \int 10 P (R) d R (10)

View Source

where the function P(R) signifies the P-R curve. AP provides a single metric that summarizes the model's accuracy across different recall levels, making it a useful evaluation metric for object detection tasks. Higher AP values indicate superior performance of the ship detection method concerning Precision and Recall.

To assess the recognition speed of this method, another primary performance metric, FPS, is also introduced. FPS measures the operational speed of object detection algorithms, denoting the number of images processed by the algorithm per unit time. The specific formula is as follows:

FPS = 1 T (11)

View Source

where T denotes the time required for the algorithm to process one frame of data. The unit of frames per second is FPS.

D. Experiment Comparison and Analysis

1) Effect of CenterNet

To select a suitable backbone structure for the proposed method, a comparative experiment was conducted to evaluate the detection performance of DLA-34, Hourglass-101, ResNet-18, ResNet-50, and ResNet-101 backbone networks, as shown in Table III. Note that AP50 represents the Average Precision at Intersection over Union (IoU) 0.5, and AP50:95 represents the Average Precision ranging from IoU 0.5 to 0.95.

TABLE III Comparative Experiments of Different Backbone Network Performance

It is evident that employing Hourglass-101 as the backbone network yields the highest detection accuracy, with an AP50 of 82.84% and an AP50:95 of 50.66%. However, this model is the most complex, resulting in the slowest detection speed, at 14 FPS. Conversely, when ResNet-18 is applied as the backbone network results in the lowest detection accuracy, achieving an AP50 of 72.25% and an AP50:95 of 43.09%, but it has the simplest model, resulting in the fastest detection speed at 103 FPS. To balance accuracy and speed for SAR ship target detection, ResNet-50 is chosen as the backbone network. It offers an AP50 of 77.61%, an AP50:95 of 47.04%, and a detection speed of 54 FPS. Therefore, subsequent model improvements and experimental analyses in this section use the CenterNet algorithm with ResNet-50 as the backbone network.

Building on the ResNet-50 backbone network, we further improved the CenterNet algorithm by incorporating new modules and structures, such as the FPN structure, DCNets, and SPP block. These enhancements were introduced to evaluate their effectiveness in optimizing the algorithm. The outcomes of these ablation studies are presented in Table IV.

TABLE IV Ablation Experiments on the Performance of the Improved Method of CenterNet

It demonstrates that the feature fusion network with the added FPN structure achieves higher detection accuracy compared to the initial network, with an AP@0.5 of 78.19% and an AP50:95 of 48.50%, reflecting improvements of 0.58% and 1.46%, respectively. Replacing the deconvolution layers with DCNets allows the network to adaptively adjust convolutional kernel sizes, further enhancing the model's detection performance. This modification achieves an AP50 of 78.93% and an AP@0.5:0.95 of 49.12%, representing improvements of 1.32% and 2.08% compared to the initial network. Adding the SPP block further refines the extracted feature information, achieving an AP50 of 78.62% and an AP50:95 of 48.53%, showing enhancements of 1.01% and 1.49%. Therefore, these proposed improvements can boost the algorithm's detection performance. When all the improvements are integrated into the original CenterNet algorithm, the highest detection accuracy is achieved, with an AP50 of 80.52% and an AP50:95 of 49.32%, which are increases of 2.91% and 2.28%, respectively, compared to the initial network. In summary, the proposed enhancements improve the effectiveness of the CenterNet algorithm for SAR ship detection, leading to better detection accuracy and generalization performance.

2) Effect of BiFormer

To enrich the network's ability to recognize ship target features, the BiFormer attention mechanism was integrated at the end of the downsampling process. The effectiveness and superiority of BiFormer attention were rigorously evaluated through comparative experiments. This study compared three attention mechanisms: the Channel and Spatial Attention Module (CBAM) [34], Squeeze-and-Excitation (SE) attention [35], and Coordinate Attention (Coord ATT) [36], focusing on their direction and position sensitivity. The comparative experimental findings are detailed in Table V.

TABLE V Comparative Experiments on the Performance of Different Attention Mechanisms

It indicate that adding CBAM attention did not enhance the detection accuracy of network. In contrast, the other three attention mechanisms demonstrated improved detection performance. Notably, the BiFormer attention mechanism achieved the best performance, with a precision (P) of 90.37%, recall (R) of 73.12%, AP50 of 81.36%, and AP50:95 of 51.54%. These values represent improvements of 1.94%, 3.8%, 0.84%, and 2.22%, respectively, compared to the initial network. Fig. 5 illustrates the P-R curves for the different experimental improvements.

Fig. 5.

P-R curves of different improvement experiments.

Show All

3) Effect of Loss Functions

To further enhance the convergence speed and detection accuracy of the algorithm, we proposed improvements to the heatmap loss, center point offset loss, and target width-height loss. The ablation experiments related to the loss functions are illustrated in Table VI.

TABLE VI Ablation Experiment on Improving the Performance of Loss Function

The results show that the initial network's detection performance was the lowest, with an AP50 of 81.36% and an AP50:95 of 51.54%. Replacing the L1 loss with Smooth L1 loss improved the model's detection accuracy, achieving an AP50 of 81.79%, which is a 0.43% increase over the initial network. However, the AP50:95 decreased to 51.16%, a drop of 0.38%, due to a decline in accuracy caused by a few challenging samples. The use of the improved Focal Loss significantly enhanced the model's detection accuracy, with an AP50 of 82.36% and an AP50:95 of 52.09%, representing increases of 1% and 0.55%, respectively, over the initial network. Ultimately, the improved model achieved an AP50 of 82.87% and an AP50:95 of 52.59%, representing increases of 1.51% and 1.05%, respectively, over the initial network.

The loss curves for different loss functions are depicted in Fig. 6. The black curve refers to the initial network, the blue curve accounts for the network with added Smooth L1 loss, the red curve represents the network with added improved Focal Loss, and the green curve refers to the loss of the final improved model. The results indicate that the final loss value of the initial network is the highest, at 82.58%. The networks with added Smooth L1 loss and improved Focal Loss experienced a decrease in the final loss value, to 60.32% and 65.68%, respectively, representing reductions of 22.26% and 16.90% compared to the initial model's loss. The loss curve of the final improved model is located at the bottom, with the lowest loss value of 40.01%, representing a decrease of 42.57% compared to the initial model's loss.

Fig. 6.

Loss curves for different loss functions.

Show All

4) Comparison With Other Ship Detection Models

To evaluate the method's effectiveness, performance comparison experiments were conducted using the HRSID and SSDD datasets under the same experimental conditions and parameter settings. The participating algorithms included Faster R-CNN [37], SSD [38], and YOLOv7-tiny [39]. The performance comparison of different models on the HRSID and SSDD datasets is presented in Table VII.

TABLE VII Performance Comparison Experiments of Different Models on HRSID and SSDD Datasets

On the HRSID dataset, the Faster R-CNN algorithm exhibited the lowest detection performance, with the lowest values across all three evaluation metrics: AP50, AP50:95, and FPS. In contrast, the proposed improved CenterNet algorithm achieved the highest AP50 and AP50:95 values, reaching 82.87% and 52.59%, respectively. These values represent improvements of 5.26% and 5.55% over the initial CenterNet algorithm.

Similarly, on the SSDD dataset, Faster R-CNN again performed the worst in terms of AP50, AP50:95, and FPS. Our improved method, however, achieved the highest detection accuracy with AP50 and AP50:95 values of 94.25% and 59.34%, respectively, marking improvements of 4.04% and 6.26% over the original CenterNet algorithm.

Despite these accuracy improvements, the FPS of the improved CenterNet decreased by 5 FPS due to the more complex feature extraction structure and additional model parameters. To address the tradeoff between accuracy and speed, we introduce the accuracy-speed tradeoff ratio (ASTR) in Table VII. This metric, calculated as the ratio of AP toFPS, helps to balance detection accuracy with processing speed.

For the HRSID dataset, the ASTR for CenterNet is 1.44, while our proposed method achieves a higher value of 1.69. This indicates that our method provides a better trade-off between accuracy and speed compared to CenterNet. Similarly, on the SSDD dataset, CenterNets ASTR is 1.67, whereas our method improves to 1.92, reflecting a more favorable balance between accuracy and real-time processing capabilities.

We have chosen not to include ASTR values for other models such as Faster R-CNN, YOLOv7-tiny, and SSD in this table. This decision is made to focus on the comparative performance between CenterNet and our method, as these two models are directly relevant to assessing the improvements we have made.

In summary, the improved CenterNet algorithm demonstrated the best detection performance on different SAR ship datasets. This was primarily attributed to the robust feature extraction network and efficient design of the loss function, resulting in significant improvements in detection performance and generalization capacity. Therefore, the proposed method is more suitable for real-time detection of SAR image target detection.

Fig. 7 presents the detection outcomes of the improved CenterNet algorithm on the SSDD dataset. Two notable SAR images were chosen for illustration. In Fig. 7(a), the ground truth (GT) annotations are displayed, marking the actual locations of the ship targets. Similarly, Fig. 7(b) showcases another GT scenario with true ship positions. Fig. 7(c) and (d) exhibits the detection results using the original CenterNet algorithm, while Fig. 7(e) and (f) illustrates the improved algorithm's performance. In the GT in Fig. 7(a), eight coastal ship targets are present, including four small ones. The original CenterNet successfully identified all eight ships but resulted in two false positives (highlighted with orange circles), one being a coastal nonship object and the other a land-based structure resembling a ship. Conversely, the improved CenterNet detected all ship targets without any false positives or omissions. In the GT in Fig. 7(b), eight coastal ship targets, including three small ones, are depicted. The original CenterNet algorithm detected seven of these targets, missing a small coastal ship (marked with an orange triangle) and generating two false positives near the coast. In comparison, the improved CenterNet algorithms results showed no false alarms, although there was one missed detection due to the target's similarity to the background.

Fig. 7.

Revised comparison chart: Performance of Improved CenterNet and other models on the HRSID dataset. (a) Original label image 1. (b) Original label image 2. (c) CenterNet visualization results of image 1. (d) CenterNet visualization results of image 2. (e) Improved CenterNet visualization results of image 1. (f) Improved CenterNet visualization results of image 2. The red boxes represent the ship targets, the green boxes indicate the detected targets, the orange circles denote the false detection targets, and the orange triangles mark the missed detection targets.

Show All

Fig. 8 illustrates the detection performance on the SSDD dataset using the improved CenterNet algorithm. Two distinct SAR images were selected for evaluation. Fig. 8(a) and (b) presents the GT annotations for ship targets. Detection results from the original CenterNet are shown in Fig. 8(c) and (d), whereas the enhanced algorithm's outcomes are depicted in Fig. 8(e) and (f). In the GT in Fig. 8(a), five coastal ship targets, including three small ones, are annotated. The original CenterNet identified all five ships but also generated two false positives, one being a nonship coastal object and another a high-scattering sea object. The improved CenterNet, however, detected all ships without any false positives or missed targets. In the GT in Fig. 8(b), seven coastal ship targets, including two small ones, are present. The original CenterNet detected all seven ships but produced two false positives: A nonship coastal object and a land-based building. In contrast, the improved CenterNet algorithm identified all ship targets accurately, with no false positives or missed detections.

Fig. 8.

Revised comparison chart: Performance of improved CenterNet and other models on the SSDD dataset. (a) Original label image 1, (b) original label image 2, (c) CenterNet visualization results of image 1, (d) CenterNet visualization results of image 2, (e) improved CenterNet visualization results of image 1, and (f) improved CenterNet visualization results of image 2. The red boxes represent the ship targets, the green boxes indicate the detected targets, the orange circles denote the false detection targets, and the orange triangles mark the missed detection targets.

Show All

In conclusion, the detection performance of the improved CenterNet algorithm is superior to the original method on both SAR ship datasets. The improved algorithm exhibits higher detection accuracy and fewer false alarms and missed detections, making it more suitable for real-time SAR ship target detection, which provides effective guidance for predicting the navigational intent of ships. Its anchor-free design simplifies the detection process and reduces computational requirements, allowing for efficient deployment on edge devices with limited resources. In addition, the integration of advanced attention mechanisms enhances feature extraction capabilities, further boosting detection performance. Despite its advantages, the improved CenterNet algorithm has certain limitations. In scenarios with very dense or overlapping targets, the detection performance may degrade due to the reliance on center points for localization. In addition, the current model may struggle with unified data platform in high-resolution images. Moreover, deploying this method on resource-constrained platforms, such as those discussed by Ma et al. [40], presents challenges related to computational and memory limitations. Future work will focus on addressing these limitations to further enhance the algorithm's robustness and applicability.

SECTION IV.

Conclusion

This article presents an improved CenterNet algorithm for SAR ship target detection aimed at predicting navigational intent. This anchor-free, real-time object detection method achieves faster detection speeds by employing the ResNet50 network as its backbone, along with the SPP module, BiFormer attention mechanism, and DCNets. These enhancements improve feature extraction and output more informative feature maps. The algorithm also optimizes the heatmap Focal Loss and replaces the L1 loss with Smooth L1 loss for width-height and center point offset losses, boosting detection accuracy and convergence speed. Performance comparisons using the HRSID and SSDD datasets demonstrate that the improved CenterNet method achieves AP50 values of 82.87% and 94.25%, respectively, along with a detection speed of 49 FPS on both datasets, surpassing other notable SAR image ship detection methods in overall effectiveness. Future work will focus on integrating data from various sources to build a unified data platform and continuously updating spatio-temporal sequence prediction models using real-time detection data for dynamic and adaptive improvements.

Wednesday, November 20, 2024

A Real-Time SAR Ship Detection Method Based on Improved CenterNet for Navigational Intent Prediction | IEEE Journals & Magazine | IEEE Xplore

Figure 1. Improved Network ArchitectureChinese Research Team Achieves Breakthrough in Real-Time Ship Detection Using Advanced AI

Summary

Major Developments in Detail

Figures and Tables

Artifacts

Datasets Used

HRSID Dataset

Official-SSDD Dataset

Code and Implementation

Model Implementation

Validation Materials

Gaps in Artifact Availability

Recommendations for Independent Validation

Contact Information

A Real-Time SAR Ship Detection Method Based on Improved CenterNet for Navigational Intent Prediction | IEEE Journals & Magazine | IEEE Xplore

Abstract:

ISSN Information:

Funding Agency:

Introduction

Method

A. Network Structure

1) Preprocessing

2) Backbone

3) Head

B. Components

1) SPP Module

2) BiFormer and DcNets

3) Loss Function

Experiment and Results

A. Experimental Environment

B. Datasets

1) HRSID Dataset

2) SSDD Dataset

C. Performance Metrics

D. Experiment Comparison and Analysis

1) Effect of CenterNet

2) Effect of BiFormer

3) Effect of Loss Functions

4) Comparison With Other Ship Detection Models

Conclusion

No comments:

Post a Comment

Explainer | China is making rapid gains in space tech. Here’s how the military could use it | South China Morning Post

Figure 1. Improved Network Architecture

Chinese Research Team Achieves Breakthrough in Real-Time Ship Detection Using Advanced AI