Fig. 1. (a), (b) and (e)–(h) correspond to images with significant variations in target size and large aspect ratios, where (g) and (h) depict images with complex target environments, dense target presence, and severe background interference, while (c) and (d) illustrate images with low resolution and low signal-to-noise ratio. |
Fig. 2. Architecture of the proposed MPDNet. |
Multilevel Pyramid Feature Extraction and Task Decoupling Network for SAR Ship Detection | IEEE Journals & Magazine | IEEE Xplore
Synthetic aperture radar (SAR) target detection plays a crucial role in both military and civilian fields, attracting significant attention from researchers globally. CenterNet, a single-stage target detection method, is known for its high detection speed and accuracy by eliminating anchor-related calculations and nonmaximum suppression. However, directly applying CenterNet to SAR ship detection poses challenges due to the distinctive characteristics of SAR images, including lower resolution, lower signal-to-noise ratio, and larger ship aspect ratios.
To address these challenges, we propose MPDNet. which introduces a multilevel pyramid feature extraction module (MP-FEM) to replace the encoding–decoding structure in CenterNet. MP-FEM employs multilevel pyramid and channel compression to fuse multiscale SAR image features and acquire deep features quickly.
Second, we propose the convolution channel attention module, which improves the multilayer perceptron in the common pooling attention mechanism into a multistage and 1-D convolution. Therefore, the feature extraction capability of MP-FEM is further refined.
Furthermore, we propose the detection task decoupling module (DTDM), which considers the characteristics of SAR ships and effectively detects smaller targets of different sizes, distinguishing the centers and sizes of densely arranged ships. DTDM extracts task-related features from the original feature map before inputting it into the three detection headers, thereby addressing the problem of task coupling in CenterNet's detection header module for SAR ship detection.
Finally, the experimental results on SSDD dataset and SAR-ship-dataset show that the proposed network can significantly improve the SAR target detection accuracy.
SECTION I. Introduction
Synthetic aperture radar (SAR), due to its unique imaging mechanism, enables data acquisition under all-weather and all-day conditions, unaffected by factors, such as weather and lighting [1], [2], [3]. Therefore, target detection algorithms based on SAR images find extensive applications in military fields, such as situation analysis and strategic defense, as well as in civilian fields, including marine monitoring, maritime search and rescue, and disaster monitoring [4]. Numerous scholars worldwide have conducted research on target detection methods based on SAR images [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]. These algorithms improve the YOLO [9], [10], [11] and CenterNet [12], [13] networks on RGB images that are adopted for SAR image target detection [4], [5], [6], [7]. However, due to the fundamental differences in imaging principles, shooting angles, and shooting distances between SAR images and conventional optical images, the research on SAR image target detection presents numerous challenges beyond those in typical target detection. Zhang et al. [8] proposed a miniaturized plug-and-play module to select target areas from SAR images and filter out large areas of ocean and coastal backgrounds with minimal computation. Qu et al. [15] introduced transformer encoding and mask guidance modules to address issues in traditional methods, effectively learning dependencies between ship targets and reducing false alarms from complex backgrounds. Ma et al. [16], through the design of an anchor-free framework, key-point estimation module, and channel attention module, successfully alleviated challenges in detecting multiscale and dense ship targets in SAR images. Fig. 1. illustrates typical SAR images with significant challenges in target detection.
Fig. 1. (a), (b) and (e)–(h) correspond to images with significant variations in target size and large aspect ratios, where (g) and (h) depict images with complex target environments, dense target presence, and severe background interference, while (c) and (d) illustrate images with low resolution and low signal-to-noise ratio. |
SAR images exhibit a vast range of target sizes and substantial variations in aspect ratios, as evidenced by the contrasting example image sets in Fig. 1(a), (b), and (e)–(h).
SAR images feature complex target environments with background interference from suspected targets and dense target arrangements. Examples include ships near the coastline and coastal structures, ships at sea, and small islands, as shown in Fig. 1(g) and (h).
SAR images suffer from low resolution and low signal-to-noise ratios, as evident in Fig. 1(c) and (d). Therefore, conventional target detection algorithms cannot be directly applied to ship detection in SAR images, highlighting the significance of developing target detection networks tailored to SAR image characteristics.
CenterNet, known for its low model complexity, fast inference speed, and capacity to extract distinct features for detecting large objects, holds great promise for SAR ship detection. Based on the characteristics of SAR images, this article proposes multilevel pyramid feature extraction and task decoupling network (MPDNet), which is a single-stage and anchor-free ship detection network. MPDNet can effectively detect SAR ships with large aspect ratio and dense arrangement in SAR images featured by low resolution and low signal-to-noise ratio. The main contributions of this article are as follows.
A new ship detection network in SAR images (MPDNet) is proposed. The MPDNet consists of multilevel pyramid feature extraction module (MP-FEM), detection task decoupling module (DTDM) and detection header module (DHM). MP-FEM combined with convolution channel attention module (Conv-CAM) can effectively extract SAR image features with low resolution and low signal-to-noise ratio. DTDM and DHM are used to accurately detect ships with large aspect ratio and dense arrangement.
Considering SAR image is characterized by low resolution, low signal-to-noise ratio, and large aspect ratio of ships, MP-FEM is introduced to extract SAR image features with strong representational power. The MP-FEM is composed of the residual module (RM) and the multilevel pyramid channel compression module (MP-CCM).
Taking the lack of selectivity during the channel compression of MP-CCM into account, we design Conv-CAM to improve the feature extraction ability of MP-FEM.
DTDM is put forward to accurately detect ships with large aspect ratio and dense arrangement in SAR images. DTDM decouples the target size, center point, and class prediction tasks, respectively, thus effectively improving the detection accuracy of ships.
SECTION II. MPDNet Network
A. Structure of MPDNet
MPDNet mainly consists of three parts: MP-FEM, DTDM, and DHM. The overall network structure is shown in Fig. 2.
The MP-FEM is composed of the RM and the MP-CCM, as shown in the gray box in Fig. 2. First, the input SAR image feature
DTDM involves size path, center path, and class path, as shown in the blue box in Fig. 2. These three paths, respectively, carry on the shunt adaptive optimization processing to the feature map
The DHM consists of size header, center header, and class header, as shown in the orange box in Fig. 2.
Unlike in CenterNet, where the header module has only one input,
MPDNet's header has three feature maps inputs. The size header takes the
output
B. Multilevel Pyramid Feature Extraction Module
Considering
SAR images with low resolution, low signal-to-noise ratio and ships
with large aspect ratio, MP-FEM is proposed to improve feature
representational ability by extracting and fusing multiscale features.
The MP-FEM consists of the RM and the MP-CCM. Among them, RM contains a 7
× 7 convolution and a ResNet residual block [17]. The detailed network structure is shown in Fig. 3. RM carries out the feature extraction of input SAR image
In order to further extract, enhance and compress features, we design an MP-CCM, whose network structure is shown in Fig. 4. MP-CCM takes the output feature map of RM as input, and outputs the feature map after three feature pyramid networks blocks (FPN Blocks) [18]. In addition to feature extraction and enhancement, MP-CCM also completes the feature map compression. That is to say, the channel of the feature map is compressed by half after each FPN Block. Through the compression of three FPN Blocks, the feature map with 256 channels is compressed into with 64 channels. The processing of feature map by FPN Block consists of five parts: feature extraction from bottom up, feature extraction from top down, lateral connection operation, additive fusion feature, and channel attention emphasis feature. To begin with, the feature extraction capability of FPN Block is mainly realized by its bottom-up operation. One block contains two bottom-up operations, each of which doubles the number of channels of the feature map while reduces the length and width of the feature map to half, as described in the following:
Then,
the top-down and lateral connection operation in FPN blocks fuse
high-level semantic features with high-resolution spatial features to
achieve feature map enhancement. Among them, the topmost feature map
Finally, FPN Block completes the aggregation and fusion of various scale information. Specifically, the three feature maps
It
is worth noting that the first FPN Block of MP-CCM does not carry out
channel compression, which ensures that the network will not be
compressed until sufficient features are extracted. As a result, it is
beneficial to avoid inadequate feature extraction. The feature map
C. Convolution Channel Attention Module
In order to selectively extract features, emphasize useful features, and eliminate interfering features when MP-CCM compresses channels, Conv-CAM is put forward, whose structure is shown in Fig. 5.
First, Conv-CAM conducts mean-pooling and max-pooling operations on the input feature map
Second,
Conv-CAM replaces the MPL layer in the channel attention module of CBAM
with two 1-D convolutions. This change not only reduces the number of
parameters and the amount of computation, but also makes full use of the
prior knowledge of relevant information of adjacent channels.
Therefore, it is more conducive to mining important feature information.
After two convolution operations, Conv-CAM maps
Finally, Conv-CAM multiplies the weight map
D. Detection Task Decoupling Module
SAR ships are characterized by small size, dense arrangement and large aspect ratio, so it is required that the target detection network can effectively detect the smaller targets of different sizes and distinguish the center and size of the densely arranged ships. However, the input feature maps of the size header, center header, and class header in CenterNet are the same. And the DHM only performs simple convolution operation with a convolution kernel of 3 before the output. As a result, the DHM has serious task coupling, and the headers of different tasks have a great influence on each other during parameter updating. Therefore, it is not conducive to the regression convergence of each task and it reduces the detection accuracy of the targets, especially of the small targets. This article holds that although the prediction of these three tasks is regression calculation, there are great differences among them for they belonging to different types of regression tasks. Extracting corresponding features for different tasks can effectively improve the detection accuracy of each task.
Therefore, we introduce detection task decoupling model (DTDM). Before the feature map is input into three detection headers, task-related features will be further extracted to send to the size header, center header, and class header, respectively, to achieve task decoupling.
As shown in the blue boxes in Fig. 2, the DTDM has three paths that modify, align, and optimize the input feature maps of the size header, center header, and class header according to the different prediction tasks. Fig. 6 expresses the specific structure of the three paths. Block A is center path, block B the size path, block C class path, and block D is the concrete structure display of the modules used in the first three network structures. There are similarities and differences among the three detection tasks. Accordingly, the three paths of DTDM share similarities and differences in structural design. In terms of common ground, the three paths all use multistream structure, residual connections and feature fusion mechanism. First, DTDM enhances the feature map by distinguishing different angles and receptive fields through multistream structures. Then, the multistream structure is aggregated and fused through the channel dimension concatenation and fusion, spatial pixel dimension addition and fusion, and residual jump connection. Finally, the comprehensive and refined feature map is output. In regard to the difference, these three paths do corresponding decoupling design according to the characteristics of their specific tasks.
Center path: For the offset distance predication task of the target center point, the size, span, offset sensitivity, and the size of the receptive field of the targets to be detected are different. That is to say, the size of the convolution kernel used to extract the feature should also be different. Therefore, the scheme of multiscale convolution kernel is adopted, as shown in block A of Fig. 6.
First of all, in order to reduce the amount of computation, it is necessary to downsample the channel of input feature map
F to obtain the lightweight feature mapFcp for center path. It is shown in the following:View SourcewhereFctp=fchn_down(F)(10) fchn_down(⋅) is a channel downsampling function formed by a 2-D convolution with the kernel size of 1.Then, center path uses convolution kernels with sizes of 1, 3, and 5 in the multistream structure to shunt, and obtains three shunt feature maps
Fctp_s1 ,Fctp_s2 , andFctp_s3 , which has small, medium, and large receptive fields, respectively, as expressed in the following:View Sourcewhere⎧⎩⎨⎪⎪Fctp_s1=conv1×1(Fctp)Fctp_s2=conv3×3(Fctp)Fctp_s3=conv5×5(Fctp)(11) convk×k(⋅) denotes the combination of the convolutional layer, BN layer, and ReLu activation layer, and its subscriptk denotes the size of the convolutional kernel.Next, the shunt structure is aggregated through the concatenation of channel dimensions, and the multistream aggregation feature map
Fctp_ms is obtained, which has a wide receptive field, as shown in the following:View SourcewhereFctp_ms=fchn_adjust(concat(Fctp_s1,Fctp_s2,Fctp_s3,Fctp))(12) fchn_adjust(⋅) is the channel adjustment function andconcat(⋅) is the channel dimension concatenation function.Finally, the original lightweight feature map
Fctp is enhanced by the attention mechanism. Besides, it is aggregated and fused with the multistream aggregation feature mapFctp_ms .Foffset is output, as shown in the following:View SourcewhereFoffset=ffuse(Fctp_ms+fam(Fctp))(13) ffuse(⋅) is the feature alignment fusion function composed of 3 × 3 convolution,fam(⋅) is the attention module composed of CBAM, andFoffset is the output of the center path.Size path: For the target size prediction task, the detector needs to obtain the target boundary information, whose specific structure is shown in block B of Fig. 6. First, like center path, size path downsamples the channel dimension of input feature map
F to obtain the lightweight feature mapFsp . It is expressed in the following:View SourceSecond, the method of deformable convolution [20] is used to obtain the enhanced feature map that can adapt to targets of different shapes. The conact function is adopted to concatenate the enhanced feature map withFsp=fchn_down(F).(14) Fsp . And the convolution layer is utilized to align their fusion features. Thus, the multistream aggregation feature mapFsp_ms of Size Path is obtained. It is shown in the following:View SourcewhereFsp_ms=conv3×3(concat(dfconv3×3(Fsp),Fsp))(15) dfconvk×k(⋅) denotes deformable convolution functions, and its subscriptk denotes the size of the convolution kernel.Finally, the lightweight feature map
Fsp is enhanced by the attention mechanism and aggregated as well as fused withFcp_ms , as shown in the following:View SourcewhereFwh=ffuse(Fsp_ms+fam(Fsp))(16) Fwh is the final output feature map of size path.Class path: For the target class prediction task, the detector needs to obtain the texture and contour of the target and other specific details, whose specific structure is shown in the block C in Fig. 6. The reasoning process of class path is similar to that of center path.
First, the class path undersamples the channel dimension of the input feature map
F to obtain the lightweight feature mapFclsp . It is expressed in the following:View SourceSecond, dilated convolution [21] with a dilated rate of 0, 1, and 2 is used to build a multistream structure of class path, which is conducive to obtaining high-frequency spatial structure information of targets, as shown in the following:Fclsp=fchn_down(F).(17) View Sourcewhere⎧⎩⎨⎪⎪Fclsp_s1=dlconv0(Fclsp)Fclsp_s2=dlconv1(Fclsp)Fclsp_s3=dlconv2(Fclsp)(18) dlconvr(⋅) represents the dilated convolution function, and its subscriptr represents the dilated rate. Then, the multistream aggregation feature mapFclsp_ms is obtained by aggregating the shunt structures. It is expressed in the following:View SourceFinally, the feature mapFclsp_ms=fchn_adjust(concat(Fclsp_s1,Fclsp_s2,Fclsp_s3,Fclsp)).(19) Fclsp emphasized by the attention mechanism is aggregated and fused withFclsp_ms , as shown in the following:View SourcewhereFcls=ffuse(Fclsp_ms+fam(Fclsp))(20) Fcls is the final output feature map of the class path.
SECTION III. Experiment Results and Analysis
A. Datasets and Evaluation Indicators
Considering the diversity and universality of the dataset, SSDD dataset [22] is selected to test the effect of our proposed algorithm. The characteristics of SSDD dataset are as follows.
Abundant data sources. The images in SSDD dataset are from RadarSat2, TerraSARX, and Sentinel-1 data sources. It contains 1160 images and 2456 ships in total, meaning that each image contains 2.12 ships on average. The specific distribution is shown in Fig. 7.
Large resolution span and large size span. Image resolution varies from
1 to15m , with the smallest object having 7 × 2 pixels and the largest object having 368 × 69 pixels.Diverse target background and distribution. Such as: near shore, far shore, multiship dense arrangement, multiship sparse distribution, and so on.
Wide versatility. At present, lots of relevant studies are carried out on SSDD dataset to verify the validity of the different proposed models. Mao et al. [23] verified the effectiveness of the advanced algorithms on SSDD dataset. Our experiments will be based on their findings.
Referring to the work of Mao et al. [23], in this article, images with file names ending in numbers 1 and 9 are used as the test set, while the remaining images are used as the training set. We get a test set of 232 images and a training set of 928 images.
This article adopts the same evaluation indicator as Mao et al.: MS COCO evaluation matrix [24].
In order to verify the effectiveness of the proposed method in
detecting multiscale targets, especially to verify the ability of the
MP-FEM to extract multiscale features, four indicators from the MS COCO
evaluation matrix,
B. Setting
In
this article, pytorch framework is adopted to implement the proposed
algorithm. The version of torch is 1.10.1+cu111 and torchvision is
0.11.2+cu111. The program is trained and tested on a 64-bit Linux system
and accelerated using a 24 G GeForce RTX 3090 GPU. In the training
process, the minimum value of learning rate is
C. Performance Evaluation
1) Comparison Experiments Between MPDNet and CenterNet
We conduct a comparison experiment between MPDNet and CenterNet to verity the effectiveness of our proposed algorithm. The experimental results are shown in Table I, where the bold column of Method represents the method in this article, and each bold indicator is the optimal result in the experiment.
As shown in Table I, MPDNet greatly improves in all indicators compared with CenterNet. The most important indicator
Fig. 8 shows the detection visualization results in the nearshore and densely arranged scene of the targets. There are three rows (a), (b), and (c) from top to bottom, showing three representative images, respectively, and three columns from left to right, representing detection results of ground-truth, CenterNet, and MPDNet. Ground-truth uses green boxes to mark the target location. CenterNet and MPDNet use red boxes to mark the target location, and the upper left corner of the red box shows the target confidence, ranging from 0 to 1. When CenterNet is dealing with nearshore and densely arranged targets, its detection results produce false alarms due to the interaction between suspected target buildings on shore and target detection. The three ships on the left in Fig. 8(a) and (c) are detected as four ships by CenterNet, while MPDNet can accurately detect three ships. In addition, when the size of nearshore and densely arranged targets is small, the detection results of CenterNet is missed due to the lack of feature extraction capability of backbone. In Fig. 8(b), CenterNet misses the middle two targets, while MPDNet could accurately detect the five targets.
Fig. 9 shows the visualization results of detection under the multiscale target scene. Fig. 9(a) contains a large number of small targets, and (b) contains two large targets, both of which jointly verify the detection results of MPDNet for multiscale targets. Due to CenterNet's weak feature extraction ability and lack of feature enhancement process, it is difficult to deal with multiscale scenes, leading to false alarm detection. CenterNet detects false alarms in the upper right corner of the target in Fig. 9(a) and false alarms in the upper left corner of the target in Fig. 9(b). In contrast, our proposed method handles those problems well. But MPDNet also has defects. Because of the influence of suspected target buildings on shore, the detection results of MPDNet also have false alarms, as shown in the lower right corner of the MPDNet detection results in Fig. 9(b).
2) Comparison Experiments Between MPDNet and Other Single-Stage Methods
MPDNet is a single-stage target detection network. We conduct comparison experiments between MPDNet and other representative single-stage methods,. The results are shown in Table II. The comparison methods are: FCOS [27], SSD [28], YOLOv3 [29], YOLOv7 [30], RetinaNet GA [31], Reppoints Moment [32], Fovea Align [33], Deformable_DETR [34], PVT [35], and PyCenterNet [36]. PyCenterNet, proposed by Duan et al., is an enhanced bottom-up CenterNet variant that detects each object as a triplet of keypoints, enabling it to locate objects with arbitrary geometries and perceive global information within objects.
As shown in Table II,
in terms of the detection of small targets, although MPDNet has a great
improvement compared with CenterNet, CenterNet and MPDNet are slightly
less effective than other algorithms. Lin et al.’s method, Fovea Align,
gets the best results in
Meanwhile, MPDNet achieves 95% in
In addition, for medium and large targets detection, MPDNet also achieves the best results.
3) Comparative Experiments Between MPDNet and Other Representative Single-Stage Methods in the SAR-Ship-Datasets
To further validate the model's generalization performance, we conducted a comparative analysis of MPDNet against several representative single-stage methods using the extensive SAR ship detection dataset, SAR-ship-dataset. The SAR-ship-dataset, created by researchers from the Institute of Electronics at the Chinese Academy of Sciences, is designed for deep learning-based SAR ship detection.1 This dataset comprises 102 images from the Gaofen-3 (GF-3) satellite and 108 images from the Sentinel-1 satellite, all of which have been meticulously annotated. Within this dataset, you will find 39 729 ship chips, each measuring 256 pixels, showcasing variations in scale and background. The dataset is thoughtfully partitioned into a training set and a test set, with a 4:1 ratio between them.
To verify the superiority of the MPDNet proposed in this article, we compared it to several representative approaches, including the transformer-based method Deformable_DETR, YOLOv7, the baseline method CenterNet, and other enhanced techniques, such as PyCenterNet. These methods are all anchor-free object detection approaches. The results are presented in Table III.
MPDNet achieved an outstanding result of 95.9% on the
D. Ablation Experiments
1) Verification Experiments of MP-FEM
In order to verify the effect of MP-FEM, two backbone networks, ResNet50 and ConvNeXt-S[37], are chosen for comparison. ResNet50 is the backbone adopted by CenterNet original text. ConvNetXt was put forward in 2022, which refers to the structure design and training method of transformer network. It makes a series of improvements on the basis of ResNet50. With a very small number of parameters and computation, ConvNetXt achieves better results than transformer on the ImageNet-1 K dataset. Here, we choose ConvNeXt-S, which has the same number of ResNet50 parameters, for comparison.
The experimental results demonstrating the impact of the MP-FEM enhancement module on the final outcomes are presented in Table IV.
These results unequivocally showcase the outstanding detection
capabilities of MP-FEM when applied to the SSDD dataset. Notably, the
most critical metric,
The above-mentioned experimental results show that MP-FEM has very strong feature extraction ability. Specifically, MP-FEM can mine multiscale target information from images. With MP-CCM, the extracted multiscale target feature map is extracted and compressed by multistage FPN Block, so that the final output feature map contains multiscale information. It effectively solves the detection difficulties caused by the ships with large aspect ratio and complex background in SAR images. Besides, it also addresses the problem of missing detection of small targets and false alarm of large targets caused by the backbone network of single-scale feature extraction. Therefore, the comprehensive detection effect of the network has been greatly improved.
2) Verification Experiments of Conv-CAM
Experimental results of Conv-CAM are shown in Table V.
The table reveals the impact of integrating Conv-CAM into the MP-FEM,
resulting in a significant improvement in the final detection
performance indicators. Specifically, it is observed that the inclusion
of Conv-CAM results in a 0.1% increase in
In conclusion, Conv-CAM enables MP-FEM to selectively screen and fuse multiscale features in channel dimensions, and extract features of various scales more efficiently and accurately. Therefore, it makes the overall network performance more balanced and the detection effect better.
3) Verification Experiments of DTDM
The experimental results of DTDM are shown in Table VI.
In this context, CenterNet is the baseline model, and CenterNet with
DTDM refers to the task-decoupled CenterNet based on our proposed DTDM.
The effects of our proposed enhancements on the model's final
quantitative results are demonstrated in Table VI,
showing notable improvements in the performance metrics of CenterNet
when influenced by the task decoupling impact of DTDM. The most crucial
metric,
The obvious performance improvement shows that the DTDM can distinguish different types of prediction tasks better. Before the feature map is input to the corresponding detection header, the feature map can be modified and optimized according to the characteristics of the prediction task. Therefore, the feature map in line with the characteristics of the task can be generated for different detection headers to maximize the role of the detection header. The proposed DTDM can effectively detect smaller targets of different sizes and distinguish center and size of densely arranged ships.
Discussion
We conducted research on target detection algorithms in SAR images, starting from the detection concept of “treating targets as points” in CenterNet, and designed the MPDNet detection network. MPDNet exhibits several advantages as follows.
MPDNet possesses robust feature extraction and enhancement capabilities. It can extract multiscale features from SAR images, effectively addressing the challenge of detecting targets with varying sizes in SAR images.
MPDNet can acquire more accurate target features. In addition, it considers the contextual information of target backgrounds, enabling it to handle cases where SAR targets near the coastline may be affected by suspected shore-based objects.
MPDNet offers higher resolution, allowing for precise detection of densely arranged targets. In terms of overall performance, MPDNet outperforms CenterNet and other mainstream single-stage target detection algorithms when applied to SAR images. However, MPDNet still has some limitations.
SECTION V. Conclusion
In
this article, MPDNet is proposed. It could effectively solve the
problem that the CenterNet-based model is still difficult to achieve
good results under the conditions of images with low resolution and low
signal-to-noise ratio and ships with large aspect ratio and dense
arrangement. The proposed MPDNet mainly consists of MP-FEM, Conv-CAM,
and DTDM. First, MP-FEM carries out feature extraction, enhancement and
compression of SAR images, and extracts multilevel features. It deals
with the problems that SAR image is characterized by low resolution, low
signal-to-noise ratio, and large aspect ratio. Second, Conv-CAM is
embedded into the channel compression process of MP-CCM, making the
process of refining and compression feature map more selective. Third,
DTDM decouples the target size prediction task, the target center offset
distance prediction task and the target class prediction task.
Therefore, the proposed network can effectively detect the smaller
targets of different sizes and distinguish the center and size of the
densely arranged ships. Finally, all proposed methods are experimentally
verified on SSDD dataset and are compared with other single-stage
methods. Furthermore, additional validation was conducted using the
SAR-ship-dataset. The results show that the detection performance of
MPDNet is significantly improved compared with CenterNet. MPDNet also
achieves the best results in several indicators compared with other
mainstream algorithms, among which
No comments:
Post a Comment