Block diagram of underwater acoustic signal preprocessing. |
Comparing the processing of traditional convolution and depthwise separable convolution |
Underwater Acoustic Target Recognition in Passive Sonar Using Spectrogram and Modified MobileNet Network Classifier: Underwater Acoustic Target Recognition in Passive Sonar Using Spectrogram and Modified MobileNet Network Classifier
Akbarian, Hassan; Sedaaghi, Mohammad Hossein (2023). Underwater Acoustic Target Recognition in Passive Sonar Using Spectrogram and Modified MobileNet Network Classifier. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.23915418.v2
In this article, to achieve reliable results by #deeplearning methods, we collected the raw #acousticsignals received by the #hydrophones in the relevant database with the label of each class, and we performed the necessary pre-processing on them so that they become a stationary signal and finally provided them to the spectrogram system. Next, by using short-term frequency transformation (STFT), the spectrogram of high resonance components is obtained and used as the input of the modified MobileNet classifier for model training and evaluation. The simulation results with the Python program indicate that the suggested technique can reach a classification accuracy of 97.37% and a validation loss of less than 3%.
Here is a summary of the key points from the paper:
- The paper proposes a method for underwater acoustic target recognition (UATR) using spectrograms and a modified MobileNet neural network classifier.
- Raw acoustic signals from underwater targets like ships are collected by hydrophones. These signals are pre-processed by filtering, noise removal, resampling etc to make them stationary.
- Short-time Fourier transform (STFT) is applied on the processed signals to generate a spectrogram showing the frequency content over time. The spectrogram images are used as input to the classifier.
- A modified MobileNet CNN architecture is designed by using depthwise separable convolutions to reduce computational complexity while maintaining good accuracy. Additional layers are removed at the end to further reduce computations.
- The model is trained on spectrogram images from the ShipsEar dataset containing 5 classes - 4 ship types and background noise. 70% data used for training, 20% validation and 10% testing.
- The model achieves 97.37% test accuracy and validation loss under 3% after training for 50 epochs. This outperforms standard CNN, VGG, ResNet and LeNet models.
- The use of spectrogram images and an efficient CNN architecture allows accurate ship classification from acoustic signals in a computationally efficient manner compared to other methods.
The paper does not specifically mention handling transient signals from the underwater targets. However, here are a few relevant points:
- The acoustic signals are split into small 4-second segments during preprocessing. This helps deal with any transients by localizing them into smaller time segments.
- STFT is used to generate the spectrograms rather than wavelet transforms. STFT provides a constant frequency resolution, while wavelets provide variable frequency resolution.
- STFT is suitable for quasi-stationary signals which spectral content does not change abruptly over time. The ship acoustic signals likely match this profile reasonably well.
- Wavelet transforms are more suitable for analyzing fast transient signals where the frequency content changes rapidly. The paper does not indicate the presence of such transients in the ship data.
- Using small fixed windows with STFT helps limit the effect of any transients to local regions in time. Wavelets may provide better localization but were not explored in this work.
In summary, the paper does not directly address handling of transients but the preprocessing and use of STFT likely accommodates transients to some extent without the need for wavelets. Exploring wavelets could be an area of future work to better analyze transients.
Here are some key details about the ShipsEar dataset used in this paper:
- It contains acoustic recordings of 90 samples from 11 different ship and boat types, as well as background noise recordings.
- The samples are recorded using a hydrophone with sensitivity of 193.5 dB at 1V/μPa and a smooth frequency response from 1 Hz to 28 kHz.
- The hydrophone position was chosen to get high quality recordings of the target ship while minimizing interference from other ships.
- The 11 vessel types are grouped into 4 experimental classes based on size - Class 1 (fishing boats, tugboats etc), Class 2 (motorboats, pilot boats), Class 3 (passenger ferries), Class 4 (ocean liners).
- There is a 5th class consisting of background noise recordings without any ships.
- The dataset has a sampling rate of 52.734 kHz originally. The paper resamples it to 26.367 kHz for faster processing.
- The dataset contains labeled recordings, so it can be used for supervised training of machine learning models.
- A total of 5671 spectrogram images of size 224x224x3 are extracted from the resampled dataset for training/validation/testing of the models.
In summary, the ShipsEar dataset provides a collection of labeled real-world acoustic recordings of different ship types and background noise for developing ship classification systems. The paper preprocesses it and extracts spectrogram images for input to their deep learning model.
The paper does not provide the detailed quantitative performance metrics like accuracy, precision etc. for the standard CNN, VGG, ResNet and LeNet models.
However, it does compare the classification accuracy of these models when trained on the ShipsEar spectrogram images:
- LeNet model achieves 70% classification accuracy.
- VGG model achieves 78% accuracy.
- Standard CNN model achieves 87% accuracy.
- ResNet is not explicitly evaluated but its accuracy is mentioned to be lower than the proposed MobileNet model.
- The proposed MobileNet model achieves the best accuracy of 97.37% on the test set.
In addition, the MobileNet model has much lower computational complexity than VGG, ResNet and standard CNN models, as it uses only 2.2 million parameters versus 20-36 million for the other models.
So in summary, the standard deep learning models have significantly lower accuracy compared to the proposed efficient MobileNet model for this ship classification task when tested on the ShipsEar dataset spectrograms. The paper emphasizes achieving a good trade-off between accuracy and computational complexity.