[2603.28141] Intelligent Road Condition Monitoring using 3D In-Air SONAR Sensing
A Comprehensive Technical Review and Contextual Analysis
University of Antwerp, CoSys-Lab, and the Broader Pavement Management Research Community
Researchers at the University of Antwerp's CoSys-Lab have demonstrated that 3D in-air SONAR sensing, specifically using the eRTIS (embedded Real-Time Imaging Sonar) 32-channel MEMS microphone array, constitutes a viable low-cost, all-weather sensor modality for opportunistic road surface monitoring when mounted on fleet vehicles. In a rigorous 10-fold cross-validated study, road material classification (asphalt, concrete, element paving) was achieved with F1 scores approaching 90% on held-out test data, while road damage detection (alligator cracking, transversal cracking, fraying, subsidence, and six additional damage types) yielded F1 scores approaching 75%. The sensing approach is uniquely immune to the optical degradation that defeats cameras and LiDAR in fog, rain, smoke, and dust—the exact conditions under which surface damage most urgently demands detection. Gradient-boosted tree classifiers consistently outperformed all other models including CNN-based architectures, and the work is directly embedded within the imec.ICON HAIROAD project, a 2023–2025 Flemish government and imec co-funded initiative to automate predictive road maintenance across one of Europe's densest municipal road networks. These results position in-air SONAR as a worthwhile complementary modality in opportunistic sensing-based pavement management systems (PMS), with further research required on beamforming algorithms, 3D energyscape representations, and geographically stratified dataset splitting.
Introduction
Pavement management systems (PMS) serve as the operational backbone through which municipal and regional authorities plan, prioritize, and execute road maintenance. The intelligence of a PMS depends entirely on the quality and frequency of road condition data fed into it. Traditionally, this data was gathered by trained workers conducting visual field inspections—a process that is labor-intensive, subjective, and provides only sparse coverage at infrequent intervals.[1]
Dedicated mapping vehicles equipped with camera arrays, LiDAR sensors, and ground-penetrating radar have partially automated and improved this process, but introduce a different set of constraints. In municipalities with large road networks, individual roads may be revisited at very low frequency since such vehicles require expensive acquisition, operation, and maintenance budgets—often prohibitively so for smaller municipalities with limited tax bases.[1,2]
The concept of opportunistic sensing has emerged as a compelling alternative paradigm. In this model, sensors are mounted on vehicles that already travel the road network for other purposes—mail delivery trucks, garbage collection vehicles, utility vans. Because these fleets inherently cover a high proportion of urban and suburban road networks at frequent intervals, they provide natural mobile sensing platforms at a fraction of the cost of dedicated survey vehicles.[2] The critical constraint introduced by this model is cost and robustness: sensors must be inexpensive enough for deployment across entire fleets, and must continue to function reliably in the same adverse conditions—heavy rain, fog, road dust, industrial smoke—during which road surfaces are most at risk.
It is precisely within this operational context that the CoSys-Lab research group at the University of Antwerp has investigated 3D in-air SONAR as a sensing modality. Their work, conducted under the imec.ICON Hybrid AI for Predictive Road Maintenance (HAIROAD) project, funded jointly by imec and the Flemish Agency for Innovation and Entrepreneurship (VLAIO, project no. HBC.2023.0170), presents a fully annotated dataset and systematic multi-model comparison for both road surface material classification and road damage type detection tasks.[Primary Paper; 32] The HAIROAD project itself runs from October 2023 through September 2025 and seeks to automate acquisition and interpretation of road condition indicators across one of the world's densest municipal road networks—Flanders, Belgium, where up to 30% of municipal budgets are devoted to mobility and road infrastructure, yet less than 5% of municipalities have formally adopted structured road condition assessment practices.[32]
Related Work and Sensing Modality Landscape
Camera-Based Road Damage Detection
Visual camera sensing remains the most mature and widely deployed technology for automated pavement condition assessment. Trinh, Anwar, and Mercelis applied road area extraction and contrastive learning to RGB images for road surface condition classification using a segmentation model to isolate road from surrounding pixels, improving classification across all damage categories.[3] Shim et al. employed a lightweight hierarchical auto-encoder network to detect the presence of surface damages from camera images, though their approach did not address damage type identification or overall surface quality assessment.[4]
The National Academies of Sciences, Engineering, and Medicine published a comprehensive 2024 synthesis report on AI Applications for Automatic Pavement Condition Evaluation, finding that 3D laser scanning data has become the predominant approach for automated collection due to its ability to gather depth information and its relative robustness to lighting conditions compared to 2D imaging.[3-NAS] Oklahoma State University's PaveVision3D system exemplifies this trajectory, enabling rapid collection of millimeter-level 3D pavement data across entire lane widths. Despite this maturity, all camera and LiDAR modalities share a fundamental vulnerability: optical degradation in adverse weather.
Acoustic and Vibration-Based Sensing
Li et al. demonstrated that tire noise alone can be used to classify pavement as being in normal condition, exhibiting crack damage, or showing pothole damage, achieving 88.4% overall accuracy using a random forest model combined with gradient boosting—at essentially zero additional hardware cost given that microphones are increasingly standard on commercial vehicles.[5] Dong and Li combined smartphone accelerometers with GPS data to identify surface distortions, patching, potholes, and rutting on moving vehicles using k-means clustering, reaching 84% average accuracy.[6]
A parallel line of research from the smart cities domain has directly explored acoustic-ultrasonic road sensing. In work published in Computational Urban Science, a wheel-rim-mounted module integrating a microphone with an ultrasonic depth sensor was coupled to Multi-Layer Perceptron, SVM, and Random Forest classifiers to distinguish smooth, slippery, grassy, and rough road conditions, demonstrating the feasibility of microphone-based road surface discrimination in smart city deployment scenarios.[19]
Kim et al. conducted a direct predecessor study using conventional (single-channel) ultrasonic sensors mounted at the front of vehicles, coupling short-time Fourier transform features to deep neural networks to identify road surface type across eight surface categories, achieving accuracies exceeding 95%.[9] This result is particularly informative as a benchmark: the 3D SONAR approach investigated by CoSys-Lab may be understood as a substantial multi-channel generalization of this prior ultrasonic work, potentially unlocking spatial resolution and directional information not available to single-element sensors.
Radar and Combined Modalities
Sattar, Li, and Chapman evaluated both 24 GHz RADAR and 40 kHz SONAR for classifying five road surface types, achieving up to 80% accuracy with individual modalities and 92% combined.[10] A 2025 study published in Sensors MDPI specifically examined automotive radar range spectra for road surface classification using Random Forest classifiers, achieving 84.5% generalization error under dry conditions and 88.7% distinguishing wet from dry asphalt, while also noting that no radar sensor is currently available commercially specifically for road classification purposes.[27] The 2025 IET Radar, Sonar and Navigation review by Bystrov provides a comprehensive survey of automotive microwave sensor approaches to this classification problem, noting the complementary characteristics of active radar and passive radiometry.[24]
In automotive ultrasonic object sensing—a related but distinct domain—a 2025 Journal of the Acoustical Society of America study by Eisele et al. demonstrated that replacing conventional single-element parking sensors with compact 2×2 MEMS transducer arrays and applying CNN classifiers substantially improved obstacle classification in automotive surround sensing, underscoring the broader applicability of small-aperture array ultrasonic sensing with machine learning.[14]
Drone- and UAV-Based Monitoring
A systematic literature review published through ASCE in 2025, surveying 60 articles from a pool of 619 publications between 2014 and 2022 on drone-based road condition monitoring (D-RCM), found consistent evidence of cost and time savings, safety improvements, and improved spatial coverage compared to ground-based methods.[8-ASCE] LiDAR-equipped drones have been operationally deployed by Alaska DOT, Caltrans, New York Thruway Authority, and Utah DOT for bridge and elevated corridor inspection with BVLOS (Beyond Visual Line of Sight) flights, reducing lane closures and enhancing inspector safety.[6-ALG]
While UAV-LiDAR represents a powerful high-resolution approach for targeted survey campaigns, it shares the limitation of all optically dependent modalities: performance degrades in fog, heavy precipitation, and smoke. It is also not well suited to the continuous, fleet-scale opportunistic data collection that is the principal motivation for the SONAR-based approach under review.
Sensor Technology and System Architecture
The eRTIS Platform
The eRTIS (Embedded Real-Time Imaging Sonar) is a fully embedded 32-channel 3D in-air SONAR sensor developed at the CoSys-Lab, University of Antwerp.[11,12,13,25,26] Inspired by bat echolocation, the system emits a hyperbolic broadband chirp spanning 20 kHz to 50 kHz, generated at 450 kHz sample rate with a 2.5 ms pulse duration, and records the resulting wavefield reflections across a 32-element MEMS microphone array at 4.5 MHz, producing 163,840 samples per capture. A dedicated on-board compute unit handles real-time signal processing, including PDM (Pulse Density Modulation) decoding, matched filtering, delay-and-sum beamforming across 91 directions spanning the full ±90° frontal hemisphere in both azimuth and elevation, envelope detection, and an artifact suppression stage analogous to a constant false-alarm rate (CFAR) detector. The processed output is a matrix of acoustic backscatter intensity as a function of angle and range, termed an energyscape.[Primary Paper]
The sensor architecture is modular: a high-performance microcontroller manages excitation signal generation and data acquisition, while an NVIDIA Jetson compute module handles GPU-accelerated beamforming and post-processing. An anodized aluminum enclosure with passive cooling and IP-rated connectors provides environmental protection suitable for vehicle mounting in harsh outdoor conditions.[26] The sensor is mounted on the rear of the vehicle, oriented downward and toward the road, with elevation angles sweeping from left to right from the driver's perspective and azimuth sweeping up and down—a geometry that captures the road surface immediately behind the vehicle.
Signal Processing Pipeline
Two parallel preprocessing pipelines are employed depending on the downstream classifier. For conventional machine learning models (logistic regression, SVM, random forest, gradient-boosted trees, multi-layer perceptron), a vector preprocessing pipeline applies max-pooling along the range dimension to the energyscape, flattens the resulting matrix, and applies whitened PCA reducing dimensionality to 256 principal components. For the CNN-based classifier, an image preprocessing pipeline introduces time-shift data augmentation (±45 samples at 450 kHz, corresponding to ±100 μs delays) to improve range-invariance, followed by per-energyscape normalization and random horizontal/vertical flips. Mean-subtraction across training samples is applied in both pipelines to emphasize inter-sample variation over absolute backscatter level.[Primary Paper]
Dataset Construction and Experimental Design
The dataset was collected using vehicle-mounted eRTIS sensors operating at approximately 10 Hz sample rate, simultaneously recording camera images used exclusively for labeling. Temporal synchronization was enforced with a 150 ms maximum allowed offset; samples exceeding this threshold were discarded. Labels encode both material type (asphalt, concrete, element) and up to nine damage categories (transversal crack, alligator crack, missing material, longitudinal crack, fraying, open longitudinal joint, subsidence, open transversal joint, loose stones), with multi-label capability to handle transitions between surface types and co-occurring damage types within a single acquisition.
Class filtering eliminated any damage category with fewer than 100 samples to reduce dataset imbalance. The full dataset was split into 10% held-out test set and 90% training/validation partitioned into 10 stratified folds with 90%/10% split per fold. The resulting test set contains 513 samples versus approximately 4,386 training samples per fold—a size asymmetry noted by the authors as a significant factor in the test-to-validation performance gap observed across models. Stratification was conducted on a synthetic integer derived from the one-hot encoded multi-label combination, thereby approximately preserving class distributions across splits.[Primary Paper]
A critical methodological limitation acknowledged by the authors is that dataset splitting does not respect the geographic origin of samples. Data points from physically proximate locations on the same road segment may be distributed across training, validation, and test sets—introducing potential data leakage and reducing the generalizability assessment to novel road environments. The authors recommend road-segment-based clustering as a prerequisite for future dataset construction.
- ~5,386 total samples; 513 held-out test, ~4,386 training per fold
- Labels: 3 material types × 9 damage categories (multi-label)
- 10-fold stratified cross-validation; each model trained twice with different random seeds
- Annotation source: synchronized camera images (not used for inference)
- Sampling rate: ~10 Hz; synchronization window: ±150 ms
Across logistic regression, decision tree, random forest, support vector machine (SVM), multi-layer perceptron (MLP), and gradient-boosted trees (GBT), all non-linear models substantially outperformed the linear baseline, confirming that the relationship between PCA components of acoustic energyscapes and road surface material type is inherently non-linear. The GBT model achieved the best test F1 score of 89.55% ± 0.60% and test κ of 78.10% ± 1.40%, consistent with pre-existing ultrasonic sensing literature showing above-90% accuracy for road type identification.[9, Primary Paper] Table I summarizes results.
| Model | Test κ (%) | Val κ (%) | Test F1 (%) | Val F1 (%) |
|---|---|---|---|---|
| Logistic Regression | 40.34 ± 1.22 | 48.00 ± 3.18 | 71.77 ± 0.90 | 75.73 ± 1.73 |
| Decision Tree | 76.82 ± 1.90 | 92.03 ± 0.48 | 89.03 ± 0.63 | 95.04 ± 0.47 |
| Random Forest | 76.84 ± 0.69 | 93.25 ± 1.46 | 89.44 ± 0.26 | 96.64 ± 0.54 |
| SVM | 74.67 ± 1.66 | 89.97 ± 2.59 | 88.03 ± 2.02 | 95.02 ± 1.73 |
| MLP | 77.52 ± 2.25 | 91.95 ± 2.21 | 89.12 ± 1.18 | 95.95 ± 1.08 |
| Gradient-Boosted Trees | 78.10 ± 1.40 | 93.58 ± 1.83 | 89.55 ± 0.60 | 96.79 ± 1.01 |
The damage detection task proved substantially more challenging. GBT again led all models with test F1 of 71.74% ± 1.15% and test κ of 58.00% ± 2.38%. Notably, the CNN (ResNet-variant) model performed significantly worse than all other classifiers, with test F1 of only 28.68% ± 8.03%—a result attributed to a geometric discontinuity in the energyscape image representation arising from the interleaved azimuth/elevation indexing, which violates the spatial locality assumption underlying convolutional operations. Table II presents results.
| Model | Test κ (%) | Val κ (%) | Test F1 (%) | Val F1 (%) |
|---|---|---|---|---|
| Logistic Regression | 25.88 ± 1.03 | 30.14 ± 2.35 | 49.16 ± 0.58 | 52.69 ± 1.17 |
| Decision Tree | 54.98 ± 1.51 | 65.23 ± 2.36 | 68.42 ± 0.78 | 75.76 ± 1.60 |
| Random Forest | 57.87 ± 1.46 | 65.96 ± 2.03 | 70.69 ± 0.69 | 76.40 ± 1.39 |
| SVM | 52.64 ± 1.25 | 59.90 ± 1.63 | 68.33 ± 1.01 | 73.15 ± 1.40 |
| MLP | 54.15 ± 1.82 | 63.54 ± 2.15 | 69.25 ± 0.95 | 75.83 ± 1.25 |
| Gradient-Boosted Trees | 58.00 ± 2.38 | 66.66 ± 2.66 | 71.74 ± 1.15 | 77.75 ± 1.54 |
| ResNet (CNN) | 2.57 ± 1.67 | 5.50 ± 2.18 | 28.68 ± 8.03 | 31.59 ± 8.35 |
The fundamental value proposition of in-air SONAR for opportunistic road sensing lies not in raw classification accuracy relative to best-in-class optical systems under favorable conditions, but in its resilience under adverse conditions where optical systems fail. Fog, heavy rain, snow, smoke, and road dust—precisely the environmental conditions correlated with deteriorating road surfaces and increasing damage risk—cause minimal degradation to acoustic sensing. This characteristic directly enables the core operational requirement of an opportunistic sensing system: reliable, continuous data collection regardless of weather. Contemporary AI road monitoring reviews consistently identify optical degradation under adverse weather as a primary limitation constraining camera and LiDAR-based continuous monitoring.[6-ALG]
The 90% F1 material classification result is directly in line with the Kim et al. (2021) prior result of over 95% accuracy using conventional ultrasonic sensing across eight road surface types,[9] and with the 92% combined RADAR+SONAR accuracy reported by Sattar et al.[10] These comparisons validate the approach and suggest that the 3D beamforming dimension of eRTIS—while adding spatial resolution and enabling concurrent obstacle detection—does not degrade acoustic surface characterization performance relative to simpler transducer configurations.
The 75% damage detection F1 is more difficult to contextualize because direct analogues are scarce in the literature. Camera-based damage detection typically reports higher accuracy on well-curated datasets but is sensitive to lighting, contrast, and viewing angle. The acoustic approach faces a different challenge: many surface damage types (fraying, loose stones, open joints) produce subtle backscatter differences that are at or below the detection threshold of a 10 Hz, vehicular-speed acquisition regime. The label distribution imbalance—several damage types appearing fewer than 100 times in the dataset—further constrains achievable performance on rarer classes.
The CNN (ResNet-variant) failure in damage detection is theoretically well-understood and practically important. The eRTIS energyscape is formed by concatenating rows from a 91-direction beamforming grid, but azimuth and elevation are collapsed into a single matrix dimension. Whenever the azimuth index wraps, a spatial discontinuity occurs in the 2D image representation. Convolutional kernels, operating under the assumption that adjacent pixels are spatially proximate, are therefore processing features across a geometric boundary—producing corrupted learned representations. Two remediation strategies are proposed: (1) reorganizing energyscapes into proper 3D tensors and applying 3D convolutions, at substantial computational cost; or (2) restricting beamforming to a 2D azimuthal slice, sacrificing some dimensional information but enabling valid 2D convolution. Neither approach was evaluated in the study, representing a clear priority for follow-on work.
The current implementation employs a linear delay-and-sum (DAS) beamformer—the simplest member of the beamforming family. More advanced algorithms—Minimum Variance Distortionless Response (MVDR) beamforming and Delay-Multiply-and-Sum (DMAS)—are known to produce substantially cleaner spatial spectra, reduced sidelobe contamination, and improved target contrast, at higher on-board computational cost.[24,25] Given the computational constraints of the eRTIS on-board processing unit, the authors identify advanced beamforming as a promising direction conditional on continued hardware capability improvements. The recent development of the HiRIS (High-Resolution Imaging Sonar) platform at CoSys-Lab—featuring a 1024-channel 32×32 uniform rectangular microphone array with MVDR forward-backward spatial smoothing—represents the architectural endpoint of this trajectory, offering 70 dB main lobe-to-sidelobe ratio in single-source scenarios.[25]
Three structural dataset limitations warrant emphasis. First, annotation is based on camera images rather than certified ground-truth pavement survey data, introducing potential labeling bias toward visually obvious damage modes and under-representation of subsurface or low-contrast defects. Second, the absence of geographic stratification in dataset splitting means that physically correlated samples (from the same road segment) may leak across splits, potentially inflating reported performance. Third, the test set at 513 samples is small enough that a single misclassification in a rare class (which may appear only once or twice) can substantially shift the test κ for that class, creating high variance in the test performance estimate. Road-segment-level splitting and larger held-out geographic areas are essential prerequisites for credible generalizability claims.
The HAIROAD project is funded by imec and VLAIO (Flanders Innovation and Entrepreneurship) with a two-year execution window (October 2023–September 2025) and targets a Flemish road network context in which fewer than 5% of municipalities have formally adopted Belgian Road Research Centre (BRRC) standard road condition surveys, largely due to cost.[32] The sensor fusion and hybrid AI framework being developed through HAIROAD aims to automate BRRC indicator acquisition and supplement it with new indicators—including water management and debris accumulation—while providing forecasting capability sufficient to support cost-benefit-weighted maintenance recommendations.[32]
The broader European context features comparable initiatives. A 2025 AI blueprint review covering UK, US, and European highway agencies documented widespread adoption of AI vision systems on routine maintenance vehicles—exemplified by Hertfordshire County Council's 2024–2025 trials of the Robotiz3d ARRES Eye system, combining high-resolution cameras and LiDAR for real-time pothole, cracking, and rutting detection during routine van patrols.[39] In the United States, a 2024 National Academies report on AI applications for automatic pavement condition evaluation documented the accelerating transition from manual inspection to automated systems using laser profiling, accelerometer-based roughness measurement, and computer vision.[3-NAS] California's 2025 AI transportation initiative, announced by Governor Newsom, further signals political commitment to AI-driven infrastructure monitoring at scale.[39]
Within this landscape, SONAR-based opportunistic sensing occupies a distinctive niche: it complements rather than competes with these primarily optical approaches. Smart sensing platforms such as those developed by Vaisala—which aggregate accelerometer friction and roughness data from vehicle fleets in real time—and ASIMOB's accelerometer-derived impact index demonstrate that non-optical modalities are already penetrating mainstream PMS infrastructure at scale.[6-ALG] The CoSys-Lab eRTIS work extends this non-optical sensing trajectory into the acoustic imaging domain, with the potential to provide surface material discrimination and damage type classification beyond what accelerometers or single-element ultrasonic sensors currently deliver.
- 3D Energyscape Representation: Restructure beamforming output into proper (azimuth × elevation × range) tensors and apply 3D CNNs to enable spatially valid convolution operations.
- Advanced Beamforming: Evaluate MVDR and DMAS beamformers within the on-board compute budget; assess impact on damage F1 across all damage categories.
- Geographic Dataset Stratification: Reprocess dataset with road-segment-level train/test splits to provide credible generalizability metrics on previously unseen road environments.
- Sensor Fusion: Combine eRTIS energyscapes with GPS, accelerometer, and camera inputs in a multi-modal fusion architecture to exploit complementary sensing strengths.
- Larger Balanced Dataset: Expand data collection to increase representation of rare damage classes (open transversal joint, subsidence, loose stones) and mitigate class imbalance effects on test performance stability.
- Damage Severity and Localization: Extend the classification framework to include severity grading and spatial localization of damage within the road surface footprint.
- Adverse Weather Validation: Formally characterize the all-weather performance advantage of SONAR versus optical sensors under quantified fog, rain, and dust conditions.
This review has examined the University of Antwerp CoSys-Lab contribution on 3D in-air SONAR sensing for opportunistic road condition monitoring within the broader context of pavement management systems research. The study establishes that the eRTIS 32-channel MEMS sonar array, combined with gradient-boosted tree classifiers operating on PCA-reduced energyscape features, achieves road material classification F1 scores approaching 90% and road damage detection F1 scores approaching 75%—performance levels that are promising and competitive with alternative acoustic modalities, while remaining below the thresholds required for industrial deployment without human verification.
The core technical value of the SONAR modality—imperviousness to optical degradation in adverse weather—provides a genuinely distinct capability for opportunistic sensing deployments compared to camera, LiDAR, or hybrid visual systems. Identified paths to improvement include advanced beamforming algorithms, geometrically valid 3D convolutional architectures, road-segment-stratified dataset construction, and multi-modal sensor fusion. Within the HAIROAD project framework and the broader European commitment to AI-driven infrastructure management, this work represents a meaningful and technically rigorous step toward cost-effective, continuous, all-weather pavement management at municipal scale.

No comments:
Post a Comment