Wednesday, January 8, 2025

A Digital Engineering Approach to Testing Modern AI and Complex Systems

Range and Doppler MAE for all algorithms on the excursion dataset.

 Air Force Research Lab Pioneers New AI Testing Framework for Military Systems

ROME, NY - In a groundbreaking development, researchers from the Air Force Research Laboratory (AFRL) and Information Systems Laboratories, Inc. (ISL) have introduced a novel framework for testing artificial intelligence systems used in military applications. The research, detailed in a recently approved technical report, addresses one of the most significant challenges in modern military technology: how to thoroughly test and validate AI-driven systems before deployment.

Dr. Joseph R. Guerci, an IEEE Fellow from ISL and lead author of the study, along with colleagues Dr. Sandeep Gogineni and Dr. Daniel L. Stevens, developed what they call "DE-T&E" (Digital Engineering Testing & Evaluation). The framework builds upon decades of AFRL's experience in radar systems and recent advances in digital engineering.

"Traditional testing methods simply weren't designed for the complexity of modern AI systems," explains Dr. Guerci. He worked with us at GA-ASI in space-time adaptive processing (STAP) for the Lynx GMTI dual beam radar system as well as many other systems. "Our approach combines digital twin technology with generative AI to identify potential failures before they occur in real-world operations."

The team demonstrated their framework using an advanced radar system, showcasing how it can detect potential problems that conventional testing might miss. The work leverages ISL's RFView simulation software, which has been refined over decades of radar systems modeling.

The research comes at a crucial time, following the Department of Defense's recent Instruction 5000.97, which mandates digital engineering approaches for new military programs. The mandate reflects lessons learned from successful programs like the B-21 Raider and Next Generation Air Dominance (NGAD) fighter, which heavily utilized digital engineering in their development.

"What makes this approach particularly valuable is its ability to discover 'Black Swan' events - rare but potentially catastrophic scenarios that traditional testing might miss," notes Dr. Gogineni, a Senior Member of IEEE and expert in radar systems.

The framework's development involved collaboration between ISL's San Diego facility and AFRL's Information Directorate in Rome, NY. The research team also included Robert W. Schutz, Gavin I. McGee, Brian C. Watson, and Hoan K. Nguyen from ISL, contributing expertise in various aspects of systems engineering and AI.

This breakthrough comes as the military increasingly relies on AI-driven systems, from autonomous vehicles to advanced radar systems. The new testing framework provides a path forward for validating these complex systems while meeting rigorous military specifications.

The research has been approved for public release by AFRL and represents a significant step forward in ensuring the reliability and safety of AI systems in military applications. As AI continues to play a larger role in defense technology, frameworks like DE-T&E will be crucial in maintaining the U.S. military's technological edge while ensuring system safety and reliability.
 

 A Digital Engineering Approach to Testing Modern AI and Complex Systems

Joseph R. Guerci, Fellow, IEEE, Sandeep Gogineni, Senior Member, IEEE, Robert W. Schutz, Gavin I. McGee, Brian C. Watson, Hoan K. Nguyen, Senior Member, IEEE, John Don Carlos, Daniel L. Stevens, Senior Member, IEEE
 
Modern AI (i.e., Deep Learning and its variants) is here to stay. However, its enigmatic "black box" nature presents a fundamental challenge to the traditional methods of test and validation (T&E). DE does in this paper we introduce a Digital Engineering (DE) approach to T&E (DE-T&E), combined with generative AI, that can achieve requisite mil-spec statistical validation as well as uncover potential deleterious "Black Swan" events that might otherwise not be uncovered—until it's too late. An illustration of these concepts is presented for an advanced modern radar employing deep learning AI.
 
:Approved for Public Release; Distribution Unlimited: AFRL-2024-5306 20240926

 Summary

 This paper presents a new approach to testing and evaluating AI systems and other complex military systems using Digital Engineering (DE). Here are the key points:

1. The authors introduce a three-phase Digital Engineering Testing & Evaluation (DE-T&E) approach:

  • - Phase I (Baseline): Uses digital twin models to conduct extensive Monte Carlo simulations and establish baseline performance
  • - Phase II (Excursion): Tests system robustness by introducing variations and modeling errors to identify potential issues
  • - Phase III (Black Swan): Employs generative AI to discover unexpected but potentially catastrophic scenarios that human testers might not anticipate

2. The approach is demonstrated using a radar application that uses deep learning to detect and track targets in cluttered environments:

  1. - Multiple CNN architectures (MobileNet, RetinaNet, YOLO) were tested
  2. - Performance was evaluated under both baseline and stressed conditions
  3. - System improvements (like increased antenna size) were made based on test results

3. The paper introduces an innovative use of Generative Adversarial Networks (GANs) to:

  • - Generate synthetic radar clutter data much faster than traditional modeling
  • - Identify potential "Black Swan" events during ongoing system deployment
  • - Help validate system performance against unexpected scenarios

4. Key benefits of this approach include:

  • - Meets military statistical validation requirements
  • - Reduces reliance on expensive physical testing
  • - Provides ongoing validation during deployment
  • - Can identify potential failures before they occur in real operations

5. The work aligns with Department of Defense Instruction 5000.97, which mandates Digital Engineering for new programs.

This represents a significant advancement in testing complex AI systems, particularly for military applications where traditional testing methods may be insufficient or impractical.

Radar Applications Demonstrated

The paper demonstrated the DE-T&E framework using a Ground Moving Target Indicator (GMTI) radar application. Here are the key technical details:

System Configuration:
- X-band radar (10 GHz)
- Platform flying at 1000m altitude, 100 m/s speed
- Initially used 10 horizontal × 5 vertical antenna array elements
- Later upgraded to 20 × 10 elements to improve performance
- Located along Southern California coast for testing

Test Scenario:
- Radar platform flying northward parallel to Earth's surface
- Target locations varied between:
  - Latitude: 32.5439°N to 32.5571°N
  - Longitude: 116.9577°W to 117.1406°W
- Ground targets moving at either 7 m/s or 14 m/s
- Complex ground clutter environment including terrain features

Three Deep Learning Approaches Tested:
1. MobileNet - 12-layer CNN architecture
2. RetinaNet
3. YOLOv7 (achieved best performance)

Key Performance Metrics:
- Mean Absolute Error (MAE) in range and Doppler measurements
- False positive/negative detection rates
- Percentage of detections within 1 bin of true location
- Each range bin = 0.0162 nautical miles
- Each Doppler frequency bin = 3.4375 Hz

Testing Phases:
1. Baseline Phase:
   - Used RFView to generate 5000 range-Doppler maps
   - YOLO achieved 0.15 range error and 0.38 Doppler error

2. Excursion Phase:
   - Increased clutter power by 6dB to stress system
   - Performance degraded significantly
   - Led to antenna array redesign

3. Black Swan Phase:
   - Used GANs to generate synthetic radar data
   - Demonstrated ability to generate realistic clutter maps
   - GAN processing time < 4ms vs RFView's 900ms
   - Validated against real terrain features like rivers and lakes

The radar demonstration showed how the framework could:
- Identify system limitations under stressed conditions
- Guide design improvements (like antenna array size)
- Generate synthetic test data much faster than traditional methods
- Maintain high detection accuracy even with challenging targets
- Handle complex environmental factors like terrain-induced clutter

This application was particularly relevant because radar systems are fundamental to military operations and increasingly use AI for target detection and classification. 

DL defined

Here's a detailed breakdown of the three Deep Learning (DL) approaches tested in the radar application:

1. MobileNet
- Type: Convolutional Neural Network (CNN)
- Architecture: 12 layers including:
  - Input layer (680×320×1)
  - 8 convolutional layers with varying dimensions
  - 2 dense layers (256 nodes and 2 output nodes)
- Features:
  - Uses ReLU (Rectified Linear Unit) activation
  - Includes batch normalization
  - Employs max pooling
- Performance:
  - Range Error: 4.90
  - Doppler Error: 3.60
  - False Positive Rate: 0.00
  - False Negative Rate: 0.00

2. RetinaNet -
The RetinaNet model is a one-stage object detection model incorporating features such as Focal Loss, a Feature Pyramid Network (FPN), and various architectural improvements. These enhancements provide a unique balance between speed and accuracy, making RetinaNet a unique model
- Performance:
  - Range Error: 0.54
  - Doppler Error: 0.35
  - False Positive Rate: 0.00
  - False Negative Rate: 0.06

3. YOLOv7 (You Only Look Once, version 7)
- Best performing model
- Performance:
  - Range Error: 0.15
  - Doppler Error: 0.38
  - False Positive Rate: 0.00
  - False Negative Rate: 0.00

Notably, the YOLO architecture achieved the best overall performance in target detection and localization, with the lowest error rates and consistent performance across different testing scenarios. The paper doesn't provide the complete architectural details for RetinaNet and YOLO but focuses on their performance metrics and implementation results.

Background of the study:
This paper discusses a new approach to testing and validating complex systems, particularly those that use advanced AI techniques like deep learning. The traditional methods of testing and validation may not be sufficient for these complex systems, as they are often treated as "black boxes" whose inner workings are not easily understood.

Research objectives and hypotheses:
The paper's main objective is to introduce a new Digital Engineering (DE) approach to Testing and Evaluation (T&E), which can achieve the required statistical validation while also uncovering potential "Black Swan" events that may not be easily predicted. The authors hypothesize that this new approach can effectively address the challenges posed by advanced AI systems.

Methodology:
The authors propose a three-phase approach to T&E:
1. Baseline phase: Establish a baseline digital twin of the system under test and its operating environment, and conduct extensive Monte Carlo simulations to achieve statistical convergence.
2. Excursion phase: Introduce excursions from the baseline models to ensure the robustness of the results, representing the "known unknowns".
3. "Black Swan" phase: Utilize generative AI (specifically, Generative Adversarial Networks) to create scenarios that are far from the "norm", representing the "unknown unknowns".

Results and findings:
The authors demonstrate the proposed approach using a radar application that employs deep learning for target detection and localization. They show that the deep learning algorithms outperform a simple peak-based approach, and that the performance can be further improved by modifying the system design (e.g., increasing the antenna size) to address the "known unknowns" identified in the Excursion phase.

Discussion and interpretation:
The authors argue that the proposed DE-T&E approach can effectively address the challenges posed by advanced AI systems, as it combines the strengths of digital twins, statistical validation, and generative AI to uncover potential "Black Swan" events.

Contributions to the field:
The paper introduces a novel, comprehensive approach to testing and validating complex systems, particularly those that employ advanced AI techniques. This approach can help ensure the reliability and robustness of these systems, which is crucial for their deployment in critical applications.

Achievements and significance:
The proposed DE-T&E approach represents a significant advancement in the field of system testing and validation, as it addresses the limitations of traditional methods and leverages the power of emerging technologies like generative AI.

Limitations and future work:
The paper focuses on a relatively simple radar application as an example, and the authors acknowledge the need to apply the proposed approach to more complex, integrated systems and systems of systems. Future work may involve further refinement and validation of the approach across a wider range of applications.
 

Key Acronyms Used:

- AI: Artificial Intelligence
- AFRL: Air Force Research Laboratory
- CNN: Convolutional Neural Network
- DE: Digital Engineering
- DE-T&E: Digital Engineering Testing & Evaluation
- DL: Deep Learning
- DLNN: Deep Learning Neural Network
- DTED: Digital Terrain Elevation Data
- ERP: Effective Radiated Power
- GAI: Generative Artificial Intelligence
- GAN: Generative Adversarial Network
- GMTI: Ground Moving Target Indicator
- HPC: High Performance Computing
- IADS: Integrated Air Defense Systems
- LCLU: Land Cover Land Use
- MAE: Mean Absolute Error
- MBSE: Model-Based Systems Engineering
- MC: Monte Carlo
- NGAD: Next Generation Air Dominance
- OSD: Office of the Secretary of Defense
- RCS: Radar Cross Section
- RD: Range-Doppler
- RF: Radio Frequency
- ReLU: Rectified Linear Unit
- SUT: System Under Test
- T&E: Testing & Evaluation
- XAI: Explainable AI
- YOLO: You Only Look Once (object detection system)
 

No comments:

Post a Comment

A Digital Engineering Approach to Testing Modern AI and Complex Systems

Range and Doppler MAE for all algorithms on the excursion dataset.  Air Force Research Lab Pioneers New AI Testing Framework for Military Sy...