Multi-Frequency Progressive Refinement for Learned Inverse Scattering

Physics > Computational Physics

[Submitted on 21 May 2024]

Owen Melia, Olivia Tsang, Vasileios Charisopoulos, Yuehaw Khoo, Jeremy Hoskins, Rebecca Willett

Interpreting scattered acoustic and electromagnetic wave patterns is a computational task that enables remote imaging in a number of important applications, including medical imaging, geophysical exploration, sonar and radar detection, and nondestructive testing of materials. However, accurately and stably recovering an inhomogeneous medium from far-field scattered wave measurements is a computationally difficult problem, due to the nonlinear and non-local nature of the forward scattering process.

We design a neural network, called Multi-Frequency Inverse Scattering Network (MFISNet), and a training method to approximate the inverse map from far-field scattered wave measurements at multiple frequencies. We consider three variants of MFISNet, with the strongest performing variant inspired by the recursive linearization method -- a commonly used technique for stably inverting scattered wavefield data -- that progressively refines the estimate with higher frequency content.

Comments:	25 pages, 8 figures
Subjects:	Computational Physics (physics.comp-ph)
Cite as:	arXiv:2405.13214 [physics.comp-ph]
	(or arXiv:2405.13214v1 [physics.comp-ph] for this version)
	https://doi.org/10.48550/arXiv.2405.13214

Submission history

From: Owen Melia [view email]
[v1] Tue, 21 May 2024 21:37:31 UTC (3,755 KB)

Summary

This paper presents a new neural network architecture and training method called Multi-Frequency Inverse Scattering Network (MFISNet) for solving inverse scattering problems. The key aspects are:

1. MFISNet-Refinement is inspired by recursive linearization algorithms. It uses a residual structure to progressively refine estimates of the scattering potential as higher frequency data is introduced.
2. A novel training method matches intermediate network outputs to low-pass filtered versions of the ground truth, ensuring network blocks learn to reconstruct specific frequency information.
3. Experiments show MFISNet-Refinement outperforms single-frequency methods and other multi-frequency approaches across various settings, including with measurement noise.
4. Ablation studies confirm the importance of the progressive refinement structure enabled by the network architecture and training method.
5. The method is evaluated on a challenging dataset with high-contrast scattering potentials and unknown, inhomogeneous backgrounds.

In summary, MFISNet provides a neural network approach to multi-frequency inverse scattering that yields state-of-the-art performance by emulating key aspects of classical recursive linearization algorithms. The progressive refinement architecture and training are crucial to its success.

MFISNet-Refinement

Multi-Frequency Inverse Scattering Network (MFISNet) is a neural network architecture and training method proposed in the paper for solving inverse scattering problems. The key aspects of MFISNet are:

1. Architecture: MFISNet-Refinement, the strongest performing variant, uses a residual structure inspired by recursive linearization algorithms. It progressively refines estimates of the scattering potential as higher frequency data is introduced. The network comprises multiple blocks, one for each incident wave frequency. Each block takes in the measurement data at a particular frequency, passes it through a network that approximately inverts the forward scattering model, and concatenates the output with the estimate from the previous block. This is further refined through convolutional layers and a skip connection.

2. Training method: A novel training approach is used where the output of each block is trained to match a low-pass filtered version of the ground truth scattering potential. The cutoff frequency of the low-pass filter is progressively increased for the blocks handling higher frequency measurement data. This ensures that each block learns to reconstruct specific frequency information. The blocks are first trained sequentially and then fine-tuned jointly.

3. Performance: Experiments show that MFISNet-Refinement outperforms single-frequency methods and alternative approaches for combining multi-frequency data across various settings, including in the presence of measurement noise. The progressive refinement architecture and training method are shown to be crucial to its success.

4. Problem setting: MFISNet is evaluated on a challenging dataset with high-contrast scattering potentials and unknown, inhomogeneous backgrounds. This reflects more realistic experimental conditions compared to previous work.

In summary, MFISNet provides a state-of-the-art neural network approach for multi-frequency inverse scattering by emulating key aspects of classical recursive linearization algorithms through its progressive refinement architecture and training method.

Inverse Scattering

Inverse scattering is a broad term that encompasses techniques used to determine the characteristics of an object or medium by measuring how it scatters incoming waves. It is the inverse problem to forward scattering, which refers to predicting how an object with known properties will scatter incoming waves. Inverse scattering has applications in various fields, including medical imaging, geophysical exploration, non-destructive testing, and yes, radar.

In a typical inverse scattering problem, a wave (such as electromagnetic or acoustic) is sent towards an object or medium, and the scattered wave field is measured at various positions. The goal is to use these measurements to infer properties of the scattering object, such as its shape, location, or internal structure.

Radar is indeed an example of inverse scattering in practice. In a radar system:

1. An electromagnetic wave (usually in the radio or microwave frequency range) is emitted by a transmitter.

2. The wave interacts with objects in its path and is scattered in various directions.

3. Some of the scattered waves are reflected back to the radar system and measured by a receiver.

4. The time delay and strength of the received signals are used to infer properties of the scattering objects, such as their distance, size, and velocity relative to the radar.

So while radar is a specific application, inverse scattering refers to the general mathematical and computational techniques used to solve problems of this nature. The methods developed in the MFISNet paper are focused on the inverse scattering problem of inferring a 2D map of the wave speed in a medium from scattered wave measurements at multiple frequencies. This type of inverse problem arises in seismic and ultrasound imaging applications.

Holography

Inverse scattering techniques have some similarities to holography. In holography, the interference pattern between a reference light wave and light scattered by an object is recorded. From this interference pattern (the hologram), the original light field can be reconstructed, allowing a 3D image of the object to be viewed.

Similarly, in inverse scattering, measurements of the scattered wave field (which in some cases include phase information) are used to reconstruct properties of the scattering object or medium.

However, there are some key differences:

1. Holography typically uses optical frequencies (visible light), while inverse scattering techniques can be applied to a variety of wave types and frequencies, including radio, microwave, acoustic, and elastic waves.
2. In holography, the scattered light field is directly measured and recorded. In inverse scattering problems, the measurements are often made in the far field (i.e., at a large distance from the object), and the goal is to infer properties of the object that are not directly measured.
3. Holography aims to reconstruct the entire 3D light field, allowing a 3D image to be viewed. Inverse scattering methods often aim to reconstruct specific properties of the object or medium, such as its shape, location, or internal structure.

Despite these differences, both holography and inverse scattering leverage the wave nature of light (or other types of waves) and the information contained in the scattered field to reconstruct information about the object or medium of interest. The MFISNet method proposed in the paper is a computational imaging technique that uses machine learning to solve a specific type of 2D inverse scattering problem.

Applications

In this paper, the authors apply inverse scattering to the problem of reconstructing a 2D map of the wave speed in a medium from measurements of the scattered wave field at multiple frequencies. This type of problem arises in applications such as seismic imaging and medical ultrasound.

Key aspects of their approach:

Forward model: They consider a scalar wave equation (the Helmholtz equation) with a variable wave speed. The goal is to reconstruct the wave speed map from measurements of the scattered wave field at multiple frequencies.
Data generation: They define a distribution of wave speed maps (scattering potentials) with high-contrast inclusions and inhomogeneous backgrounds. The scattered wave field is simulated by solving the wave equation numerically for each wave speed map and incident wave frequency.
Neural network: They propose a neural network architecture (MFISNet-Refinement) that progressively refines the reconstructed wave speed map as higher frequency data is introduced. The network is trained using a novel loss function that matches intermediate outputs to low-pass filtered versions of the ground truth.
Performance metrics: The main performance metric used is the relative L2 error between the true and reconstructed wave speed maps. They compare their method to baselines using different neural network architectures and training strategies.

Key results:

Improvement over baselines: MFISNet-Refinement outperforms single-frequency methods and alternative multi-frequency approaches, achieving relative L2 errors around 0.08-0.10 on the test set (compared to 0.11-0.26 for baselines).
Robustness to noise: The method maintains its performance advantage in the presence of moderate measurement noise (relative L2 errors around 0.09-0.10 with 10% noise).
Importance of architecture and training: Ablation studies confirm that the progressive refinement architecture and training strategy are crucial to the method's success.

In terms of resolution and signal-to-noise ratio (SNR), the paper does not provide explicit quantitative results. However, the visual examples of reconstructed wave speed maps suggest that the method is able to accurately reconstruct high-contrast inclusions and inhomogeneous backgrounds with relatively sharp boundaries. The robustness to 10% measurement noise suggests a good SNR in the reconstructions.

Overall, the paper demonstrates state-of-the-art performance for a challenging 2D inverse scattering problem using a novel neural network approach inspired by classical recursive linearization algorithms.

Authors

The authors of this paper are affiliated with the University of Chicago, with appointments in the Department of Computer Science, the Department of Statistics, and the Data Science Institute. Here is a brief overview of each author and their relevant prior work:

Owen Melia (Department of Computer Science): This appears to be one of Melia's early works, as I did not find prior related publications.
Olivia Tsang (Department of Computer Science): Similar to Melia, this seems to be one of Tsang's early publications in the field.
Vasileios Charisopoulos (Data Science Institute): Charisopoulos has prior publications on optimization algorithms and machine learning, but this appears to be his first work on inverse scattering.
Yuehaw Khoo (Department of Statistics, Computational and Applied Mathematics): Khoo has several prior works related to inverse scattering and computational imaging, including:

"SwitchNet: A Neural Network Model for Forward and Inverse Scattering Problems" (2019), which proposes a neural network architecture for solving inverse scattering problems in a single-frequency setting.

Jeremy Hoskins (Department of Statistics, Computational and Applied Mathematics): This appears to be one of Hoskins' early works in the field.
Rebecca Willett (Departments of Computer Science and Statistics, Data Science Institute): Willett has an extensive background in machine learning, signal processing, and computational imaging. Some of her relevant prior works include:

"Learning to Solve Inverse Problems Using Wasserstein Loss" (2018), which explores the use of Wasserstein loss functions for solving inverse problems with neural networks.
"Deep Learning for Inverse Problems" (2020), a book chapter that provides an overview of deep learning techniques for solving inverse problems in imaging.

The authors' affiliations and prior works suggest that this paper represents a collaboration between researchers with backgrounds in machine learning, optimization, and computational imaging. The senior authors (Khoo and Willett) have prior experience in applying machine learning techniques to inverse problems, while this work appears to be an early contribution for the more junior authors (Melia, Tsang, Charisopoulos, and Hoskins).

The paper builds upon prior work on using neural networks for inverse scattering problems (e.g., Khoo's SwitchNet), while introducing novel contributions in the multi-frequency setting and the progressive refinement architecture and training strategy inspired by recursive linearization algorithms.

Artifacts

Based on the information provided in the paper, the authors used the following datasets, tests, and artifacts:

Databases:

The authors did not use any existing public databases. Instead, they generated their own dataset by simulating scattered wave field measurements from a distribution of synthetic wave speed maps (scattering potentials). The wave speed maps were defined to have high-contrast inclusions and inhomogeneous backgrounds to reflect more realistic experimental conditions compared to previous work.

Tests conducted:

Comparison to baselines: The authors compared the performance of MFISNet-Refinement to single-frequency methods (FYNet) and alternative multi-frequency approaches (MFISNet-Fused, MFISNet-Parallel, and Wide-Band Butterfly Network) in terms of the relative L2 error between the true and reconstructed wave speed maps.
Robustness to noise: The authors evaluated the performance of MFISNet-Refinement and the baseline methods in the presence of 10% additive measurement noise to assess the robustness of the approach.
Ablation studies: The authors conducted experiments to investigate the importance of the progressive refinement architecture and training strategy by comparing MFISNet-Refinement to variants with modified architectures and training procedures.

Artifacts for independent validation:

The paper does not mention any specific artifacts that have been made publicly available for independent validation, such as:

Code: The paper does not provide a link to a public code repository containing the implementation of MFISNet-Refinement and the baseline methods.
Data: The paper does not provide access to the synthetic dataset used for training and evaluating the models.
Pre-trained models: The paper does not provide access to pre-trained models that could be used to reproduce the results or apply the method to new data.

The lack of publicly available artifacts makes it difficult for other researchers to independently validate the results or build upon the work without implementing the methods from scratch based on the descriptions provided in the paper. However, it is not uncommon for papers to omit these artifacts, especially if the work is still in the early stages of development or if there are concerns about intellectual property or future publications.

Paper

Multi-Frequency Progressive Refinement for Learned Inverse Scattering

Owen Melia Department of Computer Science, University of Chicago, US Olivia Tsang Department of Computer Science, University of Chicago, US Vasileios Charisopoulos Data Science Institute, University of Chicago, US Yuehaw Khoo Computational and Applied Mathematics, Department of Statistics, University of Chicago, US Jeremy Hoskins Computational and Applied Mathematics, Department of Statistics, University of Chicago, US Rebecca Willett Department of Computer Science, University of Chicago, US Data Science Institute, University of Chicago, US Computational and Applied Mathematics, Department of Statistics, University of Chicago, US

Abstract

Interpreting scattered acoustic and electromagnetic wave patterns is a computational task that enables remote imaging in a number of important applications, including medical imaging, geophysical exploration, sonar and radar detection, and nondestructive testing of materials. However, accurately and stably recovering an inhomogeneous medium from far-field scattered wave measurements is a computationally difficult problem, due to the nonlinear and non-local nature of the forward scattering process. We design a neural network, called Multi-Frequency Inverse Scattering Network (MFISNet), and a training method to approximate the inverse map from far-field scattered wave measurements at multiple frequencies. We consider three variants of MFISNet, with the strongest performing variant inspired by the recursive linearization method — a commonly used technique for stably inverting scattered wavefield data — that progressively refines the estimate with higher frequency content. MFISNet outperforms past methods in regimes with high-contrast, heterogeneous large objects, and inhomogeneous unknown backgrounds.

1 Introduction

Wave scattering is an important imaging technology with applications in medical and seismic imaging, sonar and radar detection, and nondestructive testing of materials. In this setting, a known source transmits incident waves through a penetrable medium, and due to an inhomogeneity in the spatial region of interest, the incident waves are scattered. Several receivers measure the scattered wave field at distant locations. We are interested in the inverse wave scattering problem: given a set of scattered wave field measurements, we want to recover the inhomogeneity in the spatial region of interest that produced the measurements. In this paper, we focus on the inverse wave scattering problem with unknown medium and full-aperture measurements at multiple incident wave frequencies. This problem is characterized by a highly nonlinear forward measurement operator, making the recovery of the scattering potential challenging. We propose a machine learning solution to this problem: given a training set of pairs of scattering potentials and scattered wavefield measurements, we seek to approximate the inversion map with a deep neural network that predicts a scattering potential from scattered wavefield measurements at multiple frequencies. We design a new training method and a new neural network architecture to achieve this goal.

The aforementioned inverse wave scattering problem has been widely studied. While the measurement operator is known to be injective when there are infinitely many sensors positioned in a ring around the scattering potential (Colton and Kress, 2018), computational approaches must always operate in the ill-posed case where finite receivers are present. Thus, past research has focused on optimization approaches to solving the inverse problem. Simple gradient-based optimization approaches to this problem face two major difficulties: computing a gradient requires solving a partial differential equation (PDE), which can be computationally expensive; additionally, the nonlinearity of the forward model induces a non-convex objective function. Therefore, convergence of local search methods such as gradient descent is not guaranteed without careful initialization.

A classical strategy to alleviate the optimization challenges associated with a nonlinear forward model is to use a linear approximation. This linear approximation allows one to formulate the inverse problem as a linear least squares problem, which is relatively easy to solve. A well-known method inverting this global linear approximation is the so-called filtered back-projection (FBP) method (Natterer, 2001). While this method is relatively easy to implement, it suffers from modeling errors and produces inaccurate reconstructions. To remedy this, many machine learning approaches are aimed at constructing data-driven approximations of the inverse map by designing architectures to imitate FBP (Fan and Ying, 2022; Khoo and Ying, 2019; Li et al., 2022). At a high level, these works approximate the two key components of FBP – namely, an application of the adjoint of a linearization of the forward operator followed by a filtering step – by suitably chosen neural network blocks. As a result, they suffer from similar drawbacks as the FBP method: in particular, they provide low-quality reconstructions of high-contrast scattering potentials, especially in the presence of unknown inhomogeneous backgrounds or measurement noise. A natural alternative is integrating machine-learning models into other iterative methods, which are computationally demanding relative to FBP but can provide higher quality reconstructions.

A standard approach, which has been successful in the strongly nonlinear scattering regime, is to use data collected at multiple incoming wave frequencies. Recursive linearization methods (Chen, 1995; Bao and Liu, 2003; Borges et al., 2017) use multi-frequency measurements to solve a sequence of sub-problems, starting at the lowest frequency to provide an initial estimate of the scattering potential and refining that estimate at progressively higher frequencies using warm-started local search methods. Algorithms in this family offer two benefits: first, they alleviate the need for careful initialization since the loss landscape of the lowest frequency sub-problem is typically well-behaved; second, they greatly reduce the number of PDE solves by relying on first-order approximations of the forward model that are relatively inexpensive to invert. However, these methods require measurements at a large number of incident wave frequencies and still involve solving large-scale PDEs and least-squares problems for each frequency. This requires, for example, multiple CPU core-hours to recover a single image, even with a state-of-the-art PDE solver (Borges et al., 2017).

In light of these advances, we propose a new architecture and training method inspired by the recursive linearization algorithm. Our primary approach is based on a residual update architecture and training method that ensures specific network blocks solve specific sub-problems. In addition, we introduce two new methods of extending FBP-inspired neural networks to the multi-frequency setting. We call our networks “Multi-Frequency Inverse Scattering Networks”, or “MFISNets”. We note that our methods are based on a neural network architecture from Fan and Ying (2022) but could, in principle, use any other neural network architectures designed for the inverse scattering problem in the single-frequency setting.

1.1 Contributions & paper outline

In Section 2, we formally define the inverse scattering problem and the machine learning objective. We present standard results about inverse scattering and survey related work in Section 3. In Section 4, we review the recursive linearization algorithm and introduce our method, MFISNet-Refinement. Finally, we introduce our other two methods, MFISNet-Fused and MFISNet-Parallel, and present a numerical evaluation of our methods in Section 5. Our main contributions can be summarized as follows:

1.

We introduce “MFISNet-Refinement”, short for “Multi-Frequency Inverse Scattering Network with Refinement”, a neural network architecture and training method that is inspired by recursive linearization algorithms (Chen, 1995; Bao and Liu, 2003; Borges et al., 2017). (Section 4)
2.

We show that our network achieves lower errors than single-frequency methods (Fan and Ying, 2022) and multi-frequency methods (Li et al., 2022) in the literature as well as the other newly-introduced MFISNets in a high-contrast, noiseless, full-aperture setting. (Section 5.3)
3.

We demonstrate numerically that our method is robust to moderate measurement noise. (Section 5.4)
4.

We consider alternative training strategies and find that the proposed progressive training strategy made possible by the MFISNet-Refinement architecture is important to its performance. (Section 5.5)

2 Problem Setup and Notation

The forward model for our imaging setup is implicitly defined by a PDE problem involving the Helmholtz equation. Let $x \in ℝ^{2}$ be the spatial variable. Suppose $u_{in} (x; s) = e^{i k x \cdot s}$ is an incoming plane wave with direction $s \in 𝕊^{1}$ , wavelength $λ$ , and angular wavenumber $k = 2 π / λ$ . We normalize the problem’s units so this wave travels at speed $c_{0} \equiv 1$ in free space. The incoming wave interacts with a real-valued scattering potential $q (x)$ to produce an additive perturbation, called the scattered wave field $u_{scat} [q] (x; s)$ . We define $q (x) = c_{0}^{2} / c^{2} (x) - 1$ where $c (x)$ is the wave speed at $x$ . The total wave field $u [q] (x; s) = u_{scat} [q] (x; s) + u_{in} (x; s)$ solves the following inhomogeneous Helmholtz equation, and the scattered wave field satisfies the Sommerfeld radiation boundary condition:

{\begin{matrix} Δ u [q] (x; s) + k^{2} (1 + q (x)) u [q] (x; s) = 0 & x \in ℝ^{2}; \\ \frac{\partial u_{scat} [q] (x; s)}{\partial ‖ x ‖} - i k u_{scat} [q] (x; s) = o ({‖ x ‖}^{- 1 / 2}), & as ‖ x ‖ \to \infty . \end{matrix}

(1)

We assume $q (x)$ is supported on a square domain $Ω = {[- 0.5, 0.5]}^{2}$ , and we work with $q \in ℝ^{N_{q} \times N_{q}}$ , the discretization of $q (x)$ onto a regular grid with $(N_{q}, N_{q})$ grid points. We place the receivers equally spaced around a large ring of radius $R ≫ 1$ , centered at the origin. We identify individual receivers by their unit-vector directions $r_{j} \in 𝕊^{1}$ . We compute the solution for a set of $N_{r}$ receiver directions and $N_{s}$ incoming wave directions $s_{ℓ}$ equally-spaced about the unit circle. This results in a set of $(N_{r}, N_{s})$ observations, ${u_{scat} [q] (R r_{j}; s_{ℓ})}_{j, ℓ \in [N_{r}] \times [N_{s}]}$ , which we arrange in a data array $d_{k} \in ℂ^{N_{r} \times N_{s}}$ . We call the mapping from $q$ to $d_{k}$ the forward model with incoming wave frequency $k$ :

{(d_{k})}_{j, ℓ} = ℱ_{k} {[q]}_{j, ℓ} \equiv u_{scat} [q] (R r_{j}; s_{ℓ})

(2)

Because we are interested in multi-frequency algorithms, we are interested in observations of the forward model evaluated on the same $q$ but with a set of incoming wave frequencies $[k_{1}, \dots, k_{N_{k}}]$ . In particular, our goal is to approximate the following mapping:

[ℱ_{k_{1}} [q], \dots, ℱ_{k_{N_{k}}} [q]] \mapsto q .

(3)

Our goal is to train a neural network $g_{θ}$ with parameters $θ$ to approximate the mapping: $g_{θ} (ℱ_{k_{1}} [q], \dots, ℱ_{k_{N_{k}}} [q]) \approx q$ . Given a distribution $𝒟$ over scattering potentials $q$ , we draw a training set of $n$ independent samples from $𝒟$ to generate data to train the neural network. After evaluating the forward model $n N_{k}$ times, we have a training set

𝒟_{n} := {(q^{(j)}, ℱ_{k_{1}} [q^{(j)}], \dots, ℱ_{k_{N_{k}}} [q^{(j)}])}_{j = 1}^{n}

(4)

We evaluate networks in this setting by measuring the relative $ℓ_{2}$ error:

RelativeL2Error (g_{θ})

= 𝔼_{q \sim 𝒟} [\frac{B i g {∥ g_{θ} (ℱ_{k_{1}} [q], \dots, ℱ_{k_{N_{k}}} [q]) - q B i g ∥}_{2}}{{‖ q ‖}_{2}}]

(5)

In practice, we approximate the expected relative $ℓ_{2}$ error in (5) by an empirical mean over a held-out test set of 1,000 samples drawn independently from $𝒟$ .

3 Background and Related Work

3.1 Background

In this section, we review standard results about $ℱ_{k}$ relevant to our study. In particular, we focus on a linear approximation of $ℱ_{k}$ that gives insights into the inverse scattering problem.

The first result is that $ℱ_{k}$ becomes more nonlinear as the magnitude of the scatterer $‖ q ‖$ or the wavenumber $k$ increases. Indeed, the solution to (1) can be equivalently defined as the solution to the Lippmann-Schwinger integral equation:

u_{scat} [q] (x; s) = k^{2} \int_{Ω} G_{k} (‖ x - x^{'} ‖) q (x^{'}) (u_{in} (x^{'}; s) + u_{scat} [q] (x^{'}; s)) d x^{'}

(6)

where $G_{k}$ is the Green’s function for the homogeneous Helmholtz operator. This recursive equation provides a nonlinear map from $q (x)$ to $u_{scat} [q] (\cdot; s)$ and therefore $ℱ_{k} [q]$ .

One way to view this nonlinearity is to interpret the Lippmann-Schwinger equation as a power series in $q (x)$ by iteratively substituting the value of $u_{scat} [q] (x)$ into its appearance on the right-hand side of (6). For example, performing this substitution once yields a linear and a quadratic term in $q (x)$ that are independent of $u_{scat} [q]$ , as well as a “remainder” term involving the unknown $u_{scat} [q]$ that accounts for higher-order terms. This power series does not converge for general $q (x)$ , but it helps illustrate which parts of the problem drive the nonlinearity of the operator $ℱ_{k}$ : as $‖ q ‖$ or $k$ grow, the size of these nonlinear terms will also grow, and as a result $ℱ_{k}$ becomes highly nonlinear.

The next result is that, under a linear approximation, the far-field measurements are diffraction-limited and can only capture frequency components of $q$ up to $2 k$ . Equivalently, the measurements depend on $q$ to a spatial resolution of $λ / 2$ . We consider the first-order Born approximation (Born et al., 1999), which approximates (6) by dropping the $u_{scat} [q] (x^{'}; s)$ term from the right-hand side. This is further simplified with an approximation of the Green’s function in the far-field limit (Born et al., 1999), yielding

d_{k} (r, s)

\approx k^{2} \int_{Ω} e^{- i k (r - s) \cdot x^{'}} q (x^{'}) d x^{'} .

(7)

We will refer to this linear approximation of the map from $q (x)$ to $d_{k} (r, s)$ as $F_{k}$ . Note that $F_{k} q$ is proportional to the Fourier Transform of $q$ evaluated at frequency vectors of the form $k (r - s)$ . Since $r, s \in 𝕊^{1}$ range over the unit circle, the frequency vectors $k (r - s)$ take on values throughout a disk with radius $2 k$ centered at the origin. Thus, evaluations of the linearized forward model $F_{k} q$ only contain the low-frequency components of $q$ , while high-frequency components are in the kernel of $F_{k}$ (Chen, 1995).

Owing to its simplicity, (7) is often used as inspiration for the design of neural network architectures approximating the inverse map $d_{k} \mapsto q$ . The networks emulate the FBP method (Natterer, 2001), which produces an estimate $\hat{q}$ of the scattering potential $q$ as

\hat{q} = {(F_{k}^{*} F_{k} + μ I)}^{- 1} F_{k}^{*} d_{k},

(8)

where $F_{k}^{*}$ is the adjoint of $F_{k}$ . The operator ${(F_{k}^{*} F_{k} + μ I)}^{- 1}$ can be implemented as a two-dimensional spatial convolution (Khoo and Ying, 2019; Fan and Ying, 2022; Li et al., 2022), while novel network architectures have been proposed to emulate $F_{k}^{*} \in ℂ^{N_{q}^{2} \times N_{r} N_{s}}$ in a parameter-efficient manner. In particular, Fan and Ying (2022) and Zhang et al. (2023) suggest leveraging the rotational equivariance of the forward model to emulate $F_{k}^{*}$ with one-dimensional convolutions, after applying a far-field scaling and an appropriate coordinate transformation in (Fan and Ying, 2022, Equation 6)). We refer to the network described by Fan and Ying (2022) as FYNet.

3.2 Related Work

Deep learning has revolutionized linear inverse problems in imaging, advancing methods for superresolution, inpainting, deblurring, and medical imaging. Many of these advances stem from methods combining deep neural networks with optimization algorithms. For example, the deep unrolling paradigm (Monga et al., 2021) performs a fixed number of steps of an iterative algorithm and replaces certain operations with learnable mappings, which are parameterized by neural networks whose weights are learned from data. Components that remain fixed throughout training may reflect prior knowledge of problem parameters, such as explicit knowledge of the forward measurement model. In this setting, the network is usually trained end-to-end by minimizing the Euclidean distance between the network outputs and the true data. Another paradigm, called plug-and-play denoising (Venkatakrishnan et al., 2013), suggests that general image denoisers can be used in place of proximal operators for regularization functions, an important subroutine in many optimization routines for linear inverse problems in imaging. In this setting, neural network blocks are often trained to solve a different task, such as denoising corrupted signals, and then used inside the inversion algorithm. Ongie et al. (2020) provides a review of deep learning for inverse problems in imaging.

Several works in the wave scattering literature attempt to solve the inverse scattering problem by augmenting an optimization algorithm with components learned from data. At inference time, these methods require running an iterative optimization algorithm. Kamilov et al. (2017) develops a plug-and-play algorithm for inverse scattering, and show that various off-the-shelf denoisers can be applied as proximal operators. Zhao et al. (2023) use an encoder network paired with a network emulating the forward model and suggest optimizing a latent representation of the scattering potential using stochastic gradient descent. When only phaseless measurements of the scattered wave field $u_{scat}$ are available, Deshmukh et al. (2022) propose a network unrolling proximal gradient descent, where the proximal operator is a neural network learned from data. For the inverse obstacle scattering problem in two dimensions, Zhou et al. (2023) propose using a fully-connected neural network to warm-start a Gauss-Newton algorithm. Ding et al. (2022) train a neural network to approximately invert a forward scattering process depending on temporal data, and use this approximate inverse as a nonlinear preconditioner for a nonlinear least squares optimization routine.

Other methods propose to learn the inverse map directly from data. Recently, neural networks that are approximately invariant to discretization size have been proposed as methods of learning maps between general function spaces (Li et al., 2021b; Lu et al., 2021) and these general-purpose networks have been applied to inverse scattering (Ong et al., 2022). Other networks have been designed to invert the forward scattering model in particular; see Chen et al. (2020) for a broad review of such approaches. In our work, we are particularly interested in FBP-inspired methods. The work of Khoo and Ying (2019) leverages the approximate low-rank structure of scattering operators to design their SwitchNet network. Fan and Ying (2022) propose a data transformation that facilitates emulating the adjoint of the linearized forward model via 1D convolutions. In our work, we consider how to combine multiple such network blocks to invert the multi-frequency forward map in (3). One way of combining these blocks is to learn each adjoint operator $F_{k_{t}}^{*}$ as a separate neural network block and combine data to jointly emulate the learned filtering operators ${(F_{k_{t}}^{*} F_{k_{t}} + μ I)}^{- 1}$ . This strategy is employed in Zhang et al. (2023), which is similar to our MFISNet-Parallel, but uses a different parameterization for the layers emulating $F_{k_{t}}^{*}$ and ${(F_{k_{t}}^{*} F_{k_{t}} + μ I)}^{- 1}$ . Another strategy is to use the Wide-Band Butterfly Network (Li et al., 2022, 2021a), which hierarchically merges information from different frequencies in the network block emulating $F_{k_{t}}^{*}$ . We provide more detail and commentary about methods of combining network blocks to form multi-frequency networks in Section 5.2.

Finally, we note in passing that the design of our method is inspired by homotopy methods (Watson and Haftka, 1989). These methods solve a sequence of sub-problems of increasing difficulty, gradually transforming a simple (but uninformative) optimization problem to the optimization problem of interest and using solutions to a given sub-problem to warm-start local search methods for subsequent sub-problems. Such a sequence can be constructed explicitly (e.g., by varying regularization levels) or implicitly; for example, curriculum learning (Bengio et al., 2009) progressively adjusts the training data distribution from “easy” to “hard” samples and has been used to train physics-informed neural networks in challenging problem settings (Krishnapriyan et al., 2021; Huang et al., 2022).

4 Recursive Linearization and Our Method

We propose a neural network that learns to approximate the multi-frequency inversion map from training data. To design the network and training algorithm, we draw inspiration from the recursive linearization method for inverse scattering, which we briefly review below.

4.1 Recursive Linearization

Recursive linearization is a classical method for solving the inverse scattering problem, introduced by Chen (1995). In spite of the nonlinearity of the true forward scattering model described in (6), recursive linearization breaks the inverse problem into a series of simpler problems, each of which corresponds to a linear inverse problem. In this section we discuss the intuition behind this strategy.

Recall from our discussion in Section 3.1 that the forward map evaluated at low incident wave frequencies $k$ acts approximately like a low-pass filter with cutoff frequency $2 k$ . At first glance, this suggests that observing $ℱ_{k} [q]$ for a high value of $k$ is sufficient for high-resolution recovery of $q$ . However, when viewed from an optimization perspective, it becomes clear that this problem is increasingly challenging for large values of $k$ . For example, one might consider the nonlinear least-squares problem

\underset{\hat{q}}{minimize} {∥ d_{k} - ℱ_{k} [\hat{q}] ∥}_{2}^{2} .

(9)

To illustrate the challenges for the optimization formulation with increasing values of $k$ , we consider a simple example where $q$ is known to be a Gaussian bump with a given spread parameter and unknown amplitude. Given observations $d_{k} = ℱ_{k} [q]$ and a numerical PDE solver to calculate $ℱ_{k} [\cdot]$ , one could estimate the amplitude of $q$ by solving a minimization problem similar to (9), but only searching over the unknown amplitude. In Figure 1, we plot this objective as a function of the amplitude of $\hat{q}$ for a range of values of $k$ . For large values of $k$ , the objective function is highly oscillatory and contains many spurious local minima. Typically this suggests that it can be challenging to locate the global minimum unless there is a scheme to initialize estimates close enough to the global minimum to avoid getting stuck elsewhere. However, Figure 1 also shows that the objective function oscillates much more slowly at the smallest value of $k$ while sharing the same global minimum. This suggests that the low-frequency observations can be used to get close to the global minimum, even though they are diffraction-limited and cannot resolve high-frequency components.

Refer to caption — Figure 1: Even in a highly stylized setting, accurately and reliably inverting $ℱ_{k}$ is difficult for high frequencies $k$ . Suppose the ground-truth scattering potential is $q (x) = β \exp (- \frac{{‖ x ‖}^{2}}{2 σ^{2}})$ , a Gaussian bump with known spread parameter $σ = 0.1$ but unknown amplitude $β$ . We show the optimization landscape that arises from searching over different amplitudes. At low incident wave frequencies, this optimization landscape is smooth and has a large basin of attraction. However, as the incident wave frequency increases, the optimization landscape becomes highly oscillatory, requiring a nearly exact initialization to guarantee convergence to the ground truth. In experimental settings, the parameterization of $q$ is often high-dimensional, which requires higher frequency data to resolve high-frequency information in $q$ . This numerical example was inspired by Bao and Liu (2003).

The recursive linearization algorithm leverages this insight by solving a sequence of inversion problems at increasing wave frequencies $k$ . Crucially, each sub-problem uses the output of the previous sub-problem to initialize a new optimization problem. This method was introduced in Chen (1995) and further developed in Bao and Liu (2003); Borges et al. (2017). At iteration $t$ , the algorithm uses the previous estimate ${\hat{q}}_{k_{t - 1}}$ along with a new set of observations $d_{k_{t}}$ , and it calculates an update $δ q$ that minimizes the $ℓ_{2}$ distance in measurement space:

\underset{δ q}{argmin} {‖ d_{k_{t}} - ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}} + δ q] ‖}_{2}^{2}

(10)

The value of ${\hat{q}}_{k_{t - 1}}$ may make it possible to avoid spurious local minima, but this problem is still difficult since an iterative optimizer would require solving a PDE at each of its iterations. However, $ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}} + δ q]$ is well-approximated by a first-order Taylor expansion about ${\hat{q}}_{k_{t - 1}}$ when $δ q$ is small or when it does not contain low-frequency information (Chen, 1995). This motivates the following surrogate for the optimization problem in (10):

\underset{δ q}{argmin} {‖ d_{k_{t}} - (ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}] + D ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}] δ q) ‖}_{2}^{2},

(11)

where $D ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}]$ denotes the Fréchet derivative of the forward model at ${\hat{q}}_{k_{t - 1}}$ . The action of $D ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}]$ and its adjoint, $D ℱ_{k_{t}}^{*} [{\hat{q}}_{k_{t - 1}}]$ , can be computed using the adjoint-state method (Bao and Liu, 2003; Borges et al., 2017). The resulting algorithm is akin to a Gauss-Newton method; critically, each sub-problem of the form shown in (11) is a linear least-squares problem. We outline a sketch of the recursive linearization algorithm in Algorithm 1.

The recursive linearization algorithm is very demanding computationally. Each iteration requires solving $N_{s}$ large-scale PDEs and a high-dimensional least-squares problem, which quickly creates a large computational burden when producing high-resolution solutions. In a classical setting without machine learning, the frequencies should be spaced close to each other for best results. Chen (1997) uses $k = 1, 2, \dots, 9$ in their numerical experiments, while Borges et al. (2017) uses $k = 1, 1.25, \dots, 70$ , which they report takes around 40-50 hours per sample to produce a single $241 \times 241$ pixel image.

Input: Multi-frequency data

{d_{k_{1}}, d_{k_{2}}, \dots, d_{k_{N_{k}}}}

{\hat{q}}_{k_{1}} \leftarrow {(F_{k_{1}}^{*} F_{k_{1}} + μ I)}^{- 1} F_{k_{1}}^{*} d_{k_{1}}

2 for $t = 2, \dots, N_{k}$ do

3 Compute

ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}]

and

D ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}]

δ q_{k_{t}} \leftarrow \underset{δ q}{argmin} {‖ d_{k_{t}} - (ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}] + D ℱ_{k_{t}} [{\hat{q}}_{k_{t - 1}}] δ q) ‖}_{2}^{2}

{\hat{q}}_{k_{t}} \leftarrow {\hat{q}}_{k_{t - 1}} + δ q_{k_{t}}

Result: Final estimate

{\hat{q}}_{k_{N_{k}}}

Algorithm 1 Recursive Linearization for Inverse Scattering based on Chen (1995, 1997); Bao and Liu (2003); Borges et al. (2017)

Although recursive linearization is computationally expensive as stated, we believe that one of the key features of recursive linearization is the way that it breaks the recovery process into multiple steps, each of which refines the estimate from the previous step using data of a higher frequency. We will refer to this step-wise recovery strategy as progressive refinement.

Progressive refinement facilitates the recovery process, since each step is only responsible for a correction to the estimate of the scattering potential within a frequency band. Focusing on this strategy also allows us to look for machine learning methods that do not explicitly emulate $ℱ_{k} [\cdot]$ or $D ℱ_{k} [\cdot]$ , which are expensive to compute.

To this end, we consider a generalization of recursive linearization where we replace lines 1 and 1 in Algorithm 1 by describing the inner loop as a generic refinement step:

δ q_{k_{t}} = {RefinementStep}_{k_{t}} ({\hat{q}}_{k_{t - 1}}, d_{k_{t}}) for t = 2, \dots, N_{k}

(12)

where ${RefinementStep}_{k_{t}} ({\hat{q}}_{k_{t - 1}}, d_{k_{t}})$ refers to the update calculated for estimate ${\hat{q}}_{k_{t - 1}}$ given data $d_{k_{t}}$ and can be implemented using a neural network. We will propose and discuss a network architecture in the next section.

4.2 Our Method

We use Algorithm 1 as inspiration for the design of our neural network architecture and training method. In particular, we focus on the following two crucial aspects of Algorithm 1:

Progressive refinement:: The algorithm builds intermediate estimates of the scattering potential which are progressively refined with the introduction of new data.
Homotopy through frequency:: The iterative refinements from the first step form a homotopy from low to high-frequency measurements. As a result, updates at step $t$ contain high-frequency information relative to $k_{t - 1}$ .

To emulate the progressive refinement structure, we propose a network with a residual structure and skip connections. The network comprises multiple blocks, one for each incident wave frequency $k_{t}$ , $t = 1, \dots, N_{k}$ . The input to each block is measurement data $d_{k_{t}}$ collected at a particular incident wave frequency $k_{t}$ . The input passes through an FYNet block (Fan and Ying, 2022), which approximately inverts the forward model. The output of the FYNet block is then concatenated with the output of the previous block, ${\hat{q}}_{k_{t - 1}}$ , and the concatenation is passed to 2D convolutional layers for a second filtering step. Finally, a skip connection adds ${\hat{q}}_{k_{t - 1}}$ to the output of the last convolutional layer of the block, producing the next estimate ${\hat{q}}_{k_{t}}$ . The network’s architecture is shown in Figure 2; we call the resulting network “MFISNet-Refinement.” Note that, under this construction, the FYNet blocks could be replaced by any other neural network architecture designed for the single-frequency inverse scattering problem.

To emulate the homotopy through frequency, we design a training method to ensure each successive block adds higher-frequency information to the estimate of the scattering potential. Under the Born approximation (7), we know $d_{k}$ contains information about $q$ up to frequency limit $2 k$ . This suggests that given data $d_{k_{t}}$ , we should be able to reconstruct the frequency components of $q$ up to $2 k_{t}$ . To reflect this, we train the output of block $t$ with the following loss function:

L_{t} ({\hat{q}}_{k_{t}}; q)

= 𝔼_{q \sim 𝒟_{n}} [{‖ {\hat{q}}_{k_{t}} - {𝖫𝖯𝖥}_{2 k_{t}} q ‖}^{2}]

(13)

In (13), ${𝖫𝖯𝖥}_{2 k_{t}}$ is a low-pass filter with approximate cutoff frequency $2 k_{t}$ , implemented as a Gaussian filter to avoid ringing artifacts; its frequency response is given by

{\tilde{𝖫𝖯𝖥}}_{f_{𝖼𝗎𝗍}} (f) := \exp (- \frac{1}{2} {(\frac{f}{f_{𝖼𝗎𝗍} / \sqrt{2 \log 2}})}^{2}),

(14)

given a target cutoff frequency $f_{𝖼𝗎𝗍}$ . Note that we shrink the standard deviation by a factor of $\sqrt{2 \log 2}$ so that the filter’s half-width at half-maximum matches $f_{𝖼𝗎𝗍}$ .

To train the network, we first adjust the weights of each block in a sequential fashion and then perform a final training step which fine-tunes all of the blocks jointly; the training procedure is summarized in Algorithm 2.

Input: Randomly-initialized neural network parameters

{θ_{1}, \dots, θ_{N_{k}}}

;
Training data samples

𝒟_{n} := {(q^{(j)}, d_{k_{1}}^{(j)}, \dots, d_{k_{N_{k}}}^{(j)})}_{j = 1}^{n}

1 for $t = 1, \dots, N_{k}$ do

2 Set

θ_{t}

as trainable, and freeze all other weights

3 if $t < N_{k}$ then

Train

θ_{t}

by optimizing

L_{t}

// Equation 13

5 else

6 Train

θ_{t}

by optimizing

{‖ {\hat{q}}_{k_{N_{k}}} - q ‖}_{2}^{2}

8 Set all weights as trainable

9 Train all weights by optimizing

{‖ {\hat{q}}_{k_{N_{k}}} - q ‖}_{2}^{2}

Result: Trained neural network parameters

{θ_{1}, \dots, θ_{N_{k}}}

Algorithm 2 Training Procedure

4.3 Implementation of FYNet

We implement the inversion network described in (Fan and Ying, 2022) and call it FYNet. This method suggests applying the far-field scaling and a transformation from the receiver and source direction coordinates $(r, s)$ to a new set of variables as summarized in (Fan and Ying, 2022, Equation (6)), which we perform using bicubic interpolation. We split the complex-valued input into real and imaginary parts along a channel dimension. To implement the action of the adjoint operator $F_{k}^{*}$ , we use a composition of three 1-dimensional convolutional layers. We implement the convolutional layers as learnable in the Fourier domain because the expression for $F_{k}^{*}$ derived in (Fan and Ying, 2022) is local in frequency, but not space. To implement the filtering operator ${(F_{k}^{*} F_{k} + μ I)}^{- 1}$ , we use a composition of three 2-dimensional convolutional layers. All layers but the last one use ReLU activations. This network outputs an estimate of the scattering potential on a regular polar grid, in coordinates $(ρ, ϕ)$ . Following (Fan and Ying, 2022), we train the network by minimizing the difference between predictions and targets on the polar grid. We transform to Cartesian coordinates for visualization and computing final test statistics.

5 Experiments

In this section, we describe the setting and results for our numerical investigation of the efficacy of our proposed method.

5.1 Dataset and Data Generation

\includeinkscape

[width=]svg-inkscape/example_inputs_outputs_q_svg-tex.pdf_tex

(a) Scattering potential

q

\includeinkscape

[width=]svg-inkscape/example_inputs_outputs_d_rs_nu_2_svg-tex.pdf_tex

(b) Measured

d_{k}

k = 4 π

\includeinkscape

[width=]svg-inkscape/example_inputs_outputs_d_rs_nu_16_svg-tex.pdf_tex

d_{k}

k = 32 π

Figure 3: Figure 3(a) shows a typical example from our distribution of scattering potentials, drawn from the test set. Our distribution of scattering potentials has a random low-frequency background field, occluded by piecewise constant geometric shapes. Figures 3(b) and 3(c) show the output of the forward model applied to this scattering potential. In these plots, incident wave frequencies

4 π

and

32 π

were used, respectively. The real part of

u_{scat}

is shown.

Distribution of Scattering Potentials

We define a distribution of scattering potentials $𝒟$ which has nonzero spatial support on the disk of radius $0.4$ , with a smoothly varying random background occluded by three randomly placed and randomly sized piecewise-constant shapes. We normalize the scattering potential so the background has minimum and maximum values $0$ and $2$ respectively, and we normalize the piecewise-constant shapes to have value $2$ . Figure 3(a) shows one such scattering potential from our distribution.

Notably, the contrast of these scattering potentials ${‖ q ‖}_{\infty} = 2$ , which is much larger than the contrast used in distributions to evaluate other machine learning methods in the shape reconstruction regime Fan and Ying (2022); Li et al. (2022). The high-contrast regime is an important experimental setting because it ensures the nonlinearity of the forward model, which is the difficult and interesting problem setting, is captured. The non-constant background also adds to the difficulty of the problem by increasing ${‖ q ‖}_{2}$ , which adds to the nonlinearity of the forward model. It also adds much more entropy to $𝒟$ . We use this model to reflect experimental conditions in imaging tasks, wherein backgrounds are rarely known, constant, or homogeneous.

Implementation of $ℱ$

To implement the forward model, we implement a numerical PDE solver to compute solutions of (1). We implement this by discretizing the scattering domain $Ω$ with a $(N_{q}, N_{q})$ regular grid, with $N_{q} = 192$ . We transform (1) into the Lippmann-Schwinger integral equation and recast the latter as a sparse linear system, which we solve with an iterative method to approximate convergence. This formulation allows us to compute the solution $u_{scat}$ on a large, distant ring placed at radius $R = 100$ . We compute the solution at $N_{r} = 192$ equally-spaced positions on this ring, and we repeat this process for $N_{s} = 192$ equally-spaced source directions. We use a Pytorch implementation of GMRES (Saad and Schultz, 1986) through the CoLA library (Potapczynski et al., 2023) to facilitate batched computation over multiple source directions, fast sparse linear system solves, and GPU acceleration; altogether, evaluating the full forward model for a given $q$ only takes about 5 seconds using our implementation, using one ${NVIDIA}^{®}$ A40 GPU. Figures 3(b) and 3(c) show the solution $u_{scat}$ produced by our implementation for high and low incident wave frequencies.

We use bicubic interpolation to implement the data transformation described in (Fan and Ying, 2022, Equation (6)), resulting in a transformed grid of size $(N_{m}, N_{h}) = (192, 96)$ for the input data. The FYNet blocks reconstruct images on a regular polar grid with $(N_{ρ}, N_{ϕ}) = (96, 192)$ pixels, and the radial dimension of our polar grid extends to $ρ_{\max} = 0.5$ . Finally, we use bicubic interpolation to transform the model’s outputs to the $(N_{q}, N_{q})$ Cartesian grid for final visualization and error measurement.

5.2 Alternative Multi-Frequency Methods

We wish to compare our method, MFISNet-Refinement, with other methods of learning an inverse to the multi-frequency forward map. We design two new methods of extending FBP-inspired single-frequency architectures to the multi-frequency setting. We use the FYNet architecture (Fan and Ying, 2022) to instantiate all three of our MFISNet methods, which allows us to focus on the effects caused by different methods of combining multi-frequency data. We show the architectures in Figure 4. We also compare our method with the Wide-Band Butterfly Network (Li et al., 2022, 2021a).

MFISNet-Fused

In MFISNet-Fused, we use an FYNet architecture that is constrained to learn all of the adjoint operators $F_{k_{1}}^{*}, \dots, F_{k_{N_{k}}}^{*}$ jointly. We concatenate the inputs $[d_{k_{1}}, \dots, d_{k_{N_{k}}}]$ along a new channel dimension, so the input array has shape $(N_{m}, N_{h}, N_{k}, 2)$ . The concatenated input is then passed into a standard FYNet architecture, with the first layer having slightly wider convolutional channels. The shape of the weights in the 2D convolutional layers is constant in the problem dimensions, so the number of parameters in this network is dominated by the 1D convolutional weights and scales as $O (N_{k}^{2} N_{h})$ .

MFISNet-Parallel

In MFISNet-Parallel, we use an extension of FYNet that allows each adjoint operator $F_{k_{1}}^{*}, \dots, F_{k_{N_{k}}}^{*}$ to be learned individually. Each input $d_{k_{t}}$ is input to a unique 1D CNN which emulates $F_{k_{t}}^{*}$ . After the adjoint operators are learned separately, the results are concatenated along a channel dimension, and the filtering operators ${(F_{k_{t}}^{*} F_{k_{t}} + μ I)}^{- 1}$ are emulated jointly by 2D CNN layers. Again, the number of 2D CNN weights does not scale with the problem dimensions, so the number of parameters in this network is dominated by the $N_{k}$ 1D CNN blocks and scales as $O (N_{k} N_{h})$ .

Wide-Band Butterfly Network

The Wide-Band Butterfly Network is introduced and defined in Li et al. (2022, 2021a). Similar to MFISNet-Fused, this architecture also jointly parameterizes the adjoint operators $F_{k_{1}}^{*}, \dots, F_{k_{N_{k}}}^{*}$ , but it leverages the complementary low-rank property of $F_{k_{t}}^{*}$ (Khoo and Ying, 2019) to hierarchically merge the data using a butterfly network. For this network, we use code provided by the authors.¹¹https://github.com/borongzhang/ISP_baseline The reference implementation is limited to using data at three incident frequencies, so we only present results in this setting.

5.3 Stabilizing Reconstruction by Adding Frequencies

First, we test whether the intuition built in Section 4 is true in a machine learning context. We test whether machine learning methods that operate on data with multiple incoming wave frequencies are more accurate and stable than single-frequency machine learning methods. To make this comparison fair, we create a sequence of training datasets with number of incident wave frequencies $N_{k} \in {1, \dots, 5}$ and keep the amount of training data, $n N_{k}$ , constant for each dataset by suitably adjusting the number of training samples $n$ .

For the $N_{k} = 1$ dataset, we train an FYNet model, and for the other datasets, we train our three models: MFISNet-Fused, MFISNet-Parallel, and MFISNet-Refinement. For each model, we train for a fixed number of epochs and choose the model weights at the epoch at which a validation set of size $n / 10$ is minimized. We also use the validation set to search over various hyperparameters, such as the size of 1D and 2D convolutional kernels, the number of channels in the convolutional layers, and optimization hyperparameters, such as step size and weight decay. For the Wide-Band Butterfly Network, we search over the rank of the butterfly factorization, as well as optimization hyperparameters, such as initial learning rate, batch size, and learning rate decay (Appendix B.)

We present the quantitative results of this experiment in Table 1 and Figure 6 (see also Figures 5 and 8 for a qualitative comparison on representative test samples). The relatively poor performance of FYNet confirms our belief that we are in a challenging nonlinear problem regime. As more frequencies are added, the multi-frequency methods improve. The performances of MFISNet-Fused and MFISNet-Parallel are comparable, indicating that the distinction between learning adjoint operators separately or jointly does not have a large effect in this setting. All of tested methods are uniformly outperformed by our method, MFISNet-Refinement, at all values of $N_{k}$ .

In the case of $N_{k} = 3$ , the Wide-Band Butterfly Network underperforms the other methods. We hypothesize that the weaker performance of the Wide-Band Butterfly Network may be attributed to several factors: on the one hand, the butterfly factorization was inspired by analysis in a weak (linear) scattering regime, but our experiments are in a strong (nonlinear) scattering regime. Also, the Wide-Band Butterfly Network was previously tested in low-contrast settings with sub-wavelength scatterers and a known background, while we are in an experimental setting with high contrast and an unknown, inhomogeneous background.

\includeinkscape

[width=0.75]svg-inkscape/preds_compared_along_n_omega_redo_0_svg-tex.pdf_tex

Figure 5: Sample predictions from models trained on different datasets. Predictions and errors on one held-out test sample are shown. The first row shows the ground-truth scattering potential, as well as the predictions and errors from the single-frequency baseline method, FYNet. The second and third rows show predictions and errors, respectively, from MFISNet models trained on

N_{k} = 3

frequencies. The fourth and fifth rows show predictions and errors, respectively, from MFISNet models trained on

N_{k} = 5

frequencies. See Figure 8 in the appendix for additional samples and outputs from Wide-Band Butterfly Network.

Performance Comparison (Noiseless)

$N_{k}$	$[k_{1}, k_{2}, \dots]$	$n$	Method Name	Relative $ℓ_{2}$ Error
1	$[32 π]$	$10, 000$	FYNet	$0.261 \pm 0.036$
2	$[16 π, 32 π]$	$5, 000$	MFISNet-Fused	$0.177 \pm 0.033$
			MFISNet-Parallel	$0.168 \pm 0.029$
			MFISNet-Refinement (Ours)	$0.154 \pm 0.034$
3	$[8 π, 16 π, 32 π]$	$3, 333$	Wide-Band Butterfly Network	$0.160 \pm 0.037$
			MFISNet-Fused	$0.130 \pm 0.025$
			MFISNet-Parallel	$0.114 \pm 0.021$
			MFISNet-Refinement (Ours)	$0.098 \pm 0.020$
4	$[4 π, 8 π, 16 π, 32 π]$	$2, 500$	MFISNet-Fused	$0.105 \pm 0.021$
			MFISNet-Parallel	$0.110 \pm 0.021$
			MFISNet-Refinement (Ours)	$0.086 \pm 0.017$
5	$[2 π, 4 π, 8 π, 16 π, 32 π]$	$2, 000$	MFISNet-Fused	$0.115 \pm 0.022$
			MFISNet-Parallel	$0.108 \pm 0.021$
			MFISNet-Refinement (Ours)	$0.084 \pm 0.018$

Table 1: When holding the number of forward model evaluations

= n N_{k}

constant, methods trained on more frequencies outperform methods with fewer frequencies. The final column reports the relative

ℓ_{2}

error mean

\pm

one standard deviation computed over

1, 000

held-out test samples. The lowest mean for each incident frequency set is marked in boldface font.

32 π

16 π, 32 π

8 π, 16 π, 32 π

4 π, 8 π, 16 π, 32 π

2 π, 4 π, 8 π, 16 π, 32 π

00.050.10.150.20.250.3Input frequenciesRelative

ℓ_{2}

errorFYNetWide-Band Butterfly NetMFISNet-FusedMFISNet-Parallel

Figure 6: Illustration of the results of Table 1. The advantage of the Residual FYNet model grows as more low-frequency data is used, even as the total volume of training data is held constant regardless of the number of frequencies.

5.4 Measurement Noise

We now turn to the question of robustness against measurement noise. We repeat the experiment above but train and test the models using noisy inputs. We assume an additive noise model similar to Borges et al. (2017). Given a clean input $d_{k} \in ℂ^{N_{m} \times N_{h}}$ and a desired noise-to-signal ratio $δ$ , we define a noisy input ${\tilde{d}}_{k}$ as

	${\tilde{d}}_{k}$	$= d_{k} + σ (Z_{1} + i Z_{2}),$		(15)
	$where σ$	$= δ \frac{{‖ d_{k} ‖}_{2}}{\sqrt{2 N_{m} N_{h}}}; {[Z_{j}]}_{m, h} \overset{i.i.d.}{\sim} 𝒩 (0, 1) for j = 1, 2 and (m, h) \in [N_{m}] \times [N_{h}] .$		(15)

Under this noise model, the expected noise-to-signal ratio is $𝔼_{Z \sim 𝒩 (0, I)} [‖ {\tilde{d}}_{k} - d_{k} ‖ / ‖ d_{k} ‖] \approx δ$ . We do not alter the scattering potentials $q_{i}$ in any way. We train and test all models using noisy inputs, and report the results of this experiment in Table 2. For the Wide-Band Butterfly Network, which takes $d_{k}$ inputs in the original $(r, s)$ coordinates, we add noise to the input in its original coordinates and divide by $\sqrt{2 N_{r} N_{s}}$ instead. This results in an equivalent noise profile thanks to the normalization that is grid-invariant. We note that the models lose between $1 %$ - $2 %$ accuracy on test set, suggesting the methods are relatively robust to the presence of measurement noise. Again, our method, MFISNet-Refinement, outperforms all other methods at all values $N_{k}$ . We present the results of this experiment in Table 2 and visualize the results in Figure 7.

Performance Comparison (Noisy,

δ = 0.1

)

$N_{k}$	$[k_{1}, k_{2}, \dots]$	$n$	Method Name	Relative L2 Error
1	$[32 π]$	$10, 000$	FYNet	$0.251 \pm 0.037$
2	$[16 π, 32 π]$	$5, 000$	MFISNet-Fused	$0.170 \pm 0.033$
			MFISNet-Parallel	$0.169 \pm 0.031$
			MFISNet-Refinement (Ours)	$0.155 \pm 0.034$
3	$[8 π, 16 π, 32 π]$	$3, 333$	Wide-Band Butterfly Network	$0.159 \pm 0.037$
			MFISNet-Fused	$0.135 \pm 0.025$
			MFISNet-Parallel	$0.116 \pm 0.024$
			MFISNet-Refinement (Ours)	$0.100 \pm 0.021$
4	$[4 π, 8 π, 16 π, 32 π]$	$2, 500$	MFISNet-Fused	$0.107 \pm 0.020$
			MFISNet-Parallel	$0.105 \pm 0.020$
			MFISNet-Refinement (Ours)	$0.087 \pm 0.017$
5	$[2 π, 4 π, 8 π, 16 π, 32 π]$	$2, 000$	MFISNet-Fused	$0.118 \pm 0.024$
			MFISNet-Parallel	$0.110 \pm 0.021$
			MFISNet-Refinement (Ours)	$0.095 \pm 0.018$

Table 2: Model evaluation on noisy train and test inputs with

δ = 0.1

(see (15) for a description of the noise model). The final column reports the relative

ℓ_{2}

error mean

\pm

one standard deviation computed over

1, 000

held-out test samples. The lowest mean for each incident frequency set is marked in boldface font.

32 π

16 π, 32 π

8 π, 16 π, 32 π

4 π, 8 π, 16 π, 32 π

2 π, 4 π, 8 π, 16 π, 32 π

00.050.10.150.20.250.3Input frequenciesRelative

ℓ_{2}

errorFYNetWide-Band Butterfly NetMFISNet-FusedMFISNet-Parallel

Figure 7: Illustration of results in the noisy measurement regime (Table 2). The relative performance of the models remains the same as in the noiseless case.

5.5 Investigating the Training Method

We hypothesize that our method is successful because it effectively breaks the reconstruction problem into a sequence of simpler frequency-dependent refinement steps. This emulates the sequential, frequency-dependent structure of the recursive linearization method. A natural question is whether the success of our method is primarily driven by emulating sequential refinement steps or can be attributed to the residual architecture alone. To test this question, we investigate two different changes to our training method which remove the progressive refinement structure. We describe each adjustment below.

No Homotopy through Frequency

Rather than sequentially training each network block after the previous blocks have been optimized, we jointly train all of the blocks by optimizing the loss function

{‖ {\hat{q}}_{k_{N_{k}}} - q ‖}^{2} + \sum_{t = 1}^{N_{k} - 1} γ^{N_{k} - t} {‖ {\hat{q}}_{k_{t}} - {𝖫𝖯𝖥}_{2 k_{t}} q ‖}^{2}

(16)

Here $γ$ is a hyperparameter which controls the relative importance of the different loss terms. We tuned over a few choices of $γ$ ; see Table 8 for details.

No Progressive Refinement

Instead of using the intermediate loss terms ${‖ {\hat{q}}_{k_{t}} - {𝖫𝖯𝖥}_{2 k_{t}} q ‖}^{2}$ , designed to promote specific network blocks learning different parts of the reconstruction, we train the network by only optimizing the final loss term ${‖ {\hat{q}}_{k_{N_{k}}} - q ‖}^{2}$ . This is the standard training loss used in most other works, including Khoo and Ying (2019); Fan and Ying (2022); Li et al. (2022).

Training Method	$[k_{1}, k_{2}, \dots]$	Relative L2 Error
No Progressive Refinement	$[2 π, 4 π, 8 π, 16 π, 32 π]$	$0.095 \pm 0.018$
No Homotopy through Frequency	$[2 π, 4 π, 8 π, 16 π, 32 π]$	$0.090 \pm 0.017$
Our Method	$[2 π, 4 π, 8 π, 16 π, 32 π]$	$0.084 \pm 0.018$

Table 3: Our training method, described in Algorithm 2, produces better results than the other methods, which were designed to remove parts of the recursive linearization structure. We describe these alternate training methods in Section 5.5.

6 Conclusion

This paper investigates the use of multi-frequency data in deep learning approaches to the inverse medium scattering problem in a highly nonlinear, full-aperture regime. We review standard optimization results for this problem, identify recursive linearization as an algorithm particularly well-suited for this problem, and use this insight to design a neural network architecture and training method. We experimentally evaluate our proposed approach, comparing against novel and previously-published methods for combining multi-frequency data. In these comparisons, we find our method outperforms the other methods across a wide range of data settings, including with and without measurement noise.

This work also leaves open important questions about machine learning in different multi-frequency data settings. We leave the investigation of these important problems to future work. The first setting is seismic imaging, where sources and receivers are located on one side of the scattering potential, resulting in limited-aperture measurements. Another setting to consider is full-aperture measurements, with a small fixed or frequency-dependent number of source and receiver directions, as $N_{s}$ and $N_{r}$ drive real-world costs when implementing an imaging system. Finally, we suggest a distribution of scattering potentials with an unknown, smoothly varying background occluded by strongly scattering shapes. We note that an important open problem is to learn to segment the reconstruction into disjoint regions, containing only background or only the strong scatterers.

Author Contributions

•

Owen Melia: Conceptualization; Data curation; Methodology; Software; Writing - original draft.
•

Olivia Tsang: Conceptualization; Data curation; Methodology; Software; Writing - original draft.
•

Vasileios Charisopoulos: Conceptualization; Methodology; Supervision; Writing - review & editing.
•

Yuehaw Khoo: Conceptualization; Methodology; Supervision; Writing - review & editing.
•

Jeremy Hoskins: Conceptualization; Methodology; Supervision; Writing - review & editing.
•

Rebecca Willett: Conceptualization; Methodology; Supervision; Funding acquisition; Project administration; Writing - review & editing.

Acknowledgements

OM, OT, VC, and RW gratefully acknowledge the support of AFOSR FA9550-18-1-0166 and NSF DMS-2023109. RW and YK gratefully acknowledge the support of DOE DE-SC0022232. The team gratefully acknowledges the support of the Margot and Tom Pritzker Foundation.

References

Bao and Liu [2003]

Gang Bao and Jun Liu. Numerical Solution of Inverse Scattering Problems with Multi-experimental Limited Aperture Data. SIAM Journal on Scientific Computing, 25(3):1102–1117, January 2003. ISSN 1064-8275. doi:10.1137/S1064827502409705. Publisher: Society for Industrial and Applied Mathematics.

Bengio et al. [2009]

Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, page 41–48, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605585161. doi:10.1145/1553374.1553380.

Borges et al. [2017]

Carlos Borges, Adrianna Gillman, and Leslie Greengard. High resolution inverse scattering in two dimensions using recursive linearization. SIAM Journal on Imaging Sciences, 10(2):641–664, 2017. doi:10.1137/16M1093562.

Born et al. [1999]

Max Born, Emil Wolf, and A. B. Bhatia. Principles of optics: electromagnetic theory of propagation, interference and diffraction of light. Cambridge University Press, Cambridge [England] ; New York, 7th (expanded) edition, 1999. ISBN 978-0-521-64222-4.

Chen et al. [2020]

Xudong Chen, Zhun Wei, Maokun Li, and Paolo Rocca. A review of deep learning approaches for inverse scattering problems (Invited review). Progress In Electromagnetics Research, 167:67–81, 2020. ISSN 1559-8985. doi:10.2528/PIER20030705.

Chen [1995]

Yu Chen. Recursive Linearization for Inverse Scattering. Technical Report YALEU/DCS/RR-1088, Yale University, October 1995.

Chen [1997]

Yu Chen. Inverse scattering via Heisenberg’s uncertainty principle. Inverse Problems, 13(2):253, April 1997. ISSN 0266-5611. doi:10.1088/0266-5611/13/2/005.

Colton and Kress [2018]

David Colton and Rainer Kress. Looking Back on Inverse Scattering Theory. SIAM Review, 60(4):779–807, January 2018. ISSN 0036-1445, 1095-7200. doi:10.1137/17M1144763.

Deshmukh et al. [2022]

Samruddhi Deshmukh, Amartansh Dubey, and Ross Murch. Unrolled Optimization With Deep Learning-Based Priors for Phaseless Inverse Scattering Problems. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022. ISSN 1558-0644. doi:10.1109/TGRS.2022.3214495. Conference Name: IEEE Transactions on Geoscience and Remote Sensing.

Ding et al. [2022]

Wen Ding, Kui Ren, and Lu Zhang. Coupling Deep Learning with Full Waveform Inversion, March 2022. URL http://arxiv.org/abs/2203.01799. arXiv:2203.01799 [cs, math].

Fan and Ying [2022]

Yuwei Fan and Lexing Ying. Solving inverse wave scattering with deep learning. Annals of Mathematical Sciences and Applications, 7(1):23–48, 2022.

Huang et al. [2022]

Yao Huang, Wenrui Hao, and Guang Lin. Hompinns: Homotopy physics-informed neural networks for learning multiple solutions of nonlinear elliptic differential equations. Computers & Mathematics with Applications, 121:62–73, 2022. ISSN 0898-1221. doi:https://doi.org/10.1016/j.camwa.2022.07.002.

Kamilov et al. [2017]

Ulugbek S. Kamilov, Hassan Mansour, and Brendt Wohlberg. A Plug-and-Play Priors Approach for Solving Nonlinear Imaging Inverse Problems. IEEE Signal Processing Letters, 24(12):1872–1876, December 2017. ISSN 1558-2361. doi:10.1109/LSP.2017.2763583. Conference Name: IEEE Signal Processing Letters.

Khoo and Ying [2019]

Yuehaw Khoo and Lexing Ying. SwitchNet: A Neural Network Model for Forward and Inverse Scattering Problems. SIAM Journal on Scientific Computing, 41(5):A3182–A3201, January 2019. ISSN 1064-8275. doi:10.1137/18M1222399. Publisher: Society for Industrial and Applied Mathematics.

Krishnapriyan et al. [2021]

Aditi Krishnapriyan, Amir Gholami, Shandian Zhe, Robert Kirby, and Michael W Mahoney. Characterizing possible failure modes in physics-informed neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, page 26548–26560. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/df438e5206f31600e6ae4af72f2725f1-Paper.pdf.

Li et al. [2021a]

Matthew Li, Laurent Demanet, and Leonardo Zepeda-Núñez. Accurate and Robust Deep Learning Framework for Solving Wave-Based Inverse Problems in the Super-Resolution Regime, June 2021a. URL http://arxiv.org/abs/2106.01143. arXiv:2106.01143 [cs, math, stat].

Li et al. [2022]

Matthew Li, Laurent Demanet, and Leonardo Zepeda-Núñez. Wide-Band Butterfly Network: Stable and Efficient Inversion Via Multi-Frequency Neural Networks. Multiscale Modeling & Simulation, 20(4):1191–1227, December 2022. ISSN 1540-3459. doi:10.1137/20M1383276. Publisher: Society for Industrial and Applied Mathematics.

Li et al. [2021b]

Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Burigede liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=c8P9NQVtmnO.

Lu et al. [2021]

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3(3):218–229, March 2021. ISSN 2522-5839. doi:10.1038/s42256-021-00302-5.

Monga et al. [2021]

Vishal Monga, Yuelong Li, and Yonina C Eldar. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Processing Magazine, 38(2):18–44, 2021.

Natterer [2001]

F. Natterer. The Mathematics of Computerized Tomography. Society for Industrial and Applied Mathematics, 2001. doi:10.1137/1.9780898719284.

Ong et al. [2022]

Yong Zheng Ong, Zuowei Shen, and Haizhao Yang. Integral autoencoder network for discretization-invariant learning. Journal of Machine Learning Research, 23(286):1–45, 2022. ISSN 1533-7928.

Ongie et al. [2020]

Gregory Ongie, Ajil Jalal, Christopher A. Metzler, Richard G. Baraniuk, Alexandros G. Dimakis, and Rebecca Willett. Deep Learning Techniques for Inverse Problems in Imaging. IEEE Journal on Selected Areas in Information Theory, 1(1):39–56, May 2020. ISSN 2641-8770. doi:10.1109/JSAIT.2020.2991563. Conference Name: IEEE Journal on Selected Areas in Information Theory.

Potapczynski et al. [2023]

Andres Potapczynski, Marc Finzi, Geoff Pleiss, and Andrew Gordon Wilson. CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra. arXiv preprint arXiv:2309.03060, 2023.

Saad and Schultz [1986]

Yousef Saad and Martin H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, 7(3):856–869, 1986. doi:10.1137/0907058.

Venkatakrishnan et al. [2013]

Singanallur V. Venkatakrishnan, Charles A. Bouman, and Brendt Wohlberg. Plug-and-play priors for model based reconstruction. In 2013 IEEE Global Conference on Signal and Information Processing, pages 945–948, 2013. doi:10.1109/GlobalSIP.2013.6737048.

Watson and Haftka [1989]

Layne T. Watson and Raphael T. Haftka. Modern homotopy methods in optimization. Computer Methods in Applied Mechanics and Engineering, 74(3):289–305, September 1989. ISSN 0045-7825. doi:10.1016/0045-7825(89)90053-4.

Zhang et al. [2023]

Borong Zhang, Leonardo Zepeda-Núñez, and Qin Li. Solving the Wide-band Inverse Scattering Problem via Equivariant Neural Networks, October 2023. URL http://arxiv.org/abs/2212.06068. arXiv:2212.06068 [cs, math].

Zhao et al. [2023]

Qingqing Zhao, Yanting Ma, Petros Boufounos, Saleh Nabi, and Hassan Mansour. Deep born operator learning for reflection tomographic imaging. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023. doi:10.1109/ICASSP49357.2023.10095494.

Zhou et al. [2023]

Mo Zhou, Jiequn Han, Manas Rachh, and Carlos Borges. A neural network warm-start approach for the inverse acoustic obstacle scattering problem. Journal of Computational Physics, 490:112341, 2023. ISSN 0021-9991. doi:https://doi.org/10.1016/j.jcp.2023.112341.

Appendix A Distribution of Scattering Potentials

To generate samples from $𝒟$ , our distribution of scattering potentials, we draw a random smoothly-varying background and three shapes with random sizes, positions, and rotations. This section provides details about the generation of these scattering objects.

The random low-frequency backgrounds were generated by drawing random Fourier coefficients and filtering out the high frequencies using ${𝖫𝖯𝖥}_{4.0}$ . The resulting background was transformed to Cartesian coordinates, and then shifted and scaled so the maximum value was $2.0$ and the minimum value was $0.0$ . The background was truncated to $0.0$ outside of the disk of radius $0.4$ . Three shapes were randomly chosen among equilaterial triangles, squares, and ellipses. The three shapes had randomly-chosen centers and rotations, constrained to be non-overlapping and fit inside the disk of radius $0.4$ . The side lengths of the squares and triangles were uniformly sampled from $[0.1, 0.15]$ . The major axis lengths of the ellipses were uniformly sampled from $[0.1, 0.15]$ , and the minor axis lengths were uniformly sampled from $[0.05, 0.1]$ . Finally, ${𝖫𝖯𝖥}_{32 π}$ was applied to the scattering potential.

Appendix B Hyperparameter Search

For our hyperparameter searches, we trained models on a grid of hyperparameters and evaluated them on a validation set every $5$ epochs. We found the epoch and hyperparameter setting which produced the lowest error on the validation set, and used those model weights for final evaluation on a held-out test set.

B.1 MFISNet Models

We train all models using the Adam algorithm, with a batch size of $16$ samples. All of the FYNet, MFISNet-Parallel, MFISNet-Fused, and MFISNet-Refinement models tested have 3 1D convolutional layers followed by 3 2D convolutional layers; ReLU activations are used between layers. In the FYNet, MFISNet-Parallel and MFISNet-Refinement models, we use 1D and 2D convolutional kernels with $24$ channels following [Fan and Ying, 2022]; in the MFISNet-Fused models, we use 1D and 2D convolutional kernels with $24 N_{k}$ channels to adjust for the increased input size. To train MFISNet-Fused, MFISNet-Parallel and MFISNet-Refinement, we search over a grid of architecture and optimization hyperparameters. We report the optimal hyperparameters we found in Tables 4, 7, 6, 5 and 8.

1d kernel size: This is the number of frequency components in the 1D convolutional filters emulating $F_{k}^{*}$ . We search over values ${20, 40, 60}$ .
2d kernel size: This is the size (in pixels) of the 2D convolutional kernel used in the layers emulating ${(F_{k}^{*} F_{k} + μ I)}^{- 1}$ . We search over values ${5, 7}$ .
Weight decay: The weight decay parameter adds an $ℓ_{2}$ weight regularization term to the loss function. This hyperparameter determines the coefficient of this regularization term. We search over values ${0.0, 1 \times 10^{- 3}}$ .
Learning rate: This is the step size for the Adam optimization algorithm. We search over values ${1 \times 10^{- 4}, 5 \times 10^{- 4}, 1 \times 10^{- 3}}$ .
LR decrease: For MFISNet-Refinement models only, we decrease the learning rate each time we begin training a new network block. This parameter determines the multiplicative decrease that we apply to the learning rate. We search over values ${1.0, 0.25}$ .

Hyperparameter	Data Setting (FYNet)
	$δ = 0.0; N_{k} = 1$	$δ = 0.1; N_{k} = 1$
1d kernel size	60	40
2d kernel size	7	5
Weight decay	$1 \times 10^{- 3}$	$0.0$
Learning Rate	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$

Table 4: Optimal hyperparameters for FYNet.

Hyperparameter	Data Setting (MFISNet-Refinement)
	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$
	$δ = 0.0$				$δ = 0.1$
1d kernel size	$20$	$20$	$20$	$40$	$20$	$20$	$20$	$20$
2d kernel size	$5$	$5$	$7$	$5$	$5$	$5$	$7$	$7$
Weight decay	$1 \times 10^{- 3}$	$0.0$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$0.0$	$1 \times 10^{- 3}$	$0.0$
Learning rate	$5 \times 10^{- 4}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$5 \times 10^{- 4}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$
LR decrease	$0.25$	$0.25$	$0.25$	$0.25$	$0.25$	$0.25$	$0.25$	$0.25$

Table 5: Optimal hyperparameters for MFISNet-Refinement.

Hyperparameter	Data Setting (MFISNet-Parallel)
	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$
	$δ = 0.0$				$δ = 0.1$
1d kernel size	$40$	$20$	$20$	$20$	$40$	$40$	$20$	$40$
2d kernel size	$7$	$7$	$7$	$7$	$7$	$5$	$7$	$7$
Weight decay	$1 \times 10^{- 3}$	$0.0$	$0.0$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$0.0$	$1 \times 10^{- 3}$	$0.0$
Learning rate	$5 \times 10^{- 4}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$5 \times 10^{- 4}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$5 \times 10^{- 4}$

Table 6: Optimal hyperparameters for MFISNet-Parallel.

Hyperparameter	Data Setting (MFISNet-Fused)
	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$	$N_{k} = 2$	$N_{k} = 3$	$N_{k} = 4$	$N_{k} = 5$
	$δ = 0.0$				$δ = 0.1$
1d kernel size	40	20	20	60	40	20	20	60
2d kernel size	7	5	5	5	5	5	5	5
Weight decay	$0.0$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$0.0$	$0.0$	$0.0$	$0.0$	$1 \times 10^{- 3}$
Learning Rate	$1 \times 10^{- 3}$	$5 \times 10^{- 4}$	$5 \times 10^{- 4}$	$5 \times 10^{- 4}$	$5 \times 10^{- 4}$	$5 \times 10^{- 4}$	$1 \times 10^{- 4}$	$5 \times 10^{- 4}$

Table 7: Optimal hyperparameters for MFISNet-Fused.

Hyperparameter	Training Method
	No Iterative Refinement	No Homotopy through Frequency	Our Method
1d kernel size	$20$	$40$	$40$
2d kernel size	$7$	$7$	$5$
Weight decay	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$	$1 \times 10^{- 3}$
Learning rate	$5 \times 10^{- 4}$	$1 \times 10^{- 4}$	$1 \times 10^{- 3}$
LR decrease	N/A	N/A	$0.25$
$γ$	N/A	$1.1$	N/A

Table 8: Optimal hyperparameters for the MFISNet-Refinement models in the training method experiment (Section 5.5). Here,

γ

is the factor which weights different loss terms in the “No Homotopy through Frequency” condition (cf. (16)). We searched over values

γ = {0.9, 1.0, 1.1}

. All of these models were trained with incident frequencies

[2 π, 4 π, 8 π, 16 π, 32 π]

and

n = 2000

noiseless training samples.

B.2 Wide-Band Butterfly Network

To find the optimal Wide-Band Butterfly Network, we optimized over the following hyperparameters. We defined the grid of hyperparameters by taking the original hyperparameter from Li et al. [2022] and adding both higher and lower values, where possible. See Table 9 for the selected values.

Rank: This parameter controls the rank of the compression of local patches in the butterfly factorization. Increasing this rank parameter increases the number of learnable parameters in the part of the network emulating $F_{k}^{*}$ . We found that increasing the rank decreased the train and validation errors, and we increased the rank until we were unable to fit the model and data onto a single GPU. We searched over values ${2, 3, 5, 10, 15, 20, 30, 50}$ .
Initial Learning Rate: We decrease the learning rate by a multiplicative factor after $2, 000$ minibatches, as suggested by Li et al. [2022]. This parameter is the initial learning rate for the optimization algorithm. We searched over values ${5 \times 10^{- 4}, 1 \times 10^{- 3}, 5 \times 10^{- 3}}$ .
Learning Rate Decay: This is the multiplicative decay parameter for the learning rate schedule. We searched over values ${0.85, 0.95}$ .
Sigma: Li et al. [2022] suggest training the network to match slightly-filtered versions of the ground-truth $q$ . This is performed by applying a Gaussian filter to the targets $q^{(i)}$ before training. Sigma is the standard deviation of this Gaussian filter. We do not blur the targets in the test set. We searched over values ${0.75, 1.125, 1.5}$ .
Batch Size: This is the number of samples per minibatch. We searched over values ${16, 32}$ .

Hyperparameter	Data Setting
	$δ = 0.0; N_{k} = 3$	$δ = 0.1; N_{k} = 3$
Rank	$50$	$50$
Initial Learning Rate	$5 \times 10^{- 4}$	$5 \times 10^{- 4}$
Learning Rate Decay	$0.85$	$0.95$
Sigma	$1.5$	$1.5$
Batch Size	$16$	$16$

Table 9: Optimal hyperparameters for Wide-Band Butterfly Networks.

Appendix C Additional empirical results

In this section, we illustrate the predictions generated by the different models used in our experiments on four randomly-selected test samples from our dataset. For each prediction, we include the associated error plot. The visualizations are provided in Figure 8.

\includeinkscape

[width=]svg-inkscape/preds_compared_along_n_omega_jcp_appendix_0_svg-tex.pdf_tex

Figure 8: Sample predictions from four randomly-selected test samples

Saturday, May 25, 2024