[2310.11835] T3P: Demystifying Low-Earth Orbit Satellite Broadband
T3P: Demystifying Low-Earth Orbit Satellite Broadband
The Internet is going through a massive infrastructural revolution with the advent of low-flying satellite networks, 5/6G, WiFi7, and hollow-core fiber deployments. While these networks could unleash enhanced connectivity and new capabilities, it is critical to understand the performance characteristics to efficiently drive applications over them. Low-Earth orbit (LEO) satellite mega-constellations like SpaceX Starlink aim to offer broad coverage and low latencies at the expense of high orbital dynamics leading to continuous latency changes and frequent satellite hand-offs.
This paper aims to quantify Starlink's latency and its variations and components using a real testbed spanning multiple latitudes from the North to the South of Europe. We identify tail latencies as a problem. We develop predictors for latency and throughput and show their utility in improving application performance by up to 25%. We also explore how transport protocols can be optimized for LEO networks and show that this can improve throughput by up to 115% (with only a 5% increase in latency). Also, our measurement testbed with a footprint across multiple locations offers unique trigger-based scheduling capabilities that are necessary to quantify the impact of LEO dynamics.
Submission history
From: Saksham Bhushan [view email][v1] Wed, 18 Oct 2023 09:39:09 UTC (12,703 KB)
Overview
Based on the paper, here are the key points about demystifying low-Earth orbit (LEO) satellite broadband:
- LEO networks like SpaceX's Starlink aim to provide global low-latency connectivity, but the high speeds of LEO satellites lead to frequent handoffs and latency changes.
- The paper built a testbed called LEOScope across Europe to measure Starlink's performance. They found median latencies of 30-50 ms but tail latencies 11-16x higher due to factors like sub-optimal handoffs.
- They built custom predictors to estimate Starlink's latency and throughput, and showed a video streaming application using the throughput predictor improved QoE by 25%.
- They explored tuning BBRv2's parameters for Starlink and achieved 115% higher throughput with only 5% higher latency compared to default BBRv2.
- The paper proposes a "T3P stack" - Telemetry, Triggers, and Predictors - to provide LEO-awareness to applications and transport protocols. LEOScope provides triggers to initiate measurements on events of interest.
- They identify opportunities like developing clean-slate LEO-aware congestion control, stochastic models for handoff prediction, and leveraging physical layer information exposed by user terminals.
In summary, the paper provides a comprehensive evaluation of Starlink performance, demonstrates benefits of LEO-aware optimization, and lays out an agenda for continued research to support the next generation of satellite megaconstellations.
User Experience and QoE
Based on the results in the paper, frequent handovers in LEO networks like Starlink can negatively impact customer experience for applications like video streaming and interactive simulations in a few key ways:
- Handoffs can cause sudden latency spikes that last 15+ seconds. This can cause stuttering and rebuffering in video streaming. For interactive simulations that require low latency, these spikes would cause lag and unresponsiveness.
- The variable latency makes it hard for applications to accurately predict throughput and adapt. Without custom predictors tailored to LEO dynamics, video quality selection can be suboptimal and simulation state can get out of sync.
- The paper showed high loss rates up to 5% during handovers. This packet loss can disrupt video streams and simulation state.
- Tail latencies in the 100ms+ range occur due to handovers. This high latency affects real-time interactivity in simulations and video conferencing.
- Frequent handoffs lead to fluctuations in latency and throughput over short time scales. This variability makes it difficult for streaming and simulations to adapt smoothly.
The paper demonstrates that custom predictors and transport protocols tuned for LEO dynamics can mitigate some of these effects. But fundamentally, the high mobility of LEO satellites results in handover effects that can substantially degrade experience for real-time interactive applications. Addressing this impact requires tighter integration of application logic with lower-layer LEO network dynamics.
LEOScope
LEOScope is the distributed testbed built by the authors to evaluate Starlink's performance and run experiments over its network. Here are some key details about LEOScope:
- It consists of measurement clients behind Starlink user terminals deployed across Europe (Spain, UK) and servers in Azure cloud regions.
- The clients measure performance to public Internet services like DNS, CDNs, and custom servers.
- An orchestrator in Azure schedules and monitors experiments across clients/servers.
- It provides "scavenger mode" where experiments are preempted when user traffic is detected, for volunteer nodes.
- It enables "trigger-based scheduling" to initiate measurements based on events like latency spikes, weather changes, satellite positions.
- Triggers allow zooming in on periods of interest like handovers and capturing their network impact.
- Clients continuously collect telemetry from Starlink terminals like latency, orientations, traffic volume.
- The telemetry and measurements feed predictors for latency/throughput and transport optimizations.
- It generates link profiles that can drive simulations and emulations of LEO networks.
- The LEOScope code is public to allow researchers to build on it for further LEO experimentation.
In summary, LEOScope provides a programmable platform with unique triggers and telemetry collection tailored to evaluating and enhancing performance over dynamic LEO broadband constellations. The authors plan to grow it into a shared community testbed.
Latency Statistics
The paper evaluates the variable latency in Starlink using several key statistics:
- Median latency - To measure typical latency, they report median latencies to various services like Google DNS, CDNs, and custom servers. The median gives a sense of common latency.
- Tail latency (95th, 99th percentile) - To quantify extreme outliers, they look at 95th and 99th percentile latencies. Tail latency shows impact of issues like handovers.
- Latency over time - Plotting latency time series shows variability at short time scales and persistent spikes during events like handovers.
- Latency distribution (CDF) - The latency CDF presents the full distribution and shows how tails compare to the median.
- Latency components - Breaking out different network segments via traceroute reveals the impact of the LEO bent-pipe.
- Relative metrics (e.g. 99th/median ratio) - Ratios show how bad the tails are relative to typical latency. Higher ratios indicate more variability.
- Prediction error (MAPE) - Mean Absolute Percentage Error of their predictors reflects how difficult latency is to anticipate due to dynamics.
- Spatial diversity - Comparing latency across testbed locations shows geographic variability in the constellation's performance.
- Temporal diversity - Repeated measurements over days/weeks reveal changes over time as satellite geometry and ground infrastructure evolves.
In general, the paper leverages percentiles, distributions, time series, component analysis, and spatial/temporal diversity to provide a comprehensive view of the highly variable latency induced by LEO satellites and handovers. These statistics highlight the need for LEO-aware optimizations proposed in the paper.
Tail latency refers to the latency experienced by a small fraction of requests or packets in a network. Specifically, tail latency looks at high percentile latencies, such as 95th, 99th or 99.9th percentile.
In contrast to median latency, which captures typical latency, tail latency focuses on the outliers and worst case. Some key aspects of tail latency:
- It reflects the experienced latency of the slowest requests. These high latencies often result from queueing, congestion, or atypical delays.
- Applications care about tail latency as it impacts overall service quality. For example, 99th percentile latency affects the smoothness of video streaming.
- Causes of tail latency include queueing, congestion, hand-offs, packet loss recovery, background tasks, GC pauses, etc.
- Techniques like hedged requests and isolating delay can help mitigate tail latency. Designing networks and systems to minimize it is important.
- LEO satellite networks exhibit very high tail latency due to effects like hand-offs between fast-moving satellites. This paper reported Starlink's 99th percentile latency being 11-16x the median.
- Tail latency is made worse by variability and unpredictability. The paper's latency predictors help applications anticipate and handle LEO's tail latency.
In summary, tail latency focuses on the high latency experienced by a small fraction of traffic. It is a critical metric in systems where overall quality depends on the slowest components. The long tails resulting from LEO satellite networks' mobility make optimizing tail latency essential.
Artifacts
Based on the information in the paper, here are some key artifacts related to this work that are available online:
- LEOScope code: The code for the LEOScope testbed is available on GitHub [1]. This allows researchers to deploy their own measurements and experiments on Starlink using the framework.
- Simulation profiles: The paper mentions that LEOScope generates LEO link profiles [2] based on its measurements. These can serve as inputs to simulators and emulators for researcher experiments.
- Starlink gRPC tools: The paper uses a open source gRPC toolkit [3] to collect telemetry data exposed by Starlink user terminals. This could be useful for others wanting to tap into the telemetry.
- Public TLE data: The satellite trajectory data used by their predictors comes from public sources like Celestrak [4]. The TLEs are updated frequently.
- Measurement servers: The custom measurement servers are hosted by the authors themselves on Azure cloud, so they don't seem to be publicly accessible.
- Measurement results: The raw measurement results from LEOScope do not seem to be available publicly at this time. Only specific results included in the paper are available.
In summary, the key public artifacts are the LEOScope testbed code, the link profiles it can generate, tools to access Starlink's gRPC API, and satellite TLE data. The raw measurement data itself does not appear to be publicly shared. However, researchers should be able to reproduce similar measurements using the available artifacts.
[1] https://github.com/leoscope-testbed
[2] https://anonymous.4open.science/r/leoscope-simulation-profiles
Authors
Here are some details about the authors' affiliations and their previous relevant work:
Affiliations:
- The authors are from Microsoft Research India, University of Surrey UK, Microsoft Research Asia, Shanghai Jiao Tong University, Telefonica Research Spain.
- So it involves researchers from major tech companies, universities, and a telecom company.
Relevant prior papers:
- Ankit Singla et al. had a paper in HotNets 2020 exploring LEO satellite networks using simulations in Hypatia. This seems to have motivated the real-world measurements in this paper.
- Ilker Nadi Bozkurt et al. had a PAM 2017 paper examining why Internet latency is high, which provides context on how DNS and CDNs affect Web latency.
- Neal Cardwell et al. from Google proposed the BBR congestion control algorithms in SIGCOMM 2017 and IETF 2019. This paper evaluates BBRv2 over Starlink.
- Ravi Netravali et al. built the Mahimahi emulator in Usenix ATC 2015, which is used for the video streaming experiments here.
- Debopam Bhattacherjee and Ankit Singla modeled Starlink network topology in CoNEXT 2019, providing architecture background.
So in summary, the authors have expertise in satellite networks, Internet measurements, congestion control and emulation. Their previous work on modeling, simulating and evaluating LEO satellites, Internet latency, BBR, and emulation likely motivated and informed this comprehensive study.
No comments:
Post a Comment