Icon SPAX: Fully Sparse Framework with Hierarchical Spatio-temporal Fusion for Moving Object Tracking in Satellite Videos

Zhehao Xiao 1
Fang Xu 1,2
Chuandong Liu 1
Wen Yang 3
Gui-Song Xia 1,2

1School of Computer Science, Wuhan University, 2School of Artificial Intelligence, Wuhan University, 3School of Electronic Information, Wuhan University

Comparison between our proposed SPAX (bottom) and existing paradigm (top).
Comparison between our proposed SPAX (bottom) and existing paradigm (top).

 

 

Abstract

Moving object tracking is a fundamental task in video satellite technologies. Remote sensing scenes feature small objects and large background ratios in spatial dimension, leading existing methods characterized by dense computation to incur considerable unnecessary computational overhead of redundant regions. Moreover, the coupled motion between the satellite platform and ground objects introduces temporal complexity that current methods find difficult to handle. To address these issues, we propose a fully sparse framework with hierarchical spatio-temporal fusion (SPAX). Specifically, SPAX utilizes an object-centric fully-sparse paradigm to reduce computational redundancy by focusing only on foreground regions. Furthermore, we adopt the hierarchical spatio-temporal fusion (HSF) to address the complexity of dual-motion coupling through intra-frame multi-scale feature fusion, inter-frame symmetric feature interaction, and inter-frame asymmetric feature interaction, thereby enabling comprehensive temporal information utilization. Additionally, we propose a plug-and-play Gaussian-based trajectory association (GTA) strategy to mitigate the negative impact of observational drifts and accumulated errors. Experiments show that SPAX outperforms previous methods on two popular benchmarks, achieving notable improvements of 5.1 and 7.6 on MOTA. While achieving the state-of-the-art (SOTA) performance, SPAX-Base reduces GFLOPs by 88.4% and delivers a 2.7x speedup on SatVideoDT dataset, along with 93.2% GFLOPs reduction and up to a 3.1x acceleration on SatMTB-MOT dataset compared to our baseline. Furthermore, SPAX-Light outperforms the previous SOTA method by 6.6 MOTA and runs at 5.9x its inference speed on SatMTB-MOT dataset.

 

Method

Overall structure of SPAX.
Overall structure of SPAX.

 

Feature comparison slider

Comparison of computational feature maps between the dense and sparse mode.

VISO-car-001:

SatMTB-ship-029:

 

Experimental results