TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction

Abstract

Accurate indoor pathloss prediction is crucial for optimizing wireless communication in indoor settings, where diverse materials and complex electromagnetic interactions pose significant modeling challenges. This paper introduces TransPathNet, a novel two-stage deep learning framework that leverages transformer-based feature extraction and multiscale convolutional attention decoding to generate high-precision indoor radio pathloss maps. TransPathNet demonstrates state-of-the-art performance in the ICASSP 2025 Indoor Pathloss Radio Map Prediction Challenge, achieving an overall Root Mean Squared Error (RMSE) of 10.397 dB on the challenge full test set and 9.73 dB on the challenge Kaggle test set, showing excellent generalization capabilities across different indoor geometries, frequencies, and antenna patterns.

Fig 1: Overview of TransPathNet training process. The framework employs a two-stage architecture: a coarse stage and a fine stage.

TransPathNet follows a U-Net-like architecture with a transformer-based encoder and multi-scale convolutional attention-based decoder. The encoder is based on TransNeXt, a state-of-the-art backbone that extracts hierarchical features from complex environmental data. We incorporate the Efficient Multiscale Convolutional Attention Decoder (EMCAD), which refines and reconstructs pathloss maps at multiple scales through an attention mechanism.

To capture the complexity of indoor propagation, TransPathNet extends the default three-channel inputs (reflectance, transmittance, distance) with enhanced features:

Free Space PathLoss (FSPL): Precomputed free-space pathloss estimate
Transmission Ray Encoding: Precomputed direct transmissions that highlight multi-path effects
Antenna Embeddings: Encodes both the antenna's pattern and angle information
Spatial-Frequency Embeddings: Combines positional encoding and frequency embedding

These additional channels collectively aid the model in generalizing to diverse indoor layouts, materials, and operating conditions.

Our system employs a two-stage coarse-to-fine training strategy to achieve high-precision prediction results:

Coarse Stage: Generates a rough approximation of the pathloss map
Fine Stage: The coarse result is concatenated with the input features and fed into the second refined model, which focuses on residual details

Implementation Details: The model is implemented in PyTorch and trained using the Adam optimizer with an initial learning rate of 10^-4, halved at 50% and 75% of training progress. Input features are resized to 384×384 across all training and evaluation sets. Random flips and rotations are applied to input features to improve generalizability. Mean Squared Error (MSE) loss is used for training. All experiments were conducted on an NVIDIA RTX 4090 GPU with a batch size of 4 for 30 epochs.

Fig 2: Visual comparison of pathloss predictions across different stages of our pipeline for a particular input.

RMSE in dB is the main metric for evaluating the model. The test dataset is divided into three tasks, each of which aims to evaluate the adaptability of the model to new (1) geometric environments, (2) frequencies, and (3) antenna patterns, with weights of 30%, 30%, and 40%, respectively.

Case	Two-Stage	Post-Process	RMSE(dB): Kaggle	RMSE(dB): Full
Coarse only	✗	✗	9.93	10.327
+ Two-Stage Training	✓	✗	9.75	10.430
Full pipeline	✓	✓	9.73	10.397

The average inference time is about 43.8 ms per sample on the RTX 4090.

Key Contributions

Two-Stage Architecture

Our coarse-to-fine training strategy enables high-precision prediction results by first generating a rough approximation and then refining details.

TransNeXt Encoder

State-of-the-art transformer-based backbone for robust feature extraction from complex environmental data.

EMCAD Decoder

Efficient Multiscale Convolutional Attention Decoder refines predictions at multiple scales through an attention mechanism.

Enhanced Input Features

Comprehensive feature set including Free Space PathLoss, transmission ray encoding, antenna embeddings, and spatial-frequency embeddings.

SOTA Performance

Winner of the ICASSP 2025 Indoor Pathloss Radio Map Prediction Challenge with state-of-the-art RMSE scores across different environments.

Conclusion

This paper presents TransPathNet, an advanced deep learning framework for indoor pathloss prediction that combines transformer-based feature extraction with multi-scale convolutional attention decoding. Our model achieves state-of-the-art performance in the ICASSP 2025 Indoor Pathloss Radio Map Prediction Challenge, demonstrating robust generalization across different geometries, frequencies, and antenna patterns. However, it is still difficult to predict high quality pathloss caused by reflections. Future work will focus on developing network designs to improve the accuracy of these predictions.

BibTeX

@inproceedings{li2025transpathnet,
  title={TransPathNet: A Novel Two-Stage Framework for Indoor Radio Map Prediction},
  author={Li, Xin and Liu, Ran and Xu, Saihua and Razul, Sirajudeen Gulam and Yuen, Chau},
  booktitle={Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  year={2025},
  month={April}
}