東北大学 大学院情報科学研究科 情報基礎科学専攻 計算機構論分野
(東北大学 工学部 電気情報物理工学科 情報工学コース)
青木・伊藤(康)研究室

A Study on Zero-Shot Semantic Segmentation with Multi-Resolution and PCA-Based Feature Refinement

Nagito Saito (Tohoku University) , Shintaro Ito (Tohoku University) , Koichi Ito (Tohoku University) , Takafumi Aoki (Tohoku University)
2025年度電気関係学会東北支部連合大会, September 2025.
Abstract

Semantic segmentation, which assigns a class to each pixel in an image, is fundamental for high-level image understanding and plays a crucial role in applications such as autonomous driving and medical image analysis. While deep learning has significantly improved the performance of semantic segmentation, the necessity of costly and time-consuming pixel-wise annotations remains a significant bottleneck. To overcome this limitation, zero-shot semantic segmentation (ZS3), which enables segmentation without any annotations, has emerged as a promising approach. Conventional ZS3 methods leverage vision-language models, such as Contrastive Language-Image Pre-training (CLIP) [1], to perform segmentation by computing pixel-wise similarity between image features and text features. However, the limitation of these conventional methods is that CLIP image features are low resolution and CLIP text features contain redundant information. To address this problem, we propose a novel ZS3 method that enhances the resolution of the image features by leveraging multi-resolution images and refines the corresponding text features using Principal Component Analysis (PCA). Through experiments conducted on public datasets, we demonstrate the effectiveness of the proposed method.

戻る