Computer Structures Laboratory
Graduate School of Information Sciences
Tohoku University

Project page

2025
Graphical Abstract
Sparse2DGS: Sparse-View Surface Reconstruction using 2D Gaussian Splatting with Dense Point Cloud

IEEE International Conference on Image Processing

Gaussian Splatting (GS) has gained attention as a fast and effective method for novel view synthesis. It has also been applied to 3D reconstruction using multi-... Gaussian Splatting (GS) has gained attention as a fast and effective method for novel view synthesis. It has also been applied to 3D reconstruction using multi-view images and can achieve fast and accurate 3D reconstruction. However, GS assumes that the input contains a large number of multi-view images, and therefore, the reconstruction accuracy significantly decreases when only a limited number of input images are available. One of the main reasons is the in- sufficient number of 3D points in the sparse point cloud obtained through Structure from Motion (SfM), which results in a poor initialization for optimizing the Gaussian primitives. We propose a new 3D reconstruction method, called Sparse2DGS, to enhance 2DGS in reconstructing objects using only three images. Sparse2DGS employs DUSt3R, a fundamental model for stereo images, along with COLMAP MVS to generate highly accurate and dense 3D point clouds, which are then used to initialize 2D Gaussians. Through experiments on the DTU dataset, we show that Sparse2DGS can ac- curately reconstruct the 3D shapes of objects using just three images.
Graphical Abstract
ERPGS: EQUIRECTANGULAR IMAGE RENDERING ENHANCED WITH 3D GAUSSIAN REGULARIZATION

IEEE International Conference on Image Processing

The use of multi-view images acquired by a 360-degree camera can reconstruct a 3D space with a wide area. There are 3D reconstruction methods from equirectangul... The use of multi-view images acquired by a 360-degree camera can reconstruct a 3D space with a wide area. There are 3D reconstruction methods from equirectangular images based on NeRF and 3DGS, as well as Novel View Synthesis (NVS) methods. On the other hand, it is necessary to overcome the large distortion caused by the projection model of a 360-degree camera when equirectangular images are used. In 3DGS-based methods, the large distortion of the 360-degree camera model generates extremely large 3D Gaussians, resulting in poor rendering accuracy. We propose ErpGS, which is Omnidirectional GS based on 3DGS to realize NVS addressing the problems. ErpGS introduce some rendering accuracy improvement techniques: geometric regularization, scale regularization, and distortion-aware weights and a masktosuppress the effects of obstacles in equirectangular images. Through experiments on public datasets, we demonstrate that ErpGS can render novel view images more accurately than conventional methods.
Graphical Abstract
ZERO-SHOT PSEUDO LABELS GENERATION USING SAM AND CLIP FOR SEMI-SUPERVISED SEMANTIC SEGMENTATION

IEEE International Conference on Image Processing

Semantic segmentation is a fundamental task in medical image analysis and autonomous driving and has a problem with the high cost of annotating the labels requi... Semantic segmentation is a fundamental task in medical image analysis and autonomous driving and has a problem with the high cost of annotating the labels required in training. To address this problem, semantic segmentation methods based on semi-supervised learning with a small number of labeled data have been proposed. For example, one approach is to train a semantic segmentation model using images with annotated labels and pseudo labels. In this approach, the accuracy of the semantic segmentation model depends on the quality of the pseudo labels, and the quality of the pseudo labels depends on the performance of the model to be trained and the amount of data with annotated labels. In this paper, we generate pseudo labels using zero-shot annotation with the Segment Anything Model (SAM) and Contrastive Language-Image Pretraining (CLIP), improve the accuracy of the pseudo labels using the Unified Dual-Stream Perturbations Approach (UniMatch), and use them as enhanced labels to train a semantic segmentation model. The effectiveness of the proposed method is demonstrated through the experiments using the public datasets: PASCAL and MS COCO.
Graphical Abstract
Stereo Radargrammetry Using Deep Learning from Airborne SAR Images

IEEE International Geoscience and Remote Sensing Symposium

In this paper, we propose a stereo radargrammetry method using deep learning from airborne Synthetic Aperture Radar (SAR) images. Deep learning-based methods ar... In this paper, we propose a stereo radargrammetry method using deep learning from airborne Synthetic Aperture Radar (SAR) images. Deep learning-based methods are considered to suffer less from geometric image modulation, while there is no public SAR image dataset used to train such methods. We create a SAR image dataset and perform fine-tuning of a deep learning-based image correspondence method. The proposed method suppresses the degradation of image quality by pixel interpolation without ground projection of the SAR image and divides the SAR image into patches for processing, which makes it possible to apply deep learning. Through a set of experiments, we demonstrate that the proposed method exhibits a wider range and more accurate elevation measurements compared to conventional methods.
Graphical Abstract
Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

Face recognition systems are designed to be robust against changes in head pose, illumination, and blurring during image capture. If a malicious person presents... Face recognition systems are designed to be robust against changes in head pose, illumination, and blurring during image capture. If a malicious person presents a face photo of the registered user, they may bypass the authentication process illegally. Such spoofing attacks need to be detected before face recognition. In this paper, we propose a spoofing attack detection method based on Vision Transformer (ViT) to detect minute differences between live and spoofed face images. The proposed method utilizes the intermediate features of ViT, which have a good balance between local and global features that are important for spoofing attack detection, for calculating loss in training and score in inference. The proposed method also introduces two data augmentation methods: face anti-spoofing data augmentation and patch-wise data augmentation, to improve the accuracy of spoofing attack detection. We demonstrate the effectiveness of the proposed method through experiments using the OULU-NPU and SiW datasets. The project page is available at: https://gsisaoki.github.io/FAS-ViT-CVPRW/ .
Graphical Abstract
ErpGS:3D ガウシアン正則化を用いた360度画像の新規視点合成

第31回画像センシングシンポジウム

360 度カメラを用いることで,カメラを中心とする全周を正距円筒図法(Equirectangular Projection: ERP)に基づいて画像平面に投影し,ERP 画像(360 度画像)を取得することができる.ERP 画像は少ない枚数で広範囲を画像化することができるため,その利点を活かして,Neural Radi... 360 度カメラを用いることで,カメラを中心とする全周を正距円筒図法(Equirectangular Projection: ERP)に基づいて画像平面に投影し,ERP 画像(360 度画像)を取得することができる.ERP 画像は少ない枚数で広範囲を画像化することができるため,その利点を活かして,Neural Radiance Fields (NeRF) や3D Gaussian Splatting (3DGS) を用いた新規視点合成手法(Novel View Synthesis: NVS) が提案されている.ERP 画像のNVS では,EPR に基づいた投影で生じる大きな歪みに対応する必要があるが,現在までに提案されている手法は,必ずしも十分に対応できていない.特に,3DGS では,ERP の歪みの影響で大きな3D ガウシアンが生成されるため,新規視点画像のレンダリングの精度が低下する.本論文では,3D ガウシアン正則化を用いたERP 画像のNVS であるErpGS を提案する.EprGS では,幾何学的な正則化,3D ガウシアンのスケールに対する正則化,歪みを考慮した重み付け,視点固有のマスキングを導入することで,ERP の歪みに対応する.公開データセットを用いた性能評価実験を通して,ErpGS が従来手法よりも高精度に新規視点画像をレンダリングできることを実証する.
Graphical Abstract
Image quality improvement in single plane-wave imaging using deep learning

Ultrasonics

In ultrasound image diagnosis, single plane-wave imaging (SPWI), which can acquire ultrasound images at more than 1000 fps, has been used to observe detailed ti... In ultrasound image diagnosis, single plane-wave imaging (SPWI), which can acquire ultrasound images at more than 1000 fps, has been used to observe detailed tissue and evaluate blood flow. SPWI achieves high temporal resolution by sacrificing the spatial resolution and contrast of ultrasound images. To improve spatial resolution and contrast in SPWI, coherent plane-wave compounding (CPWC) is used to obtain high-quality ultrasound images, i.e., compound images, by coherent addition of radio frequency (RF) signals acquired by transmitting plane waves in different directions. Although CPWC produces high-quality ultrasound images, their temporal resolution is lower than that of SPWI. To address this problem, some methods have been proposed to reconstruct a ultrasound image comparable to a compound image from RF signals obtained by transmitting a small number of plane waves in different directions. These methods do not fully consider the properties of RF signals, resulting in lower image quality compared to a compound image. In this paper, we propose methods to reconstruct high-quality ultrasound images in SPWI by considering the characteristics of RF signal of a single plane wave to obtain ultrasound images with image quality comparable to CPWC. The proposed methods employ encoder–decoder models of 1D U-Net, 2D U-Net, and their combination to generate the high-quality ultrasound images by minimizing the loss that considers the point spread effect of plane waves and frequency spectrum of RF signals in training. We also create a public large-scale SPWI/CPWC dataset for developing and evaluating deep-learning methods. Through a set of experiments using the public dataset and our dataset, we demonstrate that the proposed methods can reconstruct higher-quality ultrasound images from RF signals in SPWI than conventional method.
2024
Graphical Abstract
Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

This paper discusses the attack feasibility of Remote Adversarial Patch (RAP) targeting face detectors. The RAP that targets face detectors is similar to the RA... This paper discusses the attack feasibility of Remote Adversarial Patch (RAP) targeting face detectors. The RAP that targets face detectors is similar to the RAP that targets general object detectors, but the former has multiple issues in the attack process the latter does not. (1) It is possible to detect objects of various scales. In particular, the area of small objects that are convolved during feature extraction by CNN is small, so the area that affects the inference results is also small. (2) It is a two-class classification, so there is a large gap in characteristics between the classes. This makes it difficult to attack the inference results by directing them to a different class. In this paper, we propose a new patch placement method and loss function for each problem. The patches targeting the proposed face detector showed superior detection obstruct effects compared to the patches targeting the general object detector.
Graphical Abstract
Multibiometrics Using a Single Face Image

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

Multibiometrics, which uses multiple biometric traits to improve recognition performance instead of using only one biometric trait to authenticate individuals, ... Multibiometrics, which uses multiple biometric traits to improve recognition performance instead of using only one biometric trait to authenticate individuals, has been investigated. Previous studies have combined individually acquired biometric traits or have not fully considered the convenience of the system. Focusing on a single face image, we propose a novel multibiometric method that combines five biometric traits, i.e., face, iris, periocular, nose, eyebrow, that can be extracted from a single face image. The proposed method does not sacrifice the convenience of biometrics since only a single face image is used as input. Through a variety of experiments using the CASIA Iris Distance database, we demonstrate the effectiveness of the proposed multibiometrics method.
Graphical Abstract
High-resolution Microscopic Image Dataset of Freshwater Plankton in Japanese Lakes and Reservoirs (FREPJ): I. Zooplankton

Bulletin of the National Museum of Nature and Science. Series B, Botany

Plankton are important organisms that structure food webs in aquatic ecosystems and are also effective environmental indicators. However, the identification and... Plankton are important organisms that structure food webs in aquatic ecosystems and are also effective environmental indicators. However, the identification and enumeration of these organisms for environmental monitoring is challenging in terms of sustainability and accuracy. To overcome these difficulties, we collected plankton images that would be usable for developing an AI-based plankton monitoring system. As a series of plankton image collections, we first made an image dataset for zooplankton. This dataset contains a total of 88,653 images of 214 freshwater zooplankton taxa collected from 87 lakes and reservoirs located in different areas of the Japanese archipelago. To obtain these images, zooplankton samples collected at various locations were first scanned using an intelligent microscope, and high-resolution photographs containing multiple plankton individuals were taken. Then, each plankton individual was cropped and extracted from the photographs as a single image, classified and labeled with multiple taxonomic ranks (phylum, class, order, family, genus, and species), and stored in the dataset. The present dataset will be useful not only as an atlas of freshwater zooplankton in Japan, but also for the construction, training, and evaluation of an automatic plankton identification and enumeration system based on machine learning.
Graphical Abstract
LabellessFace: Fair metric learning for face recognition without attribute labels

IEEE International Joint Conference on Biometrics

Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on sp... Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on specific demographic groups or demographic classifier, making it difficult to address performance for unrecognised groups. This paper introduces “LabellessFace," a novel framework that improves demographic bias in face recognition without requiring demographic group labeling typically required for fairness considerations. We propose a novel fairness enhancement metric called the class favoritism level, which assesses the extent of favoritism towards specific classes across the dataset. Leveraging this metric, we introduce the fair class margin penalty, an extension of existing marginbased metric learning. This method dynamically adjusts learning parameters based on class favoritism levels, promoting fairness across all attributes. By treating each class as an individual in facial recognition systems, we facilitate learning that minimizes biases in authentication accuracy among individuals. Comprehensive experiments have demonstrated that our proposed method is effective for enhancing fairness while maintaining authentication accuracy.
Graphical Abstract
FSErasing: Improving face recognition with data augmentation using face parsing

IET Biometrics

We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used i... We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.
2025
Graphical Abstract
Sparse2DGS: Sparse-View Surface Reconstruction using 2D Gaussian Splatting with Dense Point Cloud

IEEE International Conference on Image Processing

Gaussian Splatting (GS) has gained attention as a fast and effective method for novel view synthesis. It has also been applied to 3D reconstruction using multi-... Gaussian Splatting (GS) has gained attention as a fast and effective method for novel view synthesis. It has also been applied to 3D reconstruction using multi-view images and can achieve fast and accurate 3D reconstruction. However, GS assumes that the input contains a large number of multi-view images, and therefore, the reconstruction accuracy significantly decreases when only a limited number of input images are available. One of the main reasons is the in- sufficient number of 3D points in the sparse point cloud obtained through Structure from Motion (SfM), which results in a poor initialization for optimizing the Gaussian primitives. We propose a new 3D reconstruction method, called Sparse2DGS, to enhance 2DGS in reconstructing objects using only three images. Sparse2DGS employs DUSt3R, a fundamental model for stereo images, along with COLMAP MVS to generate highly accurate and dense 3D point clouds, which are then used to initialize 2D Gaussians. Through experiments on the DTU dataset, we show that Sparse2DGS can ac- curately reconstruct the 3D shapes of objects using just three images.
Graphical Abstract
ERPGS: EQUIRECTANGULAR IMAGE RENDERING ENHANCED WITH 3D GAUSSIAN REGULARIZATION

IEEE International Conference on Image Processing

The use of multi-view images acquired by a 360-degree camera can reconstruct a 3D space with a wide area. There are 3D reconstruction methods from equirectangul... The use of multi-view images acquired by a 360-degree camera can reconstruct a 3D space with a wide area. There are 3D reconstruction methods from equirectangular images based on NeRF and 3DGS, as well as Novel View Synthesis (NVS) methods. On the other hand, it is necessary to overcome the large distortion caused by the projection model of a 360-degree camera when equirectangular images are used. In 3DGS-based methods, the large distortion of the 360-degree camera model generates extremely large 3D Gaussians, resulting in poor rendering accuracy. We propose ErpGS, which is Omnidirectional GS based on 3DGS to realize NVS addressing the problems. ErpGS introduce some rendering accuracy improvement techniques: geometric regularization, scale regularization, and distortion-aware weights and a masktosuppress the effects of obstacles in equirectangular images. Through experiments on public datasets, we demonstrate that ErpGS can render novel view images more accurately than conventional methods.
Graphical Abstract
ZERO-SHOT PSEUDO LABELS GENERATION USING SAM AND CLIP FOR SEMI-SUPERVISED SEMANTIC SEGMENTATION

IEEE International Conference on Image Processing

Semantic segmentation is a fundamental task in medical image analysis and autonomous driving and has a problem with the high cost of annotating the labels requi... Semantic segmentation is a fundamental task in medical image analysis and autonomous driving and has a problem with the high cost of annotating the labels required in training. To address this problem, semantic segmentation methods based on semi-supervised learning with a small number of labeled data have been proposed. For example, one approach is to train a semantic segmentation model using images with annotated labels and pseudo labels. In this approach, the accuracy of the semantic segmentation model depends on the quality of the pseudo labels, and the quality of the pseudo labels depends on the performance of the model to be trained and the amount of data with annotated labels. In this paper, we generate pseudo labels using zero-shot annotation with the Segment Anything Model (SAM) and Contrastive Language-Image Pretraining (CLIP), improve the accuracy of the pseudo labels using the Unified Dual-Stream Perturbations Approach (UniMatch), and use them as enhanced labels to train a semantic segmentation model. The effectiveness of the proposed method is demonstrated through the experiments using the public datasets: PASCAL and MS COCO.
Graphical Abstract
Stereo Radargrammetry Using Deep Learning from Airborne SAR Images

IEEE International Geoscience and Remote Sensing Symposium

In this paper, we propose a stereo radargrammetry method using deep learning from airborne Synthetic Aperture Radar (SAR) images. Deep learning-based methods ar... In this paper, we propose a stereo radargrammetry method using deep learning from airborne Synthetic Aperture Radar (SAR) images. Deep learning-based methods are considered to suffer less from geometric image modulation, while there is no public SAR image dataset used to train such methods. We create a SAR image dataset and perform fine-tuning of a deep learning-based image correspondence method. The proposed method suppresses the degradation of image quality by pixel interpolation without ground projection of the SAR image and divides the SAR image into patches for processing, which makes it possible to apply deep learning. Through a set of experiments, we demonstrate that the proposed method exhibits a wider range and more accurate elevation measurements compared to conventional methods.
Graphical Abstract
Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing

IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops

Face recognition systems are designed to be robust against changes in head pose, illumination, and blurring during image capture. If a malicious person presents... Face recognition systems are designed to be robust against changes in head pose, illumination, and blurring during image capture. If a malicious person presents a face photo of the registered user, they may bypass the authentication process illegally. Such spoofing attacks need to be detected before face recognition. In this paper, we propose a spoofing attack detection method based on Vision Transformer (ViT) to detect minute differences between live and spoofed face images. The proposed method utilizes the intermediate features of ViT, which have a good balance between local and global features that are important for spoofing attack detection, for calculating loss in training and score in inference. The proposed method also introduces two data augmentation methods: face anti-spoofing data augmentation and patch-wise data augmentation, to improve the accuracy of spoofing attack detection. We demonstrate the effectiveness of the proposed method through experiments using the OULU-NPU and SiW datasets. The project page is available at: https://gsisaoki.github.io/FAS-ViT-CVPRW/ .
Graphical Abstract
ErpGS:3D ガウシアン正則化を用いた360度画像の新規視点合成

第31回画像センシングシンポジウム

360 度カメラを用いることで,カメラを中心とする全周を正距円筒図法(Equirectangular Projection: ERP)に基づいて画像平面に投影し,ERP 画像(360 度画像)を取得することができる.ERP 画像は少ない枚数で広範囲を画像化することができるため,その利点を活かして,Neural Radi... 360 度カメラを用いることで,カメラを中心とする全周を正距円筒図法(Equirectangular Projection: ERP)に基づいて画像平面に投影し,ERP 画像(360 度画像)を取得することができる.ERP 画像は少ない枚数で広範囲を画像化することができるため,その利点を活かして,Neural Radiance Fields (NeRF) や3D Gaussian Splatting (3DGS) を用いた新規視点合成手法(Novel View Synthesis: NVS) が提案されている.ERP 画像のNVS では,EPR に基づいた投影で生じる大きな歪みに対応する必要があるが,現在までに提案されている手法は,必ずしも十分に対応できていない.特に,3DGS では,ERP の歪みの影響で大きな3D ガウシアンが生成されるため,新規視点画像のレンダリングの精度が低下する.本論文では,3D ガウシアン正則化を用いたERP 画像のNVS であるErpGS を提案する.EprGS では,幾何学的な正則化,3D ガウシアンのスケールに対する正則化,歪みを考慮した重み付け,視点固有のマスキングを導入することで,ERP の歪みに対応する.公開データセットを用いた性能評価実験を通して,ErpGS が従来手法よりも高精度に新規視点画像をレンダリングできることを実証する.
Graphical Abstract
Image quality improvement in single plane-wave imaging using deep learning

Ultrasonics

In ultrasound image diagnosis, single plane-wave imaging (SPWI), which can acquire ultrasound images at more than 1000 fps, has been used to observe detailed ti... In ultrasound image diagnosis, single plane-wave imaging (SPWI), which can acquire ultrasound images at more than 1000 fps, has been used to observe detailed tissue and evaluate blood flow. SPWI achieves high temporal resolution by sacrificing the spatial resolution and contrast of ultrasound images. To improve spatial resolution and contrast in SPWI, coherent plane-wave compounding (CPWC) is used to obtain high-quality ultrasound images, i.e., compound images, by coherent addition of radio frequency (RF) signals acquired by transmitting plane waves in different directions. Although CPWC produces high-quality ultrasound images, their temporal resolution is lower than that of SPWI. To address this problem, some methods have been proposed to reconstruct a ultrasound image comparable to a compound image from RF signals obtained by transmitting a small number of plane waves in different directions. These methods do not fully consider the properties of RF signals, resulting in lower image quality compared to a compound image. In this paper, we propose methods to reconstruct high-quality ultrasound images in SPWI by considering the characteristics of RF signal of a single plane wave to obtain ultrasound images with image quality comparable to CPWC. The proposed methods employ encoder–decoder models of 1D U-Net, 2D U-Net, and their combination to generate the high-quality ultrasound images by minimizing the loss that considers the point spread effect of plane waves and frequency spectrum of RF signals in training. We also create a public large-scale SPWI/CPWC dataset for developing and evaluating deep-learning methods. Through a set of experiments using the public dataset and our dataset, we demonstrate that the proposed methods can reconstruct higher-quality ultrasound images from RF signals in SPWI than conventional method.
2024
Graphical Abstract
Enhancing Remote Adversarial Patch Attacks on Face Detectors with Tiling and Scaling

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

This paper discusses the attack feasibility of Remote Adversarial Patch (RAP) targeting face detectors. The RAP that targets face detectors is similar to the RA... This paper discusses the attack feasibility of Remote Adversarial Patch (RAP) targeting face detectors. The RAP that targets face detectors is similar to the RAP that targets general object detectors, but the former has multiple issues in the attack process the latter does not. (1) It is possible to detect objects of various scales. In particular, the area of small objects that are convolved during feature extraction by CNN is small, so the area that affects the inference results is also small. (2) It is a two-class classification, so there is a large gap in characteristics between the classes. This makes it difficult to attack the inference results by directing them to a different class. In this paper, we propose a new patch placement method and loss function for each problem. The patches targeting the proposed face detector showed superior detection obstruct effects compared to the patches targeting the general object detector.
Graphical Abstract
Multibiometrics Using a Single Face Image

Asia-Pacific Signal and Information Processing Association Annual Summit and Conference

Multibiometrics, which uses multiple biometric traits to improve recognition performance instead of using only one biometric trait to authenticate individuals, ... Multibiometrics, which uses multiple biometric traits to improve recognition performance instead of using only one biometric trait to authenticate individuals, has been investigated. Previous studies have combined individually acquired biometric traits or have not fully considered the convenience of the system. Focusing on a single face image, we propose a novel multibiometric method that combines five biometric traits, i.e., face, iris, periocular, nose, eyebrow, that can be extracted from a single face image. The proposed method does not sacrifice the convenience of biometrics since only a single face image is used as input. Through a variety of experiments using the CASIA Iris Distance database, we demonstrate the effectiveness of the proposed multibiometrics method.
Graphical Abstract
High-resolution Microscopic Image Dataset of Freshwater Plankton in Japanese Lakes and Reservoirs (FREPJ): I. Zooplankton

Bulletin of the National Museum of Nature and Science. Series B, Botany

Plankton are important organisms that structure food webs in aquatic ecosystems and are also effective environmental indicators. However, the identification and... Plankton are important organisms that structure food webs in aquatic ecosystems and are also effective environmental indicators. However, the identification and enumeration of these organisms for environmental monitoring is challenging in terms of sustainability and accuracy. To overcome these difficulties, we collected plankton images that would be usable for developing an AI-based plankton monitoring system. As a series of plankton image collections, we first made an image dataset for zooplankton. This dataset contains a total of 88,653 images of 214 freshwater zooplankton taxa collected from 87 lakes and reservoirs located in different areas of the Japanese archipelago. To obtain these images, zooplankton samples collected at various locations were first scanned using an intelligent microscope, and high-resolution photographs containing multiple plankton individuals were taken. Then, each plankton individual was cropped and extracted from the photographs as a single image, classified and labeled with multiple taxonomic ranks (phylum, class, order, family, genus, and species), and stored in the dataset. The present dataset will be useful not only as an atlas of freshwater zooplankton in Japan, but also for the construction, training, and evaluation of an automatic plankton identification and enumeration system based on machine learning.
Graphical Abstract
LabellessFace: Fair metric learning for face recognition without attribute labels

IEEE International Joint Conference on Biometrics

Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on sp... Demographic bias is one of the major challenges for face recognition systems. The majority of existing studies on demographic biases are heavily dependent on specific demographic groups or demographic classifier, making it difficult to address performance for unrecognised groups. This paper introduces “LabellessFace," a novel framework that improves demographic bias in face recognition without requiring demographic group labeling typically required for fairness considerations. We propose a novel fairness enhancement metric called the class favoritism level, which assesses the extent of favoritism towards specific classes across the dataset. Leveraging this metric, we introduce the fair class margin penalty, an extension of existing marginbased metric learning. This method dynamically adjusts learning parameters based on class favoritism levels, promoting fairness across all attributes. By treating each class as an individual in facial recognition systems, we facilitate learning that minimizes biases in authentication accuracy among individuals. Comprehensive experiments have demonstrated that our proposed method is effective for enhancing fairness while maintaining authentication accuracy.
Graphical Abstract
FSErasing: Improving face recognition with data augmentation using face parsing

IET Biometrics

We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used i... We propose original semantic labels for detailed face parsing to improve the accuracy of face recognition by focusing on parts in a face. The part labels used in conventional face parsing are defined based on biological features, and thus, one label is given to a large region, such as skin. Our semantic labels are defined by separating parts with large areas based on the structure of the face and considering the left and right sides for all parts to consider head pose changes, occlusion, and other factors. By utilizing the capability of assigning detailed part labels to face images, we propose a novel data augmentation method based on detailed face parsing called Face Semantic Erasing (FSErasing) to improve the performance of face recognition. FSErasing is to randomly mask a part of the face image based on the detailed part labels, and therefore, we can apply erasing-type data augmentation to the face image that considers the characteristics of the face. Through experiments using public face image datasets, we demonstrate that FSErasing is effective for improving the performance of face recognition and face attribute estimation. In face recognition, adding FSErasing in training ResNet-34 with Softmax using CelebA improves the average accuracy by 0.354 points and the average equal error rate (EER) by 0.312 points, and with ArcFace, the average accuracy and EER improve by 0.752 and 0.802 points, respectively. ResNet-50 with Softmax using CASIA-WebFace improves the average accuracy by 0.442 points and the average EER by 0.452 points, and with ArcFace, the average accuracy and EER improve by 0.228 points and 0.500 points, respectively. In face attribute estimation, adding FSErasing as a data augmentation method in training with CelebA improves the estimation accuracy by 0.54 points. We also apply our detailed face parsing model to visualize face recognition models and demonstrate its higher explainability than general visualization methods.