计算机科学 ›› 2024, Vol. 51 ›› Issue (1): 13-25.doi: 10.11896/jsjkx.yg20240103

• 创刊五十周年特别专题 • 上一篇    下一篇

跨模态目标重识别研究综述

崔振宇, 周嘉欢, 彭宇新   

  1. 北京大学王选计算机研究所 北京100871
    多媒体信息处理国家重点实验室 北京100871
  • 收稿日期:2023-10-12 修回日期:2023-12-01 出版日期:2024-01-15 发布日期:2024-01-12
  • 通讯作者: 彭宇新(pengyuxin@pku.edu.cn)
  • 作者简介:(cuizhenyu@stu.pku.edu.cn)
  • 基金资助:
    国家自然科学基金(61925201,62132001)

Survey on Cross-modality Object Re-identification Research

CUI Zhenyu, ZHOU Jiahuan, PENG Yuxin   

  1. Wangxuan Institute of Computer Technology,Peking University,Beijing 100871,China
    National Key Laboratory for Multimedia Information Processing,Peking University,Beijing 100871,China
  • Received:2023-10-12 Revised:2023-12-01 Online:2024-01-15 Published:2024-01-12
  • About author:CUI Zhenyu,born in 1995,postgra-duate.His main research interests include computer vision and deep lear-ning.
    PENG Yuxin,born in 1974,Ph.D,professor.His main research interests include cross-media analysis and reaso-ning,image and video recognition and understanding,and computer vision.
  • Supported by:
    National Natural Science Foundation of China(61925201,62132001).

摘要: 目标重识别(ReID)技术旨在匹配不同区域摄像头在不同时间拍摄到的同一目标,其核心是通过目标间的细粒度差异实现不同目标的有效区分。因此,目标重识别技术被广泛应用于安防布控、刑侦监控等领域并发挥了重要作用。传统的目标重识别技术通常适用于光照条件良好情况下的可见光模态数据,但在处理黑夜低光照条件下的目标重识别任务时,其性能通常受到严重限制。红外摄像机因其卓越的夜视性能,通常被应用于在低光照条件下采集目标红外图像。因此,跨模态目标重识别技术旨在通过可见光图像匹配红外图像,实现全天候不间断的目标重识别。近年来,跨模态目标重识别技术取得了很大进展,然而,对于现有模型的归纳总结及深入分析仍然欠缺。为此,对跨模态目标重识别领域的相关研究和新颖方法进行了深入调研和总结,讨论了现有方法在实际场景中面临的挑战,并从模型分类和模型评价两个方面对现有方法进行归纳与分析。首先,围绕跨模态目标重识别问题的研究难点,将跨模态目标重识别分为生成式方法和非生成式方法两大类;然后,对当前跨模态重识别领域中广泛使用的评测数据集以及相关评价指标进行了综述与总结;最后,讨论了跨模态重识别领域仍然存在的挑战并对未来发展趋势进行了展望。

关键词: 计算机视觉, 目标重识别, 跨模态, 细粒度特征, 表征学习

Abstract: Object re-identification(ReID) technology aims to match the same object captured by cameras across different areas at different time.The key is to distinguish different objects through fine-grained differences between different individuals,which is widely used in security control,criminal investigation and monitoring,etc.Traditional ReID technology is usually suitable for visible cameras with good lighting conditions,but its performance is severely limited under low-light conditions.The infrared camera is often used to collect infrared images of objects under low light conditions due to its outstanding night vision performance.Therefore,cross-modality object re-identification technology focuses on achieving uninterrupted object ReID across day and night from visible images to infrared images(VI-ReID),and vice versa.In recent years,VI-ReID technology has made significant progress.However,a comprehensive summary and in-depth analysis of existing models are still lacking.To this end,this paper conducts an in-depth investigation and summary of relevant research and novel methods in the field of VI-ReID.It discusses the challenges faced by existing methods in actual scenarios,and categorizes them from two aspects:model classification and model evaluation.First,focusing on the research challenges,VI-ReID is categorized into generative methods and non-generative methods.Se-condly,the evaluation datasets and evaluation metrics are reviewed and summarized.Finally,the remaining challenges in VI-ReID are discussed and the future development trends are prospected.

Key words: Computer vision, Object re-identification, Cross-modality, Fine-grained feature, Representation learning

中图分类号: 

  • TP391
[1]YE Y,WANG Z,LIANG C,et al.A survey on multi-source person re-identification[J].Acta Automatica Sinica,2020,46(9):1869-1884.
[2]YANG F,XU Y,YIN M,et al.Review on deep learning-based pedestrian re-identification[J].Journal of Computer Applications,2020,40(5):1243.
[3]QI L,YU P,GAO Y.Research on weak-supervised person re-identification[J].Journal of Software,2020,31(9):2883-2902.
[4]RISTANI E,SOLERA F,ZOU R,et al.Performance measures and a data set for multi-target,multi-camera tracking[C]//European Conference on Computer Vision.2016:17-35.
[5]SONG W,ZHAO Q,CHEN C,et al.Survey on pedestrian re-identification research[J].CAAI Transaction on Intelligent Systems,2017,12(6):770-780.
[6]SUN H,HE X,PENG Y.HCL:Hierarchical Consistency Lear-ning for Webly Supervised Fine-Grained Recognition[J].IEEE Transactions on Multimedia,2023:1-13.DOI:10.1109/TMM.202 3.3330076.
[7]SUN H,HE X,ZHOU J,et al.Fine-Grained Visual PromptLearning of Vision-Language Models for Image Recognition[C]//Proceedings of the 31st ACM International Conference on Multimedia.2023:5828-5836.
[8]CUI Z Y,ZHOU J H,PENG Y X,et al.DCR-ReID:Deep Component Reconstruction for Cloth-Changing Person Re-Identification[J].IEEE Transactions on Circuits and Systems for Video Technology,2023,33(8):4415-4428.
[9]LENG J,WANG H,GAO X,et al.Where to look:Multi-granularity occlusion aware for video person re-identification[J].Neurocomputing,2023,536:137-151.
[10]ZHONG Z,ZHENG L,CAO D,et al.Re-ranking person re-identification with k-reciprocal encoding[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:1318-1327.
[11]SUN Y,ZHENG L,YANG Y,et al.Beyond part models:Person retrieval with refined part pooling(and a strong convolutional baseline)[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:480-496.
[12]SUH Y,WANG J,TANG S,et al.Part-aligned bilinear representations for person re-identification[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:402-419.
[13]LI D,CHEN X,ZHANG Z,et al.Learning deep context-aware features over body and latent parts for person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017:384-393.
[14]CHEN Z Q,CUI Z C,ZHANG C,et al.Dual Clustering Co-teaching with Consistent Sample Mining for Unsupervised Person Re-Identification[J].arXiv:2210.03339,2023.
[15]ZHOU J,SU B,WU Y.Online joint multi-metric adaptationfrom frequent sharing-subset mining for person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:2909-2918.
[16]ZHOU J,SU B,WU Y.Easy identification from better con-straints:Multi-shot person re-identification from reference constraints[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5373-5381.
[17]ZHOU J,YU P,TANG W,et al.Efficient online local metricadaptation via negative samples for person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2420-2428.
[18]YOU J,WU A,LI X,et al.Top-push video-based person re-identification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:1345-1353.
[19]ZHENG L,BIE Z,SUN Y,et al.Mars:A video benchmark for large-scale person re-identification[C]//Computer Vision-ECCV 2016:14th European Conference,Amsterdam,The Netherlands,October 11-14,2016,Proceedings,Part VI 14.Springer International Publishing,2016:868-884.
[20]YU S,LI S,CHEN D,et al.Cocas:A large-scale clothes changing person dataset for re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3400-3409.
[21]ZHENG Z,ZHENG L,YANG Y.Pedestrian alignment network for large-scale person re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology,2018,29(10):3037-3045.
[22]WAXMAN A M,AGUILAR M,FAY D A,et al.Solid-state co-lor night vision:fusion of low-light visible and thermal infrared imagery[J].Lincoln Laboratory Journal,1998,11(1):41-60.
[23]AGUILAR M,FAY D A,ROSS W D,et al.Real-time fusion of low-light CCD and uncooled IR imagery for color night vision[C]//Enhanced and Synthetic Vision 1998.SPIE,1998:124-135.
[24]YE M,SHEN J,LIN G,et al.Deep learning for person re-identification:A survey and outlook[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(6):2872-2893.
[25]HUANG N,LIU J,MIAO Y,et al.Deep learning for visible-in-frared cross-modality person re-identification:A comprehensive review[J].Information Fusion,2023,91:396-411.
[26]WU A,ZHENG W S,YU H X,et al.RGB-infrared cross-moda-lity person re-identification[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5380-5389.
[27]WANG G,ZHANG T,CHENG J,et al.RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2019:3623-3632.
[28]WANG Z,WANG Z,ZHENG Y,et al.Learning to reduce dual-level discrepancy for infrared-visible person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:618-626.
[29]ZHANG Z,JIANG S,HUANG C,et al.RGB-IR cross-modality person ReID based on teacher-student GAN model[J].Pattern Recognition Letters,2021,150:155-161.
[30]CHOI S,LEE S,KIM Y,et al.Hi-CMD:Hierarchical cross-modality disentanglement for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:10257-10266.
[31]WANG G A,ZHANG T,YANG Y,et al.Cross-modality paired-images generation for RGB-infrared person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:12144-12151.
[32]LIU H J,XIA D X,JIANG W.Towards homogeneous modality learning and multi-granularity information exploration for visible-infrared person re-identification[J].IEEE Journal of Selec-ted Topics in Signal Processing,2023,17(3):545-559.
[33]QI J,LIANG T F,LIU W,et al.A Generative-based Image Fusion Strategy for Visible-infrared Person Re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology,Early Access.
[34]LIU J,WANG J,HUANG N,et al.Revisiting modality-specific feature compensation for visible-infrared person re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(10):7226-7240.
[35]WEI Z,YANG X,WANG N,et al.RBDF:Reciprocal bidirec-tional framework for visible infrared person re-identification[J].IEEE Transactions on Cybernetics,2022,52(10):10988-10998.
[36]XU X,LIU S,ZHANG N,et al.Channel exchange and adversa-rial learning guided cross-modal person re-identification[J].Knowledge-Based Systems,2022,257:109883.
[37]LI D,WEI X,HONG X,et al.Infrared-visible cross-modal person re-identification with an x modality[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:4610-4617.
[38]YE M,RUAN W,DU B,et al.Channel augmented joint learningfor visible-infrared recognition[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:13567-13576.
[39]YE M,SHEN J,SHAO L.Visible-infrared person re-identification via homogeneous augmented tri-modal learning[J].IEEE Transactions on Information Forensics and Security,2020,16:728-739.
[40]LU H,ZOU X,ZHANG P.Learning progressive modality-shared transformers for effective visible-infrared person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:1835-1843.
[41]BASARAN E,GÖKMEN M,KAMASAK M E.An efficientframework for visible-infrared cross modality person re-identification[J].Signal Processing:Image Communication,2020,87:115933.
[42]HUANG Z,LIU J,LI L,et al.Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:1034-1042.
[43]KIM M,KIM S,PARK J,et al.PartMix:Regularization Strategy to Learn Part Discovery for Visible Infrared Person Re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:18621-18632.
[44]LU Z,LIN R,HU H.Tri-level Modality-information Disentanglement for Visible-Infrared Person Re-Identification[J].IEEE Transactions on Multimedia,Early Access.
[45]YANG B,YE M,CHEN J,et al.Augmented dual-contrastiveaggregation learning for unsupervised visible-infrared person re-identification[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:2843-2851.
[46]YE M,SHEN J,LIN G,et al.Deep learning for person re-identification:A survey and outlook[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(6):2872-2893.
[47]YE M,SHEN J,CRANDALL D,et al.Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]//Computer Vision-ECCV 2020:16th European Confe-rence.Glasgow,UK,Part XVII 16.2020:229-247.
[48]LIANG T,JIN Y,LIU W,et al.Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification[J].IEEE Transactions on Multimedia,Early Access.
[49]CHAI Z,LING Y,LUO Z,et al.Dual-stream Transformer with Distribution Alignment for Visible-Infrared Person Re-Identification[J].IEEE Transactions on Circuits and Systems for Video Technology,Early Access.
[50]CHEN Y,WAN L,LI Z,et al.Neural feature search for rgb-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:587-597.
[51]LIN X,LI J,MA Z,et al.Learning modal-invariant and temporal-memory for video-based visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:20973-20982.
[52]TIAN X,ZHANG Z,LIN S,et al.Farewell to mutual information:Variational distillation for cross-modal person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:1522-1531.
[53]CHENG D,WANG X,WANG N,et al.Cross-modality person re-identification with memory-based contrastive embedding[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:425-432.
[54]ZHANG D,ZHANG Z,JU Y,et al.Dual mutual learning forcross-modality person re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(8):5361-5373.
[55]FENG Y,YU J,CHEN F,et al.Visible-Infrared Person Re-Identification via Cross-Modality Interaction Transformer[J].IEEE Transactions on Multimedia,Early Access.
[56]LI H,LIU M,HU Z,et al.Intermediary-guided BidirectionalSpatial-Temporal Aggregation Network for Video-based Visible-Infrared Person Re-Identification[J].IEEE Transactions on Circuits and Systems for Video Technology,Early Access.
[57]WU Q,DAI P,CHEN J,et al.Discover cross-modality nuances for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:4330-4339.
[58]LI X,LU Y,LIU B,et al.Counterfactual Intervention Feature Transfer for Visible-Infrared Person Re-identification[C]//European Conference on Computer Vision.2022:381-398.
[59]ZHENG A,PAN P,LI H,et al.Progressive attribute embedding for accurate cross-modality person re-id[C]//Proceedings of the 30th ACM International Conference on Multimedia.2022:4309-4317.
[60]FENG J,WU A,ZHENG W S.Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2023:22752-22761.
[61]ALEHDAGHI M,JOSI A,CRUZ R M,et al.Visible-infrared person re-identification using privileged intermediate information[C]//European Conference on Computer Vision.2022:720-737.
[62]WU J,LIU H,SHI W,et al.Style-Agnostic RepresentationLearning for Visible-Infrared Person Re-identification[J].IEEE Transactions on Multimedia,Early Access.
[63]LI H,LI C,ZHU X,et al.Multi-spectral vehicle re-identification:A challenge[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:11345-11353.
[64]YE M,LAN X,LENG Q.Modality-aware collaborative learning for visible thermal person re-identification[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:347-355.
[65]ZHANG S,YANG Y,WANG P,et al.Attend to the difference:Cross-modality person re-identification via contrastive correlation[J].IEEE Transactions on Image Processing,2021,30:8861-8872.
[66]HU W,LIU B,ZENG H,et al.Adversarial decoupling and modality-invariant representation learning for visible-infrared person re-identification[J].IEEE Transactions on Circuits and Systems for Video Technology,2022,32(8):5095-5109.
[67]ZHANG Y,KANG Y,ZHAO S,et al.Dual-Semantic Consistency Learning for Visible-Infrared Person Re-Identification[J].IEEE Transactions on Information Forensics and Security,2022,18:1554-1565.
[68]WU J,LIU H,SU Y,et al.Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification[C]//Proceedings of the IEEE/CVF International Confe-rence on Computer Vision.2023:11122-11131.
[69]WU Q,XIA J,DAI P,et al.CycleTrans:Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification[J].arXiv:2208.09844,2022.
[70]PU N,CHEN W,LIU Y,et al.Dual gaussian-based variational subspace disentanglement for visible-infrared person re-identification[C]//Proceedings of the 28th ACM International Confe-rence on Multimedia.2020:2149-2158.
[71]JIANG K,ZHANG T,LIU X,et al.Cross-modality transformer for visible-infrared person re-identification[C]//European Conference on Computer Vision.2022:480-496.
[72]LI Y,ZHANG T,LIU X,et al.Visible-Infrared Person Re-Identification With Modality-Specific Memory Network[J].IEEE Transactions on Image Processing,2022,31:7165-7178.
[73]LU Y,WU Y,LIU B,et al.Cross-modality person re-identification with shared-specific feature transfer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:13379-13389.
[74]ZHANG Q,LAI C,LIU J,et al.Fmcnet:Feature-level modality compensation for visible-infrared person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:7349-7358.
[75]ZHAO Y B,LIN J W,XUAN Q,et al.Hpiln:a feature learning framework for cross-modality person re-identification[J].IET Image Processing,2019,13(14):2897-2904.
[76]YE M,WANG Z,LAN X,et al.Visible thermal person re-identification via dual-constrained top-ranking[C]//IJCAI.2018.
[77]JIA M,ZHAI Y,LU S,et al.A similarity inference metric for RGB-infrared cross-modality person re-identification[J].arXiv:2007.01504,2020.
[78]KAMENOU E,Del RINCON J M,MILLER P,et al.Closing the domain gap for cross-modal visible-infrared vehicle re-identification[C]//2022 26th International Conference on Pattern Recognition(ICPR).2022:2728-2734.
[79]ZHANG Y,ZHAO S,KANG Y,et al.Modality synergy complement learning with cascaded aggregation for visible-infrared person re-identification[C]//European Conference on Computer Vision.2022:462-479.
[80]YE M,LAN X,LI J,et al.Hierarchical discriminative learning for visible thermal person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[81]YU H,CHENG X,PENG W,et al.Modality Unifying Network for Visible-Infrared Person Re-Identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2023:11185-11195.
[82]LIU H,TAN X,ZHOU X.Parameter sharing exploration and hetero-center triplet loss for visiblethermal person re-identification[J].IEEE Transactions on Multimedia,2020,23:4414-4425.
[83]DAI P,JI R,WANG H,et al.Cross-modality person re-identification with generative adversarial training[C]//IJCAI.2018.
[84]LIU J,SUN Y,ZHU F,et al.Learning memory-augmented unidirectional metrics for cross-modality person re-identification[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:19366-19375.
[85]ZHU Y,YANG Z,WANG L,et al.Hetero-center loss for cross-modality person re-identification[J].Neurocomputing,2020,386:97-109.
[86]HAO Y,WANG N,GAO X,et al.Dual-alignment feature embedding for cross-modality person re-identification[C]//Proceedings of the 27th ACM International Conference on Multimedia.2019:57-65.
[87]HAO Y,WANG N,LI J,et al.HSME:Hypersphere manifoldembedding for visible thermal person re-identification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:8385-8392.
[88]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial networks[J].Communications of the ACM,2020,63(11):139-144.
[89]ZHU J Y,PARK T,ISOLA P,et al.Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:2223-2232.
[90]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.Animage is worth 16x16 words:Transformers for image recognition at scale[J].arXiv:2010.11929,2020.
[91]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[J].Advances in Neural Information Processing Systems,2017,60(6):84-90.
[92]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[93]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[94]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[95]MOON H,PHILLIPS P J.Computational and performance aspects of PCA-based face-recognition algorithms[J].Perception,2001,30(3):303-321.
[96]ZHENG L,SHEN L,TIAN L,et al.Scalable person re-identification:A benchmark[C]//Proceedings of the IEEE Interna-tional Conference on Computer Vision.2015:1116-1124.
[97]NGUYEN D T,HONG H G,KIM K W,et al.Person recognition system based on a combination of body images from visible light and thermal cameras[J].Sensors,2017,17(3):605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!