How does "sweep your nose and find a dog" do it? The answer is in this CVPR paper

Heart of machine 2021-10-14 02:35:35

Remember Alipay 「 Sweep your nose , Looking for a dog 」 What's new in ? lately , The researchers published the paper .

There are no two identical leaves in the world , There are no two identical dogs / A cat's nose .

Some time ago , Almost Human A new function of Alipay online is reported : Using nasal pattern recognition to help pet families find lost pets . The operation of this function is very simple . First , Open Alipay search 「 Prevent losing 」, Then enter the pet's nose print information , You can get a unique electronic card for your pet 「 Id card 」. Once the pet is lost , You can report the loss with one click , If passers-by sees a lost pet , Alipay nose scan can be used for recognition. , Contact you via virtual number , Send pets home .

This seemingly simple function is actually inseparable from the careful study of researchers , We have to overcome many difficulties , For example, pets have small noses 、 The lines are not clear ; Pets are active , Photos are not easy to capture, etc .

In a recent article CVPR 2021 In the paper , The researchers published the technology behind this function . Besides recognizing cats 、 Dog nose lines , This research related to fine-grained retrieval task can also solve many other problems . Interested students can read the original .

Thesis link :

Fine grained Retrieval

Fine grained retrieval task means that the data set comes from a specific category , Like dogs 、 cat 、 people 、 vehicle 、 Birds, etc , You need to match individual identities under specific categories , such as Person A,Person B. Classification is usually used + Measure learning Loss co supervised e-learning , A robust feature extractor is expected , Make the similarity between the extracted features in the same identity as large as possible , The similarity between different identities should be as small as possible , So that it can be connected with query Images with the same identity can be retrieved .

The difficulty of this task is : The same identity due to the shooting angle 、 light 、 There are great differences due to different time . Because the identities of fine-grained tasks belong to the same category , There are great similarities between different identities , Than a poodle of the same breed , Have a similar body shape 、 appearance 、 Hair color , There are only some subtle areas with discriminant information , Such as nasal texture 、 Eye shape 、 Subtle patterns, shapes, etc , Therefore, the learned features need to be able to capture some subtle and discriminative differences , So as to distinguish the identity of the picture sample .

The existing methods usually supervise and optimize all the elements of the feature as a whole , Including better designed Loss function [1,2], structure Attention mechanism Make the network focus on some important areas [3], Erase pictures randomly during training | Feature elements enhance certain generalization [4,5]. This is not optimal for fine-grained tasks , When some elements in the feature are already distinguishable , Training meeting convergence , Thus, it ignores continuing to learn some discriminative details .

Method realization

The purpose of this article is to learn a feature , Make each element of the feature distinctive , In order to extract as much information as possible , Improve the differentiation of the whole feature , Thus, the identity between fine-grained samples can be distinguished . In order to learn that each element has distinguishing characteristics , The researchers proposed a discriminant method perception Mechanism ——DAM. By iteratively erasing the discriminative elements 、 Keep the less discriminative elements and continue to learn , Keep making feature space difficult , Cyclic optimization makes the final feature more robust .

In order to identify the elements in the feature that need to continue learning , First, we need to calculate the discriminant of each element . For samples of different identities , If the difference of characteristic elements is large , It shows that the characteristic element has discrimination ; conversely , This feature element needs to continue to learn . therefore , The discrimination is determined by the difference of each characteristic element between identities . Network classifier Parameters Have the ability of feature classification , By using cross-entropy Calculate the number of features projected onto each classifier Parameters To optimize the similarity between , Therefore, the of network classifier Parameters Can be used as a proxy for identity , Different identities in the classifier Parameters The differences among the elements of identity can reflect the differences between the elements of different identity characteristics .

w_i Is a vector equal to the feature dimension , Indicate identity i The corresponding classifier Parameters ,W_{i,j} Is a vector equal to the feature dimension , Indicate identity i And identity j The difference of each characteristic element between .

After obtaining the difference of each characteristic element between different identity samples , In the training process, the feature elements need to be erased or retained according to the difference . Retain the characteristic elements with high discrimination , Elements with small discrimination are erased .

For different samples of the same identity , Characteristic elements that are distinguishable from each other , For elements that are distinguishable from all other categories , Therefore, the classifier with this identity Parameters And all other identity classifiers Parameters The average of the differences replaces .

The whole process of generating new features is shown in Figure :

Finally, the classification loss and measurement loss of the features in the new feature space are supervised , Continuous feature optimization , So as to obtain better characteristics .

experimental result

Discriminative perception The effect of the mechanism is verified on multiple fine-grained retrieval data sets , Data sets including birds CUB-200-2011, Car dataset Cars199, Pedestrian dataset Market-1501、MSMT17.

New methods and state-of-the-art The method also has certain advantages .

Compared with random erasure method :


DAM It can make more feature elements discriminative , This advantage has been verified on multiple tasks , Including public data sets and pet scenes 1:1 Identity comparison 、1:N Lost search 、 Variety identification . In the process of implementation , This method directly uses the average value to choose the elements of continuous learning , More dynamic selection methods deserve further exploration .

[1] Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott. Multi-similarity loss with general pair weighting for deep metric learning. In CVPR, 2019.
[2] Yifan Sun, Changmao Cheng, Yuhan Zhang, Chi Zhang, Liang Zheng, Zhongdao Wang, and Yichen Wei. Circle loss: A unified perspective of pair similarity optimization. In CVPR, 2020.
[3] Ruibing Hou, Bingpeng Ma, Hong Chang, Xinqian Gu, Shiguang Shan, and Xilin Chen. Interaction-and-aggregation network for person re-identification. In CVPR, 2019.
[4] Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. Random erasing data augmentation. In AAAI, 2020.
[5] Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A regularization method for convolutional networks. In NeurIPS, 2018.

本文为[Heart of machine]所创,转载请带上原文链接,感谢