FaceNet: A Unified Embedding for Face Recognition and Clustering

Give the formula for the Triplet Loss.


Ltriplet(x,x+,x)=xXmax(0,f(x)f(x+)22f(x)f(x)22+α)\mathcal{L}_\text{triplet}(\mathbf{x}, \mathbf{x}^+, \mathbf{x}^-) = \sum_{\mathbf{x} \in \mathcal{X}} \max\big( 0, \|f(\mathbf{x}) - f(\mathbf{x}^+)\|^2_2 - \|f(\mathbf{x}) - f(\mathbf{x}^-)\|^2_2 + \alpha \big) where x\mathbf{x} is the an anchor input (e.g. an image of a specific person), x+\mathbf{x}^+ is a positive sample (meaning that it belongs to the same class as x\mathbf{x}) and x\mathbf{x}^- is a negative sample (i.e. from a different class). α\alpha is a margin that is enforced between positive and negative pairs.

triplet-loss.png Illustration: The Triplet Loss minimizes the distance between an anchor and a positive, both of which have the same identity, and maximizes the distance between the anchor and a negative of a different identity.

What is an important limitation of the triplet loss function?


It requires large batches in the order of a few thousand examples. In order to have meaningful positive and negative paris, a minimal samples of each class need to be present in each batch.

In triplet loss, how are the anchor x\mathbf{x}, positive x+\mathbf{x}^+ and negative x\mathbf{x}^- triplets selected?


Using large batches, all anchor-positive pairs are selected. Negatives are selected such that f(x)f(x+)22<f(x)f(x)22\|f(\mathbf{x}) - f(\mathbf{x}^+)\|^2_2 < \|f(\mathbf{x}) - f(\mathbf{x}^-)\|^2_2 . These negatives are called semi-hard.

Machine Learning Research Flashcards is a collection of flashcards associated with scientific research papers in the field of machine learning. Best used with Anki or Obsidian. Edit MLRF on GitHub.