Relative Attribute Rank

@parikhRelativeAttributes2011

Relative ranks are an approach to move from a binary classification to a continuous ranking of attributes, which can be used in recognition task (i.e. someone is smiling 70%) or in generational tasks (e.g. generate speech with 20% emotional valence and 70% emotional intensity).

The authors describe how earlier work either used binary classifications or utilize relative similarities between instances to facilitate learning (e.g. Zebra has similar texture to cross-walks, so model can learn shared representation). Ranking relative attributes allows to learn any relative similarities between instances, without explicitly stating them.

Approach¶

The authors employ two training sets \(O\) and \(S\) for a given attribute \(a_m\), each containing pairs of samples. \(S\) contains samples with similar intensity in \(a_m\) , \(O\) samples where the second instance has a stronger presence of \(a\) than the first instance. The goal is to find a ranking matrix \(w\), which, when multiplied with a sample \(x\), …

gives a similar value to sample pairs in \(S\).
greater values to the second instance in a pair in \(O\) compared to the first instance.

The ranking matrix \(w\) is found using an adapted [[Support Vector Machine]] approach:

A SVM finds a decision boundary for a given training set, that not only separates the samples, but does so optimally by finding the boundary that has the maximal margin to each group. In contrast, the ranking matrix is trained by finding a vector that maximizes the margin between samples in the direction of their ordering. The authors note, that the vector orthogonal to the decision boundary of a SVM \(w_b\) is not necessarily the best ranking vector \(w_m\).

Pasted image 20240527093348.png

Zero-shot Learning¶

The authors apply their ranking matrix to a Zero-shot Learning problem. They do so by dividing their dataset in two: The set \(S\) of categories seen during training (has nothing to do with the similarity dataset before) and the set \(U\) for categories unseen during training. The model should now predict the relative strength of an attribute for samples in \(U\), not having seen any of them. Because instances of both the seen and unseen category have been ranked by human supervision before, each the category of instances in the unseen set \(U\) can be set in relation to categories of set \(S\) for a given attribute \(a_m\) in one of the following ways:

\(c_{i}^{(s)} > c_{j}^{(u)} > c_{k}^{(s)}\): The unseen category ranks in between two seen category.
\(c_{i}^{(s)} > c_{j}^{(u)}\): The unseen category ranks lower than a seen category.
\(c_{j}^{(u)} > c_{k}^{(s)}\): The unseen category ranks higher than a seen category.
The unseen category has not been ranked in regards to \(a_m\).

For case (1), the relative strength of attribute \(a_m\) can be set to be in the middle between the averages of both the seen categories. For case (2), the authors use the average rank-distance \(d_m\) of attribute \(a_m\) between all seen classes and subtract the \(d_m\) from the mean attribute value of the seen class. Vice-versa for case (3). For case (4), the attribute strength is randomly sampled from the distribution of \(a_m\) across all seen samples. This way, for each attribute \(a_m\), a strength vector can be found which represents the relative presence of the attribute in the sample of the novel category.

The authors show that this method out-performs other contemporary approaches, especially when the number of relations from the unseen to the seen categories is large.