math.md 11 KB

Color Difference Calculation

To determine how much a pokemon resembles a given color, we need some notion of the difference between colors, which can then be averaged over all the pixels in the pokemon's sprite. There is extensive prior art on this, most of which centers around selecting a perceptually uniform color space, and then simply using the Euclidean distance between the two colors within the space.

For ease of calculation, we use the square of the Euclidean distance. Put more explicitly, the distance metric used is

where q is the target color, and P is the set of pixels in the image. For notational convenience, when q and p are shown with a vector arrow, they refer to the embedding of the actual represented color in some chosen color space.

Statistical Justification

The choice to use the squared norm arises primarily for ease of calculation, but it does have an alternative statistical interpretation. For each sprite, consider P to instead be a set of observed colors, and q to be a proposed estimator of the "average" of those colors (where that is intentionally left ill-defined for the moment).

To evaluate how good of an estimator q is of the "average" (or expectation) of P, we can use the root-mean-square error.

Of course, q - P doesn't really correspond to a meaningful value mathematically. What we actually mean is the color difference between q and each pixel in P, which as above we can perform as the Euclidean distance between the vectors in a color space that reasonably approximates human perception (e.g., CIELUV).

Continuing this abuse of notation, we can replace q - P with the Euclidean norm, and now the expectation can be expressed as a sum over P.

Where f is the probability mass function of the uniform distribution over P, namely, f(p) = 1 / |P|. Since sqrt is monotonic over the values we care about here, and f(p) is constant, we can instead choose to minimize RMS(q) by minimizing the value

This is equivalent to the original D(q, P) above. The main benefit of this alternative formulation is it allows alternative intuition - for a fixed q, we seek the sprite P with a minimum standard deviation in its color difference from q.

Implementation

The D(q, P) metric above can be further manipulated to ease computation, since as it stands it involves calculating a potentially expensive operation for every pixel in every sprite, to find the best sprite for a single color.

In the following derivation, subscripts denote vector components.

The most important steps above are in the last 3 lines, where the second term is rearranged into a dot product by way of swapping the order of summation.

Since q is fixed when minimizing D(q, P), and the squared norm of the q vector is always positive, the final term can actually be dropped. Additionally, we define two more functions - X(P) and Y(P) to simplify notation as follows.

X(P) is therefore the mean squared norm of the colors of the sprite, and Y(P) is the mean of all colors in the sprite. These can both be pre-computed for each sprite, and produce only n+1 values - X(P) and the n components of Y(P), which is 3 for all color spaces of interest.

This is the purpose of ingest.py, which converts each png file in the given directory into these 4 values, and produces more easily-consumable databases.

Finally, while the factor of 2 on the second term of D(q, P) arises naturally from the original distance metric, it can be made into a tuning parameter, which will be called c. Increasing the value of c thus forces the metric to prefer images that are more in line with the strongest components of q. Taking c too low causes the metric to prefer images with more black pixels (minimizing X(P)), while taking c too high causes the metric to prefer images with more white pixels (maximizing Y(P)).

In addition to tweaking c, nearness implementations allow normalizing q and Y(P) in order to focus more on "angle" between the target color and the mean color of the image. Then, either c can be raised to offset the reduced value of q · Y(P), or the X(P) term can simply be omitted.

Thus, the final difference metric that is actually minimized is

Color Spaces

The above difference metric is independent of color space, provided Euclidean distances align to color differences. The default behavior uses sRGB, which works well enough but does produce some strange results. The application also uses the CIECAM02 color space, a more perceptually uniform color space than standard RGB. Because the J, a, and b components of the CAM02 colors can be compared with the same Euclidean distance, all that is needed is to calculate new X(P) and Y(P) values from the converted RGB pixels, and then to also convert q to this space before calculation. The ingest.py script generates database-cam02.csv, which has these modified X and Y values, and the nearness implementations allow the use of these values instead of the RGB-based database.