To determine how much a pokemon resembles a given color, we need some notion of the difference between colors, which can then be averaged over all the pixels in the pokemon's sprite. There is extensive prior art on this, most of which centers around selecting a perceptually uniform color space, and then simply using the Euclidean distance between the two colors within the space.
For ease of calculation, we use the square of the Euclidean distance. Put more explicitly, the distance metric used is
where q
is the target color, and P
is the set of pixels in the image. For
notational convenience, when q
and p
are shown with a vector arrow, they
refer to the embedding of the actual represented color in some chosen color
space.
The choice to use the squared norm arises primarily for ease of calculation,
but it does have an alternative statistical interpretation. For each sprite,
consider P
to instead be a set of observed colors, and q
to be a proposed
estimator of the "average" of those colors (where that is intentionally left
ill-defined for the moment).
To evaluate how good of an estimator q
is of the "average" (or expectation)
of P
, we can use the root-mean-square error.
Of course, q - P
doesn't really correspond to a meaningful value
mathematically. What we actually mean is the color difference between q
and
each pixel in P
, which as above we can perform as the Euclidean distance
between the vectors in a color space that reasonably approximates human
perception (e.g., CIELUV).
Continuing this abuse of notation, we can replace q - P
with the Euclidean
norm, and now the expectation can be expressed as a sum over P
.
Where f
is the probability mass function of the uniform distribution over
P
, namely, f(p) = 1 / |P|
. Since sqrt
is monotonic over the values we
care about here, and f(p)
is constant, we can instead choose to minimize
RMS(q)
by minimizing the value
This is equivalent to the original D(q, P)
above. The main benefit of this
alternative formulation is it allows alternative intuition - for a fixed
q
, we seek the sprite P
with a minimum standard deviation in its color
difference from q
.
The D(q, P)
metric above can be further manipulated to ease computation,
since as it stands it involves calculating a potentially expensive operation
for every pixel in every sprite, to find the best sprite for a single color.
In the following derivation, subscripts denote vector components.
The most important steps above are in the last 3 lines, where the second term is rearranged into a dot product by way of swapping the order of summation.
Since q
is fixed when minimizing D(q, P)
, and the squared norm of the q
vector is always positive, the final term can actually be dropped. Additionally,
we define two more functions - X(P)
and Y(P)
to simplify notation as follows.
X(P)
is therefore the mean squared norm of the colors of the sprite, and
Y(P)
is the mean of all colors in the sprite. These can both be pre-computed
for each sprite, and produce only n+1
values - X(P)
and the n
components
of Y(P)
, which is 3 for all color spaces of interest.
This is the purpose of ingest.py
, which converts each png file in the given
directory into these 4 values, and produces more easily-consumable databases.
Finally, while the factor of 2 on the second term of D(q, P)
arises naturally
from the original distance metric, it can be made into a tuning parameter, which
will be called c
. Increasing the value of c
thus forces the metric to prefer
images that are more in line with the strongest components of q
. Taking c
too low causes the metric to prefer images with more black pixels (minimizing
X(P)
), while taking c
too high causes the metric to prefer images with more
white pixels (maximizing Y(P)
).
In addition to tweaking c
, nearness implementations allow normalizing q
and
Y(P)
in order to focus more on "angle" between the target color and the mean
color of the image. Then, either c
can be raised to offset the reduced value
of q · Y(P)
, or the X(P)
term can simply be omitted.
Thus, the final difference metric that is actually minimized is
The above difference metric is independent of color space, provided Euclidean
distances align to color differences. The default behavior uses sRGB, which works
well enough but does produce some strange results. The application also uses the
CIECAM02 color space, a more perceptually
uniform color space than standard RGB. Because the J, a, and b components of the
CAM02 colors can be compared with the same Euclidean distance, all that is needed
is to calculate new X(P)
and Y(P)
values from the converted RGB pixels,
and then to also convert q
to this space before calculation. The ingest.py
script generates database-cam02.csv
, which has these modified X
and Y
values,
and the nearness implementations allow the use of these values instead of the
RGB-based database.