No Description

Kirk Trombley faf5226807 Add random team button 5 months ago
data 0b1b294eda Teal mask updates 1 year ago
tools 0b1b294eda Teal mask updates 1 year ago
web faf5226807 Add random team button 5 months ago
.gitignore 156e2a796b start moving download code to new script 2 years ago
.prettierrc ebed92f2f0 Tweak prettier settings 2 years ago
README.md a8b185e9c0 Update 'README.md' 2 years ago
analyze-all.sh 091f40415d create analyze-all script for slightly more stable full ingest 2 years ago
download-all.sh 014ff45e1d add download all script 2 years ago
requirements.txt 36806ca304 Update reqs 1 year ago

README.md

Pokemon Color Search

Utility for quickly finding pokemon by the sprite's "distance" from a given color.

Included Files

  • tools/download.py downloads the pokedex and images, populating the images directory by default. Only run this if you want to download a lot of images. Requires aiohttp.
  • tools/analyze.py runs statistics on the downloaded images for the entire pokedex, producing a database file (and an ingest record for debugging). This database is used by the UI for actual sorting. There is already a generated database provided as part of this repo, and running this script is unnecessary. Requires reasonably up to date versions of numpy, scipy, and Pillow.
  • web/index.html along with the web directory serve as a UI for doing pokemon search. Fuse.js is loaded statically from unpkg. Designed to work when served or browsed to as a local file.
  • data directory has some legacy data and the latest pokedex.

Problem Statement

Given a color q and a collection of sets of colored pixels, determine which set is "most similar" to q. Put another way, find a measure that scores each set of pixels I according to a "perceptual distance" from q.

Color difference measurement is an active area of study. While there are many strategies, the most common uses so-called uniform color spaces, where the perceptual difference between colors correlates strongly with their Euclidean distance.

This project uses the OKLab color space, a simple to compute but still largely perceptually uniform color space. The core of the measure should therefore be the Euclidean distance. A related vector comparison method, cosine similarity (and by extension, angular difference), is another good candidate. Additionally, comparing certain correlates directly may be useful, especially chroma and hue, which are derived from the a and b correlates in OKLab.

As a side note, the units for any of these distances discussed below can be a bit fuzzy - ideally the JND (just-noticeable-difference) of OKLab could be better studied and these distances could be expressed in those terms. But specific values are less important for this project than relative comparisons.

But this choice of metric does not solve the whole problem - how do you take the distance between a point and a set of points?

Distance to Mean - δ

The simplest useful metric is to simply take the centroid of the set I (the arithmetic mean of all points in I), call this μ(I), and assign

δ(q, I) = μ(I) - q

This δ is a vector-valued function, and ||δ||, the Euclidean/L2-norm of δ, is the distance from the target to the centroid. This is cheap to compute, since μ(I) can be pre-computed for each set. For optimization in sorting, the square root could be ignored as it is monotonically increasing, but the UI does not do this in the interest of giving slightly more intuitive values.

Additionally, the individual channels of δ can be used to compare the individual channels of the image with those of the target color. The absolute values of these differences are defined as ΔL, Δa, and Δb. Notably only the first of these has a strongly intuitive definition, and the ΔC and Δh defined below may more useful thanΔa and Δb.

Average of Distance to Each Point

In some sense, the most perfect metric would be to calculate the Euclidean distance from q to each point p in I, and take the arithmetic mean of these distances. This involves a large sum of square roots, which is hard to do meaningful precomputation for, and makes calculation prohibitively expensive. For this reason, this distance metric is not used. Instead, there is an approximation of it in the next section.

Additionally, the geometric median of the set would be the point minimizing this value. While there is no closed form for this measure, an approximation can be computed and target colors can be compared to this. Thanks to the relative perceptual uniformity of OKLab, actual tesing has borne out that this is not a signficantly better metric than δ above. However, since this was only determined after all these approximations were calculated, it is still present in the data. Additionally it is slightly better at finding a color that nicely "summarizes" an image, without falling prey to outliers, so we choose to display (the hex code of) this median on the UI. There are not any additional metrics using this value for the time being.

Average Square Distance to Each Point - Ψ

Rather than taking the arithmetic mean of the Euclidean distances to the target color, the quadratic mean (also called root-mean-square) gives a useful result while being comparatively trivial to compute.

Assign the distance metric

Ψ(q, I) = sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)

where |I| is the cardinality of the set. This is essentially the square root of the Fréchet variance, of the image relative to our target color q. As with δ, the square root can be ignored during the sort, as it is monotonically increasing, but it is not ignored on the UI to give a more useful final value.

However, in the interest of having more options in the UI, this metric is actually defined for each channel of the image (L, a, and b). More precisely, for each channel x define a function

xᵣₘₛ(q, I) = sqrt(sum((pₓ - qₓ)² ∀ p ∈ I) / |I|)

where vₓ is the x channel of the color v. Now redefine Ψ to be a new vector-valued function

Ψ(q, I) = (Lᵣₘₛ(q, I), aᵣₘₛ(q, I), bᵣₘₛ(q, I))

which has a Euclidean distance equal to the original Ψ definition above. This will be proved below, but first there is some algebraic manipulation to be done:

xᵣₘₛ(q, I)     = sqrt(sum((pₓ - qₓ)² ∀ p ∈ I) / |I|)
xᵣₘₛ(q, I)²    = sum((pₓ - qₓ)² ∀ p ∈ I) / |I|
xᵣₘₛ(q, I)²|I| = sum((pₓ - qₓ)² ∀ p ∈ I)
               = sum((pₓ² + qₓ² - 2qₓ*pₓ) ∀ p ∈ I)
               = sum(pₓ² ∀ p ∈ I)       + sum(qₓ² ∀ p ∈ I) - 2*sum(qₓ*pₓ ∀ p ∈ I)
               = sum(pₓ² ∀ p ∈ I)       + |I|*qₓ²          - 2qₓ*sum(pₓ ∀ p ∈ I)
xᵣₘₛ(q, I)²    = sum(pₓ² ∀ p ∈ I) / |I| + qₓ²              - 2qₓ*sum(pₓ ∀ p ∈ I) / |I|

Note that sum(pₓ ∀ p ∈ I) / |I| is just x channel of the centroid μ(I). Two terms can now be added to complete a square.

xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| + qₓ² - 2qₓ*sum(pₓ ∀ p ∈ I) / |I|
xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| + qₓ² - 2qₓ*μₓ(I)
xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I) + μₓ²(I) + qₓ² - 2qₓ*μₓ(I)
xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I) + (μₓ(I) - qₓ)²
xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I) + (μ(I) - q)ₓ²
xᵣₘₛ(q, I)² = sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I) + δₓ²(q, I)

Where that last step requires noting that μₓ(I) - qₓ is just the difference from the centroid in the channel x, the x component of δ(q, I).

To further clean this up, define a new function σₓ

σₓ²(I) = sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I)

and simplify xᵣₘₛ to

xᵣₘₛ(q, I)² = σₓ²(I) + δₓ²(q, I)

and find a simpler form of the ||Ψ(q, I)|| metric as

Ψ(q, I) = (Lᵣₘₛ(q, I), aᵣₘₛ(q, I), bᵣₘₛ(q, I))
||Ψ(q, I)||² = sum(xᵣₘₛ(q, I)² ∀ x ∈ {L, a, b})
||Ψ(q, I)||² = sum(σₓ²(I) + δₓ²(q, I) ∀ x ∈ {L, a, b})
||Ψ(q, I)||² = sum(σₓ²(I) ∀ x ∈ {L, a, b}) + sum(δₓ²(q, I) ∀ x ∈ {L, a, b})
||Ψ(q, I)||² = ||σ(I)||² + ||δ(q, I)||²

where σ(I) is a vector-valued function, the components of which are the σₓ values.

The choice of symbol is no accident - σₓ² is of course the variance in each channel of the image, and the square root is the standard deviation. This σₓ can be pre-computed for each channel x for each set I, enabling each channel-specific RMS deviation to be used for sorting, as well as the overall ||Ψ(q, I)|| metric above.

The alternative meaning of that Ψ metric, the root-mean-square of the color distance to each pixel in I, can be shown as

sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)
= sqrt(sum((p-q)⋅(p-q) ∀ p ∈ I) / |I|)
= sqrt(sum((p⋅p + q⋅q - 2q⋅p) ∀ p ∈ I) / |I|)
= sqrt((sum(p⋅p ∀ p ∈ I) + sum(q⋅q ∀ p ∈ I) - 2*sum(q⋅p ∀ p ∈ I)) / |I|)
= sqrt((sum(p⋅p ∀ p ∈ I) + |I|q⋅q - 2q⋅sum(p ∀ p ∈ I)) / |I|)
= sqrt((sum(p⋅p ∀ p ∈ I) / |I|) + q⋅q - 2q⋅μ(I))
= sqrt((sum(p⋅p ∀ p ∈ I) / |I|) - μ(I)⋅μ(I) + μ(I)⋅μ(I) + q⋅q - 2q⋅μ(I))
= sqrt((sum(p⋅p ∀ p ∈ I) / |I|) - μ(I)⋅μ(I) + ||μ(I) - q||²)
= sqrt((sum(sum(pₓ² ∀ x ∈ {L, a, b}) ∀ p ∈ I) / |I|) - μ(I)⋅μ(I) + ||δ(q, I)||²)
= sqrt((sum(sum(pₓ² ∀ p ∈ I) ∀ x ∈ {L, a, b}) / |I|) - μ(I)⋅μ(I) + ||δ(q, I)||²)
= sqrt((sum(sum(pₓ² ∀ p ∈ I) / |I| ∀ x ∈ {L, a, b})) - μ(I)⋅μ(I) + ||δ(q, I)||²)
= sqrt((sum(sum(pₓ² ∀ p ∈ I) / |I| ∀ x ∈ {L, a, b})) - sum(μₓ²(I) ∀ x ∈ {L, a, b}) + ||δ(q, I)||²)
= sqrt(sum(sum(pₓ² ∀ p ∈ I) / |I| - μₓ²(I) ∀ x ∈ {L, a, b}) + ||δ(q, I)||²)
= sqrt(sum(σₓ²(I) ∀ x ∈ {L, a, b}) + ||δ(q, I)||²)
= sqrt(||σ(I)||² + ||δ(q, I)||²)
= ||Ψ(q, I)||

A Note on Terminology

The value σ is referred to as the standard deviation of the image, and while the components of the vector are definitely standard deviations in their channels (and the norm of the vector is a root of a sum of variances, which itself is a stanard deviation of the combined distribution) the vector itself only has intuitive meaning under a hidden assumption.

Consider the definition of statistical variance of a distribution X.

Var(X) = E[X²] - E[X]²

What could a definition of Var(I) be? If we treat I as a uniform distribution over the pixels it contains, we get the following definition of expectation.

E[I] = sum(p ∀ p ∈ I) / |I| = μ(I)

But, this is a vector-valued function. How can we "square" it? Here is where the choice of definition comes into play.

Define x² = x⋅x = ||x||². This not far off from how some geometric algebras define the square of a vector!

Define E[I²] = sum(p² ∀ p ∈ I) / |I|. This is usually how the second moment is defined for discrete uniform probability distributions anyway, but it is worth stating explicitly here.

With just these definitions we can now compute the variance of the "distribution" I as follows:

Var(I) = E[I²] - E[I]²
       = sum(p² ∀ p ∈ I) / |I| - (sum(p ∀ p ∈ I) / |I|)²
       = sum(p² ∀ p ∈ I) / |I| - μ²(I)
       = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||²
       = ||σ(I)||²
       = σ²(I)

And so we recover our definition of σ² from before. If this definition of a vector square is accepted, then it is accurate to say that σ² is the variance in color of the pixel set. But this distinction is important, since otherwise there a risk of using this value imprecisely.

For example, it is not accurate to say σ² is the variance of the norms of the pixels, because it actually uses the norm of the mean and not the mean of the norms. σ² is first and foremost a computational tool, and any statistical meaning (or lack thereof) does not have any bearing on the performance of the distance metric.

Cosine Similarity - φ and Θ

The cosine similarity of two vectors is defined as the cosine of the angle between them, and computed as the dot product of two normalized vectors. To convert measure of similarity to one of difference, subtract the similarity from 1. Alternatively, use inverse cosine to get the actual angle between the vectors, and minimize that.

One option is to point this at the centroid μ(I) immediately, and define

φ(q, I) = acos((μ(I)/||μ(I)||) ⋅ (q/||q||))

The normalizations of the centroid and target are of course easy to compute before the sort, leaving only a dot product and inverse cosine.

Like with δ and Ψ, this metric cannot be directly averaged over the whole set, since a sum of inverse cosines will be expensive to compute. However, the cosine difference measure itself linearizes well, and an arithmetic mean can be used instead of a root-mean-square. One potential metric could then be

sum(1 - (p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= |I|/|I| - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= 1 - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= 1 - sum(p/||p|| ∀ p ∈ I)⋅(q/||q||) / |I|
= 1 - (sum(p/||p|| ∀ p ∈ I) / |I|)⋅q/||q||
= 1 - τ(I)⋅q/||q||

where we define a new vector τ(I), called the "tilt" since it is a measure of how "off-center" the set is.

τ(I) = sum(p/||p|| ∀ p ∈ I) / |I|

This is similar to a spherical mean of the set, but without the final normalization. This can of course be pre-computed for each set I.

But, this metric is remarkably similar to a cosine difference in its own right. The τ(I)⋅q/||q|| term is, after all, the arithmetic mean of the cosines of the angle between q and each pixel in I. Thes angles would only ever be on the range [0, π], meaning the cosines would only be on the range [0, 1], and therefore so would their mean. This prompts the question - could we simply take the inverse cosine of this mean of cosines? A quasi-arithmetic mean, with f equal to cosine.

Since cosine is monotonically decreasing over the range in question, if I is treated as a distribution of angles (from q), it's actually possible to reason about the expectation of the cosine of the distribution. The details of this are left to the reader, but a probability density function can even be defined in terms of the density function of I, which of course we do not know the distribution of but should tend to be normal for larger and larger I. If that is the case, we can approximate the expectation of this cosine distribution, where μ and σ are the mean and variance of the angle distribution, with a horrible integral.

If this integral was solvable, we could then compare it to the mean of the angle distribution and see how the error trends.

Alternatively, just take the inverse cosine, look at the results, and see how it does. And it does well! Well enough to skip all the cosine distance nonsense, and define the metric as just

Θ(q, I) = acos(τ(I)⋅q/||q||)

which is typically given in degrees in the UI for readability.

Correlate Comparison - ΔL, ΔC, Δh

Like most Lab-style color spaces, each pixel in OKLab is composed of a lightness correlate L, a green/red correlate a, and a blue/yellow correlate b. From these there can also be derived a chroma correlate C, and a hue angle h, where C = sqrt(a² + b²) and h = atan2(b, a) (i.e., a transformation to polar coordinates).

As alluded to above, the most useful of these for intuitive comparison are L, C, and h. The components of μ(I) are of course the (arithmetic) mean of the L, a, and b values in the image, called L̅(I), a̅(I), and b̅(I) for simplicity.

The other correlates are more complicated. For each pixel in I, both C and h are computed, and then these values are averaged.

The chroma values can be put into a typical arithmetic mean, giving

C̅(I) = sum(sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|

and the natural metric

ΔC(q, I) = |C̅(I) - sqrt(q_a² + q_b²)|

which could also be used to compute an RMS deviation, in a similar way to the other channels of the image. In fact, the variance of the chroma can be very simply defined, thanks to the linearity of the expectation operator

Var[C] = E[C²] - E[C]²
       = E[a² + b²] - C̅²(I)
       = E[a²] + E[b²] - C̅²(I)
       = Var[a] + a̅²(I) + Var[b] + b̅²(I) - C̅²(I)

The variances in a and b are already given by the (square of the) components of the σ vector, and so this could be easily recovered during any potential sort, but for speed this chroma deviation is precomputed.

From the chroma variance, define a new metric in the same way as Ψ, as

Cᵣₘₛ(q, I) = sqrt(Var[I_C] + ΔC²(q, I))

which is the RMS deviation in chroma from the chroma of the target.

Chroma is straightforward, but the hue angles require a circular mean. Using the typical definition of this mean, combined with the definition of h above, define

h̅(I) = atan2(
  sum(sin(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|,
  sum(cos(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
  sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|,
  sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
  sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I),
  sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I)
) = atan2(sum((b, a) / sqrt(a² + b²) ∀ (L, a, b) ∈ I))
  = atan2(sum((b, a) / ||(b, a)|| ∀ (L, a, b) ∈ I))

with a small abuse of notation that atan2((y, x)) = atan2(y, x)

The precomputed is provided in radians, and rotated to be in the range [0, 2π). The UI displays this value in degrees. Naturally, a hue difference metric can be defined as

Δh(q, I) = min(|h̅(I) - atan2(q_b, q_a)|, 2π - |h̅(I) - atan2(q_b, q_a)|)

where q = (q_L, q_a, q_b). The minimization logic with Δh ensures we use the smallest angle difference. While a circular variance/deviation measure of hue is possible, it is complicated and of limited utility, and is therefore omitted.

It would also of course be possible to compare q to the C and h values of the centroid μ(I), but this provides little utility over the Euclidean distance, while also losing some information of the original image. These potential metrics are omitted for this reason, in favor of the higher fidelity ΔC and Δh.

Composite Metric - Ω

Of the above metrics, the subjectively best performing on initial examination were Ψ and Θ.

On certain colors, the RMS deviation leans too dark, as the central location of "black" in the OKLab space causes it to be moderately close to any color that is not exceptionally bright. Additionally, it will prefer highly regular images, sometimes to the detriment of the actual difference in color (e.g., suggesting Piloswine for lavendar).

On the other hand, the quasi-mean angular difference has no consideration for highly variable images, which can cause it to be too "forgiving" for certain target colors, and failing to take lightness differences into account at all for others (e.g., suggesting Copperajah for some bright greens).

All of this is according to subjective and cursory analysis of course, but overall, these two metrics tend to be at least as "accurate" as the simpler distance metrics, and for the vast majority of target colors at least one of these two metrics will "outperform" those simple metrics.

So, why not try to optimize both at the same time? Define a new metric

Ω(q, I) = ω * ||Ψ(q, I)|| * Θ(q, I)
        = ω * sqrt(||σ(I)||² + ||μ(I) - q||²) * acos(τ(I)⋅q/||q||)

composed of both Ψ and Θ (in radians still) and a scaling factor ω to make the final values easier to compare. In testing, this metric has performed excellently, nicely balacing out the bad behavior of both component metrics except for some very specific tricky colors.

What is the meaning of Ω? Obviously it is primarily just a computational tool, but is there any intuitional reason for it to perform well? Imagine the vector Ψ, in some way representing the average deviation of the entire image from the target. Now imagine rotating that vector Θ radians, itself in some way representing the average rotation needed to turn the image into the target color. The vector sweeps out an arc, whose length is ||Ψ|| * Θ, the (unscaled) value of Ω.

The value of ω is arbitrary, and to save an extra computation on the UI is set to 180/π (i.e., the conversion factor to change Θ to degrees).

Summary

The ingester records the following statistics on each pixel set I

Statistic Definition
Centroid μ(I) = sum(p ∀ p ∈ I) / \|I\| = (L̅(I), a̅(I), b̅(I))
Tilt τ(I) = sum(p/\|\|p\|\| ∀ p ∈ I) / \|I\|
Std Deviation σₓ(I) = sqrt(sum(pₓ² ∀ p ∈ I) / \|I\| - μₓ²(I)) ∀ x ∈ {L, a, b}
Chroma C̅(I) = sum(sqrt(ae + b²) ∀ (L, a, b) ∈ I) / \|I\|
Chroma Dev Var[C] = Var[a] + a̅²(I) + Var[b] + b̅²(I) - C̅²(I))
Hue h̅(I) = atan2(sum((b, a) / \|\|(b, a)\|\| ∀ (L, a, b) ∈ I))

The size of the set is also included, primarily for cluster metrics described below. In addition, there is the geometric median alluded to above as well as the sRGB hex code of both that median and the centroid, primarily for display purposes.

The search UI provides the following metrics to sort on, for a given target q = (q_L, q_a, q_b)

Metric Definition
"Arc Difference" Ω(q, I) ∝ Ψ(q, I) * Ω(q, I)
RMS of the distance to each point Ψ(q, I) = sqrt(σ²(I) + δ²(q, I))
Distance to the centroid δ(q, I) = \|\|μ(I) - q\|\|
Quasi-mean angular difference from each point Θ(q, I) = acos(τ(I) ⋅ (q/\|\|q\|\|))
Angular difference from the centroid φ(q, I) = acos((μ(I)/\|\|μ(I)\|\|) ⋅ (q/\|\|q\|\|))
Difference from the mean lightness ΔL(q, I) = \|L̅(I) - q_L\|
Difference from the mean red/green Δa(q, I) = \|a̅(I) - q_a\|
Difference from the mean blue/yellow Δb(q, I) = \|b̅(I) - q_b\|
Difference from the mean chroma ΔC(q, I) = \|C̅(I) - sqrt(q_a² + q_b²)\|
Difference from the mean hue Δh(q, I) = \|h̅(I) - atan2(q_b, q_a)\| mod π
RMS of the difference in lightness Lᵣₘₛ(q, I) = sqrt(σ²(I_L) + ΔL²(q, I))
RMS of the difference in red/green aᵣₘₛ(q, I) = sqrt(σ²(I_a) + Δa²(q, I))
RMS of the difference in blue/yellow bᵣₘₛ(q, I) = sqrt(σ²(I_b) + Δb²(q, I))
RMS of the difference in chroma Cᵣₘₛ(q, I) = sqrt(Var[C] + ΔC²(q, I))

Data

TODO but explain animated gifs, back sprites, etc., all combined

Clustering

TODO more detail but

  • k-means
  • choose k to maximize mean(Δa² + Δb² ∀ mean combos)
  • all the above works on clusters
  • size factors
  • whole metric + rescaled cluster metric
  • cluster chosen by metric or stat, default is chroma * size ("visual importance")