No Description

Kirk Trombley 8691602c7b gen8 - note there's one corrupted urshifu image that was skipped 2 years ago
data 8691602c7b gen8 - note there's one corrupted urshifu image that was skipped 2 years ago
legacy 10c247bb9a Organize repo a bit 2 years ago
tools 8691602c7b gen8 - note there's one corrupted urshifu image that was skipped 2 years ago
web 8691602c7b gen8 - note there's one corrupted urshifu image that was skipped 2 years ago
.gitignore 156e2a796b start moving download code to new script 2 years ago
.prettierrc ebed92f2f0 Tweak prettier settings 2 years ago
README.md fad548bb60 Add logic for treating deviations as per-channel 2 years ago
requirements.txt 47f5ef4a69 Add reqs for anim_ingest 3 years ago

README.md

Pokemon Color Search

Utility for quickly finding pokemon by the sprite's "distance" from a given color.

Included Files

  • tools/download.py downloads the pokedex and images, populating the images directory by default. Only run this if you want to download a lot of images. Requires aiohttp.
  • tools/analyze.py runs statistics on the downloaded images for the entire pokedex, producing a database file (and an ingest record for debugging). This database is used by the UI for actual sorting. There is already a generated database provided as part of this repo, and running this script is unnecessary. Requires reasonably up to date versions of numpy, scipy, and Pillow.
  • web/index.html along with the web directory serve as a UI for doing pokemon search. Fuse.js is loaded statically from unpkg. Designed to work when served or browsed to as a local file.
  • data directory has some legacy data and the latest pokedex.

Problem Statement

Given a color q and a collection of sets of colored pixels, determine which set is "most similar" to q. Put another way, find a measure that scores each set of pixels I according to a "perceptual distance" from q.

Color difference measurement is an active area of study. While there are many strategies, the most common uses so-called uniform color spaces, where the perceptual difference between colors correlates strongly with their Euclidean distance.

This project uses the OKLab color space, a simple to compute but still largely perceptually uniform color space. The core of the measure should therefore be the Euclidean distance. A related vector comparison method, cosine similarity (and by extension, angular difference), is another good candidate. Additionally, comparing certain correlates directly may be useful, especially chroma and hue, which are derived from the a and b correlates in OKLab.

But this choice of metric does not solve the whole problem - how do you take the distance between a point and a set of points?

TODO tons of updates now that things are even more complicated TODO explanation of how arccos(mean(cos(x))) vs mean(x) on [0, pi] isn't actually that bad

Distance to Mean - δ

The simplest useful metric is to simply take the centroid of the set I (the arithmetic mean of all points in I), call this μ(I), and assign

δ(q, I) = ||μ(I) - q||

where ||x|| denotes the Euclidean/L2-norm of x. This is cheap to compute, since μ(I) can be pre-computed for each set. When sorting the sets, the square root inherent in the norm can be ignored, since it is monotonically increasing.

Average of Distance to Each Point

In some sense, the most perfect metric would be to calculate the Euclidean distance from q to each point p in I, and take the arithmetic mean of these distances. This involves a large sum of square roots, which is hard to do meaningful precomputation for, and makes calculation prohibitively expensive. For this reason, this distance metric is not used. Instead, there is an approximation of it.

Average Square Distance to Each Point - Ψ

Rather than taking the arithmetic mean of the Euclidean distances to the target color, the quadratic mean (also called root-mean-square) gives a useful result while being comparatively trivial to compute.

Assign the distance metric

Ψ(q, I) = sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)

where |I| is the cardinality of the set. This is essentially the square root of the Fréchet variance, of the image relative to our target color q. As with δ, the square root can be ignored during the sort, as it is monotonically increasing.

This can be further improved through algebraic manipulation:

Ψ(q, I)     = sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)
Ψ(q, I)²    = sum(||p-q||² ∀ p ∈ I) / |I|
Ψ(q, I)²|I| = sum(||p-q||² ∀ p ∈ I)
            = sum((p-q)⋅(p-q) ∀ p ∈ I)
            = sum((p⋅p + q⋅q - 2q⋅p) ∀ p ∈ I)
            = sum(p⋅p ∀ p ∈ I)       + sum(q⋅q ∀ p ∈ I) - 2*sum(q⋅p ∀ p ∈ I)
            = sum(p⋅p ∀ p ∈ I)       + |I|q⋅q           - 2q⋅sum(p ∀ p ∈ I)
Ψ(q, I)²    = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q              - 2q⋅sum(p ∀ p ∈ I) / |I|

Note that sum(p ∀ p ∈ I) / |I| is just the centroid μ(I). Two terms can now be added to complete a square.

Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅sum(p ∀ p ∈ I) / |I|
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅μ(I)
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅μ(I) + μ(I)⋅μ(I) - μ(I)⋅μ(I)
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + (μ(I) - q)⋅(μ(I) - q)    - μ(I)⋅μ(I)

Ψ(q, I)² = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||² + ||μ(I) - q||²

Note that ||μ(I) - q|| is just the distance to the centroid again. Take the first two terms of this polynomial and define a new function σ as

σ²(I) = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||²

This σ can be pre-computed for each set I. Now we can simply define the metric as

Ψ(q, I) = sqrt(σ²(I) + δ²(q, I))

A Note on Terminology

The value σ² is referred to as the variance of the image, and that is why the symbol σ was chosen, but it is important to note this notion has a hidden assumption. The above equations look remarkably similar to calculations of RMS error in a sample, but this is only because of a certain choice of definition.

Consider the definition of statistical variance of a distribution X.

Var(X) = E[X²] - E[X]²

What could a definition of Var(I) be? If we treat I as a uniform distribution over the pixels it contains, we get the following definition of expectation.

E[I] = sum(p ∀ p ∈ I) / |I| = μ(I)

But, this is a vector-valued function. How can we "square" it? Here is where the choice of definition comes into play.

Define x² = x⋅x = ||x||². This not far off from how some geometric algebras define the square of a vector!

Define E[I²] = sum(p² ∀ p ∈ I) / |I|. This is usually how the second moment is defined for discrete uniform probability distributions anyway, but it is worth stating explicitly here.

With just these definitions we can now compute the variance of the "distribution" I as follows:

Var(I) = E[I²] - E[I]²
       = sum(p² ∀ p ∈ I) / |I| - (sum(p ∀ p ∈ I) / |I|)²
       = sum(p² ∀ p ∈ I) / |I| - μ²(I)
       = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||²
       = σ²(I)

And so we recover our definition of σ² from before. If this definition of a vector square is accepted, then it is accurate to say that σ² is the variance in color of the pixel set. But this distinction is important, since otherwise there a risk of using this value imprecisely.

For example, it is not accurate to say σ² is the variance of the norms of the pixels, because it actually uses the norm of the mean and not the mean of the norms. σ² is first and foremost a computational tool, and any statistical meaning (or lack thereof) does not have any bearing on the performance of the distance metric.

Cosine Similarity - θ and Ω

The cosine similarity of two vectors is defined as the cosine of the angle between them, and computed as the dot product of two normalized vectors. To convert measure of similarity to one of difference, subtract the similarity from 1. Alternatively, use inverse cosine to get the actual angle between the vectors, and minimize that.

One option is to point this at the centroid μ(I) immediately, and define

θ(q, I) = acos((μ(I)/||μ(I)||) ⋅ (q/||q||))

The normalizations of the centroid and target are of course easy to compute before the sort, leaving only a dot product and inverse cosine.

Like with δ and Ψ, this metric cannot be directly averaged over the whole set, since a sum of inverse cosines will be expensive to compute. However, the cosine difference measure itself linearizes well, and an arithmetic mean can be used instead of a root-mean-square. Assign and simplify another metric

Ω(q, I) = sum(1 - (p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
        = |I|/|I| - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
        = 1 - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
        = 1 - sum(p/||p|| ∀ p ∈ I)⋅(q/||q||) / |I|
        = 1 - (sum(p/||p|| ∀ p ∈ I) / |I|)⋅q/||q||

Define a new vector τ(I), called the "tilt" since it is a measure of how "off-center" the set is.

τ(I) = sum(p/||p|| ∀ p ∈ I) / |I|

This is similar to a spherical mean of the set, but without the final normalization. This can of course be pre-computed for each set I, and simplifies the distance function.

Ω(q, I) = 1 - τ(I)⋅q/||q||

Correlate Comparison - ΔL, ΔC, Δh

Like most Lab-style color spaces, each pixel in OKLab is composed of a lightness correlate L, a green/red correlate a, and a blue/yellow correlate b. From these there can also be derived a chroma correlate C, and a hue angle h, where C = sqrt(a² + b²) and h = atan2(b, a) (i.e., a transformation to polar coordinates).

The most useful of these for intuitive comparison are L, C, and h. The first component of μ(I) is the (arithmetic) mean of the L values in the image. This can be called L̅(I) for simplicity.

The other correlates are more complicated. For each pixel in I, both C and h are computed, and then these values are averaged.

The chroma values can be put into a typical arithmetic mean, giving

C̅(I) = sum(sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|

but the hue angles require a circular mean. Using the typical definition of this mean, combined with the definition of h above, define

h̅(I) = atan2(
  sum(sin(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|,
  sum(cos(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
  sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|,
  sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
  sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I),
  sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I)
) = atan2(sum((b, a) / sqrt(a² + b²) ∀ (L, a, b) ∈ I))
  = atan2(sum((b, a) / ||(b, a)|| ∀ (L, a, b) ∈ I))

with a small abuse of notation that atan2((y, x)) = atan2(y, x)

The precomputed is provided in radians, and rotated to be in the range [0, 2π). The UI displays this value in

There are three natural distance metrics that can be defined from these values.

ΔL(q, I) = |L̅(I) - q_L|

ΔC(q, I) = |C̅(I) - sqrt(q_a² + q_b²)|

Δh(q, I) = |h̅(I) - atan2(q_b, q_a)| mod π

where q = (q_L, q_a, q_b) and the mod operator is defined "correctly" for non-integers (i.e., will properly choose the minimal angle difference for Δh).

It would also of course be possible to compare q to the L, C, and h values of the centroid μ(I), but this provides little utility over the Euclidean distance, while also losing some information of the original image. These potential metrics are omitted for this reason, in favor of the higher fidelity ΔL, ΔC, and Δh.

Composite Metric - α

TODO but α(q, I) = 100 * sqrt(Ψ(q, I) * Ω(q, I))

Summary

The ingester records the following statistics on each pixel set I

Statistic Definition
Centroid μ(I) = sum(p ∀ p ∈ I) / \|I\| = (L̅(I), a̅(I), b̅(I))
Tilt τ(I) = sum(p/\|\|p\|\| ∀ p ∈ I) / \|I\|
Variance σ²(I) = sum(\|\|p\|\|² ∀ p ∈ I) / \|I\| - \|\|μ(I)\|\|²
Chroma C̅(I) = sum(sqrt(a² + b²) ∀ (L, a, b) ∈ I) / \|I\|
Hue h̅(I) = atan2(sum((b, a) / \|\|(b, a)\|\| ∀ (L, a, b) ∈ I))

The size of the set is also included, primarily for cluster metrics described below, as well as the sRGB hex code of the centroid, primarily for display purposes.

The search UI provides the following metrics to sort on, for a given target q = (q_L, q_a, q_b)

Metric Definition
Distance to the centroid δ(q, I) = \|\|μ(I) - q\|\|
RMS of the distance to each point Ψ(q, I) = sqrt(σ²(I) + δ²(q, I))
Angular difference from the centroid θ(q, I) = acos((μ(I)/\|\|μ(I)\|\|) ⋅ (q/\|\|q\|\|))
Mean cosine difference from each point Ω(q, I) = 1 - τ(I) ⋅ (q/\|\|q\|\|)
"Geometric Difference" α(q, I) = 100 * sqrt(Ψ(q, I) * Ω(q, I))
Difference from the mean lightness ΔL(q, I) = \|L̅(I) - q_L\|
Difference from the mean chroma ΔC(q, I) = \|C̅(I) - sqrt(q_a² + q_b²)\|
Difference from the mean hue Δh(q, I) = \|h̅(I) - atan2(q_b, q_a)\| mod π

Data

TODO but explain animated gifs, back sprites, etc., all combined

Clustering

TODO more detail but

  • k-means
  • choose k to maximize mean(Δa² + Δb² ∀ mean combos)
  • all the above works on clusters
  • size factors
  • cluster * whole
  • cluster chosen by metric, stat, or "visual importance"