|
2 سال پیش | |
---|---|---|
data | 2 سال پیش | |
tools | 2 سال پیش | |
web | 2 سال پیش | |
.gitignore | 2 سال پیش | |
.prettierrc | 2 سال پیش | |
README.md | 2 سال پیش | |
download-all.sh | 2 سال پیش | |
requirements.txt | 3 سال پیش |
Utility for quickly finding pokemon by the sprite's "distance" from a given color.
tools/download.py
downloads the pokedex and images, populating the images directory
by default. Only run this if you want to download a lot of images. Requires aiohttp
.tools/analyze.py
runs statistics on the downloaded images for the entire pokedex,
producing a database file (and an ingest record for debugging). This database is
used by the UI for actual sorting. There is already a generated database provided as
part of this repo, and running this script is unnecessary. Requires reasonably up to
date versions of numpy
, scipy
, and Pillow
.web/index.html
along with the web
directory serve as a UI for doing pokemon search.
Fuse.js is loaded statically from unpkg
. Designed to work when
served or browsed to as a local file.data
directory has some legacy data and the latest pokedex.Given a color q
and a collection of sets of colored pixels, determine which
set is "most similar" to q
. Put another way, find a measure that scores
each set of pixels I
according to a "perceptual distance" from q
.
Color difference measurement is an active area of study. While there are many strategies, the most common uses so-called uniform color spaces, where the perceptual difference between colors correlates strongly with their Euclidean distance.
This project uses the OKLab color space,
a simple to compute but still largely perceptually uniform color space. The core of
the measure should therefore be the Euclidean distance. A related vector comparison
method, cosine similarity (and by extension, angular difference), is another good
candidate. Additionally, comparing certain correlates directly may be useful,
especially chroma and hue, which are derived from the a
and b
correlates in OKLab.
But this choice of metric does not solve the whole problem - how do you take the distance between a point and a set of points?
TODO tons of updates now that things are even more complicated TODO explanation of how arccos(mean(cos(x))) vs mean(x) on [0, pi] isn't actually that bad
δ
The simplest useful metric is to simply take the centroid of the set I
(the
arithmetic mean of all points in I
), call this μ(I)
, and assign
δ(q, I) = ||μ(I) - q||
where ||x||
denotes the Euclidean/L2-norm of x
. This is cheap to compute,
since μ(I)
can be pre-computed for each set. When sorting the sets, the square
root inherent in the norm can be ignored, since it is monotonically increasing.
In some sense, the most perfect metric would be to calculate the Euclidean distance
from q
to each point p
in I
, and take the arithmetic mean of these distances.
This involves a large sum of square roots, which is hard to do meaningful
precomputation for, and makes calculation prohibitively expensive. For this reason,
this distance metric is not used. Instead, there is an approximation of it.
Ψ
Rather than taking the arithmetic mean of the Euclidean distances to the target color, the quadratic mean (also called root-mean-square) gives a useful result while being comparatively trivial to compute.
Assign the distance metric
Ψ(q, I) = sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)
where |I|
is the cardinality of the set. This is essentially the square root of the
Fréchet variance, of the image
relative to our target color q
. As with δ
, the square root can be ignored during
the sort, as it is monotonically increasing.
This can be further improved through algebraic manipulation:
Ψ(q, I) = sqrt(sum(||p-q||² ∀ p ∈ I) / |I|)
Ψ(q, I)² = sum(||p-q||² ∀ p ∈ I) / |I|
Ψ(q, I)²|I| = sum(||p-q||² ∀ p ∈ I)
= sum((p-q)⋅(p-q) ∀ p ∈ I)
= sum((p⋅p + q⋅q - 2q⋅p) ∀ p ∈ I)
= sum(p⋅p ∀ p ∈ I) + sum(q⋅q ∀ p ∈ I) - 2*sum(q⋅p ∀ p ∈ I)
= sum(p⋅p ∀ p ∈ I) + |I|q⋅q - 2q⋅sum(p ∀ p ∈ I)
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅sum(p ∀ p ∈ I) / |I|
Note that sum(p ∀ p ∈ I) / |I|
is just the centroid μ(I)
. Two terms can now be
added to complete a square.
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅sum(p ∀ p ∈ I) / |I|
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅μ(I)
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + q⋅q - 2q⋅μ(I) + μ(I)⋅μ(I) - μ(I)⋅μ(I)
Ψ(q, I)² = sum(p⋅p ∀ p ∈ I) / |I| + (μ(I) - q)⋅(μ(I) - q) - μ(I)⋅μ(I)
Ψ(q, I)² = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||² + ||μ(I) - q||²
Note that ||μ(I) - q||
is just the distance to the centroid again. Take the first
two terms of this polynomial and define a new function σ
as
σ²(I) = sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||²
This σ
can be pre-computed for each set I
. Now we can simply define the metric as
Ψ(q, I) = sqrt(σ²(I) + δ²(q, I))
The value σ²
is referred to as the variance of the image, and that is why the symbol
σ
was chosen, but it is important to note this notion has a hidden assumption. The
above equations look remarkably similar to calculations of RMS error in a sample, but
this is only because of a certain choice of definition.
Consider the definition of statistical variance of a distribution X
.
Var(X) = E[X²] - E[X]²
What could a definition of Var(I)
be? If we treat I
as a uniform distribution over
the pixels it contains, we get the following definition of expectation.
E[I] = sum(p ∀ p ∈ I) / |I| = μ(I)
But, this is a vector-valued function. How can we "square" it? Here is where the choice of definition comes into play.
Define x² = x⋅x = ||x||²
. This not far off from how some geometric algebras define the
square of a vector!
Define E[I²] = sum(p² ∀ p ∈ I) / |I|
. This is usually how the second moment is defined
for discrete uniform probability distributions anyway, but it is worth stating explicitly
here.
With just these definitions we can now compute the variance of the "distribution" I
as
follows:
Var(I) = E[I²] - E[I]²
= sum(p² ∀ p ∈ I) / |I| - (sum(p ∀ p ∈ I) / |I|)²
= sum(p² ∀ p ∈ I) / |I| - μ²(I)
= sum(||p||² ∀ p ∈ I) / |I| - ||μ(I)||²
= σ²(I)
And so we recover our definition of σ²
from before. If this definition of a vector
square is accepted, then it is accurate to say that σ²
is the variance in color of
the pixel set. But this distinction is important, since otherwise there a risk of using
this value imprecisely.
For example, it is not accurate to say σ²
is the variance of the norms of the pixels,
because it actually uses the norm of the mean and not the mean of the norms. σ²
is first and foremost a computational tool, and any statistical meaning (or lack thereof)
does not have any bearing on the performance of the distance metric.
θ
and Ω
The cosine similarity of two vectors is defined as the cosine of the angle between them, and computed as the dot product of two normalized vectors. To convert measure of similarity to one of difference, subtract the similarity from 1. Alternatively, use inverse cosine to get the actual angle between the vectors, and minimize that.
One option is to point this at the centroid μ(I)
immediately, and define
θ(q, I) = acos((μ(I)/||μ(I)||) ⋅ (q/||q||))
The normalizations of the centroid and target are of course easy to compute before the sort, leaving only a dot product and inverse cosine.
Like with δ
and Ψ
, this metric cannot be directly averaged over the whole set, since
a sum of inverse cosines will be expensive to compute. However, the cosine difference
measure itself linearizes well, and an arithmetic mean can be used instead of a
root-mean-square. Assign and simplify another metric
Ω(q, I) = sum(1 - (p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= |I|/|I| - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= 1 - sum((p/||p||)⋅(q/||q||) ∀ p ∈ I) / |I|
= 1 - sum(p/||p|| ∀ p ∈ I)⋅(q/||q||) / |I|
= 1 - (sum(p/||p|| ∀ p ∈ I) / |I|)⋅q/||q||
Define a new vector τ(I)
, called the "tilt" since it is a measure of how "off-center"
the set is.
τ(I) = sum(p/||p|| ∀ p ∈ I) / |I|
This is similar to a spherical mean
of the set, but without the final normalization. This can of course be pre-computed for
each set I
, and simplifies the distance function.
Ω(q, I) = 1 - τ(I)⋅q/||q||
ΔL
, ΔC
, Δh
Like most Lab-style color spaces, each pixel in OKLab is composed of a lightness
correlate L
, a green/red correlate a
, and a blue/yellow correlate b
. From
these there can also be derived a chroma correlate C
, and a hue angle h
, where
C = sqrt(a² + b²)
and h = atan2(b, a)
(i.e., a transformation to polar
coordinates).
The most useful of these for intuitive comparison are L
, C
, and h
. The first
component of μ(I)
is the (arithmetic) mean of the L
values in the image. This can
be called L̅(I)
for simplicity.
The other correlates are more complicated. For each pixel in I
, both C
and h
are computed, and then these values are averaged.
The chroma values can be put into a typical arithmetic mean, giving
C̅(I) = sum(sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|
but the hue angles require a circular mean. Using the
typical definition
of this mean, combined with the definition of h
above, define
h̅(I) = atan2(
sum(sin(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|,
sum(cos(atan2(b, a)) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|,
sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I) / |I|
) = atan2(
sum(b / sqrt(a² + b²) ∀ (L, a, b) ∈ I),
sum(a / sqrt(a² + b²) ∀ (L, a, b) ∈ I)
) = atan2(sum((b, a) / sqrt(a² + b²) ∀ (L, a, b) ∈ I))
= atan2(sum((b, a) / ||(b, a)|| ∀ (L, a, b) ∈ I))
with a small abuse of notation that atan2((y, x)) = atan2(y, x)
The precomputed h̅
is provided in radians, and rotated to be in the range [0, 2π)
.
The UI displays this value in
There are three natural distance metrics that can be defined from these values.
ΔL(q, I) = |L̅(I) - q_L|
ΔC(q, I) = |C̅(I) - sqrt(q_a² + q_b²)|
Δh(q, I) = |h̅(I) - atan2(q_b, q_a)| mod π
where q = (q_L, q_a, q_b)
and the mod
operator is defined "correctly" for
non-integers (i.e., will properly choose the minimal angle difference for Δh
).
It would also of course be possible to compare q
to the L
, C
, and h
values
of the centroid μ(I)
, but this provides little utility over the Euclidean distance,
while also losing some information of the original image. These potential metrics
are omitted for this reason, in favor of the higher fidelity ΔL
, ΔC
, and Δh
.
α
TODO but α(q, I) = 100 * sqrt(Ψ(q, I) * Ω(q, I))
The ingester records the following statistics on each pixel set I
Statistic | Definition |
---|---|
Centroid | μ(I) = sum(p ∀ p ∈ I) / \|I\| = (L̅(I), a̅(I), b̅(I)) |
Tilt | τ(I) = sum(p/\|\|p\|\| ∀ p ∈ I) / \|I\| |
Variance | σ²(I) = sum(\|\|p\|\|² ∀ p ∈ I) / \|I\| - \|\|μ(I)\|\|² |
Chroma | C̅(I) = sum(sqrt(a² + b²) ∀ (L, a, b) ∈ I) / \|I\| |
Hue | h̅(I) = atan2(sum((b, a) / \|\|(b, a)\|\| ∀ (L, a, b) ∈ I)) |
The size of the set is also included, primarily for cluster metrics described below, as well as the sRGB hex code of the centroid, primarily for display purposes.
The search UI provides the following metrics to sort on, for a given target q = (q_L, q_a, q_b)
Metric | Definition |
---|---|
Distance to the centroid | δ(q, I) = \|\|μ(I) - q\|\| |
RMS of the distance to each point | Ψ(q, I) = sqrt(σ²(I) + δ²(q, I)) |
Angular difference from the centroid | θ(q, I) = acos((μ(I)/\|\|μ(I)\|\|) ⋅ (q/\|\|q\|\|)) |
Mean cosine difference from each point | Ω(q, I) = 1 - τ(I) ⋅ (q/\|\|q\|\|) |
"Geometric Difference" | α(q, I) = 100 * sqrt(Ψ(q, I) * Ω(q, I)) |
Difference from the mean lightness | ΔL(q, I) = \|L̅(I) - q_L\| |
Difference from the mean chroma | ΔC(q, I) = \|C̅(I) - sqrt(q_a² + q_b²)\| |
Difference from the mean hue | Δh(q, I) = \|h̅(I) - atan2(q_b, q_a)\| mod π |
TODO but explain animated gifs, back sprites, etc., all combined
TODO more detail but
mean(Δa² + Δb² ∀ mean combos)