説明なし

Kirk Trombley c81f5bbd2f Update ingest logic to take arbitrary dir 3 年 前
.gitignore 38622fc9b5 Implement explore script 3 年 前
README.md 778549f8ee Update README 3 年 前
convert.py 3af774f436 Misc cleanup 3 年 前
database-luv.csv da9559e8f3 Generate CIELUV database and update JS database 3 年 前
database.csv a6a1784055 Apply the Hunt pruning metric 3 年 前
database.js da9559e8f3 Generate CIELUV database and update JS database 3 年 前
explore.py 3af774f436 Misc cleanup 3 年 前
ingest.py c81f5bbd2f Update ingest logic to take arbitrary dir 3 年 前
nearest.html dc09d10c9e Add pokemon lookup 3 年 前
nearest.js 3af774f436 Misc cleanup 3 年 前
nearest.py d5a4312a49 Reformatting 3 年 前
requirements.txt b4fa7e32cd Implement ingester 3 年 前

README.md

Pokemon Color Search

Utility for quickly finding pokemon by the sprite's "distance" from a given color.

  • nearest.py provides multiple options for finding pokemon "near" a color. No external dependencies, but database.csv must be present and populated.
  • ingest.py generates database.csv which is needed for nearest.py, database-luv.csv which is an alternative database for nearest.py using the CIE LUV color space, and database.js which is needed for nearest.html. Requires Pillow (PIL).
  • nearest.html and nearest.js allow you to use a front-end in your browser, by opening nearest.html directly. The implementation is somewhat lazy to avoid the use of an actual build system, and statically imports the dependencies Fuse.js, jQuery, and colorspaces. Requires database.js to be present, meaning ingest.py must be run first, unless you are using the included database.
  • explore.py traverses a subset of the 24-bit RGB color space and finds the pokemon that most closely match each color, and produces best.csv and counts.csv as results.
  • convert.py provides an RGB -> CIELUV implementation

The included database files correspond to small sprites downloaded for each Pokemon. No alternate forms or regional variants are included

PNG Source

Currently using pokemondb's sprite archive.

Download the entire page (i.e., Ctrl-S in most browsers), then take the folder of PNGs it downloads and place them in the pngs directory in this repository. Then run ingest.py to generate database.csv

Distance Calculation

For ease of calculation, a pokemon's distance from a certain color is the mean of the squared Euclidean distances between the given color and each pixel of the sprite, treating the RGB components as vector components. Transparent pixels are omitted, as are pure black pixels, since these are typically used for outlines.

Put more explicitly, if a pokemon's sprite's pixels form the set P, then the distance to a color q is:

D(P, q) = sum(||p - q||^2, p in P) / |P|

This distance measure was chosen because it can be easily reformulated. For a pixel p, let p_r, p_g, and p_b be the red, green, and blue components respectively. The above function can be rewritten as follows:

= sum(||p - q||^2, p in P) / |P|
# expand 2-norm definition
= sum(sum((p_c - q_c)^2, c in [r, g, b]), p in P) / |P|
# FOIL
= sum(sum(p_c^2 - 2*p_c*q_c + q_c^2, c in [r, g, b]), p in P) / |P|
# split sums, extract constants
= (sum(sum(p_c^2, c in [r, g, b]), p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(sum(q_c^2, c in [r, g, b]), p in P)) / |P|
# collapse 2-norm definition in first and third terms
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(||q||^2, p in P)) / |P|
# evaluate third summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + |P|*||q||^2) / |P|
# invert order of second summation (sums are finite)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# pull out q_c term in inner sum of second summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(q_c*sum(p_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# distribute 1/|P| factor
= (sum(||p||^2, p in P)/|P|)
  - 2*sum(q_c*sum(p_c, p in P)/|P|, c in [r, g, b])
  + ||q||^2
# let Y be a vector-valued function such that Y(P)_c = sum(p_c, p in P)/|P|
= (sum(||p||^2, p in P)/|P|)
  - 2*sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# let X(P) = sum(||p||^2, p in P)/|P|
= X(P)
  - 2*sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# collapse dot product definition in second term
D(P, q) = X(P) - 2q . Y(P) + ||q||^2

Notably, X(P) and and Y(P) can be computed ahead of time for each sprite. This is the purpose of ingest.py, which converts each png file in the pngs directory into an item in database.csv. The columns of this CSV file are name,X,Y_r,Y_g,Y_b. Additionally, the last term, ||q||^2 has no dependence on P, and can thus be eliminated when trying to find the best P for a given q. Finally, while the factor of 2 on the second term arises naturally from the original distance metric, it can be made into a tuning parameter, which will be called c for now. Thus, the final distance metric is:

D(P, q) = X(P) - cq . Y(P) 

To recap, X(P) is the mean squared distance of all pixels in the image from zero, and Y(P) is the mean of all pixels in the image. Increasing the value of c thus forces the metric to prefer images that are more in line with the strongest components of q. Taking c too low causes the metric to prefer images with more black pixels (minimizing X(P)), while taking c too high causes the metric to prefer images with more white pixels (maximizing Y(P)).

Intuitively, X(P) is a measure of how bright the image is over all, and Y(P) is the mean color of the image. q . Y(P) is then a measure of how much the image's mean color aligns with the target color.

In addition to tweaking c, nearness implementations allow normalizing q and Y(P) in order to focus more on "angle" between the target color and the mean color of the image. Then, either c can be raised to offset the reduced value of q . Y(P), or the X(P) term can simply be omitted

CIELUV Color Space

The above distance metric can also be made to work with the CIELUV color space, a (more) perceptually uniform color space than standard RGB. Because the L, u, and v components of the LUV pixels can be compared with the same Euclidean distance, all that is needed is to calculate new X(P) and Y(P) values from the converted RGB pixels, and then to also convert q to this space before calculation. The ingest.py script generates database-luv.csv, which has these modified X and Y values, and the nearness implementations allow the use of these values instead of the RGB-based database.