No Description

Kirk Trombley 85835bcb72 Add extremely lazy web implementation 3 years ago
.gitignore 5ab634204c Initial commit 3 years ago
README.md ee7e329ed0 Docs 3 years ago
database.csv 72dbd98fe5 Generate database 3 years ago
ingest.py 5865ae7efb Cleanup, add shebang 3 years ago
nearest.html 85835bcb72 Add extremely lazy web implementation 3 years ago
nearest.js 85835bcb72 Add extremely lazy web implementation 3 years ago
nearest.py a77bd4eca2 Improve random color logic, add shebang 3 years ago
requirements.txt b4fa7e32cd Implement ingester 3 years ago

README.md

Pokemon Color Search

Utility for quickly finding pokemon by the sprite's "distance" from a given color.

  • nearest.py provides multiple options for finding pokemon "near" a color. No external dependencies, but database.csv must be present and populated.
  • ingest.py generates database.csv which is needed for nearest.py Requires Pillow (PIL).

PNG Source

Currently using pokemondb's sprite archive.

Download the entire page (i.e., Ctrl-S in most browsers), then take the folder of PNGs it downloads and place them in the pngs directory in this repository. Then run ingest.py to generate database.csv

Distance Calculation

For ease of calculation, a pokemon's distance from a certain color is the mean of the squared Euclidean distances between the given color and each pixel of the sprite, treating the RGB components as vector components. Transparent pixels are omitted.

Put more explicitly, if a pokemon's sprite's pixels form the set P, then the distance to a color q is:

D(P, q) = sum(||p - q||^2, p in P) / |P|

This distance measure was chosen because it can be easily reformulated. For a pixel p, let p_r, p_g, and p_b be the red, green, and blue components respectively. The above function can be rewritten as follows:

= sum(||p - q||^2, p in P) / |P|
# expand 2-norm definition
= sum(sum((p_c - q_c)^2, c in [r, g, b]), p in P) / |P|
# FOIL
= sum(sum(p_c^2 - 2*p_c*q_c + q_c^2, c in [r, g, b]), p in P) / |P|
# split sums, extract constants
= (sum(sum(p_c^2, c in [r, g, b]), p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(sum(q_c^2, c in [r, g, b]), p in P)) / |P|
# collapse 2-norm definition in first and third terms
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(||q||^2, p in P)) / |P|
# evaluate third summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + |P|*||q||^2) / |P|
# invert order of second summation (sums are finite)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# pull out q_c term in inner sum of second summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(q_c*sum(p_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# distribute 1/|P| factor, move factor of 2 inside second summation
= (sum(||p||^2, p in P)/|P|)
  - sum(q_c*2*sum(p_c, p in P)/|P|, c in [r, g, b])
  + ||q||^2
# let Y be a vector-valued function such that Y(P)_c = 2*sum(p_c, p in P)/|P|
= (sum(||p||^2, p in P)/|P|)
  - sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# let X(P) = sum(||p||^2, p in P)/|P|
= X(P)
  - sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# collapse dot product definition in second term
D(P, q) = X(P) - q . Y(P) + ||q||^2

Notably, X(P) and and Y(P) can be computed ahead of time for each sprite. This is the purpose of ingest.py, which converts each png file in the pngs directory into an item in database.csv. The columns of this CSV file are name,X,Y_r,Y_g,Y_b. Additionally, the last term, ||q||^2 has no dependence on P, and can thus be eliminated when trying to find the best P for a given q. Thus, the final distance metric is:

D(P, q) = X(P) - q . Y(P)