# Pokemon Color Search

Utility for quickly finding pokemon by the sprite's "distance" from a given color.

- `nearest.py` provides multiple options for finding pokemon "near" a color.
No external dependencies, but `database.csv` must be present and populated.
- `ingest.py` generates `database.csv` which is needed for `nearest.py`
Requires Pillow (PIL).
- `nearest.html` and `nearest.js` allow you to use a very rudimentary front-end
in your browser, by opening `nearest.html` directly. The implementation is very
lazy in order to allow usage with no dependencies.
- `expore.py` traverses a subset of the 24-bit RGB color space and finds the
pokemon that most closely match each color, and produces `best.csv` and 
`counts.csv` as results.

## PNG Source

Currently using [pokemondb](https://pokemondb.net/sprites)'s sprite archive.

Download the entire page (i.e., Ctrl-S in most browsers), then take the folder
of PNGs it downloads and place them in the `pngs` directory in this repository.
Then run `ingest.py` to generate `database.csv`

## Distance Calculation

For ease of calculation, a pokemon's distance from a certain color is the 
mean of the squared Euclidean distances between the given color and each pixel
of the sprite, treating the RGB components as vector components. Transparent 
pixels are omitted.

Put more explicitly, if a pokemon's sprite's pixels form the set P, then the
distance to a color q is:

```
D(P, q) = sum(||p - q||^2, p in P) / |P|
```

This distance measure was chosen because it can be easily reformulated. For a
pixel p, let p_r, p_g, and p_b be the red, green, and blue components 
respectively. The above function can be rewritten as follows:

```
= sum(||p - q||^2, p in P) / |P|
# expand 2-norm definition
= sum(sum((p_c - q_c)^2, c in [r, g, b]), p in P) / |P|
# FOIL
= sum(sum(p_c^2 - 2*p_c*q_c + q_c^2, c in [r, g, b]), p in P) / |P|
# split sums, extract constants
= (sum(sum(p_c^2, c in [r, g, b]), p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(sum(q_c^2, c in [r, g, b]), p in P)) / |P|
# collapse 2-norm definition in first and third terms
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + sum(||q||^2, p in P)) / |P|
# evaluate third summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, c in [r, g, b]), p in P) 
  + |P|*||q||^2) / |P|
# invert order of second summation (sums are finite)
= (sum(||p||^2, p in P) 
  - 2*sum(sum(p_c*q_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# pull out q_c term in inner sum of second summation (no dependency on p)
= (sum(||p||^2, p in P) 
  - 2*sum(q_c*sum(p_c, p in P), c in [r, g, b]) 
  + |P|*||q||^2) / |P|
# distribute 1/|P| factor, move factor of 2 inside second summation
= (sum(||p||^2, p in P)/|P|)
  - sum(q_c*2*sum(p_c, p in P)/|P|, c in [r, g, b])
  + ||q||^2
# let Y be a vector-valued function such that Y(P)_c = 2*sum(p_c, p in P)/|P|
= (sum(||p||^2, p in P)/|P|)
  - sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# let X(P) = sum(||p||^2, p in P)/|P|
= X(P)
  - sum(q_c*Y(P)_c, c in [r, g, b])
  + ||q||^2
# collapse dot product definition in second term
D(P, q) = X(P) - q . Y(P) + ||q||^2
```

Notably, `X(P)` and and `Y(P)` can be computed ahead of time for each sprite. 
This is the purpose of `ingest.py`, which converts each png file in the `pngs`
directory into an item in `database.csv`. The columns of this CSV file are
`name,X,Y_r,Y_g,Y_b`. Additionally, the last term, `||q||^2` has no dependence
on `P`, and can thus be eliminated when trying to find the best `P` for a given
`q`. Thus, the final distance metric is:

```
D(P, q) = X(P) - q . Y(P) 
```