Monday, December 3, 2012

Project: Synesthesia

Recently in Hacker School, I finished a project using Python I called "Synesthesia." The concept was devised by my housemate Natan, but it seemed to be an interesting engineering problem that would allow me to revisit my psycholinguistics knowledge. And then we can hang our art in our living room.

The concept: A generative art program into which you feed an image and you get back another image such that all the colored regions in the original are replaced with words that are related to that colors. So,  a previously red patch might sport the words "cherry," "communism," and "balloon." Here's what happens:

This program has several parts, the logic behind each I will explain in more detail later. They are:

  1. Making the image's color palette simpler based on what the user's preferences. For the image above, I chose green, blue, orange, and black.
  2. Figuring out which words are related to those selected colors
  3. Putting the words where the colors are

1. Simplifying the Image:

In order to best figure out what words should go where, I needed to simplify the color space of the image. So, we as humans know that the blue section in the original above is all, well, blue. However, neighboring pixels in image are a) different colors and b) defined not by their names but by the amount of red, green, and blue they comprise. This trio is called the "RGB value."

To do this, I used a nice module called webcolors that allowed me to make a list of the RGB value for each of the colors specified by the user. Then, I calculated the Euclidean distance between each pixel in the original image and the list of RGB values I'd created. Subsequently, I replaced the pixel with its nearest approximate, creating a simplified image.

2. Related Words:

In order to get a list of words that are related with the colors in the image, I appealed to the idea that related words co-occur with one another. I created a module that will find words that co-occur with any target word. To do this, I first scraped Wikipedia articles that come up when you do a search for the target word, compiling them all into a single corpus. I did this using Beautiful Soup. Following that, I created a frequency distribution of words that occur within a certain distance of the target word within it's scraped corpus, seeing how many times each word appeared. I then used this distribution to determine which words co-occur significantly with the target.

3. Creating the Art:

To create the "synesthesized" image, I started at the top left corner, figured out what color the pixel was and popped the first word related to that color from the list of related words, and put in a new image in the corresponding space. Each letter in the word stood in for a pixel, and each word is separated by a space, which is also a pixel. The "color" of the word is determined by the color of the pixel-space it begins on. So, if the word "cat" were placed, the next word would begin 4 pixel-spaces after "cat" began. If the next word in line for the color doesn't fit on the line, it's put back in the stack and I start again at the new line. I know that's a bit confusing, so here's a visualization:

I admit that this is a lazy approach, which both muddles the image and makes the right side of the image have a terribly jagged border. I should perhaps create a dynamic programming algorithm (it's a knapsack-y problem) that will pick words based on which ones with fit in the amount of the color that is left.

EDIT 12/5/12:
Writing this post and having to defend all my lazy design decisions made me realize that well, I was just being lazy and I could learn so much from actually solving the problem instead of just saying "close enough." I wanted to fill in each line of color without wasting space or running off the edge of the color's space. Sounds like a linear packing problem to me! A month of Hacker School has passed since I made my original (lazy) design decision, and in the past month, I've realized that I can solve any hard problem if I just work enough at it. Even an NP-hard problem!

I paired up with another Hacker Schooler, Betsy, because two brains are better than one, and we set to making a better painting algorithm. After joking a bit about solving the bin-packing problem and then getting a Field's medal, we decided that just making this algorithm better would be sufficient, resigning to our O(n!) fate. The way to find the most optimal solution is to find all the solutions and choose the best one.

Quite expensive yes, but we know storage is our best friend. Because the corpuses from which the co-occurrences are calculated aren't scraped freshly every time, we realized that we could store the results from our combinatorics in a pickled dictionary keyed by color and then by length, greatly reducing the online runtime. However, we realized that we didn't need to calculate all of the combinations. We found only all combinations that are shorter than a certain number of characters (25), realizing that anything longer than that could be recursively split into two shorter segments which would have solutions within the dictionary. Then, when filling in the images line by line, we see how long each color segment is and then pop off the next word in from the "reconstituted" dictionary. In the event that a segment of color is shorter than any of the words, we put in exclamation points as a place holder. This method allowed us to reproduce the image extremely precisely, but we are going to look into different solutions.

Another little interesting problem I'll touch on was that there were originally vertical bands of spaces running down the entirely orange segment in the Rothko you can see as the sample image above (the orange and blue one). We realized that this was because the line was always being split in half at the same place, leading to there being a space in the same place on each line at the junctions between the spliced segments. We added a random shift from 0 to 4 when splitting longer segments so to add noise and eliminate the band. Sweet!

It's really exciting to me now that more detailed images actually look better now than ones that are less so, like the above Rothko. Look, it's me!

Check it all out on github!

Wednesday, September 26, 2012


Hello friends. Long time, no post from me. This summer I was focusing my creative energies into making a food blog about my market share experience, the fruits of which gave me biological energies. It was a winning situation all around. Check it out, if you want to, at

Anyway, I've returned to this, my lovely domain, because I am going to try to blog about my experience this "semester" as part of batch[4] of Hacker School. It is a three month program where we become better programmers by just doing a lot of projects and talking about them. Hacker School is a neat environment because it seems to be very nurturing of people of all skill levels. I hope to use this blog as a way to chronicle my experiences, which I anticipate will be very different from the "scholastic" programming environments I'm used to.

Though my academic interests may have seemed to shift away from the human languages poked at by (psycho)linguistics and more towards programming languages, I really hope and anticipate that many of my projects will revolve around words, patterns, and networks. Other things too though...

Follow Hacker School at @hackerschool, if ya want.

I start Monday!

Tuesday, June 26, 2012

Interesting article about neural network research out of the Google X lab. They used 16,000 processors to be able to compile the concept of 'cat' without supervision. Actually, not that easy of a task. The article mentions that this success is great for the field of speech recognition.

Sunday, February 12, 2012

Logical Separation

I have been coding an HTML5 version of Brick Breaker for my Web Apps class and perhaps the most challenging part of it was grasping the concept that the game's logical updates are separate from its drawing updates.

Why was this so hard? It gets at the binding problem and more scientifically Treisman and Gelade's feature integration theory. In cognition, we conflate object and form. We perceived something shaped like a dog, and we assign it as a dog-concept. When we perceive an object, we see it as a whole object, composed of concept and form, not a mere representation or projection of that object. When programming the game in Object-Oriented JavaScript that updated an HTML5 canvas,  I needed to shift my cognitive framework to think in the latter way. The shapes on the screen are mere projections of the behind the scenes logical operations of the game. The objects themselves might change their position logically, but unless the forms are updated, you don't know. And that's really weird to think about, that you might change something about the location of objects but that isn't necessarily reflected in your environment. When a player plays the game, they see the moving ball as a complete ball, not a projection of a JavaScript ball object that is merely a bunch of numbers in a program. (And this doesn't even get to the lower level representation of the ball-object and ball-picture by the computer...).

In the beginning stages of my game, there was an even more off-putting bug. By the nature of the HTML5 canvas functionality, any drawings you do there are permanent unless you "paint over" them. So, as the ball moved, it left a trail behind it, a ghost in full color, a map of time passed as an object moved through the environment.

Challenges like these are why I love programming.