Monday, December 3, 2012

Project: Synesthesia

Recently in Hacker School, I finished a project using Python I called "Synesthesia." The concept was devised by my housemate Natan, but it seemed to be an interesting engineering problem that would allow me to revisit my psycholinguistics knowledge. And then we can hang our art in our living room.

The concept: A generative art program into which you feed an image and you get back another image such that all the colored regions in the original are replaced with words that are related to that colors. So,  a previously red patch might sport the words "cherry," "communism," and "balloon." Here's what happens:

This program has several parts, the logic behind each I will explain in more detail later. They are:

  1. Making the image's color palette simpler based on what the user's preferences. For the image above, I chose green, blue, orange, and black.
  2. Figuring out which words are related to those selected colors
  3. Putting the words where the colors are

1. Simplifying the Image:

In order to best figure out what words should go where, I needed to simplify the color space of the image. So, we as humans know that the blue section in the original above is all, well, blue. However, neighboring pixels in image are a) different colors and b) defined not by their names but by the amount of red, green, and blue they comprise. This trio is called the "RGB value."

To do this, I used a nice module called webcolors that allowed me to make a list of the RGB value for each of the colors specified by the user. Then, I calculated the Euclidean distance between each pixel in the original image and the list of RGB values I'd created. Subsequently, I replaced the pixel with its nearest approximate, creating a simplified image.

2. Related Words:

In order to get a list of words that are related with the colors in the image, I appealed to the idea that related words co-occur with one another. I created a module that will find words that co-occur with any target word. To do this, I first scraped Wikipedia articles that come up when you do a search for the target word, compiling them all into a single corpus. I did this using Beautiful Soup. Following that, I created a frequency distribution of words that occur within a certain distance of the target word within it's scraped corpus, seeing how many times each word appeared. I then used this distribution to determine which words co-occur significantly with the target.

3. Creating the Art:

To create the "synesthesized" image, I started at the top left corner, figured out what color the pixel was and popped the first word related to that color from the list of related words, and put in a new image in the corresponding space. Each letter in the word stood in for a pixel, and each word is separated by a space, which is also a pixel. The "color" of the word is determined by the color of the pixel-space it begins on. So, if the word "cat" were placed, the next word would begin 4 pixel-spaces after "cat" began. If the next word in line for the color doesn't fit on the line, it's put back in the stack and I start again at the new line. I know that's a bit confusing, so here's a visualization:

I admit that this is a lazy approach, which both muddles the image and makes the right side of the image have a terribly jagged border. I should perhaps create a dynamic programming algorithm (it's a knapsack-y problem) that will pick words based on which ones with fit in the amount of the color that is left.

EDIT 12/5/12:
Writing this post and having to defend all my lazy design decisions made me realize that well, I was just being lazy and I could learn so much from actually solving the problem instead of just saying "close enough." I wanted to fill in each line of color without wasting space or running off the edge of the color's space. Sounds like a linear packing problem to me! A month of Hacker School has passed since I made my original (lazy) design decision, and in the past month, I've realized that I can solve any hard problem if I just work enough at it. Even an NP-hard problem!

I paired up with another Hacker Schooler, Betsy, because two brains are better than one, and we set to making a better painting algorithm. After joking a bit about solving the bin-packing problem and then getting a Field's medal, we decided that just making this algorithm better would be sufficient, resigning to our O(n!) fate. The way to find the most optimal solution is to find all the solutions and choose the best one.

Quite expensive yes, but we know storage is our best friend. Because the corpuses from which the co-occurrences are calculated aren't scraped freshly every time, we realized that we could store the results from our combinatorics in a pickled dictionary keyed by color and then by length, greatly reducing the online runtime. However, we realized that we didn't need to calculate all of the combinations. We found only all combinations that are shorter than a certain number of characters (25), realizing that anything longer than that could be recursively split into two shorter segments which would have solutions within the dictionary. Then, when filling in the images line by line, we see how long each color segment is and then pop off the next word in from the "reconstituted" dictionary. In the event that a segment of color is shorter than any of the words, we put in exclamation points as a place holder. This method allowed us to reproduce the image extremely precisely, but we are going to look into different solutions.

Another little interesting problem I'll touch on was that there were originally vertical bands of spaces running down the entirely orange segment in the Rothko you can see as the sample image above (the orange and blue one). We realized that this was because the line was always being split in half at the same place, leading to there being a space in the same place on each line at the junctions between the spliced segments. We added a random shift from 0 to 4 when splitting longer segments so to add noise and eliminate the band. Sweet!

It's really exciting to me now that more detailed images actually look better now than ones that are less so, like the above Rothko. Look, it's me!

Check it all out on github!

Wednesday, September 26, 2012


Hello friends. Long time, no post from me. This summer I was focusing my creative energies into making a food blog about my market share experience, the fruits of which gave me biological energies. It was a winning situation all around. Check it out, if you want to, at

Anyway, I've returned to this, my lovely domain, because I am going to try to blog about my experience this "semester" as part of batch[4] of Hacker School. It is a three month program where we become better programmers by just doing a lot of projects and talking about them. Hacker School is a neat environment because it seems to be very nurturing of people of all skill levels. I hope to use this blog as a way to chronicle my experiences, which I anticipate will be very different from the "scholastic" programming environments I'm used to.

Though my academic interests may have seemed to shift away from the human languages poked at by (psycho)linguistics and more towards programming languages, I really hope and anticipate that many of my projects will revolve around words, patterns, and networks. Other things too though...

Follow Hacker School at @hackerschool, if ya want.

I start Monday!

Tuesday, June 26, 2012

Interesting article about neural network research out of the Google X lab. They used 16,000 processors to be able to compile the concept of 'cat' without supervision. Actually, not that easy of a task. The article mentions that this success is great for the field of speech recognition.

Sunday, February 12, 2012

Logical Separation

I have been coding an HTML5 version of Brick Breaker for my Web Apps class and perhaps the most challenging part of it was grasping the concept that the game's logical updates are separate from its drawing updates.

Why was this so hard? It gets at the binding problem and more scientifically Treisman and Gelade's feature integration theory. In cognition, we conflate object and form. We perceived something shaped like a dog, and we assign it as a dog-concept. When we perceive an object, we see it as a whole object, composed of concept and form, not a mere representation or projection of that object. When programming the game in Object-Oriented JavaScript that updated an HTML5 canvas,  I needed to shift my cognitive framework to think in the latter way. The shapes on the screen are mere projections of the behind the scenes logical operations of the game. The objects themselves might change their position logically, but unless the forms are updated, you don't know. And that's really weird to think about, that you might change something about the location of objects but that isn't necessarily reflected in your environment. When a player plays the game, they see the moving ball as a complete ball, not a projection of a JavaScript ball object that is merely a bunch of numbers in a program. (And this doesn't even get to the lower level representation of the ball-object and ball-picture by the computer...).

In the beginning stages of my game, there was an even more off-putting bug. By the nature of the HTML5 canvas functionality, any drawings you do there are permanent unless you "paint over" them. So, as the ball moved, it left a trail behind it, a ghost in full color, a map of time passed as an object moved through the environment.

Challenges like these are why I love programming.

Tuesday, May 24, 2011

Post Script...

It's been interesting to see how I've progressed since I started the blog. See my October 5th entry!

Psychological Word Space

This is the final project for the Computational Cognitive Science class I took this semester. The graph is a comparison of two mental words space approximators...

Step 1: Latent Semantic Analysis
The first was a purely text-based analysis called Latent Semantic Analysis. The data is represented by the red x's in the picture. What this algorithm does, in short, is sees how many times a bunch of words appear in a bunch of documents and from that can give a pretty good picture of the relationship between words. Cool!

To implement LSA, we needed a huge corpus of text to evaluate. Websites about LSA recommended using over 1000 documents, and our specific algorithm needed all these texts in a single plain text document. To get this, we first used a pretty simple bash script to collect 1000 URLs from Google Blog Search for "education" using a text-based browser called Lynx. Be aware that this is actually against Google’s Terms of Service and we ran into a little trouble with Google noticing some funny activity from our IP address. (All in the name of science!). (If you are going to attempt this project, I encourage you to plan ahead and look into this) Then, we ran another pretty simple script that again used Lynx and went through the URL list, pulling the text from each of the websites, and putting them all into a single plain text document. (The code was not developed by me, so I do not feel comfortable sharing it here!) The next step was to pass the corpus to a script from from a MATLAB toolbox that made from it a document-term matrix. In doing this, we also asked the script to disregard stop words, high-frequency words low in content that help us to form meaningful syntax, as is common practice in natural language processing. From this matrix, we were able to determine the 100 most prevalent words in the corpus. Despite fears we had about the quality of a corpus gathered via automated script from the Internet, the most popular was indeed our search term, “education.” From these 100 words, we pulled 10 that were salient in the list and that we thought would make for meaningful comparisons for the second part of our experiment. To make the matrix manageable, we created a smaller document-term matrix with just these 10 terms, then ran a dimensionality reduction algorithm called singular value decomposition (also in that MATLAB toolbox) on it. We then plotted the data using the second singular value as the X and third singular value as the Y, resulting in a word space. We chose not to use the first singular value because it actually indicates the number of times that word has been used in all documents, and thus it made more sense to use the second and third.

Step 2: Multidimensional Scaling
We also wanted to compare the findings using LSA to a human psychological space. To do this, we first gathered all unique pairwise similarity ratings of salient terms from the list of the top 100 most popular words in the LSA corpus: college, university, state, students, teacher, school, research, business, government, job. This resulted in 45 unique comparisons. We gathered this data from 116 subjects using Google forms. After translating this data to a similarity matrix, we ran a different dimensionality reduction algorithm, called multidimensional scaling, on it to make a plot-able data set that supposedly represents the psychological word space surrounding the concept of education (The the green o's on the graph).

Step 3: Comparison!
Upon comparing the output of SVD and MDS run just for the ten terms and then graphed, we found similarities between the two graphs. Yay! Our project worked! Most striking was the fact that “state” was far away from the rest of the words, off to one side, but on opposite sides in each graph. Also in both plots, “students” was on the opposite side of the graph from “state”. Upon realizing this, we realized that both LSA and MDS had created a similar spectrum from learning (“student”) to bureaucracy (“state”). LSA showed that “job” and “business” were similar concepts, as well as “college” and “university”. Unfortunately, everything in MDS was close to each other and therefore we cannot comment as meaningfully on it as we can on the LSA plot. Due to the striking similarities we mentioned earlier, we decided that, legitimately or not, we would superimpose the graphs, flipping the LSA graph so that “state” and “student” matched in sides. This was accomplished by multiplying the SVD matrix by -1. This plot can be seen in Figure 3. In doing this, we realized that the more bureaucracy-oriented terms matched pretty closely while the learning-oriented ones did not.

Overall, we found that our intuitions about the similarity of 10 terms was actually captured better by the LSA plot than by the MDS plot. We think this might have had something to do with the fact that people had trouble with the task, treating it often as an all or nothing rating rather than a scale that would be appropriate for approximating a psychological space. If we were to make adjustments to this part, we would choose instead a three-way comparison task, forcing people to make judgements that our previous two-way comparison task did not properly encourage.

Yeah, it looks like a measly graph with x's and o's and twenty words, but it's a guess at our mental lexicon. It was quite fun to develop and code. Here's to a future in computational psycholinguistics!

Monday, January 24, 2011

Taking a Wrecking Ball to the Tower of Babel

At about 40:30, this sensational documentary by Hans Rosling talks about the efforts Google is making in real-time translation so people speaking different languages can communicate in a flash. I'm still parsing through my thoughts on it. On the one hand, it is fascinating and mind-blowing that this technology can exist.

My torn-ness revolves around the place of the variety of different languages that there are. Does language difference serve a purpose? Sapir, Whorf, and the theory of linguistic relativity would say that because these people speak different languages, they think differently, at least to an extent. So does it go the other way too? Does diversity of language reflect the nuances in culture? This is getting a little out of my purview, but it makes me think— is there any advantage to having the thousands of different languages that people across the globe speak? In this global age, is it just a burden to progress? The idea of getting people across the world able to communicate has been around for ages, probably not even starting with the creators of Esperanto. But why don't we speak Esperanto these days? I really don't know. It seems awesome to me. Did it fail because it was a synthetic language?

But this Google project seems a little different. It lets people have the cultural differences, to allow them those nuances that their language provides them, and then translates it. There is a reason why the term "lost in translation" exists. Can Google's gizmo get good enough at translating the extra-linguistic nuances? This becomes super relevant then with the real-time audio translation. Supralinguistic features of language (inflection, intonation, etc.) are different across different languages. Will Google's translator-voice take that into account? I mean, even human translation is not perfect...

I'm excited to see what comes of this. Imagine the globalization opportunities!