wevi: Word Embedding Visual Inspector
This tool visualizes the basic working mechanism of word2vec, a popular word embedding model, originally published by Mikolov et al. (https://code.google.com/p/word2vec/). It is the result of a few days' hack during my preparation for the a2-dlearn (Ann Arbor Deep Learning Meetup) event hosted by Daniel Pressel.
Here are the video and slides of my talk. To read more about this work, please read my paper explaining the parameter learning of word embedding. I am planning to keep adding functionalities to the tool, and it is greatly appreciated if you could help in the development process! The repository is at https://github.com/ronxin/wevi.
The model by default has randomly initialized weights, so you will need to train it in order to see some interesting patterns emerge. You can simply click on the button, and it will train the model using 500 training instances. When it stops, you can hover your mouse over some neuron on the input layer, and that neuron will be set as active, and you can inspect the activation levels of the neurons, like this:
You may also click on the button, and it will take one training instance at a time, so that you can inspect it at your own pace. Clicking on the button will make it recalculate PCA given the current positions of the vectors. This will usually give you a better layout.
Load the "King and Queen" preset, train 1000 iterations (click on the button, wait until it stops, and clicks again). Then click on the button. Look into the lower-right scatterplot, check the blue markers (input vectors), and you can see the infamous analogy: "king - queen = man - woman."
In addition to this, each of the presets bears some interestingness in it. You are encouraged to explore on your own!
Yes. But Chrome is the recommended browser. The font looks weird on Firefox.
Yes, you may follow the same format in the preset training data textbox. Note that you may also specify multiple words as context or target words. Just separate the words with ^. Input pairs should be separated by |. Vocabulary will be automatically derived based on your given training data. Below are some examples:
I only spent a few days on this, so I picked a linear transformation that I was most familiar with. These more advanced dimensionality reduction methods are on the TODO list, you are welcome to help me by just sending in pull requests!
I implemented weight initialization, feedforward and backpropagation by myself, because they are actually very simple to implement. I used an existing package for PCA.
The inputs and outputs of wevi are identical to the neural network portion of word2vec. Of course, the latter has components that actually process input sentences and generate context/target word pairs. Below is an illustration of the entire word2vec working pipeline, and wevi simulates the part in the red circle.
Yes! Please do whatever you want! Pull requests on github are more than welcome! The repository is at https://github.com/ronxin/wevi.
Yes. Here are a list of things on my mind. Please let me know if you would like to pick up any of these and just shoot me an email (see "How to contact you").
Sorry... I did not (and perhaps will not) have much time working on that. But the code should be quite self-explanatory. I tried to leave useful comments here and there.
Please create an "issue" in https://github.com/ronxin/wevi.
Just send me an email to firstname.lastname@example.org and I will be happy to reply.