Home | Jason {osa-jima}

word2vec - Negative Sampling

Tue 10 December 2019

In the original word2vec paper, the authors introduced Negative Sampling, which is a technique to overcome the computational limitations of vanilla Skip-Gram. Recall that in the previous post, we had a vocabulary of 6 words, so the output of Skip-Gram was a vector of 6 binary elements. However, if we had a vocabulary of, say 170,000 words, we'd find it difficult to compute our loss function for every step of training the model.

In this post, we will discuss the changes to Skip-Gram using negative sampling and update our Tensorflow word2vec implementation to use it.

machine learning natural language processing word2vec

word2vec - CBOW and Skip-Gram

Mon 09 December 2019

word2vec is an iterative model that can be used to create embeddings of words (or embeddings of pretty much anything). In this post, we will talk briefly about why you would want to use word2vec, break down the Continuous Bag of Words (CBOW) and skip gram word2vec model, and implement it in tensorflow.

machine learning natural language processing word2vec

Singular Value Decomposition

Fri 22 November 2019

Some quick notes about Singular Value Decomposition (SVD) to develop an intuition that will help solve problems related to collaborative filtering, natural language processing (NLP), dimensionality reduction, image compression, denoising data etc.

machine learning linear algebra natural language processing

Convolutional Networks - VGG16

Sat 18 August 2018

The Imagenet Large Scale Visual Recognition Challenge (ILSVRC) is an annual computer vision competition. Each year, teams compete on two tasks. The first is to detect objects within an image coming from 200 classes, which is called object localization. The second is to classify images, each labeled with one of 1000 categories, which is called image classification.

In 2012, Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hinton won the competition by a sizable margin using a convolutional network (ConvNet) named AlexNet. This became a watershed moment for deep learning.

Two years later, Karen Simonyan and Andrew Zisserman won 1st and 2nd place in the two tasks described above. Their model was also a ConvNet named VGG-19. VGG is the acronym for their lab at Oxford (Visual Geometry Group) and 19 is the number of layers in the model with trainable parameters.

What attracted me to this model was its simplicity - the model shares most of the same basic architecture and algorithms as LeNet5, one of the first ConvNets from the 90s. The main difference is the addition of several more layers (from 5 to 19), which seems to validate the idea that deeper networks are able to learn better representations (this trend continues with the introduction of Residual Networks, which won IVCLR the following year with a whopping 152 layers).

neural networks machine learning convolutional networks VGG

The Math behind Neural Networks - Backpropagation

Wed 18 July 2018

The hardest part about deep learning for me was backpropagation. Forward propagation made sense; basically you do a bunch of matrix multiplications, add some bias terms, and throw in non-linearities so it doesn't turn into one large matrix multiplication. Gradient descent also intuitively made sense to me as well; we want to use the partial derivatives of our parameters with respect to our cost function (\(J\)) to update our parameters in order to minimize \(J\).

The objective of backpropagation is pretty clear: we need to calculate the partial derivatives of our parameters with respect to cost function (\(J\)) in order to use it for gradient descent. The difficult part lies in keeping track of the calculations, since each partial derivative of parameters in each layer rely on inputs from the previous layer. Maybe it's also the fact that we are going backwards makes it hard for my brain to wrap my head around it.

neural networks machine learning

The Math behind Neural Networks - Forward Propagation

Wed 18 July 2018

When I started learning about neural networks, I found several articles and courses that guided you through their implementation in numpy. But when I started my research, I couldn't see past these basic implementations. In other words, I couldn't understand the concepts in research papers and I couldn't think of any interesting research ideas.

In order to go forward I had to go backwards. I had to relearn many fundamental concepts. The two concepts that are probably the most fundamental to neural networks are forward propagation and backpropagation. I decided to write two blog posts explaining in depth how these two concepts work. My hope is that by the end of this two part series you will have a deeper understanding of the fundamental underpinnings of both.

neural networks machine learning