Figure 1: MNIST digit recognition sample
So in this blog post we’ll review an example of using a Deep Belief Network to classify images from the MNIST dataset, a dataset consisting of handwritten digits. The MNIST dataset is extremely well studied and serves as a benchmark for new models to test themselves against.
However, in my opinion, this benchmark doesn’t necessarily translate into real-world viability. And this is mainly due to the dataset itself where each and every image has been pre-processed — including cropping, clean thresholding, and centering.
In the real-world, your dataset will not be as “nice” as the MNIST dataset. Your digits won’t be as cleanly pre-processed.
Still, this is a great starting point to get our feet wet utilizing Deep Belief Networks and nolearn .
Deep learning is all about hierarchies and abstractions. These hierarchies are controlled by the number of layers in the network along with the number of nodes per layer. Adjusting the number of layers and nodes per layer can be used to provide varying levels of abstraction.
In general, the goal of deep learning is to take low level inputs (feature vectors) and then construct higher and higher level abstract “concepts” through the composition of layers. The assumption here is that the data follows some sort of underlying pattern generated by many interactions between different nodes on many different layers of the network.
Now that we have a high level understanding of Deep Learning concepts and assumptions, let’s look at some definitions to aide us in our learning.
Figure 2: Example of training a Deep Belief Network by constructing multiple Restricted Boltzmann Machines stacked on top of each other. Each layer consists of multiple nodes which feed into the next layer. (source)
Before we get to the code, let’s quickly discuss what Deep Belief Networks are, along with a bit of terminology.
This review is by no means meant to be complete and exhaustive. And in some cases I am greatly simplifying the details. But that’s okay. This is meant to be a gentle introduction to DBNs and not a hardcore review with tons of mathematical notation. If that’s what you’re looking for, then sorry, this isn’t the post for you. I would suggest reading up on the DeepLearning.net Tutorials (trust me, they are really good, but if this is your first exposure to deep learning, you might want to get through this post first).
Deep Belief Networks consist of multiple layers, or more concretely, a hierarchy of unsupervised Restricted Boltzmann Machines (RBMs) where the output of each RBM is used as input to the next.
The major breakthrough came in 2006 when Hinton et al. published their A Fast Learning Algorithm for Deep Belief Networks paper. Their seminal work demonstrated that each of the hidden layers in a neural net can be treated as an unsupervised Restricted Boltzmann Machine with a supervised back-propagation step for fine-tuning. Furthermore, these RBMs can be trained greedily — and thus were feasible as highly scalable and efficient machine learning models.
This notion of efficiency was further demonstrated in the coming years where Deep Nets have been trained on GPUs rather than CPUs leading to a reduction of training time by over an order of magnitude. What once took weeks, now takes only days.
From there, deep learning has taken off.
But before we get too far, let’s quickly discuss this concept of “layers” in our DBN.
The first layer is our is a type of visible layer called an input layer. This layer contains an input node for each of the entries in our feature vector.
For example, in the MNIST dataset each image is 28 x 28 pixels. If we use the raw pixel intensities for the images, our feature vector would be of length 28 x 28 = 784, thus there would be 784 nodes in the input layer.
From there, these nodes connect to a series of hidden layers. In the most simple terms, each hidden layer is an unsupervised Restricted Boltzmann Machine where the output of each RBM in the hidden layer sequence is used as input to the next.
The final hidden layer then connects to an output layer.
Finally, we have our another visible layer called the output layer. This layer contains the output probabilities for each class label. For example, in our MNIST dataset we have 10 possible class labels (one for each of the digits 1-9). The output node that produces the largest probability is chosen as the overall classification.
Of course, we could always sort the output probabilities and choose all class labels that fall within some epsilon of the largest probability — doing this is a good way to find the most likely class labels rather than simply choosing the one with the largest probability. In fact, this is exactly what is done for many of the popular deep learning challenges, including ImageNet.
Now that we have some terminology, we can jump into the code.
Alright, time for the fun part — let’s write some code.
It is important to note that this tutorial (by in large) is based on the excellent example on the nolearn website. My goal here is to simply take the example, tweak it slightly, as well as throw in a few extra demonstrations — and provide a detailed review of the code, of course.
Anyway, open up a new file, name it dbn.py , and let’s get started.
We’ll start by importing the packages that we’ll need. We’ll import train_test_split (to generate our training and testing splits of the MNIST dataset) and classification_report (to display a nicely formatted table of accuracies) from the scikit-learn package. We’ll import the dataset module from scikit-learn to download the MNIST dataset.
Next up, we’ll import our Deep Belief Network implementation from the nolearn package.
And finally we’ll wrap up our import statements by importing NumPy for numerical processing and cv2 for our OpenCV bindings.
Let’s go ahead and download the MNIST dataset:
We make a call to the fetch_mldata function on Line 13 that downloads the original MNIST dataset from the mldata.org repository.
The actual dataset is roughly 55mb so it may take a few seconds to download. However, once the dataset is downloaded it is cached locally on your machine so you will not have to download it again.
If you take the time to examine the data, you’ll notice that each feature vector contains 784 entries in the range [0, 255]. These values are the grayscale pixel intensities of the flattened 28 x 28 image. Background pixels are black (0) whereas foreground pixels appear to be lighter shades of gray or white.
Time to generate our training and testing splits:
In order to train our Deep Belief network, we’ll need two sets of data — a set for training our algorithm and a set for evaluating or testing the performance of the classifier.
We perform the split on Lines 17 and 18 by making call to train_test_split. The first argument we specify is the data itself, which we scale to be in range [0, 1.0]. The Deep Belief Network assumes that our data is scaled in the range [0, 1.0] so this is a necessary step.
We then specify the “target” or the “class labels” for each feature vector as the second argument.
The last argument to train_test_split is the size of our testing set. We’ll utilize 33% of the data for testing, while the remaining 67% will be utilized for training our Deep Belief Network.
Speaking of training the Deep Belief Network, let’s go ahead and do that:
We initialize our Deep Belief Network on Lines 23-28.
The first argument details the structure of our network, represented as a list. The first entry in the list is the number of nodes in our input layer. We’ll want to have an input node for each entry in our feature vector list, so we’ll specify the length of the feature vector for this value.
Our input layer will now feed forward into our second entry in the list, a hidden layer. This hidden layer will be represented as RBM with 300 nodes.
Finally, the output of the 300 node hidden layer will be fed into the output layer, which consists of an output for each of the class labels.
We can then define our learn_rate , which is the learning rate of the algorithm, the decay of the learn rate ( learn_rate_decays ), the number of epochs , or iterations of the training data, and the verbosity level.
Both learn_rates and learn_rates_decays can be specified as a single floating point values or a list of floating point values. If you specify only a single value, this learning rate/decay rate will be applied to all layers in the network. If you specify a list of values, the the corresponding learning rate and decay rate will be used for the respective layers.
Training the actual algorithm takes place on Line 29. If you have a slow machine, you way want to make a cup of coffee or go for a quick walk during this time.
Now that our Deep Belief Network is trained, let’s go ahead and evaluate it:
Here we make a call to the predict method of the network on Line 33 which takes our testing data and makes predictions regarding which digit each image contains. If you have worked with scikit-learn at all, then this should feel very natural and comfortable.
We then present a table of accuracies on Line 34.
Finally, I thought it might be interesting to inspect images individually rather than on aggregate as a further demonstration of the network:
On Line 37 we loop over 10 randomly chosen feature vectors from the test data.
We then predict the digit in the image on Line 39.
To display our image on screen, we need to reshape it on Line 43. Since our data is in the range [0, 1.0], we first multiply by 255 to put it back in the range [0, 255], change the shape to be a 28 x 28 pixel image, and then change the data type from floating point to an unsigned 8-bit integer.
Finally, we display the results of the prediction on Lines 46-48.
Now that the code is done, let’s look at the results.
Fire up a shell, navigate to your dbn.py file, and issue the following command:
If all goes well, you should have something similar to my output below:
Here you can see that our Deep Belief Network is trained over 10 epochs (iterations over the training data). At each iteration our our loss function is minimized and the error on the training set is lower.
Taking a look at our classification report we see that we have obtained 98% accuracy (the precision column) on our testing set. As you can see, the “1” and “7” digits was accurately classified 99% of the time. We could have perhaps obtained higher accuracy for the other digits had we let our network train for more epochs.
And below we can see some screenshots of our Deep Belief Network correctly classifying the digit in their respective images.
Note: You’ll notice that the loss, error, and accuracy values do not 100% match the output above. That is because I gathered these sample images on a separate run of the algorithm. Deep Belief Networks are stochastic algorithms, meaning that the algorithm utilizes random variables; thus, it is normal to obtain slightly different results when running the learning algorithm multiple times. To account for this, it is normal to obtain multiple sets of results and average them together prior to reporting final accuracies.
Figure 3: Correctly classifying a “1” digit using our Deep Belief Network.
Here we can see that we have correctly classified the “1” digit.
Figure 4: Correctly classifying a “4” digit using our Deep Belief Network.
Again, we can see that our digit is correctly classified.
But take a look at this “8” digit below. This is far from a “legible digit”, but the Deep Belief Network is still able to sort it out:
Figure 5: Correctly classifying a “8” digit using our Deep Belief Network.
Finally, let’s try a “7”:
Figure 6: Correctly classifying a “7” digit using our Deep Belief Network.
Yep, that one is correctly classified as well!
So there you have it — an brief, gentle introduction to Deep Belief Networks.
In this post we reviewed the structure of a Deep Belief Network (at a very high level) and looked at the nolearn Python package.
We then utilized nolearn to train and evaluate a Deep Belief Network on the MNIST dataset.
If this is your first experience with DBNs, I highly recommend that you spend the next few days researching and reading up on Artificial Neural Networks (ANNs); specifically, feed-forward networks, the back-propagation algorithm, and Restricted Boltzmann Machines.
Honestly, if you are serious about exploring Deep Learning, the algorithms I mentioned above are required, non-optional reading!
You won’t get very far into deep learning without reading up on these techniques. And don’t be afraid of the academic papers either! That’s where you’ll find all the gory details.
Training a Deep Belief Network on a CPU can take a long, long time.
Luckily, we can speed up the training process using our GPUs, leading to training times being reduced by an order of magnitude or more.
In my next post I’ll show you how to setup your system to train a Deep Belief Network on your GPU. I think the speedup in training time will be quite surprising…
Be sure to enter your email address in the form at the bottom of this post to be updated when the next post goes live! You definitely won’t want to miss it.
Figure 6: Learn how to use HOG and a Linear Support Vector Machine to recognize handwritten text in my Practical Python and OpenCV book.
Did you enjoy this post on handwriting recognition
If so, you’ll definitely want to check out my Practical Python and OpenCV book!
Chapter 6, Handwriting Recognition with HOG details the techniques the pro’s use…allowing you to become a pro yourself! From pre-processing the digit images, utilizing the Histogram of Oriented Gradients (HOG) image descriptor, and training a Linear SVM, this chapter covers handwriting recognition from front-to-back.
Simply put — if you loved this blog post, you’ll love this book.
Sound interesting
Click here to pickup a copy of the Practical Python and OpenCV
百度搜索“就爱阅读”,专业资料,生活学习,尽在就爱阅读网92to.com,您的在线图书馆!
来源: http://www.92to.com/bangong/2017/08-30/27832949.html