Wednesday, January 14, 2015

Understanding of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) is one of the popular deep learning algorithms. CNNs were proposed as a deep learning framework that is motivated by minimal data preprocessing requirements. It can also be regarded as a feature extraction process.

CNNs is a family of multi-layer neural networks. It is comprised of one or more convolutional layers (often with a subsampling layer) and then followed by one or more fully connected layers as in standard multilayer neural network.

The architecture of a CNNs is designed to take advantage of the 2D structure of  an input image. This is achieved with local connections and tied weights followed by some form of pooling which results in translation invariant features. Another benefit of CNNs is that they are easier to train and have many fewer parameters than fully connected networks with the same number of hidden units because of weight sharing strategy.

Given an input RGB image (M x N), the first convolutional layer has k filters (or weight matrix) of size m x n x q. Here, q equals 3 (the number of input maps: RGB); k is the number of output maps. Actually, these filters can be represented as a 4D tensor with size k x q x m x n. The size of output map should be (M-m+1) x (N-n+1) because of convolution. If the next layer is subsampling layer, each map is then subsampled typically with mean or max pooling over p x p contiguous regions. Either before or after the subsampling layer, an additive bias and sigmoid (or tanh) nonlinearity is applied to each feature map. The outputs of hyperbolic tangent function will typically be near zero, the outputs of a sigmoid will be non-zero on average. However, normalizing your training data to have mean 0 and variance 1 along the features can often improve convergence during gradient descent.




No comments:

Post a Comment