Introduction to Neural Networks

Q1: I am having some trouble understanding backpropagation when training the neural net.

Resource:

Q2: What's the perceptron algorithm?

Resource:

What the Hell is Perceptron?

Q3: How to find the optimal learning rate?

This paper by Leslie Smith is a great resource in finding the optimal learning rate: A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay . You can find implementation of this paper in this blog: Estimating an Optimal Learning Rate For a Deep Neural Network

Q4: What is cross entropy Loss?

Resource:

Understanding binary cross-entropy / log loss: a visual explanation

Q5: What is bias?

Resource:

Q6: What is Gradient Descent?

Answered by @Clement:

Batch Gradient Descent also just known as Gradient Descent usually loads in the entire training examples (dataset) into the network at one go and update the weights based on all the training examples. Stochastic Gradient Descent loads 1 training example at one go and update the weights using only that training example. Lastly, Mini-Batch Gradient Descent is a combination of the two. Mini-Batch Gradient descent instead of taking the entire dataset takes in N batch size. Where N is the number of training examples you can choose. These N training examples are loaded into the network and are used to update the weights once. And subsequent N batches will continue to update the weights until the entire data has been seen.

Resource:

Gradient descent, how neural networks learn | Deep learning, chapter 2

Q7: In softmax function why do we take exponential?

Resource:

In softmax classifier, why use exp function to do normalization?

Q8: Why do we need activation function?

Answered by @Clement:

Hi, the purpose of an activation is to introduce non-linearity into the neural network. Essentially, when we are first building Neural Networks, the formula where, y = w1x1 + w2x2 + b is a linear function, this means that it can only linearly separate data points using a line. Adding the non-linearity i.e. activation function allows the model to form different boundary instead of it just being a line.

Resource:

Activation functions and it’s types-Which is better?

Q9: What's the difference between np.dot(), np.matmul(), for matrix multilication and when to use them?* Resource: numpy.dot vs numpy.matmul

Q10: Having a problem with gradient descent ?

Here is a tutorial on Gradient Descent with numpy (using the notebook provided by Udacity). Created by @Beata.

Q11: Are there any notes for these lessons?

There are notes created by our fellow scholars. You can refer to these notes through this spreadsheet created and maintained by @DylanGoh.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction to Neural Networks

FilesExpand file tree

lesson2.md

Latest commit

History

lesson2.md

File metadata and controls

Introduction to Neural Networks