Q1: I am having some trouble understanding backpropagation when training the neural net.
Resource:
- Michael Nielsen: Neural Networks and Deep Learning - Chapter 2
- Getting Started with PyTorch Part 1: Understanding how Automatic Differentiation works
Q2: What's the perceptron algorithm?
Resource:
Q3: How to find the optimal learning rate?
This paper by Leslie Smith is a great resource in finding the optimal learning rate: A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay . You can find implementation of this paper in this blog: Estimating an Optimal Learning Rate For a Deep Neural Network
Q4: What is cross entropy Loss?
Resource:
Q5: What is bias?
Resource:
Q6: What is Gradient Descent?
Answered by @Clement:
Batch Gradient Descent also just known as Gradient Descent usually loads in the entire training examples (dataset) into the network at one go and update the weights based on all the training examples. Stochastic Gradient Descent loads 1 training example at one go and update the weights using only that training example. Lastly, Mini-Batch Gradient Descent is a combination of the two. Mini-Batch Gradient descent instead of taking the entire dataset takes in N batch size. Where N is the number of training examples you can choose. These N training examples are loaded into the network and are used to update the weights once. And subsequent N batches will continue to update the weights until the entire data has been seen.
Resource:
Q7: In softmax function why do we take exponential?
Resource:
Q8: Why do we need activation function?
Answered by @Clement:
Hi, the purpose of an activation is to introduce non-linearity into the neural network. Essentially, when we are first building Neural Networks, the formula where, y = w1x1 + w2x2 + b is a linear function, this means that it can only linearly separate data points using a line. Adding the non-linearity i.e. activation function allows the model to form different boundary instead of it just being a line.
Resource:
Q9: What's the difference between np.dot(), np.matmul(), for matrix multilication and when to use them?* Resource: numpy.dot vs numpy.matmul
Q10: Having a problem with gradient descent ?
- Here is a tutorial on Gradient Descent with numpy (using the notebook provided by Udacity). Created by @Beata.
Q11: Are there any notes for these lessons?
There are notes created by our fellow scholars. You can refer to these notes through this spreadsheet created and maintained by @DylanGoh.