Skip to content

Latest commit

 

History

History
47 lines (31 loc) · 2.33 KB

File metadata and controls

47 lines (31 loc) · 2.33 KB

Recurrent Neural Networks

Q: I found this lesson difficult to understand. What additional resources are recommended?

  • At the beginning of the lesson Luis Serrano suggests these resources:

    Understanding LSTM Networks blogpost from Chris Olah

    Exporing LSTMs blogpost from Edwin Chen

    The Unreasonable Effectiveness of Recurrent Neural Networks blogpost from Andrej Karpathy

    Lecture on RNNs and LSTMs from Stanford University’s CS231by Andrej Karpathy

  • From @Vlad:

LSTMs from Richard Socher and Stanford NLP for mathematical but clean explanations.

Q: Please explain the significance of n_hidden in nn.LSTM(input_size, n_hidden, n_layers, dropout=drop_prob, batch_first=True)

  • Answered by @José Fernández Portal:

n_hidden defines the size of your hidden state. The hidden state is a tensor that RNN outputs in every sequence step (t) and is the input for the next sequence step (t+1). In your diagram, it is represented by the right arrows. Basically, the hidden state carry information along the sequence. Regarding its size (n_hidden), I think that a bigger hidden state will allow to transfer more information along the sequence, but it becomes harder to train.

Q: Are there any resources to help in the understanding of LSTM batches and sequences?

  • Answered by @sundeep:

A helpful video from course instructor Mat.

A step-through of the sizes used in the Anna Karenina text character example may help in understanding how batches work here.

Q: In get_batches is there a more elegant way of creating y? Character_Level_RNN_Solution.ipynb

# The targets, shifted by one
        y = np.zeros_like(x)
        try:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, n+seq_length]
        except IndexError:
            y[:, :-1], y[:, -1] = x[:, 1:], arr[:, 0]
  • Try the numpy command to roll array elements along an axis numpy.roll(a, shift, axis=None)
# The targets, shifted by one
        y = np.roll(x,-1)