The paper says it needs sampling from previous and future frames, but during inference, there is only the beginning frame. Or is the first frame sampled from processed dataset equipped with phase features, then the network predicts the next phase? Does the network need to be trained on the whole dataset to successfuly generate next few frames?
Thank you for your clarification. @paulstarke