Skip to content

thiennhanng191/essay_title_generator

Repository files navigation

Generate Essay Titles with Context

Nhan Nguyen - CMPU 366 Spring 2021

The following files are included in this submission folder:

  • user_interact.py:
    • Python program for user to try out the models through command-line interface
    • Run with "python3 user_interact.py"
    • May take a while to load (~ 1min for my machine)
  • NOTE: I am using pyinquirer lib version 1.0.3, which requires prompt-toolkit==1.0.14 However, if you run .ipynb file, it will need (and it would automatically install) a higher version of prompt-toolkit (3.0.18 for my machine) (as shown in this issue here on the library's GitHub CITGuru/PyInquirer#1 (comment)) So if you happen to run the .ipynb, or if the error "cannot import name 'Token' from 'prompt_toolkit.token'" ever comes up please run "pip3 install prompt_toolkit==1.0.14" before running user_interact.py.
  • eval_pred.ipynb: including code to generate the ROUGE-1 and ROUGE-L F-similarity score for the models' predictions on 1000 samples in the validate set

  • ./sample_essays:

    • Folder including .txt files containing some example essays that can be used to try out the model (e.g. through user_interact.py)
  • ./inference:

    • Folder containing code for inference models
  • ./build_model:

    • .ipynb files: Jupyter Notebook containing code to build and train models
      • These notebooks were run on Google Colaboratory as well as Vassar's lambda-quad machine so the model names and file paths included are different, thus they may not compile successfully locally here
    • articles_2.json: JSON file containing 100,000 articles used to train the models
    • glove.6B.200d.txt: GloVe embeddings
    • attention.py: custom Bahdanau Attention layer in Keras
  • ./models:

    • Folder containing the saved models
    • Not included here due to large size
  • ./training_val_nd_array:

    • Tokenized training and validate sets of word sequence for each model
    • They should be the same for all of the models, only separated for consistency in naming file paths
  • ./word_idx_dict:

    • Folder containing dictonaries in the form of JSON of words in the corpus of each model and their corresponding unique token id
  • ./training_history:

    • folder containing the training results of the models in the form of csvs
  • ./util:

    • preprocess_text.py: python program to preprocess the input essay/article
  • requirements.txt:

    • List of dependencies to install before running user_interact.py

Code reference:

About

RNN seq-2-seq models with Bi-LSTM / LSTM encoder and LSTM decoder using attention to generate titles from essays with context

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors