We will need to be able to extract all sentences that use the word Galaxy from an input document. This implies that we are able to split an input document on sentence boundaries.
NLTK will be sufficient for testing and development, but may not be sufficient (time or space) in large scale production. Consider using Stanford CoreNLP, Apache OpenNLP, or something else as a standalone service for common tasks like tokenization and sentence splitting. The Lappsgrid can provide standalone Dockerized services for this that communicate via REST or AMQP.
We will need to be able to extract all sentences that use the word Galaxy from an input document. This implies that we are able to split an input document on sentence boundaries.
NLTK will be sufficient for testing and development, but may not be sufficient (time or space) in large scale production. Consider using Stanford CoreNLP, Apache OpenNLP, or something else as a standalone service for common tasks like tokenization and sentence splitting. The Lappsgrid can provide standalone Dockerized services for this that communicate via REST or AMQP.