Skip to content

Sentence Segmentation #4

@ksuderman

Description

@ksuderman

We will need to be able to extract all sentences that use the word Galaxy from an input document. This implies that we are able to split an input document on sentence boundaries.

NLTK will be sufficient for testing and development, but may not be sufficient (time or space) in large scale production. Consider using Stanford CoreNLP, Apache OpenNLP, or something else as a standalone service for common tasks like tokenization and sentence splitting. The Lappsgrid can provide standalone Dockerized services for this that communicate via REST or AMQP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions