Skip to content

Commit 90d72aa

Browse files
committed
attempting to formulate a clearer scope #2
1 parent e2338c2 commit 90d72aa

1 file changed

Lines changed: 28 additions & 15 deletions

File tree

README.md

Lines changed: 28 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
This repository is intended to organize the work, output and documentation of the CLARIAH Interest Group (IG) on Text
44
Processing.
55

6+
*(note: in the current stage, all of this should be interpreted as a proposal and open for discussion)*
7+
68
## Introduction
79

810
There is a CLARIAH-wide need for robust text processing technologies that can handle historical as well as contemporary
@@ -12,40 +14,51 @@ Dutch texts. Partners like VU, INT and RU have contributed different components
1214

1315
The aims of the IG on Text are:
1416

15-
- foster discussion and knowledge sharing regarding text processing
17+
- foster discussion and knowledge sharing regarding automatic text processing
1618
- enhance interoperability between various text processing solutions
1719
- develop and share best practices
1820
- inform development of CLARIAH text processing tools and services
1921

2022
## Scope of the Interest Group
2123

22-
- support of automatic text processing, including but not limited to linguistic enrichment and NLP
24+
Our scope is automatic text processing, including but not limited to linguistic enrichment and NLP:
25+
26+
- automatic linguistic enrichment for multiple languages and multiple time periods
27+
- named entity extraction & linking
28+
- dependency parsing, syntactic parsing, morphological analysis
29+
- part-of-speech tagging
30+
- lemmatisation
31+
- sentiment analysis
32+
- tokenisation and sentence segmentation
33+
- text normalisation (including post-OCR/HTR correction)
34+
- optical character recognition & handwriting recognition
35+
- machine translation
36+
- language modelling
37+
- text analysis
38+
- text retrieval, indexing, and querying (raw text, querying of annotations is covered by the [annotation group](https://github.com/CLARIAH/IG-Annotation))
39+
- (list is not exhaustive)
40+
41+
Though our scope is not limited to Dutch, it is probably fair to say that Dutch, Flemish and Frisian, merit most
42+
attention, as we are a project in the Netherlands.
2343

2444
Aspects that are outside the scope of this Interest Group (because they are covered by other IGs):
2545

2646
- manual text annotation (covered by the [annotation group](https://github.com/CLARIAH/IG-Annotation))
2747
- annotation models and formats (covered by the [annotation group](https://github.com/CLARIAH/IG-Annotation))
48+
- speech recognition (covered by the AV group)
2849

2950
## Communication
3051

3152
We use the following communication channel:
3253

33-
- slack (to be announced)
54+
- [slack](clariah-workspace.slack.com) (if you don't have access yet, please contact one of the coordinators)
3455

3556
## Tasks
3657

37-
Text processing problems that we try to tackle are:
38-
39-
- automatic linguistic enrichment for multiple languages and multiple time periods
40-
- named entity extraction
41-
- dependency relations
42-
- part-of-speech tagging
43-
- lemmatisation
44-
- sentiment analysis
45-
- text normalisation (including post-OCR/HTR correction)
46-
- interoperability
47-
- choosing standards
48-
- ... (todo)
58+
1. Provide [an inventory](docs/inventory.md) of current text processing tools, services and models in CLARIAH,
59+
either developed in CLARIAH (WP3 or WP6), or third party projects that are adopted as solutions.
60+
2. Specify what requirements we want text processing solutions to adhere to for CLARIAH, to facilitate interoperability
61+
between tools/services. Indicate to what extent the existing solutions adhere to these requirements.
4962

5063
## Group Members
5164

0 commit comments

Comments
 (0)