Skip to content

consider adding as_conll2002 #15

@jwijffels

Description

@jwijffels
as_conll2002 <- function(x, sep = "\t"){
  ## Each word has been put on a separate line and there is an empty line after each sentence.
  id <- udpipe::unique_identifier(x, fields = intersect(c("doc_id", "paragraph_id", "sentence_id"), colnames(x)))
  add <- ifelse(!duplicated(id), "\n\n", "\n")
  add[1] <- ""
  word   <- x$token
  entity <- x$chunk_entity
  sprintf("%s%s%s%s", add, word, sep, entity)
}
cat(as_conll2002(head(x, 100)))

to be used in training based on bertje

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions