Anne Ferger & Daniel Jettka | Paderborn University, Germany
Slides at digitalhumanists.github.io/CLARIN2021
We noticed a big difference after employing continous quality control mechanisms in the project INEL in the time needed for manual fixing of inconsistencies and the time needed to prepare the data for publication.
Combining a simplified solution using git (with the tool Lama) and automatic quality control fixes and checks (corpus services with automatic git scripts) doesn't change the existing data creation workflows too much.
Simplified git solution: Lama
Quality checks and automatic fixes for linguistic corpora: corpus services
Automatic git solution for corpus services: cubo
https://github.com/digitalhumanists
Anne Ferger
anne.ferger@uni-paderborn.de
https://twitter.com/anneferger1
Daniel Jettka
daniel.jettka@uni-paderborn.de
https://twitter.com/DJettka