These are some projects I have been working on.
Sustainable Multilingualism in the South African Context (2014- ).
Currently, I work at the University of South Africa (UNISA) in Pretoria, South Africa, as part of the core research group at the Academy for African Languages and Science (AALS), under Prof Laurette Pretorius. AALS is a strategic project within the School of Interdisciplinary Research and Graduate Studies (SIRGS) under the College of Graduate Studies (CGS) on the Muckleneuk Campus of UNISA in Pretoria.
Broadly speaking, my vision is to contribute to the development of language technology for lesser-resourced languages in South Africa. I aim to apply my knowledge of corpora, alignment and machine translation to this context, and have done so to an extent (see publications). However, I am also interested in the application of bootstrapping techniques for resource development, terminology extraction and processing, as well as computational morphology for agglutinative languages such as Zulu. I am also currently involved in corpus development, instead of just working on bitexts and treebanks (see AALS website).
Ph.D. thesis (2008-2013)
In the context of the PaCo-MT project (see below) I have undertaken research in various approaches to the problem of tree alignment, focusing on the alignment of constituents (non-terminals) and the interplay with existing word alignments and other important features. I look at the problem from both a statistical and a rule-based perspective. Most notably, I implement Eric Brill's transformation-based learning algorithm to construct a classifier that can align trees both from scratch and as a tool to correct errors in the output of statistical alignment.
Parse and Corpus-Based Machine Translation (PaCo-MT) (2008-2011)
This is a syntax-based machine translation project where I have had the opportunity to work as a PhD candidate at the University of Groningen in collaboration with the University of Leuven and OneLiner bvba from 2008 to 2011. It was sponsored by the STEVIN programme (STE07007) of the Dutch Language Union (Nederlandse Taalunie).
I worked on the creation of richly annotated parallel corpora on a large scale for the language pairs Dutch/English and Dutch/French. Important steps were sentence alignment, word alignment, parsing and constituent alignment. This proved to be a valuable testing ground for the subject of my Ph.D. thesis.
ALEXANDER and Afrikaans WordNet (2006-2007)
During the time of my appointment at the CTexT research centre at the North-West University campus in Potchefstroom, South Africa, I was responsible as project leader for the initial construction stages of a wordnet for Afrikaans, my mother tongue. This was also the main topic of my Master's Thesis.
Soon after, the construction of the Afrikaans lexical database ALEXANDER was initiated, of which I was also appointed the project leader at the time. The wordnet was later integrated as part of the database. For more information and to inquire about availability, click here.