A list of downloads.

My PhD thesis can be downloaded here. The title is "Complementary approaches to tree alignment: Combining statistical and rule-based methods". [Short English abstract] [Short Dutch abstract] [Long Dutch abstract]


You may download my Curriculum Vitae, my Master's Thesis and a list of all my publications in BibTeX format.


Data


Automatic English-Zulu sentence and word aligned parallel corpus (2321 sentence pairs)

Links were broken - to be updated soon. For reference, see the following paper:

Kotzé, G and Wolff, F. 2014. Experiments with syllable-based English-Zulu alignment. Proceedings of the SaLTMiL Workshop on free/open-source language resources for the machine translation of less-resourced languages, at LREC 2014, May 2014, Reykjavík, Iceland. [BibTeX]


Dutch/English and Dutch/French phrase-structure parse trees (448 sentence pairs) from the PaCo-MT project

The tree alignment data sets used in the PaCo-MT project (2008-2011) that were used to train the statistical tree aligner Lingua::Align are available for download. The languages involved are Dutch, English and French. Constituent alignments were manually created by myself. In the case of the Dutch-to-English and the English-to-Dutch sets, word alignments were also corrected. Please refer to the included README files for further information.


Here are the alignment sets:


  • Dutch to English, with corrected word alignments .zip .tgz (140 sentence pairs)
  • English to Dutch, with corrected word alignments .zip .tgz (150 sentence pairs)
  • Dutch to French, with uncorrected word alignments .zip .tgz (158 sentence pairs)

Software


The transformation-based tree alignment system that I have worked on for my doctoral thesis, TBLign, is available for download. This also includes all alignment data sets that have been used in the experiments (Dutch-to-English only). From time to time I will do bug fixes and update the documentation, but my intention is to discontinue Perl development in favour of Python 3 reimplementation at some point. Download here: .zip, tarball, at Github or Bitbucket.


I expect to have an update in the following months on the Github repository hosting the code for the adaptation of the Terminator software that I am using for the terminology web application at Unisa. Watch this space.