A list of downloads.

You may download my Curriculum Vitae, both in English and in Afrikaans, my Master's Thesis and a list of all my publications in BibTeX format.


Data


Automatic English-Zulu sentence and word aligned parallel corpus (2321 sentence pairs)


Please cite the following paper if you use this data in your research:


Kotzé, G and Wolff, F. 2014. Experiments with syllable-based English-Zulu alignment. Proceedings of the SaLTMiL Workshop on free/open-source language resources for the machine translation of less-resourced languages, at LREC 2014, May 2014, Reykjavík, Iceland. [BibTeX]


Dutch/English and Dutch/French phrase-structure parse trees (448 sentence pairs) from the PaCo-MT project

The tree alignment data sets used in the PaCo-MT project (2008-2011) that were used to train the statistical tree aligner Lingua-Align are available for download. The languages involved are Dutch, English and French. Constituent alignments were manually created by myself. In the case of the Dutch-to-English and the English-to-Dutch sets, word alignments were also corrected. The French-to-English set still needs to be adapted to be processable by the newest version of the Stockholm TreeAligner (editor and viewer of the trees and alignments). Please refer to the included README files for further information.


Here are the alignment sets:


  • Dutch to English, with corrected word alignments .zip .tgz (140 sentence pairs)
  • English to Dutch, with corrected word alignments .zip .tgz (150 sentence pairs)
  • Dutch to French, with uncorrected word alignments .zip .tgz (158 sentence pairs)

Software


The transformation-based tree alignment system that I have worked on for my doctoral thesis, TBLign, is available for download. This also includes all alignment data sets that have been used in the experiments (Dutch-to-English only). From time to time I will do bug fixes and update the documentation, but my intention is to discontinue Perl development in favour of Python 3 reimplementation at some point. Download here: .zip, tarball, at Github or Bitbucket.


Other

My PhD thesis can be downloaded here. The title is "Complementary approaches to tree alignment: Combining statistical and rule-based methods". [Short English abstract] [Short Dutch abstract] [Long Dutch abstract]