Call for Participation
Event Dates: 
26 May 2012
Lütfi Kırdar Istanbul Exhibition and Congress Centre
Reinhard Rapp
Marko Tadić
Contact Email: 
reinhardrapp [at] gmx [dot] de
Contact Email: 
marko [dot] tadic [at] ffzg [dot] hr

Apologies for multiple postings


Call for Participation


Language Resources for Machine Translation

in Less-Resourced Languages and Domains

Co-located with LREC 2012

Lütfi Kırdar Istanbul Exhibition and Congress Centre

Saturday, 26 May 2012


Endorsed by

* ACL SIGWAC (Special Interest Group on Web as Corpus)

* FLaReNet (Fostering Language Resources Network)

* META-NET (Multilingual Europe Technology Alliance)


WORKSHOP PROGRAMME (formatted version see URL above)

Saturday, 26 May 2012

09:00 Opening

Oral Presentations 1: Multilinguality (Chair: Pierre Zweigenbaum)


09:10 Philipp Petrenz, Bonnie Webber: Robust Cross-Lingual Genre Classification through Comparable Corpora

09:30 Qian Yu, François Yvon, Auréen Max: Revisiting sentence alignment algorithms for alignment visualization and evaluation

Invited Project Session (Chair: Serge Sharoff)


09:50 Inguna Skadiņa: Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Translation (ACCURAT, http://www.accurat-project.eu)

10:10 Andrejs Vasiļjevs: LetsMT! — Platform to Drive Development and Application of Statistical Machine Translation (LetsMT!, http://www.letsmt.eu)

10:30 Coffee Break

11:00 Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio Toral, Victoria Arranz: Mining and Exploiting Domain-Specific Corpora in the PANACEA Platform (PANACEA, http://panacea-lr.eu)

11:20 Adam Kilgarriff, George Tambouratzis: The PRESEMT Project (PRESEMT, http://www.presemt.eu)

11:40 Béatrice Daille: Building Bilingual Terminologies from Comparable Corpora: The TTC TermSuite (TTC, http://www.ttc-project.eu)

12:00 Panel Discussion with Invited Speakers

12:30 Lunch Break

Oral Presentations 2: Building Comparable Corpora (Chair: Reinhard Rapp)


14:00 Aimée Lahaussois, Séverine Guillaume: A viewing and processing tool for the analysis of a comparable corpus of Kiranti mythology

14:20 Nancy Ide: MultiMASC: An Open Linguistic Infrastructure for Language Research

Poster Presentations with Booster Session (Chair: Marko Tadić)


14:40 Elena Irimia: Experimenting with Extracting Lexical Dictionaries from Comparable Corpora for: English-Romanian language pair

14:45 Iustina Ilisei, Diana Inkpen, Gloria Corpas, Ruslan Mitkov: Romanian Translational Corpora: Building Comparable Corpora for Translation Studies

14:50 Angelina Ivanova: Evaluation of a Bilingual Dictionary Extracted from Wikipedia

14:55 Quoc Hung-Ngo, Werner Winiwarter: A Visualizing Annotation Tool for Semi-Automatical Building a Bilingual Corpus

15:00 Lene Offersgaard, Dorte Haltrup Hansen: SMT systems for less-resourced languages based on domain-specific data

15:05 Magdalena Plamada, Martin Volk: Towards a Wikipedia-extracted Alpine Corpus

15:10 Sanja Štajner, Ruslan Mitkov: Using Comparable Corpora to Track Diachronic and Synchronic Changes in Lexical Density and Lexical Richness

15:15 Dan Stefanescu: Mining for Term Translations in Comparable Corpora

15:20 George Tambouratzis, Michalis Troullinos, Sokratis Sofianopoulos, Marina Vassiliou: Accurate phrase alignment in a bilingual corpus for EBMT systems

15:25 Kateřina VeselovskáNguy Giang Linh, Michal Novák Using Czech-English Parallel Corpora in Automatic Identification of 'It'

15:30 Manuela Yapomo, Gloria Corpas, Ruslan Mitkov: CLIR- and Ontology-Based Approach for Bilingual Extraction of Comparable Documents

15:35 Poster Session and Coffee Break (coffee from 16:00 — 16:30)

Oral Presentations 3: Lexicon Extraction and Corpus Analysis

(Chair: Andrejs Vasiļjevs)


16:30 Amir Hazem, Emmanuel Morin: ICA for Bilingual Lexicon Extraction from Comparable Corpora

16:50 Hiroyuki Kaji, Takashi Tsunakawa, Yoshihoro Komatsubara: Improving Compositional Translation with Comparable Corpora

17:10 Nikola Ljubešić, Špela Vintar, Darja Fišer: Multi-word term extraction from comparable corpora by combining contextual and constituent clues

17:30 Robert Remus, Mathias Bank: Textual Characteristics of Different-sized Corpora

17:50 Wrapup discussion and end of the workshop


Reinhard Rapp, Universities of Mainz (Germany) and Leeds (UK)

Marko Tadić, University of Zagreb (Croatia)

Serge Sharoff, University of Leeds (UK)

Andrejs Vasiļjevs, Tilde SIA, Riga (Latvia)

Pierre Zweigenbaum, LIMSI, CNRS, Orsay, and ERTIM, INALCO, Paris (France)


* Srinivas Bangalore (AT&T Labs, USA)

* Caroline Barrière (National Research Council Canada)

* Chris Biemann (Microsoft / Powerset, San Francisco, USA)

* Lynne Bowker (University of Ottawa, Canada)

* Hervé Déjean (Xerox Research Centre Europe, Grenoble, France)

* Andreas Eisele (DFKI, Saarbrücken, Germany)

* Rob Gaizauskas (University of Sheffield, UK)

* Éric Gaussier (Université Joseph Fourier, Grenoble, France)

* Nikos Glaros (ILSP, Athens, Greece)

* Gregory Grefenstette (Exalead/Dassault Systemes, Paris, France)

* Silvia Hansen-Schirra (University of Mainz, Germany)

* Kyo Kageura (University of Tokyo, Japan)

* Adam Kilgarriff (Lexical Computing Ltd, UK)

* Natalie Kübler (Université Paris Diderot, France)

* Philippe Langlais (Université de Montréal, Canada)

* Tony McEnery (Lancaster University, UK)

* Emmanuel Morin (Université de Nantes, France)

* Dragos Stefan Munteanu (Language Weaver Inc., USA)

* Lene Offersgaard (University of Copenhagen, Denmark)

* Reinhard Rapp (Universities of Mainz, Germany, and Leeds, UK)

* Sujith Ravi (Yahoo! Research, Santa Clara, CA, USA)

* Serge Sharoff (University of Leeds, UK)

* Michel Simard (National Research Council Canada)

* Inguna Skadiņa (Tilde, Riga, Latvia)

* Monique Slodzian (INALCO, Paris, France)

* Benjamin Tsou (The Hong Kong Institute of Education, China)

* Dan Tufiş (Romanian Academy, Bucharest, Romania)

* Justin Washtell (University of Leeds, UK)

* Michael Zock (LIF, CNRS Marseille, France)

* Pierre Zweigenbaum (LIMSI-CNRS, Orsay, France)

For further information, please contact

Reinhard Rapp reinhardrapp (at) gmx (dot) de

or Marko Tadić marko.tadic (at) ffzg (dot) hr