Two Internships at UPC in Open-Source Natural Language Parsing

The TALP Research Center at the UPC offers two interships in Machine
Learning and Natural Language Processing, funded by the PASCAL2
Harvest Programme. The internships will take place at the UPC campus
in Barcelona for 8 weeks during May-July of 2011.

The goal of the project is to develop an open-source library of
high-performance methods for tagging, parsing and structured
prediction tasks, with a focus in NLP tasks. Our aim is
that this software library becomes a reference implementation of
modern techniques for natural language parsing and related problems.

The project team will be led by Xavier Carreras and Lluís Padró of the
TALP research center, both experts in NLP and statistical methods. The
two interns will closely work with the other members of the team.

These internships are an excellent opportunity for PhD students and
researchers interested in parsing and machine learning. We will
implement a generic, extensible structured prediction framework
consisting of several learning algorithms (including Perceptron,
max-margin methods and CRFs) and several models for predicting
structures (such as Markov taggers and dependency parsers). The first
four weeks of the internship will take a tutorial-style form with a
series of lectures and guided experimental sessions. After this phase
we should be all familiar with methods and existing code, and we
should be able to reproduce top-performing systems for a variety of
multi-lingual parsing tasks.

The second part of the internship will be devoted to collaborative
engineering and research. Our main interest is to improve efficiency,
scalability, flexibility and usability of the software library. Ideas
from interns will be welcome, ranging from extending the library with
new models and features, to designing generic parsing schemes, to
improving efficiency of algorithms and data structures.

The interns will be entitled to a per-diem for accomodation, meals and
local travel, as well as a round-trip ticket from the place they live.

The ideal candidates should have:

* A clear understanding of natural language parsing and structured
prediction methods. Ideally, candidates would be Master or PhD
students that already took a class on the subject. More experienced
researchers are also welcome to participate.

* A passion for programming, algorithms and data structures, and
excellent skills in C++, unix environments, and generic/template
programming.

* Experience implementing a large NLP or ML project.

* Ability to work in group.

* Fluent English.

For applications and inquiries, send email to Xavier Carreras and
Lluís Padró ({carreras,padro}(at)lsi.upc.edu). To apply, send a CV, a
short description of programming/research projects you’ve been
involved, and a statement of interests in research and engineering
related to NLP. Let us also know about time constraints you may have
for the period of May-July, as we will set the time that best fits the whole
team.

The positions will be opened until filled, though ideally we would
receive applications by mid March and decide by early April.