Assignments‎ > ‎

HW6: Machine translation

Due: April 12, 2013

Problem 1 (20 points)

Go to Google Translate, a site which allows you to type in text and translate it into another language. By copying and pasting you can also backtranslate, i.e., translate back into the original language. 

For this problem, you must translate the following sentence (from a speech by Jon Stewart) into three different languages and then translate back into English.

The press can hold its magnifying glass up to our problems bringing them into focus, illuminating issues heretofore unseen or they can use that magnifying glass to light ants on fire and then perhaps host a week of shows on the sudden, unexpected dangerous flaming ant epidemic.

For example, this translates into Portuguese as:

A imprensa pode conter a sua ampliação até nossos problemas trazê-los para o foco, iluminando as questões até então jamais vistos ou eles podem usar a lupa para formigas leve em fogo e, talvez, de acolhimento de uma semana de shows na repentina epidemia de formiga, inesperado perigosas chamas.

And then this translates back into English as:

The release may contain its expansion to our problems bring them into focus, illuminating the issues heretofore unseen or they can use the magnifying glass to light on fire ants, and perhaps host a week of concerts in the sudden epidemic ant, unexpected dangerous flames.

So, some things have obviously gone terribly wrong.

Part A (5 points)

Choose one European language (not Portuguese), one African language (not Afrikaans), and one Asian language. Write down the languages, and the English backtranslations associated with each one. Based on the backtranslations, do you think that some languages are better matched with English than others with respect to ease of translation? Give evidence for your answer using the backtranslations as examples.

Part B (15 points)

Find three lexical or syntactic errors that are apparent from the backtranslations, such as "press" becoming "release" and "shows" becoming "concerts" (lexical), or "light ants on fire" becoming "light on fire ants" and "ant epidemic" becoming "epidemic ant" (syntactic). Discuss why you think they may have come about. Focus especially on words that receive different translations in different languages, or syntactic constructions that seem especially hard.  How "fatal" are these errors for being able to understand the backtranslation (and thus possibly also the translation into another language)?

Feel free to use any knowledge you have of the others languages in answering this (though knowledge of other languages is not required to answer this).  Also, feel free to construct or use other examples of English sentences that you backtranslate in answering this question.

Problem 2 (40 points)

This exercise deals with word translation probabilities from Tagalog to English.

Part A (10 points)

Consider the examples below:

1a. the teacher bought a book .
1b. bumili ng libro ang titser .

2a. the thing a teacher bought is a book .
2b. libro ang binili ng titser .

3a. the teacher said that Linda bought the car .
3b. nagsabi ang titser na binili ni Linda ang kotse .

Using the bag of words alignment model, provide the probabilities for translating:

  1. bumili, binili, libro, and titser given teacher.
  2. bumili, binili, libro, and titser given book.
  3. bumili, binili, libro, and titser given bought.
  4. ng, ang, na, and titser given a
  5. ng, ang, na, and titser given the
Part B (15 points)

The bag of words model, of course, gets better with more aligned sentences. Here are two more:

4a. who bought a book ?
4b. sino ang bumili ng libro ?

5a. who said that Linda bought the dress ?
5b. sino ang nagsabi na binili ni Linda ang damit ?

Recompute the probabilities you did for Part A using sentences 4 and 5 in addition to 1-3.

Part C (5 points)

Describe how these extra sentences in (4) and (5) help you translate certain words. (i.e., Which words get easier to translate and why?)

Part D (10 points)

Here are the actual translations (actually, ng and ang are more complicated than these simple translations indicate, but you don't need to worry about that here):

  • bumili = bought (active voice)
  • binili = bought (passive voice)
  • libro = book
  • titser = teacher
  • ng = a
  • ang = the

Which words were translated well using the bags-of-words method? Discuss any problems you see with the translation probabilities you calculated. Can you think of any ways to change the method so that it would create more accurate translation probabilities? (You don't have to actually do it, just suggest possibilities.)

Problem 3 (15 points)

The English verb to know translates into Portuguese as conhecer “to be familiar with (a person)” or saber “to know (a thing)”. For example, Maria conhece Bill means "Mary knows Bill," and Maria sabe a resposta means "Mary knows the answer."

Part A (5 points)

In terms of hyponymy/hypernymy, describe the relationship between the English verb to know and the Portuguese verbs conhecer and saber.

Part B (10 points)

Conhecer also means "to meet." For example, ela conheceu ele can mean "she met him."

Draw a Venn diagram showing how the English verbs to know and to meet overlap with the Portuguese verbs conhecer and saber.

Problem 4 (25 points)

How much do you think language influences thought? Does it determine it, influence it to some degree, or not influence it at all? Give argumentation and evidence for your position. You can use outside sources for answering this question, as well as the course slides. For example, you might look at these:

You can also consider words in other languages that form lexical gaps in others, and how much work can go into explaining them; for example: