######################################
Internationalization coding guidelines
######################################

Preparing code to be presented in many languages can be complex and difficult.
The rules here give the best practices for marking English strings in source
so that it can be extracted, translated, and presented to the user in the
language of their choice.

See also:

* `Django Internationalization <https://docs.djangoproject.com/en/dev/topics/i18n/>`_ (overview)
* `Django: Internationalizing Python code <https://docs.djangoproject.com/en/dev/topics/i18n/translation/#internationalization-in-python-code>`_
* `Django Translation guidelines <https://docs.djangoproject.com/en/dev/topics/i18n/translation/>`_
* `Django Format localization <https://docs.djangoproject.com/en/dev/topics/i18n/formatting/>`_

Presented in this document are the following sections:

* `General internationalization rules`_
* `Editing source files`_
* `Coverage testing`_
* `Style guidelines`_


General internationalization rules
**********************************

In order to localize source files, we need to prepare them so that the
human-readable strings can be extracted by a pre-processing step, and then have
localized strings used at runtime.  This requires attention to detail, and
unfortunately limits what you can do with strings in the code.  In general:

1. Always mark complete sentences for translation.  If you combine fragments at
   runtime, there is no way for the translator to construct a proper sentence
   in their language.

2. Don't join strings together at runtime to create sentences.

3. Limit the amount of text in strings that is not presented to the user.  HTML
   markup is better applied after the translation.  If you give HTML to the
   translators, there's a good chance they will translate your tags or
   attributes.

4. Use placeholders with descriptive names: ``"Welcome {student_name}"`` is
   much better than ``"Welcome {0}"``.

See the detailed Style Guidelines at the end for details.


Editing source files
********************

While editing source files (including Python, JavaScript, or HTML template
files), use the appropriate conventions.  There are a few things to know how to
do:

1. What has to be at the top of the file (if anything) to prepare it for i18n.

2. How are strings marked for internationalization?  This takes the form of a
   function call with the string as an argument.

3. How are translator comments indicated?  These are comments in the file that
   will travel with the strings to the translators, giving them context to
   produce the best translation.  They have a "Translators:" marker. They must
   appear on the line preceding the text they describe. Multi-line comments
   are supported for Python in case the translator comment needs to be wrapped.

The code samples below show how to do each of these things for:

* `Python source code`_
* `Django template files`_
* `Mako template files`_
* `JavaScript files`_
* `Coffeescript files`_
* `Underscore template files`_
* `Other kinds of code`_

Note that you have to take into account not just the programming language involved,
but the type of file: JavaScript embedded in an HTML Mako template is treated differently
than JavaScript in a pure .js file.

Python source code
==================

.. highlight:: python

In most Python source code (read the Django docs for more details)::

    from django.utils.translation import ugettext as _
    
    # Translators: This will help the translator
    message = _("Welcome!")
    
    # Translators: This is a very long comment that needs to wrap
    # over multiple lines because it would be too long otherwise.
    message = _("Hello world")

Some edX code cannot use Django imports. To maintain portability, XBlocks,
XModules, Inputtypes and Responsetypes forbid importing Django.  Each of these
has its own way of accessing translations.  You'll use lines like these
instead::

    ### for XBlock & XModule:
    _ = self.runtime.service(self, "i18n").ugettext
    # Translators: a greeting to newly-registered students.
    message = _("Welcome!")

    # for InputType and ResponseType:
    _ = self.capa_system.i18n.ugettext
    # Translators: a greeting to newly-registered students.
    message = _("Welcome!")

"Translators" comments will work in these places too, so don't be shy about
providing clarifying comments to the translators.


Django template files
=====================

.. highlight:: django

In Django template files (`templates/*.html`)::

    {% load i18n %}
    
    {# Translators: this will help the translator. #}
    {% trans "Welcome!" %}

Mako template files
===================

.. highlight:: mako

In Mako template files (`templates/*.html`), you can use all of the tools
available to python programmers. Just make sure to import the relevant
functions first. Here's a Mako template example::

    <%! from django.utils.translation import ugettext as _ %>
 
    ## Translators: message to the translator
    ${_("Welcome!")}

JavaScript files
================

.. highlight:: javascript

In order to internationalize JavaScript, first the HTML template (base.html)
must load a special JavaScript library (and Django must be configured to serve
it)::

    <script type="text/javascript" src="jsi18n/"></script>

Then, in JavaScript files (`*.js`)::

    // Translators: this will help the translator.
    var message = gettext('Welcome!');

Note that JavaScript embedded in HTML in a Mako template file is handled
differently.  There, you use the Mako syntax even within the JavaScript.

Coffeescript files
==================

.. highlight:: coffeescript

Coffeescript files are compiled to JavaScript files, so it works mostly like
JavaScript::

    `// Translators: this will help the translator.`
    message = gettext('Hey there!')
    # Interpolation has to be done in JavaScript, not Coffeescript:
    message = gettext("Error getting student progress url for '<%= student_id %>'.")
    full_message = _.template(message, {student_id: unique_student_identifier})

But because we extract strings from the compiled .js files, there are some
native Coffeescript features that break the extraction from the .js files:

1. You cannot use Coffeescript string interpolation:  This results in string
   concatenation in the .js file, so string extraction won't work.

2. You cannot use Coffeescript comments for translator comments, since they are
   not passed through to the JavaScript file.

::

    # NO NO not like this:
    # Translators: this won't get to the translators!
    message = gettext("Welcome, #{student_name}!")  # This won't work!
    
    # YES like this:
    `// Translators: this will get to the translators.`
    message = gettext("This works")

    ###
    Translators: This will work, but takes three lines :(
    ###
    message = gettext("Hey there")
 
.. highlight:: python

Underscore template files
=========================

Underscore template files are used in conjunction with JavaScript, and so the
same techniques are used for localization. Ensure that the i18n JavaScript
library has already been loaded, and then use the regular i18n functions
such as ``gettext`` and ``interpolate`` from your template.

For example::

    <%-
        interpolate(
            gettext('This post is visible only to %(group_name)s.'),
                {group_name: group.group_name},
                true
        )
    %>

Note: it is recommended that you use ``<%-`` for all translated strings
as this will HTML escape the string before including it in the page. This
ensures that translations are free to use non-HTML characters.

Other kinds of code
===================

We have not yet established guidelines for internationalizing the following.

* Course content (such as subtitles for videos)

* Documentation (written for Sphinx as .rst files)


Building and testing your code
******************************

These instructions assume you are a developer writing new code to check in to
Github. For other use cases in the translation life cycle (such as translating
the strings, or checking the translations into Github, see use cases).

1. Create human-readable .po files with the latest strings. This command may
   take a minute or two to complete::

    $ cd edx-platform
    $ paver i18n_extract

2. Generate dummy strings:  See coverage testing (below) for more details. This
   will create an "Esperanto" translation that is actually over-accented
   English.  Use this to create fake translations::

    $ paver i18n_dummy
    
3. Run the paver i18n_generate command to create machine-readable .mo files::
 
    $ paver i18n_generate

4. Django should be ready to go. The next time you run Studio or LMS, append
   ``?preview-lang=eo`` to the URL to turn on Esperanto as a dark language. The
   accented-English strings (from step 3, above) should be displayed.

   If you experience issues, be sure that your settings for ``USE_I18N`` and
   ``USE_L10N`` are both set to True.

5. With Esperanto turned on as a dark language (see Step 4), review the pages
   affected by your code and verify that you see fake translations. If you see
   plain English instead, your code is not being properly translated. Review 
   the steps in editing source files (above). 

6. When you are done reviewing, append ``?clear-lang`` to the LMS or Studio URL
   to reset your session to English.


Coverage testing
****************

This tool is used during the bootstrap phase, when presumably (1) there is a
lot of edX source code to be converted, and (2) there are not a lot of
available translations for externalized edX strings. At the end of the
bootstrap phase, we will eventually deprecate this tool in favor of other
processes. Once most of the edX source code has been successfully converted,
and there are several full translations available, it will be easier to detect
and correct specific gaps in compliance.

Use the coverage tool to generate dummy files::

    $ paver i18n_dummy
    
This will create new dummy translations in the Esperanto directory
(edx-platform/conf/local/eo/LC_MESSAGES).

You can then configure your browser preferences to view Esperanto as your
preferred language. Instead of plain English strings, you should see something
like this:

    Thé Fütüré øf Ønlïné Édüçätïøn Ⱡσяєм ι#
    Før änýøné, änýwhéré, änýtïmé Ⱡσяєм #

This dummy text is distinguished by extra accent characters. If you see plain
English instead (without these accents), it most likely means the string has
not been externalized yet. To fix this: 

* Find the string in the source tree (either in Python, JavaScript, or HTML
  template code). 

* Refer to the above coding guidelines to make sure it has been externalized
  properly. 

* Rerun the scripts and confirm that the strings are now properly converted
  into dummy text.

This dummy text is also distinguished by Lorem ipsum text at the end of each
string, and is always terminated with "#". The original English string is
padded by about 30% extra characters, to simulate some language (like German)
which tend to have longer strings than English. If you see problems with your
page layout, such as columns that don't fit, or text that is truncated (the
``#`` character should always be displayed on every string), then you will
probably need to fix the page layouts accordingly to accommodate the longer
strings.


Style guidelines
****************

Don't append strings, interpolate values
========================================

It is harder for translators to provide reasonable translations of small
sentence fragments. If your code appends sentence fragments, even if it seems
to work OK for English, the same concatenation is very unlikely to work
properly for other languages.

Bad::

    message = _("The directory has ") + len(directory.files) + _(" files.")

In this scenario, the translator will have to figure out how to translate these
two separate strings. It is very difficult to translate a fragment like "The
directory has." In some languages the fragments will be in different order. For
example, in Japanese, "files" will come before "has."

It is much easier for a translator to figure out how to translate the entire
sentence, using the pattern "The directory has {file_count} files."

Good::

    message = _("The directory has {file_count} files.").format(file_count=directory.files)


Use named placeholders
======================

Python string formatting provides both positional and named placeholders.  Use
named placeholders, never use positional placeholders.  Positional placeholders
can't be translated into other languages which may need to re-order them to
make syntactically correct sentences.  Even with a single placeholder, a named
placeholder provides more context to the translator.

Bad::

    message = _('Today is %s %d.') % (m, d)

OK::

    message = _('Today is %(month)s %(day)s.') % {'month': m, 'day': d}

Best::

    message = _('Today is {month} {day}.').format(month=m, day=d)

Notice that in English, the month comes first, but in Spanish the day comes
first. This is reflected in the .po file like this::

    # fragment from edx-platform/conf/locale/es/LC_MESSAGES/django.po
    msgid "Today is {month} {day}."
    msgstr "Hoy es {day} de {month}."

The resulting output is correct in each language::

    English output: "Today is November 26."
    Spanish output: "Hoy es 26 de Noviembre."


Only translate literal strings
==============================

As programmers, we're used to using functions in flexible ways.  But the
translation functions like ``_()`` and ``gettext()`` can't be used like other
functions.  At runtime, they are real functions like any other, but they also
serve as markers for the string extraction process.

For string extraction to work properly, the translation functions must be
called with only literal strings.  If you use them with a computed value,
the string extracter won't have a string to extract.

The difference between the right way and the wrong way can be very subtle:

::

    # BAD: This tries to translate the result of .format()
    _("Welcome, {name}".format(name=student_name))

    # GOOD: Translate the literal string, then use it with .format()
    _("Welcome, {name}").format(name=student_name))

::

    # BAD: The dedent always makes the same string, but the extractor can't find it.
    _(dedent("""
    .. very long message ..
    """))

    # GOOD: Dedent the translated string.
    dedent(_("""
    .. very long message ..
    """))

::

    # BAD: The string is separated from _(), the extractor won't find it.
    if hello:
        msg = "Welcome!"
    else:
        msg = "Goodbye."
    message = _(msg)

    # GOOD: Each string is wrapped in _()
    if hello:
        message = _("Welcome!")
    else:
        message = _("Goodbye.")


Be aware of nested syntax
=========================

When translating strings in templated files, you have to be careful of nested
syntax.  For example, consider this JavaScript fragment in a Mako template::

    <script>
    var feeling = '${_("I love you.")';
    </script>

When rendered for a French speaker, it will produce this::

    <script>
    var feeling = 'Je t'aime.';
    </script>

which is now invalid JavaScript.  This can be avoided by using double-quotes
for the JavaScript string.  The better solution is to use a filtering function
that properly escapes the string for JavaScript use::

    <%!
        from django.utils.translation import ugettext as _
        from django.utils.html import escapejs
    %>
    ...
    <script>
    var feeling = '${escapejs(_("I love you."))}';
    </script>

which produces::

    <script>
    var feeling = 'Je t\'aime.';
    </script>

Other places that might be problematic are HTML attributes::

    <img alt='${_("I love you.")}'>


Singular vs plural
==================

It's tempting to improve a message by selecting singular or plural based on a
count::

    if count == 1:
        msg = _("There is 1 file.")
    else:
        msg = _("There are {file_count} files.").format(file_count=count)

This is not the correct way to choose a string, because other languages have
different rules for when to use singular and when plural, and there may be more
than two choices!

One option is not to use different text for different counts::

    msg = _("Number of files: {file_count}").format(file_count=count)

If you want to choose based on number, you need to use another gettext variant
to do it::

    from django.utils.translation import ungettext
    msg = ungettext("There is {file_count} file", "There are {file_count} files", count)
    msg = msg.format(file_count=count)

This will properly use count to find a correct string in the translation file,
and then you can use that string to format in the count.


Translating too early
=====================

When the ``_()`` function is called, it will fetch a translated string.  It
will use the current user's language to decide which string to fetch.  If you
invoke it before we know the user, then it will get the wrong language.

For example::

    from django.utils.translation import ugettext as _

    HELLO = _("Hello")
    GOODBYE = _("Goodbye")

    def get_greeting(hello):
        if hello:
            return HELLO
        else:
            return GOODBYE

Here the HELLO and GOODBYE constants are assigned when the module is first
imported, at server startup.  There is no current user then, so ugettext will
use the server's default language.  When we eventually use those constants to
show a message to the user, they won't be looked up again, and the user will
get the wrong language.

There are a few ways to deal with this.  The first is to avoid calling ``_()``
until we have the user::

    def get_greeting(hello):
        if hello:
            return _("Hello")
        else:
            return _("Goodbye")

Another way is to use Django's ugettext_lazy function.  Instead of returning
a string, it returns a lazy object that will wait to do the lookup until it is
actually used as a string:

    from django.utils.translation import ugettext_lazy as _

This can be tricky because the lazy object doesn't act like a string in all
cases.

The last way to solve the problem is to mark the string so that it will be
extracted properly, but not actually do the lookup when the constant is
defined::

    from django.utils.translation import ugettext

    _ = lambda text: text

    HELLO = _("Hello")
    GOODBYE = _("Goodbye")

    def get_greeting(hello):
        if hello:
            return ugettext(HELLO)
        else:
            return ugettext(GOODBYE)

Here we define ``_()`` as a pass-through function, so the string will be found
during extraction, but won't be translated too early.  Then we use the real
translation function at runtime to get the localized string.

Multiline Strings
=================

Translator notes must directly precede the string literals to which they refer.
For example, the translator note here will not be passed along to translators::

    # Translators: you will not be able to see this note because
    # I do not directly prepend the line with the translated string literal.
    # See the line directly below this one does not contain part of the string?
    long_translated_string = _(
        "I am a long string, with many, many words. So many words that it is "
        "advisable that I be split over this line."
    )

In such a case, make sure you format your code so that the string begins on
a line directly below the translator note::

    # Translators: you will be able to see this note.
    # See how the line directly below this one contains the start of the string?
    long_translated_string = _("I am a long string, with many, many words. "
                               "So many words that it is advisable that I "
                               "be split over this line.")