Diacritization added to the Google Language API

June 24, 2010

Earlier this year, we launched the Tashkeel (Diacritization) service on Google Labs. I'm pleased to announce that we've added an experimental Diacritization component to the Google Language API. This is a simple JSON API which you can use to add diacritic symbols to strings of Arabic text.

To test it out, try clicking this link:
https://www.googleapis.com/language/diacritize/v1?lang=ar&message=مرحبا%20العالم&last_letter=false&callback=result

A URL-encoded string is supplied as the message parameter, and it's returned by the API with diacritics included. These symbols are useful to people just learning the language and as an important pre-step for several text processing applications.

Right now, the API only supports Arabic, but we're working on adding more languages, as well as a JavaScript API, so be sure to watch this blog for details. For more information, see the documentation and our post on the Google Arabia blog (you may want to click "view post in English").


Posted by: Adam Feldman, Product Manager and Jeff Scudder, Software Engineer