Tool:Indicwiki Transliteration Tool
| Website | https://indicwiki-transliterate-api.toolforge.org |
| Description | The Indicwiki Transliteration Tool consists of an API and a UserScript designed to facilitate transliteration between Indic languages, specifically focusing on Hindi and Urdu, with support for additional Indic scripts. |
| Keywords | transliteration, hindi, urdu, indic, api, userscript, wikipedia |
| Author(s) | Agamya Samuel |
| Maintainer(s) | Agamya Samuel (View all) |
| Source code | API: https://gitlab.wikimedia.org/toolforge-repos/indicwiki-transliterate-api UserScript: https://meta.wikimedia.org/wiki/User:Agamyasamuel/Indicwiki-Transliterate-User-Script.js |
| License | MIT License |
Overview
The Indicwiki Transliteration Tool is a Toolforge-hosted service that provides transliteration capabilities for Indic languages, primarily between Hindi (Devanagari) and Urdu (Perso-Arabic) scripts, along with other related Indic scripts such as Gurmukhi, Shahmukhi, and Sindhi variants. It includes a REST API that acts as a proxy for transliteration requests and a UserScript for seamless integration into Wikipedia editing workflows.
The tool's purpose is to enhance language interoperability on Wikimedia platforms, allowing users to convert text between different scripts without leaving the wiki environment. This is particularly useful for contributors working on multilingual content, cross-wiki coordination, or content creation in related languages.
Key Features:
- Proxy API for transliteration between specific Indic language pairs.
- UserScript that adds an in-browser transliteration interface to Wikipedia pages.
- Support for multiple transliteration directions, including auto-detection for certain scripts.
- Designed for bots, tools, gadgets, and direct user interaction via browser extensions.
- Backed by open-source code for community contributions and custom deployments.
While the core focus is on Hindi-Urdu, the API supports additional pairs like Gurmukhi-Shahmukhi and Sindhi variants, addressing a broader range of Indic language needs. This flexibility helps in handling diverse scripts used across South Asia, considering nuances like script detection and accurate phonetic mapping. However, it may not cover all Indic languages or handle complex linguistic edge cases perfectly, such as dialectal variations or ambiguous transliterations.
Web Service and API
The web service is hosted on Toolforge and provides a RESTful API for transliteration. The API base URL is: https://indicwiki-transliterate-api.toolforge.org
Interactive API documentation and testing (likely Swagger UI) is available at: https://indicwiki-transliterate-api.toolforge.org/docs
The API is built to handle POST requests for transliteration, returning JSON responses suitable for scripts, bots, web frontends, and the accompanying UserScript. No authentication is required, but users should respect Toolforge usage guidelines to avoid excessive requests.
API Endpoints
All endpoints use POST methods and expect a JSON body with a "text" field containing the string to transliterate. Responses are typically JSON with a "transliterated_text" field (inferred from standard practices; confirm via docs for exact schema).
- POST /transliterate/AutoDetectPersioArabicScript
- Automatically detects and transliterates text in Perso-Arabic scripts.
- Body: {"text": "string"}
- Useful for mixed or unknown Perso-Arabic input.
- POST /transliterate/AutoDetectSindhiHindiScript
- Automatically detects and transliterates text in Sindhi or Hindi-related scripts.
- Body: {"text": "string"}
- Handles auto-detection for Sindhi Devanagari or Hindi variants.
- POST /transliterate/GurmukhiToShahmukhi
- Transliterates from Gurmukhi (Punjabi) to Shahmukhi (Punjabi in Perso-Arabic).
- Body: {"text": "string"}
- POST /transliterate/HindiToUrdu
- Transliterates from Hindi (Devanagari) to Urdu (Perso-Arabic).
- Body: {"text": "string"}
- Core endpoint for Hindi-Urdu conversion.
- POST /transliterate/ShahmukhiToGurmukhi
- Transliterates from Shahmukhi to Gurmukhi.
- Body: {"text": "string"}
- POST /transliterate/SindhiDEVToRoman
- Transliterates from Sindhi Devanagari to Roman (Latin) script.
- Body: {"text": "string"}
- POST /transliterate/SindhiDEVToSindhiUR
- Transliterates from Sindhi Devanagari to Sindhi Urdu (Perso-Arabic).
- Body: {"text": "string"}
- POST /transliterate/SindhiURToSindhiDEV
- Transliterates from Sindhi Urdu to Sindhi Devanagari.
- Body: {"text": "string"}
- POST /transliterate/UrduToHindi
- Transliterates from Urdu to Hindi.
- Body: {"text": "string"}
Refer to the interactive API documentation for full parameters, potential query options, and live testing.
Example JSON Response
For a request to /transliterate/HindiToUrdu with {"text": "नमस्ते दुनिया"}:
{
"transliterated_text": "نمستے دنیا"
}
Field meanings:
transliterated_text– The converted text in the target script.
Note: Actual response schema may vary; this is based on typical transliteration APIs. Check the docs for precise format. Edge cases like non-transliterable characters (e.g., emojis, numbers) might be preserved or handled specially.
Quick Start Examples
Using curl
1. Transliterate Hindi to Urdu:
curl -X POST "https://indicwiki-transliterate-api.toolforge.org/transliterate/HindiToUrdu" \
-H "Content-Type: application/json" \
-d '{"text": "नमस्ते"}'
Expected response: {"transliterated_text": "نمستے"}
2. Transliterate Urdu to Hindi:
curl -X POST "https://indicwiki-transliterate-api.toolforge.org/transliterate/UrduToHindi" \
-H "Content-Type: application/json" \
-d '{"text": "اسلام علیکم"}'
Expected response: {"transliterated_text": "इस्लाम अलैकुम"}
3. Auto-detect Perso-Arabic:
curl -X POST "https://indicwiki-transliterate-api.toolforge.org/transliterate/AutoDetectPersioArabicScript" \
-H "Content-Type: application/json" \
-d '{"text": "نمستے"}'
Using JavaScript (fetch)
async function transliterateHindiToUrdu(text) {
const response = await fetch('https://indicwiki-transliterate-api.toolforge.org/transliterate/HindiToUrdu', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text })
});
if (!response.ok) {
console.error('Transliteration failed', response.status);
return;
}
const data = await response.json();
console.log(`Transliterated: ${data.transliterated_text}`);
}
transliterateHindiToUrdu('नमस्ते दुनिया');
Error Handling
Standard HTTP status codes are used:
200 OK– Successful transliteration.400 Bad Request– Invalid input (e.g., missing "text" field, unsupported script).404 Not Found– Endpoint not available.500 Internal Server Error– Backend issue, such as transliteration service failure.
Error responses may include JSON with a "detail" or "error" field explaining the issue, e.g., {"detail": "Invalid script detection"}. Always validate input text for length and content to avoid errors. Consider edge cases like empty strings, which might return 400, or very long texts exceeding potential limits (not specified; test empirically).
Usage Notes and Best Practices
- Cache responses for repeated transliterations to reduce API load.
- For high-traffic applications, implement client-side caching or rate limiting.
- The tool proxies external transliteration services (e.g., possibly Google or AI4Bharat); accuracy depends on the backend—report linguistic issues upstream if possible.
- When using the UserScript:
- Install on supported wikis (Hindi, Urdu, etc.).
- It modifies page content in-place; use cautiously on live articles.
- Supports dropdown selection for different transliteration modes.
- Respect Wikimedia's terms: Do not use for bulk scraping or non-Wikimedia purposes without permission.
- For large texts, split into chunks to avoid timeouts or limits.
- Test with diverse inputs, including loanwords, proper names, and punctuation, as transliteration rules vary by language.
Development and Source Code
The API is likely implemented in Python (common for Toolforge), acting as a proxy to underlying transliteration libraries (e.g., AI4Bharat or similar). The UserScript is JavaScript-based, integrating with MediaWiki's API.
- Git repository (API): https://gitlab.wikimedia.org/toolforge-repos/indicwiki-transliterate-api
- UserScript: https://meta.wikimedia.org/wiki/User:Agamyasamuel/Indicwiki-Transliterate-User-Script.js
- License: Not explicitly stated; as a Wikimedia tool, likely open-source (e.g., MIT or GPL).
- To run locally:
- Clone the repository.
- Set up a Python environment with dependencies (e.g., FastAPI if used).
- Configure any required API keys for backend transliteration services.
- Run the server (e.g., with uvicorn).
- For the UserScript, load it in your browser via common.js.
Repository may include setup instructions; contribute via merge requests.
Reporting Bugs and Feature Requests
- GitLab Issues: https://gitlab.wikimedia.org/toolforge-repos/indicwiki-transliterate-api/issues (or create if none)
- Phabricator: Search for related tasks or create one tagged with Toolforge.
- When reporting:
- Include the endpoint, request body, and response.
- Steps to reproduce, including input text.
- Browser/OS details for UserScript issues.
- Specify if the bug is in accuracy, performance, or functionality.
Consider linguistic nuances: Bugs might stem from backend libraries; provide examples of expected vs. actual output.
See Also
- Toolforge documentation: https://wikitech.wikimedia.org/wiki/Help:Toolforge
- Wikimedia Meta-Wiki on UserScripts: https://meta.wikimedia.org/wiki/User_scripts
- Related tools: AI4Bharat Indic transliteration projects (e.g., https://github.com/AI4Bharat)
- Help:Toolforge – General guidance on documenting and maintaining Toolforge tools.