[Tutorial] How to fully localize ERPNext using AI (Custom App approach) + Fix for Bench update error

Hello Community! :waving_hand:

I recently faced the challenge of fully localizing ERPNext (v15) for my region. The default translations were incomplete, and manual editing was too slow. I developed a workflow that uses AI (LLM) to translate thousands of strings in minutes and a Custom App to ensure these translations survive updates.

I also encountered a specific error in recent Bench versions (AttributeError: 'App' object has no attribute 'tag') when using local apps, and found a workaround.

Here is the complete guide to replicate this setup.

:warning: IMPORTANT NOTES:

  1. Site Name: Replace site.local with your actual site name (e.g., mysite.com).

  2. Language: You must edit the SYSTEM_PROMPT inside the script to specify your target language (e.g., Spanish, German, Hindi) and your region’s specific ERP terminology.


Step 1: Create a Container App

Never modify core files in apps/erpnext. Create a dedicated app to hold your translations.

Bash

cd ~/frappe-bench
# Create a new app (we use --no-git for speed, but we will fix the repo later)
bench new-app custom_translations --no-git

# Install it on your site
bench --site site.local install-app custom_translations


Step 2: Extract Source Strings

To get a full list of translatable strings (including hidden system messages):

Bash

bench --site site.local build-message-files

Locate the generated source file (all_english_strings.txt or similar) and use it as your INPUT_FILE for the script below.


Step 3: AI Translation and CSV Formatting Script

This script performs the translation, saves progress to a database, and outputs a clean, Frappe-compatible CSV file by correctly handling quotes (quoting=csv.QUOTE_ALL)—eliminating the need for a separate cleaning script.

File: translate_erpnext.py

Python

import asyncio
import json
import os
import csv
from typing import Dict, List
# Ensure you install the library for your chosen LLM (e.g., `pip install google-generativeai tqdm`)
import google.generativeai as genai 
from tqdm.asyncio import tqdm

# --- CONFIGURATION ---
API_KEY = "YOUR_API_KEY_HERE"  # Replace with your actual API key
INPUT_FILE = 'source_strings.txt'  # File containing English strings (one per line)
PROGRESS_DB_FILE = 'translation_db.json' # Stores progress to allow resuming
FINAL_OUTPUT_FILE = 'ru.csv' # CHANGE THIS: Use your language code (e.g., 'es.csv', 'de.csv')

BATCH_SIZE = 30
CONCURRENCY_LIMIT = 1  # Adjust based on your API plan's concurrency limit
RPM_DELAY = 4.5        # Delay to respect Rate Limits per minute

# --- MODEL SETUP ---
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel('gemini-2.0-flash')

# --- SYSTEM PROMPT (CRITICAL: CUSTOMIZE THIS!) ---
SYSTEM_PROMPT = """
Role: Senior ERPNext Localization Expert.
Context: Accounting, Supply Chain, HR, and System Administration.

# !!! CHANGE THE TARGET LANGUAGE BELOW !!!
Target Language: Russian (RF/KZ context). 

RULES:
1. Use professional terminology accepted in your region (e.g., "Submit" -> "ŠŸŃ€Š¾Š²ŠµŃŃ‚Šø").
2. NEVER translate code tags: {{ name }}, %s, {0}, <div>, etc.
3. If input is technical code or numbers only, return it as is.
4. Output strict JSON dictionary/object format where keys match the input indices.
"""

def load_progress() -> dict:
    """Loads previously translated strings to avoid duplicates."""
    if not os.path.exists(PROGRESS_DB_FILE):
        return {}
    try:
        with open(PROGRESS_DB_FILE, 'r', encoding='utf-8') as f:
            return json.load(f)
    except json.JSONDecodeError:
        print("Warning: Progress DB corrupted. Starting fresh.")
        return {}

def save_progress(new_data: dict):
    """Saves progress to JSON database."""
    current_db = load_progress()
    current_db.update(new_data)
    with open(PROGRESS_DB_FILE, 'w', encoding='utf-8') as f:
        json.dump(current_db, f, ensure_ascii=False, indent=2)

def export_to_csv(translation_db: dict):
    """
    Exports the database to a Frappe-compatible CSV format.
    Crucial: Uses csv.QUOTE_ALL to handle commas/quotes inside strings.
    """
    print(f"\nExporting {len(translation_db)} strings to {FINAL_OUTPUT_FILE}...")
    with open(FINAL_OUTPUT_FILE, 'w', encoding='utf-8', newline='') as f:
        # csv.QUOTE_ALL ensures that all strings are quoted, protecting commas inside phrases.
        writer = csv.writer(f, quoting=csv.QUOTE_ALL)
        for source, target in translation_db.items():
            writer.writerow([source, target])
    print("Export complete.")

async def translate_batch(sem, batch_lines: list):
    """Handles API call for a single batch with retry logic."""
    async with sem:
        # Create a dictionary payload for the LLM to process
        batch_dict = {str(i): line for i, line in enumerate(batch_lines)}
        text_payload = json.dumps(batch_dict, ensure_ascii=False)
        
        prompt = f"Translate the following dictionary values: {text_payload}"
        
        # Retry logic for API stability
        for attempt in range(3):
            try:
                # Use to_thread for synchronous API calls within an async loop
                response = await asyncio.to_thread(
                    model.generate_content,
                    contents=[SYSTEM_PROMPT, prompt],
                    # Requesting a JSON response object
                    generation_config={"response_mime_type": "application/json"} 
                )
                result_json = json.loads(response.text)
                
                translated_batch = {}
                for idx_str, original_text in batch_dict.items():
                    # Map original text to its translated version from the returned JSON
                    # Fallback to original text if the translation is missing in the result
                    translated_batch[original_text] = result_json.get(idx_str, original_text)
                
                # Delay to respect RPM limits
                await asyncio.sleep(RPM_DELAY)
                return translated_batch
            except Exception as e:
                print(f"\n[Error on attempt {attempt+1}] {e}. Retrying...")
                await asyncio.sleep(5 * (attempt + 1))
        
        # Return original text for all lines if all attempts fail
        return {line: line for line in batch_lines} 

async def main():
    if not os.path.exists(INPUT_FILE):
        print(f"Error: Input file '{INPUT_FILE}' not found.")
        return

    with open(INPUT_FILE, 'r', encoding='utf-8') as f:
        all_lines = [line.strip() for line in f if line.strip()]

    progress_db = load_progress()
    remaining_lines = [line for line in all_lines if line not in progress_db]
    
    print(f"Total strings: {len(all_lines)} | Already translated: {len(all_lines) - len(remaining_lines)} | Remaining: {len(remaining_lines)}")

    if remaining_lines:
        batches = [remaining_lines[i:i + BATCH_SIZE] for i in range(0, len(remaining_lines), BATCH_SIZE)]
        sem = asyncio.Semaphore(CONCURRENCY_LIMIT)
        
        pbar = tqdm(total=len(batches), desc="Translating Batches")
        for batch in batches:
            result = await translate_batch(sem, batch)
            # Save results immediately to disk after each batch
            save_progress(result)
            pbar.update(1)
        pbar.close()

    # Final export after all batches are processed
    final_db = load_progress()
    export_to_csv(final_db)

if __name__ == "__main__":
    asyncio.run(main())


Step 4: The ā€œBench Updateā€ Fix

If you created your app with --no-git, running bench update might crash with:

AttributeError: 'App' object has no attribute 'tag'

This happens because Bench expects a git repository. The Fix: Initialize a dummy local git repo inside your app.

Bash

cd ~/frappe-bench/apps/custom_translations

# Initialize git to satisfy Bench requirements
git init
git config user.email "local@admin.com"
git config user.name "Local Admin"
git add .
git commit -m "Initial commit to fix bench update error"


Step 5: Deployment

  1. Upload the CSV: Copy your generated CSV (e.g., ru.csv) to your server: ~/frappe-bench/apps/custom_translations/custom_translations/translations/

  2. Apply Changes (Server-side): Translation files from Custom Apps have higher priority than core files, but you must trigger a migration to load them into the database.

    Bash

    cd ~/frappe-bench
    
    # 1. Force migration to parse the new CSV (Replace 'site.local')
    bench --site site.local migrate
    
    # 2. Clear Redis cache (Critical for translations)
    bench clear-cache
    
    # 3. Restart services
    sudo supervisorctl restart all
    
    

Result

Your ERPNext instance is now fully localized. Since the translations live in a Custom App, they will not be overwritten when you update ERPNext framework code.

I can’t find the files . Can you let me know where your files is located. THX

Hi! When you run bench build-message-files, the command generates a CSV file (often named message_files.csv or similar) inside your sites folder, or it simply updates the translation JSON/CSV files within each app’s folder (e.g., frappe-bench/apps/erpnext/erpnext/translations/).

However, the command output in the terminal usually tells you exactly where the file was saved.

You can download the generated file from your server to your local machine using scp. For example:

# Run in terminal on your local machine (not the server)
scp user@your-server-ip:/home/frappe/frappe-bench/sites/site_name/message_files.csv .

(Replace paths with the actual path shown in your terminal output).

Thanks for your quickly reply.

Unfortunately, no feedback was seen on the terminal screen regarding the execution of this command. Anyway I can find some csv files in apps\erpnext\erpnext\translations that has been modified after execution of this command. It would be the files i looking for.

I will try to import to the python script and see what’s happen.

THX again.