How to machine translate product descriptions

Machine translation systems, such as Google Translate, make it quick and easy to bulk translate product copy and other content via google_trans_new. Here’s how it’s done.

How to machine translate product descriptions
Picture by Christina Morillo, Pexels.
6 minutes to read

Whether you’re analysing content written in other languages using Natural Language Processing, or you want to assist your content team by translating their writing into other languages, machine translating software can be extremely useful.

I am reliably informed that Google Translate can now be almost as good as some human translators. While it’s never going to be perfect, it could be perfectly adequate for many projects, and it’s far quicker and cheaper than using humans to do the job.

Thanks to some excellent Python APIs, it is also now very easy to bulk translate content strings, Pandas dataframe columns, or entire websites using Python and Google Translate. Here’s how it’s done.

Load the packages

For this project we’ll be using pandas and google_trans_new. I’ve previously used googletrans, but it looks like it’s awaiting an update following a change to the Google Translate API, as it’s currently not working as well as it once did. You can install this by entering pip3 install google_trans_new in your terminal.

import pandas as pd
from google_trans_new import google_translator

Detecting the language

Before you can do anything in google_trans_new, you need to instantiate the google_translator(). I’ve assigned this to an instance called translator. Then, I’ve passed my foreign string to the detect() function, which returns the country code and language name it detected from our text.

translator = google_translator()
detect_text = translator.detect('Hasta la vista, Baby!')  
detect_text
['es', 'spanish']

Translating the text

To translate a text string into another language, you pass the text to the translate() function along with the two-letter country code for the lang_tgt, representing the target language you want to translate to. This turns our text string สวัสดีจีน into Hello china.

translate_text = translator.translate('สวัสดีจีน', lang_tgt='en')
translate_text
'Hello china '

Translating product descriptions

Next, we’ll do some bulk translations. Load up a Pandas dataframe containing the text you want to translate. Mine contains a mixture of ecommerce product detail page content from various electronics sites.

df = pd.read_csv('products.csv')
df.head()
content
0 The portable Google Pixelbook Go laptop has be...
1 The ASUS Chromebook has a 14” Full HD touch di...
2 The Lenovo S345 Chromebook laptop uses Google’...
3 The Acer 314 laptop has been created with a ba...

To bulk translate the text into our chosen languages, we’ll create a little function called translate_column() so we can use the apply() function to run it on our Pandas dataframe column. This takes the text string and the two-letter country code target_language.

def translate_column(text, target_language):
    return translator.translate(text, lang_tgt=target_language)

Now, we’ll define a few new Pandas columns and use apply() to run translate_column() as a lambda function. We’ll do this three times to translate our product page content into French, German, and Igbo. Run the code and your text will be automatically translated and saved into the Pandas dataframe, so you can use it as you wish.

df['content_french'] = df.apply(lambda x: translate_column(x.content, 'fr'), axis=1)
df['content_german'] = df.apply(lambda x: translate_column(x.content, 'de'), axis=1)
df['content_igbo'] = df.apply(lambda x: translate_column(x.content, 'ig'), axis=1)
df[['content_french']].head()
content_french
0 L'ordinateur portable Google Pixelbook Go a ét...
1 Le Chromebook ASUS est doté d'un écran tactile...
2 L'ordinateur portable Lenovo S345 Chromebook u...
3 L'ordinateur portable Acer 314 a été créé avec...
df[['content_german']].head()
content_german
0 Der tragbare Google Pixelbook Go-Laptop wurde ...
1 Das ASUS Chromebook verfügt über ein 14-Zoll-F...
2 Der Lenovo S345 Chromebook-Laptop verwendet da...
3 Der Acer 314-Laptop wurde mit einem Akku mit e...
df[['content_igbo']].head()
content_igbo
0 Ejirila laptọọpụ Google Pixelbook Go dị obere ...
1 ASUS Chromebook nwere ihe ngosi 14 ”Full HD na...
2 Laptọọpụ Lenovo S345 Chromebook na-eji Google ...
3 Emepụtara laptọọpụ Acer 314 na batrị nke na-en...

Matt Clarke, Sunday, March 14, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Introduction to TensorFlow in Python

Learn the fundamentals of neural networks and how to build deep learning models using TensorFlow.

Start course for FREE

Comments