• About
  • Advertise
  • Privacy & Policy
  • Contact
Thursday, December 25, 2025
  • Login
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
Advertisement
  • Home
    • Home – Layout 1
    • Home – Layout 2
    • Home – Layout 3
    • Home – Layout 4
    • Home – Layout 5
    • Home – Layout 6
  • News
    • All
    • Business
    • Politics
    • Science
    • World
    Hillary Clinton in white pantsuit for Trump inauguration

    Hillary Clinton in white pantsuit for Trump inauguration

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Amazon has 143 billion reasons to keep adding more perks to Prime

    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    These Are the 5 Big Tech Stories to Watch in 2017

    These Are the 5 Big Tech Stories to Watch in 2017

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Entertainment
    • All
    • Gaming
    • Movie
    • Music
    • Sports
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    Harnessing the power of VR with Power Rangers and Snapdragon 835

    So you want to be a startup investor? Here are things you should know

    So you want to be a startup investor? Here are things you should know

  • Lifestyle
    • All
    • Fashion
    • Food
    • Health
    • Travel
    Shooting More than 40 Years of New York’s Halloween Parade

    Shooting More than 40 Years of New York’s Halloween Parade

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Heroes of the Storm Global Championship 2017 starts tomorrow, here’s what you need to know

    Why Millennials Need to Save Twice as Much as Boomers Did

    Why Millennials Need to Save Twice as Much as Boomers Did

    Doctors take inspiration from online dating to build organ transplant AI

    Doctors take inspiration from online dating to build organ transplant AI

    How couples can solve lighting disagreements for good

    How couples can solve lighting disagreements for good

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Ducati launch: Lorenzo and Dovizioso’s Desmosedici

    Trending Tags

    • Golden Globes
    • Game of Thrones
    • MotoGP 2017
    • eSports
    • Fashion Week
  • Review
    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    The Legend of Zelda: Breath of the Wild gameplay on the Nintendo Switch

    Shadow Tactics: Blades of the Shogun Review

    Shadow Tactics: Blades of the Shogun Review

    macOS Sierra review: Mac users get a modest update this year

    macOS Sierra review: Mac users get a modest update this year

    Hands on: Samsung Galaxy A5 2017 review

    Hands on: Samsung Galaxy A5 2017 review

    The Last Guardian Playstation 4 Game review

    The Last Guardian Playstation 4 Game review

    Intel Core i7-7700K ‘Kaby Lake’ review

    Intel Core i7-7700K ‘Kaby Lake’ review

No Result
View All Result
Ai News
No Result
View All Result
Home Machine Learning

Translating a Memoir: A Technical Journey | by Valeria Cortez | Dec, 2024

AiNEWS2025 by AiNEWS2025
2024-12-12
in Machine Learning
0
Translating a Memoir: A Technical Journey | by Valeria Cortez | Dec, 2024
0
SHARES
0
VIEWS
Share on FacebookShare on Twitter


Leveraging GPT-3.5 and unstructured APIs for translations

Valeria Cortez

Towards Data Science

This blog post details how I utilised GPT to translate the personal memoir of a family friend, making it accessible to a broader audience. Specifically, I employed GPT-3.5 for translation and Unstructured’s APIs for efficient content extraction and formatting.

The memoir, a heartfelt account by my family friend Carmen Rosa, chronicles her upbringing in Bolivia and her romantic journey in Paris with an Iranian man during the vibrant 1970s. Originally written in Spanish, we aimed to preserve the essence of her narrative while expanding its reach to English-speaking readers through the application of LLM technologies.

Cover image of “Un Destino Sorprendente”, used with permission of author Carmen Rosa Wichtendahl.
Cover image of “Un Destino Sorprendente”, used with permission of author Carmen Rosa Wichtendahl.

Below you can read the translation process in more detail or you can access here the Colab Notebook.

I followed the next steps for the translation of the book:

  1. Import Book Data: I imported the book from a Docx document using the Unstructured API and divided it into chapters and paragraphs.
  2. Translation Technique: I translated each chapter using GPT-3.5. For each paragraph, I provided the latest three translated sentences (if available) from the same chapter. This approach served two purposes:
  • Style Consistency: Maintaining a consistent style throughout the translation by providing context from previous translations.
  • Token Limit: Limiting the number of tokens processed at once to avoid exceeding the model’s context limit.

3. Exporting translation as Docx: I used Unstructured’s API once again to save the translated content in Docx format.

1. Libraries

We’ ll start with the installation and import of the necessary libraries.

pip install --upgrade openai 
pip install python-dotenv
pip install unstructured
pip install python-docx
import openai

# Unstructured
from unstructured.partition.docx import partition_docx
from unstructured.cleaners.core import group_broken_paragraphs

# Data and other libraries
import pandas as pd
import re
from typing import List, Dict
import os
from dotenv import load_dotenv

2. Connecting to OpenAI’s API

The code below sets up the OpenAI API key for use in a Python project. You need to save your API key in an .env file.

import openai

# Specify the path to the .env file
dotenv_path = '/content/.env'

_ = load_dotenv(dotenv_path) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

3. Loading the book

The code allows us to import the book in Docx format and divide it into individual paragraphs.

elements = partition_docx(
filename="/content/libro.docx",
paragraph_grouper=group_broken_paragraphs
)

The code below returns the paragraph in the 10th index of elements.

print(elements[10])

# Returns: Destino sorprendente, es el título que la autora le puso ...

4. Group book into titles and chapters

The next step involves creating a list of chapters. Each chapter will be represented as a dictionary containing a title and a list of paragraphs. This structure simplifies the process of translating each chapter and paragraph individually. Here’s an example of this format:

[
{"title": title 1, "content": [paragraph 1, paragraph 2, ..., paragraph n]},
{"title": title 2, "content": [paragraph 1, paragraph 2, ..., paragraph n]},
...
{"title": title n, "content": [paragraph 1, paragraph 2, ..., paragraph n]},
]

To achieve this, we’ll create a function called group_by_chapter. Here are the key steps involved:

  1. Extract Relevant Information: We can get each narrative text and title by calling element.category. Those are the only categories we’re interested in translating at this point.
  2. Identify Narrative Titles: We recognise that some titles should be part of the narrative text. To account for this, we assume that italicised titles belong to the narrative paragraph.
def group_by_chapter(elements: List) -> List[Dict]:
chapters = []
current_title = None

for element in elements:

text_style = element.metadata.emphasized_text_tags # checks if it is 'b' or 'i' and returns list
unique_text_style = list(set(text_style)) if text_style is not None else None

# we consider an element a title if it is a title category and the style is bold
is_title = (element.category == "Title") & (unique_text_style == ['b'])

# we consider an element a narrative content if it is a narrative text category or
# if it is a title category, but it is italic or italic and bold
is_narrative = (element.category == "NarrativeText") | (
((element.category == "Title") & (unique_text_style is None)) |
((element.category == "Title") & (unique_text_style == ['i'])) |
((element.category == "Title") & (unique_text_style == ['b', 'i']))
)

# for new titles
if is_title:
print(f"Adding title {element.text}")

# Add previous chapter when a new one comes in, unless current title is None
if current_title is not None:
chapters.append(current_chapter)

current_title = element.text
current_chapter = {"title": current_title, "content": []}

elif is_narrative:
print(f"Adding Narrative {element.text}")
current_chapter["content"].append(element.text)

else:
print(f'### No need to convert. Element type: {element.category}')

return chapters

In the example below, we can see an example:

book_chapters[2] 

# Returns
{'title': 'Proemio',
'content': [
'La autobiografía es considerada ...',
'Dentro de las artes literarias, ...',
'Se encuentra más próxima a los, ...',
]
}

5. Book translation

To translate the book, we follow these steps:

  1. Translate Chapter Titles: We translate the title of each chapter.
  2. Translate Paragraphs: We translate each paragraph, providing the model with the latest three translated sentences as context.
  3. Save Translations: We save both the translated titles and content.

The function below automates this process.

def translate_book(book_chapters: List[Dict]) -> Dict:
translated_book = []
for chapter in book_chapters:
print(f"Translating following chapter: {chapter['title']}.")
translated_title = translate_title(chapter['title'])
translated_chapter_content = translate_chapter(chapter['content'])
translated_book.append({
"title": translated_title,
"content": translated_chapter_content
})
return translated_book

For the title, we ask GPT a simple translation as follows:

def translate_title(title: str) -> str:
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages= [{
"role": "system",
"content": f"Translate the following book title into English:\n{title}"
}]
)
return response.choices[0].message.content

To translate a single chapter, we provide the model with the corresponding paragraphs. We instruct the model as follows:

  1. Identify the role: We inform the model that it is a helpful translator for a book.
  2. Provide context: We share the latest three translated sentences from the chapter.
  3. Request translation: We ask the model to translate the next paragraph.

During this process, the function combines all translated paragraphs into a single string.

# Function to translate a chapter using OpenAI API
def translate_chapter(chapter_paragraphs: List[str]) -> str:
translated_content = ""

for i, paragraph in enumerate(chapter_paragraphs):

print(f"Translating paragraph {i + 1} out of {len(chapter_paragraphs)}")

# Builds the message dynamically based on whether there is previous translated content
messages = [{
"role": "system",
"content": "You are a helpful translator for a book."
}]

if translated_content:
latest_content = get_last_three_sentences(translated_content)
messages.append(
{
"role": "system",
"content": f"This is the latest text from the book that you've translated from Spanish into English:\n{latest_content}"
}
)

# Adds the user message for the current paragraph
messages.append(
{
"role": "user",
"content": f"Translate the following text from the book into English:\n{paragraph}"
}
)

# Calls the API
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=messages
)

# Extracts the translated content and appends it
paragraph_translation = response.choices[0].message.content
translated_content += paragraph_translation + '\n\n'

return translated_content

Finally, below we can see the supporting function to get the latest three sentences.

def get_last_three_sentences(paragraph: str) -> str:
# Use regex to split the text into sentences
sentences = re.split(r'(?

# Get the last three sentences (or fewer if the paragraph has less than 3 sentences)
last_three = sentences[-3:]

# Join the sentences into a single string
return ' '.join(last_three)

6. Book export

Finally, we pass the dictionary of chapters to a function that adds each title as a heading and each content as a paragraph. After each paragraph, a page break is added to separate the chapters. The resulting document is then saved locally as a Docx file.

from docx import Document

def create_docx_from_chapters(chapters: Dict, output_filename: str) -> None:
doc = Document()

for chapter in chapters:
# Add chapter title as Heading 1
doc.add_heading(chapter['title'], level=1)

# Add chapter content as normal text
doc.add_paragraph(chapter['content'])

# Add a page break after each chapter
doc.add_page_break()

# Save the document
doc.save(output_filename)

While using GPT and APIs for translation is fast and efficient, there are key limitations compared to human translation:

  • Pronoun and Reference Errors: GPT did misinterpret pronouns or references in few cases, potentially attributing actions or statements to the wrong person in the narrative. A human translator can better resolve such ambiguities.
  • Cultural Context: GPT missed subtle cultural references and idioms that a human translator could interpret more accurately. In this case, several slang terms unique to Santa Cruz, Bolivia, were retained in the original language without additional context or explanation.

Combining AI with human review can balance speed and quality, ensuring translations are both accurate and authentic.

This project demonstrates an approach to translating a book using a combination of GPT-3 and Unstructured APIs. By automating the translation process, we significantly reduced the manual effort required. While the initial translation output may require some minor human revisions to refine the nuances and ensure the highest quality, this approach serves as a strong foundation for efficient and effective book translation

If you have any feedback or suggestions on how to improve this process or the quality of the translations, please feel free to share them in the comments below.

Source link

#Translating #Memoir #Technical #Journey #Valeria #Cortez #Dec

Previous Post

The Dodge Charger is an EV that sounds like a V8

Next Post

The world’s next big environmental problem could come from space

AiNEWS2025

AiNEWS2025

Next Post
The world’s next big environmental problem could come from space

The world’s next big environmental problem could come from space

Stay Connected test

  • 23.9k Followers
  • 99 Subscribers
  • Trending
  • Comments
  • Latest
A tiny new open source AI model performs as well as powerful big ones

A tiny new open source AI model performs as well as powerful big ones

0
Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

Water Cooler Small Talk: The Birthday Paradox 🎂🎉 | by Maria Mouschoutzi, PhD | Sep, 2024

0
Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

Ghost of Yōtei: The acclaimed Ghost of Tsushima is getting a sequel

0
Best Headphones for Working Out (2024): Bose, Shokz, JLab

Best Headphones for Working Out (2024): Bose, Shokz, JLab

0
Keeping Probabilities Honest: The Jacobian Adjustment

Keeping Probabilities Honest: The Jacobian Adjustment

2025-12-25
SPEED Act passes in House despite changes that threaten clean power projects

SPEED Act passes in House despite changes that threaten clean power projects

2025-12-25
AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

2025-12-25
Google’s cute Gemini ad is mostly honest about lying to your kid

Google’s cute Gemini ad is mostly honest about lying to your kid

2025-12-25

Recent News

Keeping Probabilities Honest: The Jacobian Adjustment

Keeping Probabilities Honest: The Jacobian Adjustment

2025-12-25
SPEED Act passes in House despite changes that threaten clean power projects

SPEED Act passes in House despite changes that threaten clean power projects

2025-12-25
AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

2025-12-25
Google’s cute Gemini ad is mostly honest about lying to your kid

Google’s cute Gemini ad is mostly honest about lying to your kid

2025-12-25
Footer logo

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow Us

Browse by Category

  • AI & Cloud Computing
  • AI & Cybersecurity
  • AI & Sentiment Analysis
  • AI Applications
  • AI Ethics
  • AI Future Predictions
  • AI in Education
  • AI in Fintech
  • AI in Gaming
  • AI in Healthcare
  • AI in Startups
  • AI Innovations
  • AI News
  • AI Research
  • AI Tools & Automation
  • Apps
  • AR/VR & AI
  • Business
  • Deep Learning
  • Emerging Technologies
  • Entertainment
  • Fashion
  • Food
  • Gadget
  • Gaming
  • Health
  • Lifestyle
  • Machine Learning
  • Mobile
  • Movie
  • Music
  • News
  • Politics
  • Review
  • Robotics & Smart Systems
  • Science
  • Sports
  • Startup
  • Tech
  • Travel
  • World

Recent News

Keeping Probabilities Honest: The Jacobian Adjustment

Keeping Probabilities Honest: The Jacobian Adjustment

2025-12-25
SPEED Act passes in House despite changes that threaten clean power projects

SPEED Act passes in House despite changes that threaten clean power projects

2025-12-25
  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.