Preparation
Firstly, an archived version of the original website for the World Tipiṭaka Edition was accessed. A Python script was written to convert the XML enclosed HTML files in this archive into normal HTML for initial inspection.
The modern recreation of The World Tipiṭaka Edition is then generated by converting the bespoke HTML to a modern static website using standard HTML based on Bootstrap. In the process, the original numeric ids for each webpage is converted to a semantic representation using a heuristic algorithm. This is used as a human readable reference for the edition used for translation
As an alternate path which is used for the translation, the same bespoke HTML files are also converted to Markdown using a Python script. The Markdown files generated are fed into the translation process.
Translation Process
The translation involves looking up each word in the text via Digital Pali Dictionary. This is done algorithmically by accessing the SQLlite dpd.db
database from a Python script, to determine possible meanings and inflection forms. One day I may publish the script that does this.
Based on the possible meanings and inflection forms, the appropriate meaning and inflection form combination is selected based on the context of the sentence in the text.
Finally, the translated version of the text is created by rewriting the words into natural English. Buddhist technical terms are retained with their meanings provided in parentheses to retain the accuracy of the original text.
Where appropriate, the translation is compared to existing English translations to confirm accuracy.
Additional Steps
A summary of the translated text is also provided. This is generated using a Large Language Model (Gemini 2.5 Pro).
The translated text is also rendered as one or more Mermaid diagram(s).
I also provide a commentary on the translation, which is my interpretation of the text using a rational perspective based on a phenomenological framework. This represents my personal opinion of what the Buddha may have understood and taught, and is subject to change.
Where possible, I have identified parallels to the text in other versions of the Buddhist canon. I also done a literature review of available articles of the text published in academic journals. Where possible, to the extent permissible by copyright law, I have converted open access articles or books (based on the fact that they are available for download on public Internet sites) into Markdown and included the full text of the articles or books in the website for easy reference.
The conversion of articles and books from PDF to Markdown is achieved by using a transformer model (Mistral OCR) and then manually editing the resultant output.
Copyrighted articles and books may be included in summarised rather than full text forms. These summaries are generated using a Large Language Model (Gemini 2.5 Pro).
Finally, I often include an image to accompany the translated text by using a generative model (Imagen 4). These images are generated using very specific prompts to convey a consistent artistic style and reflecting elements of the translated text.
Thank you
I like to acknowledge and thank Google (Alphabet) for providing me with credits (over A$2000) via their Gen App Builder program to enable me to use their generative and large language models (Gemini and Imagen) for text summarisation and image generation, and also providing me with the training to use the models (as well as some free swag!). I would also like to thank Mistral.ai for providing free access to their OCR API. Finally I would like to thank Yuttadhammo Bhikkhu for archiving the World Tipiṭaka Edition, without which this translation would not have been possible.