@trickyloki3 Yes, I do mean comparing for the different flavor text. The last message you wrote was asking for help with regard to Flavor text, and that's what I was concerned with.
The problem: scaling up generation of item descriptions for flavor text
Your program is about automation which excites me, but I am not sure how to scale up/automate the Flavor text part. That is big chunk of work you'll need to do to make sure each item's complete description (including flavor text) is reliable.
It's either we do checking manually (which Asheraf, Zackdreaver, and community contributors already do) or we go with the fundamental idea of your work: automating the item descriptions. I like the path you're taking: automate it. We already have very few volunteers in the community. Better to make good use of their time and automate what we can.
We already have existing reliable and high quality datasets for item descriptions (Asheraf, Zackdreaver). We frame the task of item descriptions generation as a dataset generation problem, and the goal is to achieve similar quality as that of the gold standard: Human-processed manually checked item descriptions.
To compare the similarity between two things, we need a measure. Coming up with some statistical measure can help you be more confident that your generated item descriptions are indeed semantically equivalent to the item.
What good would a statistical measure do? How can it be useful?
Having a statistical measure, like edit distance, can allow us to see statistics between the datasets. This is illustrated in an example below:
Asheraf Translation vs. trickyloki AutoGen DB to Item Description
15,000 items automatically generated from item_db.conf and item_db2.conf
13,000 items compared between the two data translations.
Average edit distance: 4 characters (means that your automatically generated descriptions only vary with asheraf's translation by 4 characters on the average).
Zackdreaver Translation vs. trickyloki AutoGen DB to Item Description
15,000 items automatically generated from item_db.conf and item_db2.conf
12,000 items compared between the two data translations.
Average edit distance: 12 characters (means that your automatically generated descriptions only vary with zackdreaver's translation by 12 characters on the average. This is curious, and since this is all automated, you can easily query which items had high edit distances and study why they varied, so you can better tailor your description algorithm).
In the end, you're not aiming to be perfectly close to Asheraf or Zackdreaver, but to generate not only high quality item script descriptions but also still have reliable flavor text that we all love to read while playing RO.
These are only ideas and suggestions; it is your project sir, and I wish you all the best. Thank you for your generous contribution to the community!