While reading the Upanishads and Vedas, the thing that strikes you first are the names of Rishis and the Seekers of Knowledge. Vamadeva, Viswamitra, Angirasa, Bhrigu and more. Leave alone understanding of mantras, the very mention of these names seem to wake some primordial impulse to Truth and Godhood. So it should be, since to the Rishi every God, every named power, could be invoked and meditated upon until they manifested in him in all their power. So it is my idea to dwell on these ancient names to perhaps draw something, worth the try I would think.
So it struck me, what of works that I seem to know, what names lurk in there, what powers or qualities do other names hold for the eager seeker? So I picked the Mahabharata, specifically the version available at Project Gutenberg, translated by Sri Kisari Mohan Ganguli from the original Sanskrit of Krishna Dwaipanya Vyasa. With some help from Python, NLTK and writing a basic entity recognition algo I managed to extract all the names from the english text of Ganguli. That done, I fed them into the excellent WordCloud in Python tutorial code by Andreas Mueller. And just like that they came out..the heroes, villains, gods, demi-gods..all of them
So without further ado, let me humbly offer this tiny fruit of my labour for your curiosity and pleasure. Note that each image links to high resolution versions (1200 x 900).
Vana Parva 1
Vana Parva 2
Go ahead download and see how many names are familiar to you. It is interesting how mentions of characters varies as the plot progresses. Vaisampayana is the narrator, hence the number of mentions. Arjuna establishes himself as key at the start and the end. In the middle sections Yudhisthira plays a bigger role and so on.
If there is interest I can release my code that did the basic entity recognition with some help from NLTK. In addition I also plan to put out a clickable version of this word cloud, so that curious folks can head to Wikipedia directly.
Let me know if this sort of analysis and visualization on Hindu scriptures and epics seems interesting to you.
The size of the image is based on the frequency of mentions in the text and normalized for overall word count. The top 200 names have been pulled out. As you can see the entity recognition could be a lot better. The list of left out names are much larger, I need to try another route to present all of them.
Credits to Python community, the awesome natural language processing library NLTK and Andreas Mueller for enabling this, would not have been possible otherwise.