The day after the famous interview was aired, the complete transcript of the interview was available online to read through. On facebook, I saw one of my college juniors had made a simple shell script to count the number occurrences of certain words. This by itself was quite insightful and I knew I could take it to the next level without much effort by using R and few of its packages.
I’ve uploaded the R script on github. Following is the basic flow of the script
- Separate the Rahul and Arnabs conversation into 2 buckets
- Remove extra spaces
- Remove punctuation
- Convert the text to lower case
- Remove the stop words
- Convert the text to a term document matrix
- Rank the words based on their occurrences
- Generate the word cloud and also their top 5 words
Here is the word cloud.