Comparing fashion words with TweetFiled under: data
This post shows a comparison of tweets grouped by languages.
The dataset was grouped into two categories according to the ISO 639-1. English (en) and non-English groups. This data comes from users who specified their default language in their Twitter profiles.
The map shows where English is the dominant speaking language located around the world.
These tweets were found through the API by using the term “fashion” so of course that shows up the most frequently. The data was collected March 7-20, 2013.
How might this work be improved in the future?
- Text processing: Better natural language processing to account for foreign language stop words and better recognition of hashtags to words used as part of a sentence.
- Timing: compare tweets during fashion week with those tweeted other times.
- Geography: compare tweets from different cities and various fashion weeks at different cities
- Visualizations: someone told me about D3, I’ll have to take a looksee and build something with it!