Progress:
The following analysis methods have thus far been implemented in python:
- Average number of replies
- Most influential reply in the tree
- Count per hour of replies since original tweet was posted
- Basic sentiment analysis
- An interactive map representation of locations of replies
Tree Visualisation
We have started to develop a module to handle creating the visualization part of the tweets tree. Initially, in our project spec (functional spec) we were looking into using gephi/gexf file format to represent the node in HTML. However having further researched into it, the area of XML styling is a bit hard to grasp. Installation and use of this gexf file format is a bit vague and our initial try-runs of sample code yielded no results.
We finally decided to use the D3 javascript library as they have an extensive gallery of examples in the cases of representing data visually. For our tree to be represented in javascript the format of the data needed to be in a nested json object. Converting our tree to this structure proved to be a little hard to do since we chose to do it recursively. Initial attempts so only the first few levels of the tree being displayed but threads that went deeper were cut off in the first few levels. We finally realised that the problem ended up being the order in which the tweets were being collected. Since they were collected in reverse time of the original tweet, our attempts to append each node to their respective parent in the nested json format failed because their parents weren't attached to the root json object in the first place. To remedy this we simply changed the collection from the built-in method in the anytree package 'PreOrderIter' to just the collection of 'root.descendants'.
We finally decided to use the D3 javascript library as they have an extensive gallery of examples in the cases of representing data visually. For our tree to be represented in javascript the format of the data needed to be in a nested json object. Converting our tree to this structure proved to be a little hard to do since we chose to do it recursively. Initial attempts so only the first few levels of the tree being displayed but threads that went deeper were cut off in the first few levels. We finally realised that the problem ended up being the order in which the tweets were being collected. Since they were collected in reverse time of the original tweet, our attempts to append each node to their respective parent in the nested json format failed because their parents weren't attached to the root json object in the first place. To remedy this we simply changed the collection from the built-in method in the anytree package 'PreOrderIter' to just the collection of 'root.descendants'.
During our implementation of the "count per hour" function and the interactive map we used the python package matplotlib and basemap in order to represent the data in graphs. However, this would not work in integrating those graphs with the workflow of the django framework. That is computing the graphs and then passing them in as part of the 'context' to be rendered in the views.py. In our research we came across Plot.ly as an alternative.
No comments:
Post a Comment