Third Year Project: 28th February

From the 28th February to 3rd March we finished up on implementing the analysis phase of the project as we planned beforehand.

Progress:
The following analysis methods have thus far been implemented in python:

Average number of replies
Most influential reply in the tree
Count per hour of replies since original tweet was posted
Basic sentiment analysis
An interactive map representation of locations of replies

Tree Visualisation

We have started to develop a module to handle creating the visualization part of the tweets tree. Initially, in our project spec (functional spec) we were looking into using gephi/gexf file format to represent the node in HTML. However having further researched into it, the area of XML styling is a bit hard to grasp. Installation and use of this gexf file format is a bit vague and our initial try-runs of sample code yielded no results.

We finally decided to use the D3 javascript library as they have an extensive gallery of examples in the cases of representing data visually. For our tree to be represented in javascript the format of the data needed to be in a nested json object. Converting our tree to this structure proved to be a little hard to do since we chose to do it recursively. Initial attempts so only the first few levels of the tree being displayed but threads that went deeper were cut off in the first few levels. We finally realised that the problem ended up being the order in which the tweets were being collected. Since they were collected in reverse time of the original tweet, our attempts to append each node to their respective parent in the nested json format failed because their parents weren't attached to the root json object in the first place. To remedy this we simply changed the collection from the built-in method in the anytree package 'PreOrderIter' to just the collection of 'root.descendants'.

Graphs
During our implementation of the "count per hour" function and the interactive map we used the python package matplotlib and basemap in order to represent the data in graphs. However, this would not work in integrating those graphs with the workflow of the django framework. That is computing the graphs and then passing them in as part of the 'context' to be rendered in the views.py. In our research we came across Plot.ly as an alternative.

Third Year Project

Sunday, 5 March 2017

28th February - 5th March

No comments:

Post a Comment