Wednesday, 8 March 2017

6th March - 10th March 2017

This is the final week of our project, so we aim to start testing and the documentation. We met with our supervisor on Wednesday to discuss some queries we had and also to update him with how the overall project is going.

During last week, we encountered some issues when setting up a sever to host our web application which set us back a little bit time-wise. This was due to some problems setting up the initial NGINX and uWSGI confiuration on AWS. Once that was established, however, further problems with connecting the AWS namespace servers to the domain name complicated matters. Finally, a problem with ssh-ing into the AWS instance made it impossible to work on the server configuration as troubleshooting why we weren't able to ssh into the instance was leading nowhere. Having already had problems setting up the intial NGINX and uSWGI configuration, going back to scratch would have been too much. However in a meeting with our supervisor, he advised us to scrap the idea of using the online server and just use the django development server on localhost instead for the demo. With that coming into play, we were able to focus on the presentation of the user interface and begin testing.

We developed a simple two page website which has a form for submission of the brand campaign details. The page then automatically redirects to the results page which would show the tree of tweets and the various analysis results in the next page. We chose to do a minimal website for ease of use while still maintaining high design standards. 

As for the graph representations of our analysis results, we finally settled on using the "graphos" python package which implements several different javascript graphing libraries into python to be used in django web development specifically. This eased our workload considerably as Plot.ly, which we had mentioned looking into before, proved to be too much of a hassle to integrate with django as it produced a separate html file for view the generated graphs. If we were to use that, we would have had to render or redirect the user to a different page, thus needing to resolve a different url and a different view which we thought would have put the design down in terms of leaving all the graphs and results on one page for ease of access. In contrast the "graphos" package allowed us to integrate all the results in one page, lessening the amount of click-throughs and page redirection needed. 

So now our main goal for the rest of this week is to make the user interface more appealing, complete all necessary documentation, finish testing and create the video walkthrough before the deadline on Friday.

Sunday, 5 March 2017

28th February - 5th March

From the 28th February to 3rd March we finished up on implementing the analysis phase of the project as we planned beforehand.

Progress:
The following analysis methods have thus far been implemented in python:

  • Average number of replies
  • Most influential reply in the tree
  • Count per hour of replies since original tweet was posted
  • Basic sentiment analysis
  • An interactive map representation of locations of replies

Tree Visualisation
We have started to develop a module to handle creating the visualization part of the tweets tree. Initially, in our project spec (functional spec) we were looking into using gephi/gexf file format to represent the node in HTML. However having further researched into it, the area of XML styling is a bit hard to grasp. Installation and use of this gexf file format is a bit vague and our initial try-runs of sample code yielded no results.  

We finally decided to use the D3 javascript library as they have an extensive gallery of examples in the cases of representing data visually. For our tree to be represented in javascript the format of the data needed to be in a nested json object. Converting our tree to this structure proved to be a little hard to do since we chose to do it recursively. Initial attempts so only the first few levels of the tree being displayed but threads that went deeper were cut off in the first few levels. We finally realised that the problem ended up being the order in which the tweets were being collected. Since they were collected in reverse time of the original tweet, our attempts to append each node to their respective parent in the nested json format failed because their parents weren't attached to the root json object in the first place. To remedy this we simply changed the collection from the built-in method in the anytree package 'PreOrderIter' to just the collection of 'root.descendants'. 

Graphs
During our implementation of the "count per hour" function and the interactive map we used the python package matplotlib and basemap in order to represent the data in graphs. However, this would not work in integrating those graphs with the workflow of the django framework. That is computing the graphs and then passing them in as part of the 'context' to be rendered in the views.py. In our research we came across Plot.ly as an alternative.