Third Year Project

Wednesday, 8 March 2017

6th March - 10th March 2017

This is the final week of our project, so we aim to start testing and the documentation. We met with our supervisor on Wednesday to discuss some queries we had and also to update him with how the overall project is going.

During last week, we encountered some issues when setting up a sever to host our web application which set us back a little bit time-wise. This was due to some problems setting up the initial NGINX and uWSGI confiuration on AWS. Once that was established, however, further problems with connecting the AWS namespace servers to the domain name complicated matters. Finally, a problem with ssh-ing into the AWS instance made it impossible to work on the server configuration as troubleshooting why we weren't able to ssh into the instance was leading nowhere. Having already had problems setting up the intial NGINX and uSWGI configuration, going back to scratch would have been too much. However in a meeting with our supervisor, he advised us to scrap the idea of using the online server and just use the django development server on localhost instead for the demo. With that coming into play, we were able to focus on the presentation of the user interface and begin testing.

We developed a simple two page website which has a form for submission of the brand campaign details. The page then automatically redirects to the results page which would show the tree of tweets and the various analysis results in the next page. We chose to do a minimal website for ease of use while still maintaining high design standards.

As for the graph representations of our analysis results, we finally settled on using the "graphos" python package which implements several different javascript graphing libraries into python to be used in django web development specifically. This eased our workload considerably as Plot.ly, which we had mentioned looking into before, proved to be too much of a hassle to integrate with django as it produced a separate html file for view the generated graphs. If we were to use that, we would have had to render or redirect the user to a different page, thus needing to resolve a different url and a different view which we thought would have put the design down in terms of leaving all the graphs and results on one page for ease of access. In contrast the "graphos" package allowed us to integrate all the results in one page, lessening the amount of click-throughs and page redirection needed.

So now our main goal for the rest of this week is to make the user interface more appealing, complete all necessary documentation, finish testing and create the video walkthrough before the deadline on Friday.

Sunday, 5 March 2017

28th February - 5th March

From the 28th February to 3rd March we finished up on implementing the analysis phase of the project as we planned beforehand.

Progress:
The following analysis methods have thus far been implemented in python:

Average number of replies
Most influential reply in the tree
Count per hour of replies since original tweet was posted
Basic sentiment analysis
An interactive map representation of locations of replies

Tree Visualisation

We have started to develop a module to handle creating the visualization part of the tweets tree. Initially, in our project spec (functional spec) we were looking into using gephi/gexf file format to represent the node in HTML. However having further researched into it, the area of XML styling is a bit hard to grasp. Installation and use of this gexf file format is a bit vague and our initial try-runs of sample code yielded no results.

We finally decided to use the D3 javascript library as they have an extensive gallery of examples in the cases of representing data visually. For our tree to be represented in javascript the format of the data needed to be in a nested json object. Converting our tree to this structure proved to be a little hard to do since we chose to do it recursively. Initial attempts so only the first few levels of the tree being displayed but threads that went deeper were cut off in the first few levels. We finally realised that the problem ended up being the order in which the tweets were being collected. Since they were collected in reverse time of the original tweet, our attempts to append each node to their respective parent in the nested json format failed because their parents weren't attached to the root json object in the first place. To remedy this we simply changed the collection from the built-in method in the anytree package 'PreOrderIter' to just the collection of 'root.descendants'.

Graphs
During our implementation of the "count per hour" function and the interactive map we used the python package matplotlib and basemap in order to represent the data in graphs. However, this would not work in integrating those graphs with the workflow of the django framework. That is computing the graphs and then passing them in as part of the 'context' to be rendered in the views.py. In our research we came across Plot.ly as an alternative.

Tuesday, 28 February 2017

27th February - 28th February 2017

This week we are working on the analysis phase of our project.
This includes:

Finding the most influential node in the tree of tweets

and seeing if a users follower count impact the level of influence
checking if there is a correlation

Finding the most common time tweets are sent

looking at time difference between tweets and replies

Conducting simple sentiment analysis to see if most replies to brand campaigns are positive or negative
Seeing if location has any influence

what countries tweet about a specific brand campaign the most.

We have divided up the tasks above to equally work on during the week to have completed for our meeting with our supervisor on Friday morning.

We will conduct testing and finishing up documentation and video walkthrough next week.

Sunday, 26 February 2017

23rd - 26th February 2017

This week Ina and I met up to discuss our progress. As in our last blog post, the database is now set up to receive tweets and the tree can now display tweets as branching nodes on the terminal which we use for testing.

Plans: Django web application + server configuration
We are using django, the python framework, to write up the app interface. We plan to use the following server configuration setup:

the web client <-> the web server <-> the socket <-> uWSGI <-> Python

This is an addition to using Amazon Web Services to host the NGINX web server and uWSGI.

Progress:
Started to implement simple analysis methods on the established tree structure. This includes:

Finding the average number of replies in a tree
Finding the longest reply-chain in the tree

We tried to contact our supervisor but unfortunately, he was unavailable so we will schedule another meeting with him next week.

Wednesday, 22 February 2017

18th - 22nd February 2017

While trying to implement the code for the project regarding tracing retweets through the different users, we ran into a few problems. It turns out that the Twitter API does not hold any data in the Tweet object about which user a particular Tweet is retweeted from. It only links back to the original author of the Tweet, regardless of any intermediary links. This poses a great problem for us as a large part of the project relies on being able to trace a retweets journey. Without the links in between to show who a user retweets a Tweet from, building a tree network of retweets seems impossible.

Here are some suggested workarounds:

Analysing retweets based on social network theories such as node centrality etc. instead of the tree-based approach in order to determine which users/retweets have more of an impact on the dispersion of the brand campaign.
Focusing on analysing the retweets in relation to other factors such as time etc

In the meanwhile, we are focusing on tracing replies to the campaign tweets while we wait to run the alternatives or any other fix-its with our supervisor.

Rachel is then working on the database setup and the functions related to passing in structured data of Tweets so we may be able to use it for analysis. We also ran into a few problems in this area as the Tweet structure can get quite convoluted with nested objects and elements making it hard to parse as humans, thus making querying the database a little hard to do. Along with this, we realised that we are unsure if we have to create an actual mySQL server for the database or is there a way around that. We plan to discuss that with our supervisor in our next meeting.

Tuesday, 14 February 2017

13th-17th February 2017

After falling slightly behind our initial schedule, we met with our supervisor (Ray Walshe) last Friday to discuss how to proceed with our project. After a successful meeting, we managed to kick ourselves into gear and divided up some work to be done individually. As it stands, we have managed to access the Twitter API and gather a number of tweets from the public stream and make these tweets readable by implementing the json package. Whilst this was being done, Ina was working off a controlled batch of tweets in order to try and create the basic tree-like structure which will be needed to represent our results. This is just our first blog post to try and get into the habit of keeping up with the progress that we make.