Scraping and number crunching for a sentiment analysis website

Harrison Kinsley uses PythonAnywhere to generate the data he hosts at his sentiment analysis website, sentdex.com. We asked him a bit about how the site works, and how he uses PythonAnywhere.

Screenshot of the Sentdex front page

Editor's note, aka the shameless marketing bit: the Sentdex code runs on our $99/month Startup Plan.

What does sentdex.com do?

We do sentiment analysis for stocks/forex/bitcoin, politics, and global sentiment, plotted on a three.js [ed: WebGL-based in-browser 3D graphics] globe.

What first gave you the idea to build a site like this?

I have always been interested in financial markets. I got the idea to attempt to get a program to sift through all the news for public companies to generate a sentiment rating on them, which could help long term investors find price inconsistencies in the short term for buying undervalued companies, or even short term traders trade on the sentiment.

I had a developer at the time who I used to develop all my silly ideas, but he wasn't able to do this one, so I had to go at it alone. I somewhat felt like I was a pioneer in the idea that you could use computers to gauge sentiment. I had seen people do Twitter mentions.. but not much else. Now that I've been here for a while, I can see I am hardly alone. Nothing new under the sun, as the saying goes.

Who uses the site?

It started out purely for stocks, and since then I have moved to politics and global sentiment. About 40% of my viewers come for the stocks, another ~30% comes for the political or general sentiment, and then the other 30% comes for the tutorials I give on Python.

How busy is the site (hits/day etc)?

I'd say it is still pretty small. We're currently getting 500-700 visits a day during the week, with 1000-2000 pageviews, and more like 300 or so visits over the weekend (less stock pages getting viewed with markets closed). It varies a lot and has been growing pretty much every day. The website is fairly new, with doors opening about 6 months ago. We've started getting a lot more organic results, so that is nice! We also recently got listed on the Chrome Experiments website for our WebGL globe of general sentiment. That's been a huge help.

How do you use PythonAnywhere?

Currently, I run the crawl bot, stock prices, and some of the number crunching on PythonAnywhere. This powers the stocks, politics, and bitcoin aspects of the website almost entirely. I'd like to eventually run the entire back-end on PythonAnywhere, but the other aspects involve processing through 5-15+ million tweets a day, and then some daily larger number crunching. Those two appear to really kill the cpu time, but I am also trying to shift the loads around a bit to make things more efficient and streamline them more towards the scheduled tasks that can be set up.

What frameworks/Python modules are you using?

I ended up creating my own natural language processing module, and usually wind up writing a lot of things myself. Matplotlib and Numpy are the two major modules that I can't thank enough for visualization and number crunching.

Other than that, I use the heck out of urllib2, re, cookilib (for the exceedingly tricky websites that don't have love for bots :(... ).

Were there any interesting problems/challenges with getting it working?

I started out with very little serious programming skills. It all started when I decided that I wanted to make a computer read and understand text. That was probably my biggest challenge.

What's your background? How long have you been programming?

I am 23 years old, a university graduate, with no CS schooling, and getting married in a few months! I double majored in Philosophy and Criminal Justice. If I could change it, I would probably would have done Philosophy and Economics instead.

I am self taught at programming. I started programming by algorithmic trading virtual currencies. I made a living from ~12 to 21 in the virtual currency market, but I wanted to move on to other things. I was also heavily flipping domains at that time for side income. That is when I decided on the natural language processing idea, but I didn't know anything about that field.

Even though I was algo trading with a program I wrote myself, I still didn't even understand what a function was. It was basically 1 giant loop with a lot of if statements.

A friend pointed me towards the NLTK text book, which covered Natural Language Processing in Python. At the time, I was trying to figure out what language to even learn. The algo trading was done in Pascal, so... yeah... it was time to do something else! I tried Java and C... but I just couldn't learn it, mostly because I felt like I wasn't making any progress towards the goal I wanted to achieve.

Even though NLP wasn't the best or easiest thing to start on, I went through the book in a couple days, simply because that was exactly what I wanted to learn. It was really nice to have found a huge body of work aimed at NLP, while also being written in a way that a true noob could understand.


Got a PythonAnywhere story you'd like to share next month?

Drop us a line at support@pythonanywhere.com and we'll have a chat!

blog comments powered by Disqus

PythonAnywhere is a Python development and hosting environment that displays in your web browser and runs on our servers. They're already set up with everything you need. It's easy to use, fast, and powerful. There's even a useful free plan.

You can sign up here.