The goal of this demo is to show how ipython notebooks can be used in conjunction with different datasources (eg: Quandl) and useful python libraries (eg: pandas) to do financial analysis. It will try to slowly introduce new and useful functions for the new python user.
Since oil-equity corr has been all the talk these days (this demo was written in Jan 2016), let's take a look at it!
# PythonAnywhere comes pre-installed with Quandl, so you just need to import it
import Quandl
# first, go to quandl.com and search for the ticker symbol that you want
# let's say we want to look at (continuous) front month crude vs e-mini S&Ps
cl = Quandl.get('CHRIS/CME_CL1')
es = Quandl.get('CHRIS/CME_ES1')
# Quandl.get() returns a pandas dataframe, so you can use all the pandas goodies
# For example, you can use tail to look at the most recent data, just like the unix tail binary!
es.tail()
# you can also get statistics
es.describe()
But wait!
What do we have here? Did you notice that the count is different for the different columns?
Let's take a look at what the missing values are:
# select the rows where Open has missing data points
es[es['Open'].isnull()].head()
Hmmm. Time to spend money and buy good data?
Eh. We really only need the daily close here anyways (ie. the settle column). Let's zoom in on that.
es_close = es.Settle # WHAT IS THIS SORCERY? Attribute access!
es_close.head()
print(type(es))
print(type(es_close))
Oh ok. A column of a DataFrame is a Series (also a pandas object).
Note that it is still linked to the DataFrame (ie. changing the Series will change the DataFrame as well)
# Okay- time to quickly check the crude time series as well
cl.describe()
# Hmm. That's a lot more counts. Does the crude time series start earlier than e-mini's?
cl.head()
earliest_es_date = es.index[0]
# at first glance, you could just do
cl[earliest_es_date:].head()
# but just in case there is no matching precise date, we can also take the closest date:
closest_row = cl.index.searchsorted(earliest_es_date)
cl_close = cl.iloc[closest_row:].Settle
cl_close.head()
# ok lets just plot this guy
import matplotlib
import matplotlib.pyplot as plt
# use new pretty plots
matplotlib.style.use('ggplot')
# get ipython notebook to show graphs
%pylab inline
es_close.plot()
That was satisfying- our all too familar S&P chart. Let's try to plot both S&P and oil in the same graph.
plt.figure()
es_close.plot()
cl_close.plot()
plt.yscale('log')
Meh. Okay. LET's ACTUALLY DO SOME MATH!
(... ahem. stats)
es['Settle'].corr(cl['Settle'])
okay... MOAR GRAPHS! I hear you say
import pandas as pd
pd.rolling_corr(es_close, cl_close, window=252).dropna()
# why 252? because that's the number of trading days in a year
That's weird. You'd expect the first year to drop out (because the rolling correlation window starts after the first year), but it should have started after Sept 1998. Instead it is starting in 2014...
print(len(cl_close))
print(len(es_close))
merged = pd.concat({'es': es_close, 'cl': cl_close}, axis=1)
# maybe this is the culprit?
merged[merged['cl'].isnull()].head()
merged.dropna(how='any', inplace=True)
# BAD DATA BEGONE!
merged[merged['cl'].isnull()]
pd.rolling_corr(merged.es, merged.cl, window=252).dropna().plot()
plt.axhline(0, color='k')
Brilliant! But this is still quite inconclusive in terms of equity/crude corr. Why? Well we are forgetting about one HUGE HUGE factor affecting correlation here.
# D'oh
import numpy as np
print('Autocorrelation for a random series is {:.3f}'.format(
pd.Series(np.random.randn(100000)).autocorr())
)
print('But, autocorrelation for S&P is {:3f}'.format(es_close.autocorr()))
So that's why we should look at %-change
instead of $-close
or $-change
...
daily_returns = merged.pct_change()
rolling_correlation = pd.rolling_corr(daily_returns.es, daily_returns.cl, window=252).dropna()
rolling_correlation.plot()
plt.axhline(0, color='k')
title('Rolling 1 yr correlation between Oil and S&P')
Great. Now this is much more interesting. It is quite clear that the period of higher correlation in oil prices came after 2009. Qualitatively, we know (if you worked in finance back then) that this was the case: previously, extreme high oil prices (over $100/bbl) were seen as a drag on the economy. Nowadays, extreme low oil prices are seen as an indication of weakness in global demand, with oil prices, equity, credit etc all selling off hand in hand when there is risk off sentiment.
Let's plot some pretty graphs to show what we know qualitatively, and make sure our memory was correct.
# vertically split into two subplots, and align x-axis
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
fig.suptitle('Checking our intuition about correlation', fontsize=14, fontweight='bold')
# make space for the title
fig.subplots_adjust(top=0.85)
rolling_correlation.plot(ax=ax1)
ax1.set_title('Rolling correlation of WTI returns vs S&P returns')
ax1.axhline(0, color='k')
ax1.tick_params(
which='both', # both major and minor ticks
bottom='off', top='off', right='off',
labelbottom='off' # labels along the bottom edge are off
)
cl_close.plot(ax=ax2)
ax2.set_title('Price of front month WTI crude')
ax2.tick_params(which='both', top='off', right='off')
ax2.tick_params(which='minor', bottom='off')
ax2.yaxis.set_major_locator(MaxNLocator(5)) # how many ticks
Alright, fine. So we can distinctly see the regime change starting from the European debt crisis, when oil came back down from $150/bbl. Traders no longer saw high oil prices as a drag on the economy, and instead focused on their intention on global demand instead as we entered a period of slow growth.
Also, all the recent talk about equity oil correlation, we have actually seen higher correlations in the 2011-2013 period.
So this is an interesting observation. But as data scientists, we must test this hypothesis! If the cause of this recent spike in equity/crude corr is really driven by risk off sentiment, let's see if there is also much stronger cross asset correlation in other risk assets. Stay tuned for the next part of this series!