The goal of this demo is to show how ipython notebooks can be used in conjunction with different datasources (eg: Quandl) and useful python libraries (eg: pandas) to do financial analysis. It will try to slowly introduce new and useful functions for the new python user.

Since oil-equity corr has been all the talk these days (this demo was written in Jan 2016), let's take a look at it!

# PythonAnywhere comes pre-installed with Quandl, so you just need to import it
import Quandl

# first, go to quandl.com and search for the ticker symbol that you want
# let's say we want to look at (continuous) front month crude vs e-mini S&Ps

cl = Quandl.get('CHRIS/CME_CL1')
es = Quandl.get('CHRIS/CME_ES1')

# Quandl.get() returns a pandas dataframe, so you can use all the pandas goodies
# For example, you can use tail to look at the most recent data, just like the unix tail binary!
es.tail()

# you can also get statistics
es.describe()

But wait!

What do we have here? Did you notice that the count is different for the different columns?

Let's take a look at what the missing values are:

# select the rows where Open has missing data points
es[es['Open'].isnull()].head()

Hmmm. Time to spend money and buy good data?

Eh. We really only need the daily close here anyways (ie. the settle column). Let's zoom in on that.

es_close = es.Settle  # WHAT IS THIS SORCERY? Attribute access!
es_close.head()

Date
1997-09-09    934
1997-09-10    915
1997-09-11    908
1997-09-12    924
1997-09-15    922
Name: Settle, dtype: float64

print(type(es))
print(type(es_close))

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

Oh ok. A column of a DataFrame is a Series (also a pandas object).

Note that it is still linked to the DataFrame (ie. changing the Series will change the DataFrame as well)

# Okay- time to quickly check the crude time series as well
cl.describe()

# Hmm. That's a lot more counts. Does the crude time series start earlier than e-mini's?
cl.head()

earliest_es_date = es.index[0]

# at first glance, you could just do
cl[earliest_es_date:].head()

# but just in case there is no matching precise date, we can also take the closest date:
closest_row = cl.index.searchsorted(earliest_es_date)
cl_close = cl.iloc[closest_row:].Settle
cl_close.head()

Date
1997-09-09    19.42
1997-09-10    19.42
1997-09-11    19.37
1997-09-12    19.32
1997-09-15    19.27
Name: Settle, dtype: float64

# ok lets just plot this guy
import matplotlib
import matplotlib.pyplot as plt
# use new pretty plots
matplotlib.style.use('ggplot')
# get ipython notebook to show graphs
%pylab inline

es_close.plot()

Populating the interactive namespace from numpy and matplotlib

<matplotlib.axes._subplots.AxesSubplot at 0x7fe1ef10cf28>

That was satisfying- our all too familar S&P chart. Let's try to plot both S&P and oil in the same graph.

plt.figure()
es_close.plot()
cl_close.plot()
plt.yscale('log')

Meh. Okay. LET's ACTUALLY DO SOME MATH!

(... ahem. stats)

es['Settle'].corr(cl['Settle'])

0.34046181767351186

okay... MOAR GRAPHS! I hear you say

import pandas as pd
pd.rolling_corr(es_close, cl_close, window=252).dropna()
# why 252? because that's the number of trading days in a year

Date
2014-11-21   -0.369646
2014-11-24   -0.386641
2014-11-25   -0.404232
2014-11-26   -0.421947
2014-11-28   -0.441608
2014-12-01   -0.457250
2014-12-02   -0.473815
2014-12-03   -0.488969
2014-12-04   -0.502630
2014-12-05   -0.516329
2014-12-08   -0.526947
2014-12-09   -0.536849
2014-12-10   -0.541803
2014-12-11   -0.548337
2014-12-12   -0.549632
2014-12-15   -0.549979
2014-12-16   -0.546815
2014-12-17   -0.551038
2014-12-18   -0.560348
2014-12-19   -0.569132
2014-12-22   -0.577833
2014-12-23   -0.586649
2014-12-24   -0.594898
2014-12-26   -0.603125
2014-12-29   -0.610925
2014-12-30   -0.617783
2014-12-31   -0.622196
2015-01-02   -0.626952
2015-01-05   -0.628422
2015-01-06   -0.627138
                ...   
2016-01-26    0.613887
2016-01-27    0.620853
2016-01-28    0.626633
2016-01-29    0.630797
2016-02-01    0.636777
2016-02-02    0.644177
2016-02-03    0.650006
2016-02-04    0.655308
2016-02-05    0.661402
2016-02-08    0.668422
2016-02-09    0.676663
2016-02-10    0.684008
2016-02-11    0.692094
2016-02-12    0.697270
2016-02-16    0.701433
2016-02-17    0.703887
2016-02-18    0.706600
2016-02-19    0.709578
2016-02-22    0.711551
2016-02-23    0.714437
2016-02-24    0.717001
2016-02-25    0.718281
2016-02-26    0.720613
2016-02-29    0.722511
2016-03-01    0.723214
2016-03-02    0.723336
2016-03-03    0.722921
2016-03-04    0.722644
2016-03-07    0.722566
2016-03-08    0.722830
Name: Settle, dtype: float64

That's weird. You'd expect the first year to drop out (because the rolling correlation window starts after the first year), but it should have started after Sept 1998. Instead it is starting in 2014...

print(len(cl_close))
print(len(es_close))

4646
4738

merged = pd.concat({'es': es_close, 'cl': cl_close}, axis=1)
# maybe this is the culprit?
merged[merged['cl'].isnull()].head()

merged.dropna(how='any', inplace=True)
# BAD DATA BEGONE!
merged[merged['cl'].isnull()]

pd.rolling_corr(merged.es, merged.cl, window=252).dropna().plot()
plt.axhline(0, color='k')

<matplotlib.lines.Line2D at 0x7fe1e2e5fef0>

Brilliant! But this is still quite inconclusive in terms of equity/crude corr. Why? Well we are forgetting about one HUGE HUGE factor affecting correlation here.

# D'oh
import numpy as np
print('Autocorrelation for a random series is {:.3f}'.format(
    pd.Series(np.random.randn(100000)).autocorr())
)
print('But, autocorrelation for S&P is {:3f}'.format(es_close.autocorr()))

Autocorrelation for a random series is -0.003
But, autocorrelation for S&P is 0.998803

So that's why we should look at %-change instead of $-close or $-change...

daily_returns = merged.pct_change()
rolling_correlation = pd.rolling_corr(daily_returns.es, daily_returns.cl, window=252).dropna()
rolling_correlation.plot()
plt.axhline(0, color='k')
title('Rolling 1 yr correlation between Oil and S&P')

<matplotlib.text.Text at 0x7fe1e2c89ba8>

Great. Now this is much more interesting. It is quite clear that the period of higher correlation in oil prices came after 2009. Qualitatively, we know (if you worked in finance back then) that this was the case: previously, extreme high oil prices (over $100/bbl) were seen as a drag on the economy. Nowadays, extreme low oil prices are seen as an indication of weakness in global demand, with oil prices, equity, credit etc all selling off hand in hand when there is risk off sentiment.

Let's plot some pretty graphs to show what we know qualitatively, and make sure our memory was correct.

# vertically split into two subplots, and align x-axis
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
fig.suptitle('Checking our intuition about correlation', fontsize=14, fontweight='bold')
# make space for the title
fig.subplots_adjust(top=0.85)

rolling_correlation.plot(ax=ax1)
ax1.set_title('Rolling correlation of WTI returns vs S&P returns')
ax1.axhline(0, color='k')
ax1.tick_params(
    which='both',  # both major and minor ticks
    bottom='off', top='off', right='off',
    labelbottom='off'  # labels along the bottom edge are off
)

cl_close.plot(ax=ax2)
ax2.set_title('Price of front month WTI crude')
ax2.tick_params(which='both', top='off', right='off')
ax2.tick_params(which='minor', bottom='off')
ax2.yaxis.set_major_locator(MaxNLocator(5)) # how many ticks

Alright, fine. So we can distinctly see the regime change starting from the European debt crisis, when oil came back down from $150/bbl. Traders no longer saw high oil prices as a drag on the economy, and instead focused on their intention on global demand instead as we entered a period of slow growth.

Also, all the recent talk about equity oil correlation, we have actually seen higher correlations in the 2011-2013 period.

So this is an interesting observation. But as data scientists, we must test this hypothesis! If the cause of this recent spike in equity/crude corr is really driven by risk off sentiment, let's see if there is also much stronger cross asset correlation in other risk assets. Stay tuned for the next part of this series!

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
Date
2016-03-02	1976.25	1984.75	1966.25	1982.25	5.5	1983.5	1814091	3008297
2016-03-03	1981.75	1992.50	1974.75	1991.50	7.0	1990.5	1541249	3008594
2016-03-04	1991.00	2007.50	1984.00	1994.75	4.5	1995.0	2232860	3018684
2016-03-07	1994.50	2004.50	1984.50	1999.75	4.0	1999.0	1623905	3012243
2016-03-08	1999.00	2000.25	1976.00	1982.50	18.0	1981.0	1928239	3005808

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
count	4723.000000	4736.000000	4738.000000	4738.000000	512.000000	4738.000000	4738.000000	4738.000000
mean	1313.871427	1325.847867	1305.396581	1316.367885	13.185547	1316.353480	1076569.155129	1451989.689743
std	310.180613	312.158070	311.821112	312.201487	12.239507	312.167612	956268.606849	1159894.330244
min	674.750000	694.750000	665.750000	676.000000	0.250000	676.000000	0.000000	0.000000
25%	1109.250000	1117.000000	1102.500000	1109.750000	4.000000	1109.750000	182620.250000	193737.500000
50%	1268.500000	1277.875000	1259.500000	1269.500000	9.500000	1269.500000	857774.500000	1351206.000000
75%	1437.000000	1449.500000	1429.000000	1438.687500	19.562500	1438.687500	1728705.250000	2678325.750000
max	2129.250000	2134.000000	2122.750000	2128.750000	100.250000	2128.000000	6285917.000000	3594453.000000

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
Date
2015-11-17	NaN	2063.50	2041.50	2049.75	1.00	2049.0	1610071	2803541
2015-11-27	NaN	2098.25	2081.50	2090.50	2.00	2090.0	653079	2761335
2015-12-01	NaN	2101.50	2083.50	2099.25	20.25	2100.0	1479676	2764688
2015-12-02	NaN	2105.00	2075.00	2083.50	18.50	2081.5	1709808	2759024
2015-12-09	NaN	2079.75	2034.25	2045.25	16.75	2042.0	2660114	2624311

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
count	8273.000000	8275.000000	8275.000000	8275.000000	518.000000	8275.000000	8275.000000	8274.000000
mean	41.777084	42.333422	41.186375	41.777544	1.025753	41.777361	112520.504411	125494.688301
std	29.568643	29.946338	29.134537	29.559547	0.871786	29.559366	123411.769025	110104.481053
min	10.000000	11.020000	9.750000	10.420000	0.010000	10.420000	0.000000	0.000000
25%	19.600000	19.790000	19.400000	19.610000	0.400000	19.610000	30618.000000	46379.500000
50%	28.300000	28.650000	27.970000	28.320000	0.810000	28.320000	58659.000000	87553.000000
75%	61.290000	62.150000	60.485000	61.365000	1.470000	61.365000	166439.000000	176041.500000
max	145.190000	147.270000	143.220000	145.290000	7.540000	145.290000	824242.000000	608831.000000

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
Date
1983-03-30	29.01	29.56	29.01	29.40	NaN	29.40	949	470
1983-03-31	29.40	29.60	29.25	29.29	NaN	29.29	521	523
1983-04-04	29.30	29.70	29.29	29.44	NaN	29.44	156	583
1983-04-05	29.50	29.80	29.50	29.71	NaN	29.71	175	623
1983-04-06	29.90	29.92	29.65	29.90	NaN	29.90	392	640

	Open	High	Low	Last	Change	Settle	Volume	Open Interest
Date
1997-09-09	19.43	19.61	19.37	19.42	NaN	19.42	32299	88070
1997-09-10	19.57	19.57	19.35	19.42	NaN	19.42	41858	86872
1997-09-11	19.49	19.72	19.30	19.37	NaN	19.37	52342	80434
1997-09-12	19.42	19.47	19.27	19.32	NaN	19.32	28540	80440
1997-09-15	19.29	19.38	19.23	19.27	NaN	19.27	31610	76590

	cl	es
Date
1997-11-27	NaN	959.50
1997-11-28	NaN	955.00
1998-01-19	NaN	972.25
1998-02-16	NaN	1019.00
1998-05-25	NaN	1116.50

Jupyter notebooks finance demo