|You Are What You Decide:|
A Journey in Automation of Our Selves.
OK. Python is not convenient for statistical data analysis, because you have to import a number of packages every time you want to do mathematical operations: NumPy, SciPy, Pandas, SkLearn, StatsModels, Sympy, etc. Follow this guide to create your own package of imports. I had started creating one, which suits me, -- the stt package, which you can use, after you install everything from http://ipython.org/. Once ready, just install
pip install -U stt, and then, whenever you need the statistical capabilities inside python, you can do
from stt import *, and it will import things often needed in statistical analysis. You can check out what does it exactly import by typing
from stt.verbose import *.
Anyway, once I have that installed, I can work with Python almost like with R.
So, open the IPython by the below command, and let's try:
ipython qtconsole --pylab inline
# imports from stt import * # read data df = pd.read_csv('http://www.quandl.com/api/v1/datasets/GOOG/NASDAQ_GOOG.csv?sort_order=asc', parse_dates=, index_col=)
df.head() df[['Open', 'Close']]['2013-01-01':'2013-01-10']
df.describe() s = df.describe().T; s
s['skew'] = df.skew() df['Id'] = range(df.shape)
model = smf.ols('Close ~ Id', df).fit() model.summary()
model.predict(df) df['EY'] = model.predict(df) df['Residuals'] = df['Close'] - df['EY']
df[['EY', 'Close']].plot() df['Residuals'].plot()
All in one plot:
rcParams['figure.figsize'] = 12, 5 fig, axes = plt.subplots(ncols=2) plot1 = df[['EY', 'Close']].plot(ax=axes, title='Regression Line') plot2 = df['Residuals'].plot(ax=axes, title='Residuals')