十、假设检验

大约 1 分钟

import pandas as pd
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
% matplotlib inline
# import microsoft.csv, and add a new feature - logreturn
ms = pd.read_csv('data/microsoft.csv',index_col=0)
ms.index=pd.to_datetime(ms.index)
ms['logReturn'] = np.log(ms['Close'].shift(-1)) - np.log(ms['Close'])
# Log return goes up and down during the period
ms['logReturn'].plot(figsize=(20, 8))
plt.axhline(0, color='red')
plt.show()

Steps involved in testing a claim by hypothesis testing

Step 1: Set hypothesis

H0 means the average stock return is 0 H1 means the average stock return is not equal to 0

Step 2: Calculate test statistic

sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)
n = ms['logReturn'].shape[0]

# if sample size n is large enough, we can use z-distribution, instead of t-distribtuion
# mu = 0 under the null hypothesis
zhat = (sample_mean - 0)/(sample_std/n**0.5)
print(zhat)

1.6141477140003675

Step 3: Set desicion criteria

# confidence level
alpha = 0.05

zleft = norm.ppf(alpha/2, 0, 1)
zright = -zleft  # z-distribution is symmetric 
print(zleft, zright)

-1.95996398454 1.95996398454

Step 4: Make decision - shall we reject H0?

print('At significant level of {}, shall we reject: {}'.format(alpha, zhat>zright or zhat<zleft))

At significant level of 0.05, shall we reject: False

Try one tail test by yourself !

# step 2
sample_mean = ms['logReturn'].mean()
sample_std = ms['logReturn'].std(ddof=1)
n = ms['logReturn'].shape[0]

# if sample size n is large enough, we can use z-distribution, instead of t-distribtuion
# mu = 0 under the null hypothesis
zhat = (sample_mean - 0)/(sample_std/n**0.5)
print(zhat)

Expected output: 1.6141477140003675

# step 3
alpha = 0.05

zright = norm.ppf(1-alpha, 0, 1)
print(zright)

Expected output: 1.64485362695

# step 4
print('At significant level of {}, shall we reject: {}'.format(alpha, zhat>zright))

Expected output: At significant level of 0.05, shall we reject: False

An alternative method: p-value

# step 3 (p-value)
p = 1 - norm.cdf(zhat, 0, 1)
print(p)

0.053247694997

# step 4
print('At significant level of {}, shall we reject: {}'.format(alpha, p < alpha))

At significant level of 0.05, shall we reject: False