data0.py

Created on December 18, 2023
4.6 KB
'''
Expected return (moyenne classique)

Variance
Variance = (somme (i=1 à N) des (Xi – moyenneX)^2)/2

Standard deviation (racine de la variance)

Covariance
On pose N ou N-1 en fonction de la taille de l’échantillon :
-  Si l’échantillon est grand (>100) on pose N
-  Sinon, on pose N-1 
Qx,y = 1/N-1 somme (i=1 à N) (xi – moyx) (yi – moyy) 
ATTENTION : Covxx = V(x) 

Empirical correlation
Correlation = Normalized covariance 
rx,y = covariance / sqrtv(x) * sqrtv(y) 
The correlation takes values between -1 et 1
-  Correlation close to 1: Strong positive relationship between the variables.
-  Correlation equal to 0: No linear relationship between the variables.
-  Correlation close to -1: Strong negative relationship between the variables.

Linear regression 
yi = B1 + B2xi + ei 

Coefficient of determination 
Note on the adjusted coefficient of determination:
1. The coefficient penalizes the addition of extra predictors to the model.
2. The adjusted coefficient is smaller than the coefficient of determination

It takes values between 0 and 1 
-  Coefficient close to 1: Strong relationship between the variables.
-  Coefficient close to 0: No linear relationship between the variables
 
R^2 = 1 – (somme des ei)^2/ (somme des (yi – moyy))^2
OU 
In the case of a simple linear regression 
R^2 = rx,y^2 = empirical correlation ^2
OU 
In the case of a multiple linear regression 
The coefficient of determination is equal to the squared correlation coefficient with the fitted values
R^2 = ry^,y
  
r^2 = squared correlation coefficient = coefficient of determination 
Coefficient close to 1: Strong relationship between the variables. Car cela signifie que SSR est plus petit et donc qu’il n’y a pas d’explanatory variable. 
Coefficient close to 0: No linear relationship between the variables.

SSR = sum of squared residuals 
The smaller SSR is, the best it is. 
Residuals = explanatory variable 
 
Two sided statistical test : 
-  Z-statistics 
 
ZB2 = B2^ - B2/ sqrt(variance/(somme xi – moyx)^2) 
SE = sqrt(variance/(somme xi – moyx)^2) 
 Choose a significant level alpha 
 Find the Normal criterion for the significant level
 Reject the Null if : /ZB2/ > Zcrit 
Fluctuation interval and Zcrit : 
60%: 0.84
80% : 1.28
90%: 1.64
95%: 1.96
99%: 2.58

-  T-statistics 
 Compute the standard error of the estimator
 
SE= sqrt(variance/(somme xi – moyx)^2) 
 Compute the t-statistics:
 
T = B2^- B*/ SE(B2^) 
 Choose a significant level
 Reject the Null if : /t/ > tcrit 

-  F-statistics 
F-test requires the estimation of two models: unrestricted and restricted 
(SSRr – SSRu/ SSRu)*(N-K-1/K*) 

Interpret the correlation coefficient. Does it surprise you that the correlation co- efficient is negative?
It is not surprising. One explanation could be that more police implies that criminals prefer committing crimes elsewhere.

The regression equation
Intercept (B1) 
B^1= moyy – B^2moyx

Slope (B2) 
B^2 = somme (xi – moyx) (yi-moyy)/ somme (xi – moyx)^2  = covariance/ variance 
DONC B2 = Covariance/ Variance

The intercept is the expected sales revenue when there is no advertising expense.
On calcule l’intercept 
The revenue is expected to be equal to 1.5 million dollar on average when no advertising expenses are paid. 
The sales revenue is expected to increase by 2.2 millions when the advertising expense is increased by one million dollar.
The expected sales revenue amounts to 8.1 millions for an expense of 3 millions.

Hypothesis:
1. Linear regression: The variables are linearly related.
2. Strict exogeneity: Errors cannot be explained by the explanatory variable.
This assumption holds if the errors are independent w.r.t. the explanatory var.
3. Homoskedasticity: The variance of the error term is constant.
4. Distribution: Error term is normally distributed.

Properties of an estimator – OLS 
Ordinary least squares (OLS) estimator
-  Unbiased 
-  Consistency 
-  Efficiency 

Definitions 
A dummy variable is a variable which takes on the value 0 or 1.

A categorical variable is a variable which takes a finite number d of possible values.

An interaction variable is a variable that is the product of two (or more) variables.

Multicollinearity is the presence of one explanatory variable that are (almost)
perfectly correlated with one (or several) other explanatory variable(s).

Several clues that indicate problems with multicollinearity:
1. An independent variable known to be an important predictor ends up being insignificant.
2. A parameter that should have a positive sign turns out to be negative, or vice versa.
3. When an independent variable is added or removed, there is a drastic change in the values
of the remaining coefficients.
'''