''' Expected return (moyenne classique) Variance Variance = (somme (i=1 à N) des (Xi – moyenneX)^2)/2 Standard deviation (racine de la variance) Covariance On pose N ou N-1 en fonction de la taille de l’échantillon : - Si l’échantillon est grand (>100) on pose N - Sinon, on pose N-1 Qx,y = 1/N-1 somme (i=1 à N) (xi – moyx) (yi – moyy) ATTENTION : Covxx = V(x) Empirical correlation Correlation = Normalized covariance rx,y = covariance / sqrtv(x) * sqrtv(y) The correlation takes values between -1 et 1 - Correlation close to 1: Strong positive relationship between the variables. - Correlation equal to 0: No linear relationship between the variables. - Correlation close to -1: Strong negative relationship between the variables. Linear regression yi = B1 + B2xi + ei Coefficient of determination Note on the adjusted coefficient of determination: 1. The coefficient penalizes the addition of extra predictors to the model. 2. The adjusted coefficient is smaller than the coefficient of determination It takes values between 0 and 1 - Coefficient close to 1: Strong relationship between the variables. - Coefficient close to 0: No linear relationship between the variables R^2 = 1 – (somme des ei)^2/ (somme des (yi – moyy))^2 OU In the case of a simple linear regression R^2 = rx,y^2 = empirical correlation ^2 OU In the case of a multiple linear regression The coefficient of determination is equal to the squared correlation coefficient with the fitted values R^2 = ry^,y r^2 = squared correlation coefficient = coefficient of determination Coefficient close to 1: Strong relationship between the variables. Car cela signifie que SSR est plus petit et donc qu’il n’y a pas d’explanatory variable. Coefficient close to 0: No linear relationship between the variables. SSR = sum of squared residuals The smaller SSR is, the best it is. Residuals = explanatory variable Two sided statistical test : - Z-statistics ZB2 = B2^ - B2/ sqrt(variance/(somme xi – moyx)^2) SE = sqrt(variance/(somme xi – moyx)^2) Choose a significant level alpha Find the Normal criterion for the significant level Reject the Null if : /ZB2/ > Zcrit Fluctuation interval and Zcrit : 60%: 0.84 80% : 1.28 90%: 1.64 95%: 1.96 99%: 2.58 - T-statistics Compute the standard error of the estimator SE= sqrt(variance/(somme xi – moyx)^2) Compute the t-statistics: T = B2^- B*/ SE(B2^) Choose a significant level Reject the Null if : /t/ > tcrit - F-statistics F-test requires the estimation of two models: unrestricted and restricted (SSRr – SSRu/ SSRu)*(N-K-1/K*) Interpret the correlation coefficient. Does it surprise you that the correlation co- efficient is negative? It is not surprising. One explanation could be that more police implies that criminals prefer committing crimes elsewhere. The regression equation Intercept (B1) B^1= moyy – B^2moyx Slope (B2) B^2 = somme (xi – moyx) (yi-moyy)/ somme (xi – moyx)^2 = covariance/ variance DONC B2 = Covariance/ Variance The intercept is the expected sales revenue when there is no advertising expense. On calcule l’intercept The revenue is expected to be equal to 1.5 million dollar on average when no advertising expenses are paid. The sales revenue is expected to increase by 2.2 millions when the advertising expense is increased by one million dollar. The expected sales revenue amounts to 8.1 millions for an expense of 3 millions. Hypothesis: 1. Linear regression: The variables are linearly related. 2. Strict exogeneity: Errors cannot be explained by the explanatory variable. This assumption holds if the errors are independent w.r.t. the explanatory var. 3. Homoskedasticity: The variance of the error term is constant. 4. Distribution: Error term is normally distributed. Properties of an estimator – OLS Ordinary least squares (OLS) estimator - Unbiased - Consistency - Efficiency Definitions A dummy variable is a variable which takes on the value 0 or 1. A categorical variable is a variable which takes a finite number d of possible values. An interaction variable is a variable that is the product of two (or more) variables. Multicollinearity is the presence of one explanatory variable that are (almost) perfectly correlated with one (or several) other explanatory variable(s). Several clues that indicate problems with multicollinearity: 1. An independent variable known to be an important predictor ends up being insignificant. 2. A parameter that should have a positive sign turns out to be negative, or vice versa. 3. When an independent variable is added or removed, there is a drastic change in the values of the remaining coefficients. '''