Definitions [3]
Define Regression.
The term ‘Regression’ was first coined and used in 1877 by Francis Galton while studying the relationship between the height of fathers and sons. The average height of children born of parents of a given height tended to move or “regress” toward the average height in the population as a whole. Gabon’s law of universal regression was confirmed by his friend Karl Pearson, who collected more than a thousand records of heights of members of family groups. The literal meaning of the word “regression” is “Stepping back towards the average”.
A statistical method used to predict the value of one variable based on another
Dependent Variable (Y)
Variable being predicted.
Independent Variable (X)
Variable used for prediction.
Regression Equations
A mathematical equation used for prediction.
Fitting Regression:
Finding the straight line that best represents the relationship between X and Y using the given sample data.
Scatter Diagram:
A graphical representation of paired data (X, Y).
Each pair is plotted as a point.
Formulae [3]
Best-fit line is the one that minimises the sum of squares of residuals:
\[S^2=\sum(y_i-\hat{y}_i)^2\]
Residual: \[e_i=y_i-\hat{y}_i\]
Y = a + bX
where b = bYX = regression coefficient of Y on X
\[b_{_{YX}}=\frac{\operatorname{cov}(X,Y)}{\operatorname{var}(X)}\]
\[=\frac{\frac{\sum\left(x_i-\overline{x}\right)\left(y_i-\overline{y}\right)}{n}}{\frac{\sum\left(x_i-\overline{x}\right)^2}{n}}\]
\[=\frac{\sum x_iy_i-n\bar{x}\bar{y}}{\sum x_i^2-n\bar{x}^2}\]
\[a=\overset{-}{\operatorname*{y}}-b\overset{-}{\operatorname*{x}}\]
\[X=a^{\prime}+b^{\prime}y\]
where b' = bXY = regression coefficient of X on Y
\[b_{_{XY}}\quad=\quad\frac{\operatorname{cov}(X,Y)}{\operatorname{var}(Y)}\]
\[\begin{array}
{cc} & \frac{\sum\left(x_i-\overline{x}\right)\left(y_i-\overline{y}\right)}{n} \\
= & \frac{\sum\left(y_i-\overline{y}\right)^2}{n}
\end{array}\]
\[b_{XY}=\frac{\sum x_iy_i-n\bar{x}\bar{y}}{\sum y_i^2-n\bar{y}^2}\]
\[\begin{array}
{rcl}a^{\prime} =\overline{x}-b^{\prime}\overline{y}
\end{array}\]
Key Points
Simple Linear Regression:
One independent variable.Multiple Linear Regression
Multiple Linear Regression:
Two or more independent variables
1.\[b_{_{XY}}.b_{_{YX}}=r^{2}\]
2. If bYX > 1, then bXY < 1.
3. \[\left|\frac{b_{yx}+b_{xy}}{2}\right|\geq|r|\]
4. Regression coefficients are independent of a change of origin but are affected by a change of scale.
Important Questions [26]
- Bxy and byx are _______.
- |bxy + byx | ≥ ______.
- For the following data, find the regression line of Y on X X 1 2 3 Y 2 1 6 Hence find the most likely value of y when x = 4.
- Following table shows the all India infant mortality rates (per '000) for years 1980 to 2010: Year 1980 1985 1990 1995 2000 2005 2010 IMR 10 7 5 4 3 1 0
- Find Graphical Solution for Following System of Linear Inequations : 3x + 2y ≤ 180; X+ 2y ≤ 120, X ≥ 0, Y ≥ 0 Hence Find Co-ordinates of Corner Points of the Common Region.
- Compute the Product Moment Coefficient of Correlation for the Following Data:
- Information on V:Ehicln Thousands) Passing Through Seven Different Highways During a Day (X) and Number of Accidents Reported (Y) is Given as Follows :
- For the following bivariate data obtain the equations of two regression lines:
- The equations given of the two regression lines are 2x + 3y - 6 = 0 and 5x + 7y - 12 = 0. Find: (a) Correlation coefficient (b) σxσy
- Identify the Regression Equations of X on Y and Y on X from the Following Equations : 2x + 3y = 6 and 5x + 7y – 12 = 0
- Find the Feasible Solution for the Following System of Linear Inequations: 0 ≤ X ≤ 3, 0 ≤ Y ≤ 3, X + Y ≤ 5, 2x + Y ≥ 4
- If σ = 56 σ = 56, σX_1^2 = 478, σY_1^2 = 476, σ = 469 and N = 7, Find (A) the Regression Equation of Y on X. (B) Y, If X = 12.
- BYX is ______.
- Find the equation of the line of regression of Y on X for the following data: n = 8, ∑(xi-x¯).(yi-y¯)=120,x¯=20,y¯=36,σx=2,σy=3
- For certain bivariate data on 5 pairs of observations given: ∑x = 20, ∑y = 20, ∑x2 = 90, ∑y2 = 90, ∑xy = 76 then bxy = ______.
- For a bivariate data x¯=10, y¯=12, V(X) = 9, σy = 4 and r = 0.6Estimate y when x = 5 Solution: Line of regression of Y on X is Y-y¯=□(X-x¯) ∴ Y - 12 = σσr.σyσx(X-10) ∴ Y - 12 = 0.6×4□(X-10)
- For 50 students of a class, the regression equation of marks in statistics (X) on the marks in accountancy (Y) is 3y − 5x + 180 = 0.
- You are given the following information about advertising expenditure and sales. Correlation coefficient between X and Y is 0.8
- For a bivariate data: ∑(x-x¯)2 = 1200, ∑(y-y¯)2 = 300, ∑(x-x¯)(y-y¯) = – 250 Find: byx bxy Correlation coefficient between x and y.
- The following results were obtained from records of age (X) and systolic blood pressure (Y) of a group of 10 men. X Y Mean 50 140 Variance 150 165 and ∑(xi-x¯)(yi-y¯)=1120. Find the prediction of
- The equations of two regression lines are 10x − 4y = 80 and 10y − 9x = − 40 Find: andx¯andy¯ bYXandbXYbYXandbXY If var (Y) = 36, obtain var (X) r
- BXY . bYX = ______.
- For bivariate data, the regression coefficient of Y on X is 0.4 and the regression coefficient of X on Y is 0.9. Find the value of the variance of Y if the variance of X is 9.
- For a bivariate data, x¯=53, y¯=28, byx = −1.5 and bxy = −0.2. Estimate y when x = 50.
- The two regression equations are 5x − 6y + 90 = 0 and 15x − 8y − 130 = 0. Find x¯,y¯, r.
- For bivariate data. x¯=53, y¯=28, byx = −1.2, bxy = −0.3. Find the correlation coefficient between x and y.
