Definitions [5]
Of all curves approximating a given set of data points, the curve for which D12 + D22 +⋯+Dn2 is minimum is called the best fitting curve.
The statistical methods, which help us to estimate or predict the unknown value of one variable from the known value of the related variable is called regression.
When the best-fitting curve is a straight line, it is called a line of regression (or line of best fit) and the regression is said to be linear.
A line of regression is the straight line which gives the best fit in the least squares sense to the given set of data.
A scatter diagram is a graph with points plotted to show a relationship between two sets of data.
Formulae [5]
Regression coefficient of Y on X:
\[b_{yx}=\frac{n\sum xy-(\sum x)\left(\sum y\right)}{n\sum x^2-\left(\sum x\right)^2}\]
Regression coefficient of X on Y:
\[b_{xy}=\frac{n\sum xy-(\sum x)(\sum y)}{n\sum y^{2}-(\sum y)^{2}}\]
Normal equations:
Y on X:
\[\Sigma y=nc+m\Sigma x\]
\[\Sigma xy=_{C}\Sigma x+m\Sigma x^{2}\]
X on Y:
\[\Sigma x=nc+m\Sigma y\]
\[\Sigma xy=c\Sigma y+m\Sigma y^{2}\]
Regression coefficient of Y on X:
\[b_{yx}=\frac{\sum xy-n\overline{x}\overline{y}}{\sum x^{2}-n\overline{x}^{2}}\]
Regression coefficient of X on Y:
\[b_{xy}=\frac{\sum xy-n\overline{x} \overline{y}}{\sum y^{2}-n\overline{y}^{2}}\]
Regression coefficient of Y on X:
\[b_{yx}=\frac{\sum u\nu-\frac{\sum u.\sum\nu}{n}}{\sum u^2-\frac{\left(\sum u\right)^2}{n}}\]
Regression coefficient of X on Y:
\[b_{xy}=\frac{\sum u\nu-\frac{\sum u\cdot\sum\nu}{n}}{\sum\nu^{2}-\frac{\left(\sum\nu\right)^{2}}{n}}\]
\[\tan\theta=\frac{1-r^2}{r}\frac{\sigma_x\sigma_y}{\sigma_x^2+\sigma_y^2}\]
Key Points
| Type | Key idea |
|---|---|
| Strong positive | Points close, rising left → right |
| Weak positive | Scattered but upward trend |
| Weak negative | Scattered but downward |
| Strong negative | Close points, falling |
| No correlation | Random dots |
