本文共 2672 字,大约阅读时间需要 8 分钟。
Missing data can grocely be classified into three types:
First, several predictors of the variable with missing values are identified using a correlation matrix. The best predictors are selected and used as independent variables in a regression equation.
The variable with missing data is used as the dependent variable.
Second, cases with complete data for the predictor variables are used to generate the regression equation;
Third, the equation is then used to predict missing values for incomplete cases in an iterative process.
以上是单变量线性回归
Linear regression has signigicant limits like:
This is where multiple regression comes in. It is specifically designed to create regressions on models with a single dependent variable and multiple independent variables.
Equation for multiple regpression takes the form:
y = b 1 ∗ x 1 + b 2 ∗ x 2 + . . . + b n ∗ x n + a y=b_1*x_1+b_2*x_2+...+b_n*x_n+a y=b1∗x1+b2∗x2+...+bn∗xn+a b i b_i bi coefficients;x i x_i xi independent variables; also called predictor variables
y i y_i yi dependent vairables; also called criterion variable
a a a a constant stating the value of the depnedent variable;
Similarly to minimized the sum of squared errors to find B in the linear regression, we minimize the sum of squared errors to find all the B terms in multiple regression.
Exactly we use stochastic gradient descent(随机梯度下降).
Use the same r 2 r^2 r2 value that was used for linear regression.
r 2 r^2 r2 which is called the coefficient of determination, states the portion of change in the data set that is predicted by the model. It’s a value ranging from 0 to 1. With 0 stating that the model has no ability to predict the result and 1 stating that the model predicts the result perfectly.
转载地址:http://pjge.baihongyu.com/