**5. Linear Regression** ================================================ 5.1. What is Linear Regression? ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The linear regression is a supervised machine learning (ML) technique for modeling a dependent value(s) based on an independent value(s). This ML method is used for prediction and finding relationships between values. Simple linear regression is the case where the number of independent variables is 1, and there is a linear relationship between the independent (:math:`x`) and dependent (:math:`y`) variable. The Linear regression is of interest in several scientific and industrial fields. The purpose of using this method is to find a relationship between one independent variable and one or more dependent variable(s) using a straight line. If there is only one independent variable in the model, we call it **Simple Linear Regression** and if there are several independent variables, it is called **Multiple Linear Regression**. It should be noted that in the most of the real world problems, we are dealing with a lot of independent variables where the multiple linear regression model is applicable. 5.2. Mathematical Framework ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Lets say we have a dependent variable (:math:`y_i`) where it has a relationship with linear combination of different independent variables (x_{in}) where the :math:`i` and :math:`n` correspond to size of the data and number of the independent variables respectively. This relationship could be expressed as below: .. math:: :name: eq.500 y_i = \alpha_0 1 + \alpha_1 x_{i1} + \alpha_2 x_{i2} + ... +\alpha_n x_{in} The :ref:`Equation.99 ` could be rewritten in matrix format: .. math:: :name: eq.501 \vec{\boldsymbol{y_i}} = \boldsymbol{x} \vec{\boldsymbol{\alpha_j}} In the above equation, the parameters are expressed as follows: .. math:: :name: eq.502 \vec{\boldsymbol{y_i}} = \begin{bmatrix} y_1 \\ y_2 \\. \\ . \\ . \\ y_n \end{bmatrix} \quad & \quad \vec{\boldsymbol{\alpha_i}} = \begin{bmatrix} \alpha_1 \\ \alpha_2 \\. \\ . \\ . \\ \alpha_j \end{bmatrix} \quad & \quad \boldsymbol{x} = \begin{bmatrix} x_{11} & x_{12} & . & .& . & x_{1m} \\ x_{21} & x_{22} & . & .& . & x_{2m} \\ . & . & . & .& . & . \\ . & . & . & .& . & .\\ . & . & . & .& . & . \\ x_{n1} & x_{n2} & . & .& . & x_{nm} \end{bmatrix} Again, it should be noted that each :math:`y_i` corresponds to a specific data where the value of the :math:`y_i` depends to the linear combination of multiplication of the values of independent variables (:math:`x_{i1},.....,x_{im}`) by their corresponding :math:`\alpha` values (:math:`\alpha_{1},.....,\alpha_{m}`) 5.3. Analytical Solution ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The purpose of performing the linear regression is to find the values of the :math:`\alpha_{1},....,\alpha_{m}` so we could find the equation of a straight line which is best fitted to the data. In order to do that, we need to minimize the sum of the squares of the differences between the real value of the :math:`y_i` and the value predicted by the liner regression model which is usually referred to as **Ordinary Least Square** (**OLS**) method. The objective function is defined as: .. math:: :name: eq.503 F = \sum_{i=0}^n ((y_i - \sum_{j=0}^m x_{ij} \alpha_{j})^2) Where the :math:`n` and :math:`m` correspond to the number of the data and number of the independent variables respectively. The minimization of the above function results in a unique solution for the :math:`\alpha` : .. math:: :name: eq.504 \vec{\alpha} = (\boldsymbol{x}^{T}\boldsymbol{x})^{-1} \boldsymbol{x}^{T} \vec{y} The above solution could be used to calculate the unknown :math:`\alpha` values. Although this may look perfect for small problems where we have only few of independent variables but this approach has a prominent disadvantage which is calculation of the inverse of the matrix. It should be noted that when it comes to a problem where the size of the independent variables is large, the calculation of the inverse matrix is computationally expensive. 5.4. Gradient Descent ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To resolve this issue, we need to take an iterative approach in order to minimize the differences between the real values of :math:`y_i` and the predicted values using the linear regression model. This method is called **Gradient Descent** and we should minimize the **Mean Squared Error (MSE)** function as defined here: .. math:: :name: eq.505 MSE = \frac {1}{n}\sum_{i=0}^n (y_i - x_{i} \alpha_{i})^2 This could be done by taking a derivative from the above equation with respect to the :math:`\alpha`. .. math:: :name: eq.506 \frac{\partial{(MSE)}}{\partial{\alpha}} = \frac{2}{n} \sum_{i=0}^n (y_i - x_{i} \alpha_{i}) x_{i} .. note:: There are a couple of steps should be taken until reaching to the minimum of the MSE function by performing the Gradient Descent approach as follows: **1.** Initialize :math:`\alpha` with zeros **2.** Specifying a parameter which is called **Learning Rate** (:math:`\lambda`) for adjusting at which rate we are moving to find the extremum of the function **3.** Replacing the :math:`\alpha_i` values with: .. math:: :name: eq.507 \alpha_i = \alpha_i - \frac{2}{n} \lambda \sum_{i=0}^n (y_i - x_{i} \alpha) x_{i} **4.** Repeat steps 1-3 until the MSE value reaches to a reasonable minimum value 5.5. C++ Implementation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here we implement the Linear Regression model for a simple problem to show how it works. We consider a problem with only one independent variable (:math:`x`) and 3 input data including 1, 2 and 3 with the dependent variable (:math:`y`) of 6, 11 and 16 respectively. We implement the Linear_Regression model using both direct approach and iterative approach (Gradient Descent). In this case, the equation of the linear line is: :math:`y = \alpha_{0} + \alpha_{1} x`. The purpose is to find the :math:`\alpha_{0}` and :math:`\alpha_{1}`. After that, we will use the obtained :math:`alpha` values to predict the dependent value for a new input independent variable (i.e., 4 ) In order to perform the linear algebra operation (e.g., Calculation of inverse of the matrix) we use a C++ library (`Dlib `_) 5.5.1 Direct Solver """""""""""""""""""""""""""""""""""""""""" Here is the implementation of the Linear Regression using direct approach .. code-block:: c #include #include "/dlib/matrix.h" #include using namespace std; // Here we define 3 input data corresponding to the independent variable (1,2,3). // Each data should be stored in a vector starting with number 1 std::vector> x_values = { { 1, 1 }, { 1, 2 }, { 1, 3 } }; // Here we define 3 values corresponding to the dependent variable (6,11,16). std::vector target_values = { 6, 11, 16}; static constexpr uint8_t number_of_data = 3; static constexpr uint8_t dimensions = 2; dlib::matrix matrix_x; dlib::matrix matrix_y; class Linear_Regression { public: dlib::matrix return_alpha (std::vector> vec_x, std::vector vec_y){ for (int i = 0 ; i < number_of_data ; i++){ for (int j = 0 ; j < dimensions ; j++){ matrix_x(i,j) = vec_x[i][j]; } } for (int k = 0 ; k < number_of_data ; k++){ matrix_y(k,0) = vec_y[k]; } return inv(trans(matrix_x) * matrix_x) * trans(matrix_x) * matrix_y; }; }; float predict (dlib::matrix coefficients,std::vector new_inputs){ float new_predict; for (int p = 0 ; p < new_inputs.size(); p++){ new_predict += coefficients(p,0) * new_inputs[p]; } return new_predict; } int main(){ Linear_Regression lg; dlib::matrix alpha = lg.return_alpha(x_values,target_values); cout << "The calculated coefficients are: " << endl; for (int z = 0 ; z < alpha.size() ; z++){ cout << alpha(z,0) << endl; } // Here we define new independent variable (4) to predict the dependent variable. // This new value should be stored in a vector starting with number 1 std::vector new_values_to_predict = {1, 4}; cout << "The predicted value is: " << predict (alpha,new_values_to_predict ) << endl; return 0; } The output of the above code is: .. code-block:: c The calculated coefficients are: 1.00001 5 The predicted value is: 21 The above result are as we expect as the equation of this line based on the input data is :math:`y = 1 + 5x` and the predicted value is equal to 21. 5.5.2 Iterative Solver (Gradient Descent) """""""""""""""""""""""""""""""""""""""""" Here is the implementation of the Linear Regression using gradient descent approach (Iterative Solver): .. code-block:: c #include #include using namespace std; // Here we define 3 input data corresponding to the independent variable (1,2,3). // Each data should be stored in a vector starting with number 1 vector> x_values = { { 1, 1 }, { 1, 2 }, { 1, 3 } }; // Here we define 3 values corresponding to the dependent variable (6,11,16). vector target_values = { 6, 11, 16}; // Here we define the learning rate. float lambda = 0.001; // Here we define the number of iteration. int iterations = 50000; int number_of_data = 3; int dimensions = 2; //////////////////////////////////////////////////// vector> matrix_x; vector matrix_y_real; vector matrix_y_predict; //////////////////////////////////////////////////// float gamma; vector alpha = {0, 0}; vector new_values_to_predict = {1, 4}; float predict (vector coefficients,vector new_inputs){ float new_predict; for (int p = 0 ; p < new_inputs.size(); p++){ new_predict += coefficients[p] * new_inputs[p]; } return new_predict; } int main(){ for (int i = 0 ; i < number_of_data ; i++){ matrix_x.push_back(vector()); for (int j = 0 ; j < dimensions ; j++){ matrix_x[i].push_back(x_values[i][j]); } } for (int k = 0 ; k < number_of_data ; k++){ matrix_y_real.push_back(target_values[k]); matrix_y_predict.push_back(0); } //////////////////////////////////// for (int f = 0 ; f < iterations ; f++){ for (int m = 0 ; m < number_of_data ; m++){ float result = 0; for (int n = 0 ; n < dimensions ; n++){ result += alpha[n] * matrix_x[m][n]; } matrix_y_predict[m] = result; } for (int p = 0 ; p < dimensions; p++){ for (int q = 0 ; q < number_of_data ; q++){ gamma += (matrix_y_predict[q] - matrix_y_real[q]) * matrix_x[q][p]; } alpha [p] = alpha [p] - (2./number_of_data) * lambda * gamma; gamma = 0; } cout << "Iteration number: " << f << endl; cout << "alpha 1 is: " << alpha[0] << endl; cout << "alpha 2 is: " << alpha[1] << endl; } cout << "The predicted value is: " << predict (alpha,new_values_to_predict ) << endl; return 0; } The output of the above code is: .. code-block:: c Iteration number: 0 alpha 1 is: 0.022 alpha 2 is: 0.0506667 Iteration number: 1 alpha 1 is: 0.0437533 alpha 2 is: 0.100772 Iteration number: 2 alpha 1 is: 0.0652627 alpha 2 is: 0.150324 Iteration number: 3 alpha 1 is: 0.0865309 alpha 2 is: 0.199326 Iteration number: 4 alpha 1 is: 0.107561 alpha 2 is: 0.247786 Iteration number: 5 alpha 1 is: 0.128354 alpha 2 is: 0.29571 Iteration number: 6 alpha 1 is: 0.148915 alpha 2 is: 0.343103 Iteration number: 7 alpha 1 is: 0.169245 alpha 2 is: 0.389972 Iteration number: 8 alpha 1 is: 0.189346 alpha 2 is: 0.436322 Iteration number: 9 alpha 1 is: 0.209222 alpha 2 is: 0.482159 Iteration number: 10 alpha 1 is: 0.228875 alpha 2 is: 0.527489 . . . . Iteration number: 49999 alpha 1 is: 1.00001 alpha 2 is: 5 The predicted value is: 21 Which is again the same as the expected values. .. note:: :math:`\bullet` This should be noted again, the direct approach works seamlessly and it is recommended when the size (dimensions) of the independent variables is small. On the other hand, in case of dealing with large number of independent variables, this is recommended to use the iterative approach based on gradient descent method to avoid high computational cost of calculation of the inverse matrix. :math:`\bullet` It is cool to play with the value of the learning rate :math:`\lambda` and number of the iterations to see how altering these value affect the accuracy of prediction.