**5. Linear Regression**
================================================

5.1. What is Linear Regression?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


The linear regression is a supervised machine learning (ML) technique for modeling a dependent value(s) based on an 
independent value(s).
This ML method is used for prediction and finding relationships between values. Simple linear regression is the case where 
the number of independent variables is 1, and
there is a linear relationship between the independent (:math:`x`) and dependent (:math:`y`) variable.
The Linear regression is of interest in several scientific and industrial fields.
The purpose of using this method is to find a relationship between one independent variable and one or 
more dependent variable(s) using a straight line. 
If there is only one independent variable in the model, we call it **Simple Linear Regression** and if 
there are several independent variables, it is called **Multiple Linear Regression**.
It should be noted that in the most of the real world problems, we are dealing with a lot of 
independent variables where the multiple linear regression model is applicable.

5.2. Mathematical Framework 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Lets say we have a dependent variable (:math:`y_i`) where it has a relationship with linear combination of different independent 
variables (x_{in}) where the :math:`i` and  :math:`n` correspond to size of the data and number of the independent variables 
respectively. This relationship could be expressed as below:

.. math:: 
  :name: eq.500

   y_i =  \alpha_0 1 + \alpha_1 x_{i1} + \alpha_2 x_{i2} + ... +\alpha_n x_{in}


The :ref:`Equation.99 <eq.500>` could be rewritten in matrix format: 

.. math:: 
  :name: eq.501

   \vec{\boldsymbol{y_i}} =  \boldsymbol{x} \vec{\boldsymbol{\alpha_j}} 

In the above equation, the parameters are expressed as follows:

.. math:: 
  :name: eq.502

   \vec{\boldsymbol{y_i}} =  \begin{bmatrix} y_1 \\ y_2 \\. \\ . \\ . \\ y_n \end{bmatrix} \quad & \quad
   \vec{\boldsymbol{\alpha_i}} =  \begin{bmatrix} \alpha_1 \\ \alpha_2 \\. \\ . \\ . \\ \alpha_j \end{bmatrix} \quad & \quad
   \boldsymbol{x} =  \begin{bmatrix} x_{11} & x_{12} & . & .& . & x_{1m}  \\ x_{21} & x_{22} & . & .& . & x_{2m} \\
    . & . & . & .& . & . \\ . & . & . & .& . & .\\ . & . & . & .& . & . \\  x_{n1} & x_{n2} & . & .& . & x_{nm}  \end{bmatrix} 


Again, it should be noted that each :math:`y_i` corresponds to a specific data where the value of the :math:`y_i` depends to 
the linear combination of multiplication of the values of independent variables (:math:`x_{i1},.....,x_{im}`) by their 
corresponding :math:`\alpha` values (:math:`\alpha_{1},.....,\alpha_{m}`)

5.3. Analytical Solution 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The purpose of performing the linear regression is to find the values of the :math:`\alpha_{1},....,\alpha_{m}` so we could 
find the equation of a straight line which is best fitted to the data.

In order to do that, we need to minimize the sum of the squares of the differences between the real value of 
the :math:`y_i` and the value predicted 
by the liner regression model which is usually referred to as **Ordinary Least Square** (**OLS**) method. 

The objective function is defined as:

.. math:: 
  :name: eq.503

   F = \sum_{i=0}^n ((y_i - \sum_{j=0}^m x_{ij} \alpha_{j})^2)

Where the :math:`n` and :math:`m` correspond to the number of the data and number of the independent variables respectively. 
The minimization of the above function results in a unique solution for the :math:`\alpha` : 

.. math:: 
  :name: eq.504

   \vec{\alpha} = (\boldsymbol{x}^{T}\boldsymbol{x})^{-1} \boldsymbol{x}^{T} \vec{y}

The above solution could be used to calculate the unknown :math:`\alpha` values. Although this may look
perfect for small problems where we have only few of independent variables but this approach
has a prominent disadvantage which is calculation of the inverse of the matrix. It should be noted that when it comes to a problem
where the size of the independent variables is large, the calculation of the inverse matrix is computationally expensive. 

5.4. Gradient Descent 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To resolve this issue, we need to take an iterative approach in order to minimize the differences between the real values of 
:math:`y_i` and the predicted values using the linear regression model. This method is called **Gradient Descent** and we should
minimize the **Mean Squared Error (MSE)** function as defined here:

.. math:: 
  :name: eq.505

   MSE = \frac {1}{n}\sum_{i=0}^n (y_i - x_{i} \alpha_{i})^2


This could be done by taking a derivative from the above equation with respect to the :math:`\alpha`.

.. math:: 
  :name: eq.506

   \frac{\partial{(MSE)}}{\partial{\alpha}} = \frac{2}{n} \sum_{i=0}^n (y_i - x_{i} \alpha_{i}) x_{i}


.. note:: 

   There are a couple of steps should be taken until reaching to the minimum of the MSE function by
   performing the Gradient Descent approach as follows:


**1.** Initialize :math:`\alpha` with zeros

**2.**  Specifying a parameter which is called **Learning Rate** (:math:`\lambda`) for adjusting at which rate 
we are moving to find the extremum of the function 

**3.** Replacing the :math:`\alpha_i` values with:

.. math:: 
  :name: eq.507

  \alpha_i = \alpha_i - \frac{2}{n} \lambda \sum_{i=0}^n (y_i - x_{i} \alpha) x_{i}


**4.** Repeat steps 1-3 until the MSE value reaches to a reasonable minimum value 


5.5. C++ Implementation 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Here we implement the Linear Regression model for a simple problem to show how it works. We consider a problem 
with only one independent variable (:math:`x`) and 3 input data including 1, 2 and 3 with the dependent variable (:math:`y`) of
6, 11 and 16 respectively. We implement the Linear_Regression model using both direct approach and 
iterative approach (Gradient Descent). In this case, the equation of the linear line is: :math:`y = \alpha_{0} + \alpha_{1} x`.
The purpose is to find the :math:`\alpha_{0}` and :math:`\alpha_{1}`. After that, we will use the obtained :math:`alpha` values
to predict the dependent value for a new input independent variable (i.e., 4 )
In order to perform the linear algebra operation (e.g., Calculation of inverse of the matrix) we use a C++ library (`Dlib <http://dlib.net/>`_)

5.5.1 Direct Solver 
""""""""""""""""""""""""""""""""""""""""""
Here is the implementation of the Linear Regression using direct approach 


.. code-block:: c

         #include <iostream>
         #include "/dlib/matrix.h"
         #include <vector>

         using namespace std;
         // Here we define 3 input data corresponding to the independent variable (1,2,3). 
         // Each data should be stored in a vector starting with number 1
         std::vector<std::vector<double>> x_values = { 
                     
                     { 1, 1 },
                     { 1, 2 },
                     { 1, 3 }

               };

         // Here we define 3 values corresponding to the dependent variable (6,11,16).         
         std::vector<double> target_values = { 6, 11, 16};

         static constexpr uint8_t number_of_data = 3;
         static constexpr uint8_t dimensions = 2;

         dlib::matrix<float, number_of_data, dimensions> matrix_x;
         dlib::matrix<float, number_of_data, 1> matrix_y;

         class Linear_Regression {

            public:
            dlib::matrix<float, dimensions, 1> return_alpha (std::vector<std::vector<double>> vec_x, std::vector<double> vec_y){

            for (int i = 0 ; i < number_of_data ; i++){
               for (int j = 0 ; j < dimensions ; j++){
                  matrix_x(i,j) = vec_x[i][j];
               }
            }
            for (int k = 0 ; k < number_of_data ; k++){
                  matrix_y(k,0) = vec_y[k];
               }
            
            return inv(trans(matrix_x) * matrix_x) * trans(matrix_x) * matrix_y;       

            };
         };
            
         float predict (dlib::matrix<float, dimensions, 1> coefficients,std::vector<double> new_inputs){
            
            float new_predict;
            for (int p = 0 ; p < new_inputs.size(); p++){
               new_predict += coefficients(p,0) * new_inputs[p];
            }
            return new_predict; 
         }
         int main(){

            Linear_Regression lg;

            dlib::matrix<float, dimensions, 1> alpha = lg.return_alpha(x_values,target_values); 

            cout << "The calculated coefficients are: " << endl;
            for (int z = 0 ; z < alpha.size() ; z++){

               cout << alpha(z,0) << endl;
               }
         // Here we define new independent variable (4) to predict the dependent variable. 
         // This new value should be stored in a vector starting with number 1
         std::vector<double> new_values_to_predict = {1, 4};

         cout << "The predicted value is: "  << predict (alpha,new_values_to_predict ) << endl;

               return 0;
         }

The output of the above code is: 

.. code-block:: c

         The calculated coefficients are: 
         1.00001
         5
         The predicted value is: 21

The above result are as we expect as the equation of this line based on the input data is :math:`y = 1 + 5x` and the predicted
value is equal to 21.  

5.5.2 Iterative Solver (Gradient Descent)
""""""""""""""""""""""""""""""""""""""""""
Here is the implementation of the Linear Regression using gradient descent approach (Iterative Solver):

.. code-block:: c

         #include <iostream>
         #include <vector>

         using namespace std;

         // Here we define 3 input data corresponding to the independent variable (1,2,3).
         // Each data should be stored in a vector starting with number 1
         vector<vector<double>> x_values = {

                     { 1, 1 },
                     { 1, 2 },
                     { 1, 3 }

               };

         // Here we define 3 values corresponding to the dependent variable (6,11,16).
         vector<double> target_values = { 6, 11, 16};

         // Here we define the learning rate.
         float lambda = 0.001;

         // Here we define the number of iteration.
         int iterations = 50000;

         int number_of_data = 3;
         int dimensions = 2;
         ////////////////////////////////////////////////////
         vector<vector<double>> matrix_x;

         vector<double> matrix_y_real;

         vector<double> matrix_y_predict;
         ////////////////////////////////////////////////////

         float gamma;
         vector<double> alpha = {0, 0};


         vector<double> new_values_to_predict = {1, 4};

         float predict (vector<double> coefficients,vector<double> new_inputs){

         float new_predict;
            for (int p = 0 ; p < new_inputs.size(); p++){
               new_predict += coefficients[p] * new_inputs[p];
            }
            return new_predict;
         }

         int main(){

         for (int i = 0 ; i < number_of_data ; i++){

            matrix_x.push_back(vector<double>());
            
            for (int j = 0 ; j < dimensions ; j++){

               matrix_x[i].push_back(x_values[i][j]);
            }
         }

         for (int k = 0 ; k < number_of_data ; k++){
               matrix_y_real.push_back(target_values[k]);
               matrix_y_predict.push_back(0);
         }

         ////////////////////////////////////

         for (int f = 0 ; f < iterations ; f++){


            for (int m = 0 ; m < number_of_data ; m++){

            float result = 0;
               for (int n = 0 ; n < dimensions ; n++){

               result += alpha[n] * matrix_x[m][n];
               }
               matrix_y_predict[m] = result;
            }


            for (int p = 0 ; p < dimensions; p++){


               for (int q = 0 ; q < number_of_data ; q++){


                     gamma +=  (matrix_y_predict[q] - matrix_y_real[q]) * matrix_x[q][p];

               }

               alpha [p] = alpha [p] - (2./number_of_data) * lambda * gamma;

               gamma = 0;

               }

            cout << "Iteration number: " << f << endl;
            cout << "alpha 1 is: " << alpha[0] << endl;
            cout << "alpha 2 is: " << alpha[1] << endl;

            }

            cout << "The predicted value is: " << predict (alpha,new_values_to_predict ) << endl;


               return 0;
         }

The output of the above code is: 

.. code-block:: c

      Iteration number: 0
      alpha 1 is: 0.022
      alpha 2 is: 0.0506667
      Iteration number: 1
      alpha 1 is: 0.0437533
      alpha 2 is: 0.100772
      Iteration number: 2
      alpha 1 is: 0.0652627
      alpha 2 is: 0.150324
      Iteration number: 3
      alpha 1 is: 0.0865309
      alpha 2 is: 0.199326
      Iteration number: 4
      alpha 1 is: 0.107561
      alpha 2 is: 0.247786
      Iteration number: 5
      alpha 1 is: 0.128354
      alpha 2 is: 0.29571
      Iteration number: 6
      alpha 1 is: 0.148915
      alpha 2 is: 0.343103
      Iteration number: 7
      alpha 1 is: 0.169245
      alpha 2 is: 0.389972
      Iteration number: 8
      alpha 1 is: 0.189346
      alpha 2 is: 0.436322
      Iteration number: 9
      alpha 1 is: 0.209222
      alpha 2 is: 0.482159
      Iteration number: 10
      alpha 1 is: 0.228875
      alpha 2 is: 0.527489
      .
      .
      .
      .
      Iteration number: 49999
      alpha 1 is: 1.00001
      alpha 2 is: 5
      The predicted value is: 21

Which is again the same as the expected values. 

.. note:: 

   :math:`\bullet`  This should be noted again, the direct approach works seamlessly and it is recommended when the size (dimensions) of the
   independent variables is small. On the other hand, in case of dealing with large number of independent variables, this is
   recommended to use the iterative approach based on gradient descent method to avoid high computational cost of calculation 
   of the inverse matrix.  

   :math:`\bullet`  It is cool to play with the value of the learning rate :math:`\lambda` and number of the iterations to 
   see how altering these value affect the accuracy of prediction.