It was hard to grasp the idea why they are important and why we need them.
So to this great video of explanation i want to add visual representation of matrices in the lecture 27. Positive Definite Matrices and Minima
Let's first call all types of matrices:
- ⬆️⬇️ indefinite
- ⬆️⬆️ positive definite
- ⬆️➡️ positive semi definite
- ⬇️➡️ negative semi definite
- ⬇️⬇️ negative definite
Indefinite
There is no standard single notation for an indefinite matrix. It is usually described in words as "indefinite." If you would get random matrix, it would probably be indefinite and it is not usual.
There is no local maximum or minimum.
Example
Consider the matrix \( \mathbf{A} \): \[ \mathbf{A} = \begin{pmatrix} 1 & 2 \\ 2 & -3 \end{pmatrix} \]
and \[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \]
This matrix is indefinite because it can produce both positive and negative values for the quadratic form depending on the choice of \( \mathbf{x} \). The quadratic form for the matrix \( \mathbf{A} \) is given by: \[ \mathbf{x}^T \mathbf{A} \mathbf{x} = x_1^2 + 4x_1x_2 - 3x_2^2 \]
Positive Definite (PD)
\[ \mathbf{A} \succ 0 \]
There is local minimum.
Example
Consider the matrix \( \mathbf{A} \): \[ \mathbf{A} = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} \]
and \[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \]
This matrix is positive definite because all its eigenvalues are positive. The quadratic form for the matrix \( \mathbf{A} \) is given by: \[ \mathbf{x}^T \mathbf{A} \mathbf{x} = 2x_1^2 + 2x_2^2 \]
Positive Semi-Definite (PSD)
\[ \mathbf{A} \succeq 0 \]
There could be local minimum but there is saddle point in at least one axis.
Example
Consider the matrix \( \mathbf{A} \): \[ \mathbf{A} = \begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix} \]
and \[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \]
This matrix is positive semi-definite because it satisfies the condition \( \mathbf{x}^T \mathbf{A} \mathbf{x} \geq 0 \) for all vectors \( \mathbf{x} \).
The quadratic form for the matrix \( \mathbf{A} \) is given by: \[ \mathbf{x}^T \mathbf{A} \mathbf{x} = x_1^2 + 4x_1x_2 + 4x_2^2 \]
Negative Semi-Definite (NSD)
\[ \mathbf{A} \preceq 0 \]
There could be local maximum but there is saddle point in at least one axis.
Example
Consider the matrix \( \mathbf{A} \): \[ \mathbf{A} = \begin{pmatrix} -1 & -2 \\ -2 & -4 \end{pmatrix} \]
and \[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \]
This matrix is negative semi-definite because it satisfies the condition \( \mathbf{x}^T \mathbf{A} \mathbf{x} \leq 0 \) for all vectors \( \mathbf{x} \).
The quadratic form for the matrix \( \mathbf{A} \) is given by: \[ \mathbf{x}^T \mathbf{A} \mathbf{x} = -x_1^2 - 4x_1x_2 - 4x_2^2 \]
Negative Definite (ND)
\[ \mathbf{A} \prec 0 \]
There is local maximum.
Example
Consider the matrix \( \mathbf{A} \): \[ \mathbf{A} = \begin{pmatrix} -2 & -1 \\ -1 & -2 \end{pmatrix} \]
and \[ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} \]
This matrix is negative definite because it satisfies the condition \( \mathbf{x}^T \mathbf{A} \mathbf{x} < 0 \) for all non-zero vectors \( \mathbf{x} \).
The quadratic form for the matrix \( \mathbf{A} \) is given by: \[ \mathbf{x}^T \mathbf{A} \mathbf{x} = -2x_1^2 - 2x_1x_2 - 2x_2^2 \]
Wait but why?
Positive Semi-Definite (PSD) matrices are crucial in various applications, particularly in optimization and machine learning. A matrix is PSD if all its eigenvalues are non-negative, which has significant implications.
When dealing with Hessian matrices, which are second-order derivatives of a function, a PSD Hessian indicates a convex function. Convexity is essential because it ensures that any local minimum is also a global minimum, simplifying optimization tasks. This property guarantees that gradient descent methods will converge to the optimal solution.
In the context of quadratic functions, a PSD matrix ensures that the function has a minimum. Specifically, in the quadratic form ( x^T A x ), where ( A ) is a PSD matrix, the function is guaranteed to be convex, implying it has a minimum rather than a maximum or saddle point.
Moreover, in machine learning, covariance matrices must be PSD to make sense statistically. A non-PSD covariance matrix would imply negative variances, which are not feasible.
Thus, ensuring a matrix is PSD provides stability and guarantees optimal solutions in optimization problems, making it indispensable for various mathematical, statistical, and engineering applications.