## Principal Component Analysis

### Introduction

In medical statistics, especially in clinical experiments, the observation results recorded by each observation object contain multiple response variables. For example, blood records include systolic blood pressure, diastolic blood pressure, pulse pressure, etc. Such data with multiple variables is called multivariate data. Principal component analysis is an analysis method in multivariate analysis methods. Common multivariate analysis methods also include multivariate analysis of variance (MANOVA), factor analysis, canonical correlation analysis, and cluster analysis, *etc*. Principal Component Analysis (PCA) is a statistical analysis method for mastering the main contradictions of things. It can analyze the main influencing factors from multiple things, reveal the essence of things, and simplify complex problems. The purpose of calculating principal components is to project high-dimensional data into a lower-dimensional space.

### The Relationship Between Principal Components and Original Variables

Principal Component Analysis (PCA) transforms the original variables into a linear combination (principal components) of the original variables, while preserving the main information, to achieve the purpose of simplification and dimensionality reduction. The relationship between the principal components and the original variables mainly includes:

- The principal component is a linear combination of the original variables.
- The number of principal components is less than the original number.
- The principal components retain most of the information of the original variables.
- The main components are independent of each other.

### Advantages of Principal Component Analysis

- The raw data is not required to be normally distributed. The principal component is to rotate the basis set in the direction of the largest degree of data dispersion, and this feature expands its application range.
- By synthesizing and simplifying the original variables, the weight of each indicator can be determined objectively, avoiding the arbitrariness of subjective judgment.

### Principal Component Analysis Process

Fig 1. Flow chart of principal component analysis.

- First perform correlation tests, such as KMO test and Bartlett's test, to determine whether the data is suitable for principal component analysis.
- Select initial variables, unify the dimensions, and standardize the data.
- Choose whether to use the covariance matrix or the correlation matrix to find the principal components according to the characteristics of the initial variables.
- Calculate the eigenvalues and eigenvectors of the covariance matrix or correlation matrix.
- Determine the number of principal components and extract the principal components.
- To explain the principal components, the significance of the principal components is determined by several indicators with larger weights in each linear combination.

