We say that a factor model \[\begin{aligned} \mathbf y_i &=\boldsymbol \Lambda \boldsymbol \eta_i + \boldsymbol \epsilon_i \ \ for \ i=1, ..., I. \\ where \ Cov(\boldsymbol \epsilon_i) &= \boldsymbol \Theta, \ Cov(\boldsymbol \eta_i) = \boldsymbol \Psi, \ \ \boldsymbol \epsilon_i \perp \boldsymbol \eta_i \end{aligned} \] with population covariance structure \[ \boldsymbol \Sigma_{yy}= \boldsymbol\Lambda\boldsymbol \Psi \boldsymbol\Lambda^{T} + \boldsymbol \Theta \]

is not identified if there are two or more distinct parameter points that can generate the same joint probability distribution. More conceretly, if we can find a matrix \(\textbf{T}\) not equal to \(\textbf{I}\), such that the two distinct set of parameters \(({\boldsymbol{\Lambda}},{\boldsymbol{\Phi}})\) and \(({\boldsymbol{\Lambda}}^\bigstar,{\boldsymbol{\Phi}}^\bigstar)\) can produce the same matrix decomposition

\[ \begin{align} {\boldsymbol{\Sigma}}_{yy} &= {\boldsymbol{\Lambda}}{\boldsymbol{\Phi}}{\boldsymbol{\Lambda}}^{\intercal}+{\boldsymbol{\Psi}}\\ &=({\boldsymbol{\Lambda}}\textbf{T}) ({\textbf{T}}^{-1}\bigl({\boldsymbol{\Phi}}\bigr){{\textbf{T}}^{\intercal^{-1}}})({\textbf{T}}^{\intercal}{\boldsymbol{\Lambda}}^{\intercal})+{\boldsymbol{\Psi}}\\ &={\boldsymbol{\Lambda}}^\bigstar{\boldsymbol{\Phi}}^\bigstar{\boldsymbol{\Lambda}}^{\bigstar^\intercal}+{\boldsymbol{\Psi}}\end{align} \]

then it follows then that the model is not identified. Conceptually, think of the latent variables \(\boldsymbol\eta\) as spanning a \(D\)-dimensional space, the latent space, and any combination of the traits that define a particular person \(i\) can be represented by a point residing in that space. However, since \(\boldsymbol\eta\) is latent, it has no frame of reference. To establish a frame, we need to specify a point of origin, a unit of measurement or scale, that is a set of \(D\) basis vectors that define the coordinates and the direction. These indeterminacies correspond to location, scale, rotation, and reflection invariance and to label switching if the basis vectors are not ordered.

Resolving the Invariance of Scale and Location

To specify a metric of measurement, we need to set the location and scale of the latent variables in the model. The location and scale of \(\boldsymbol\eta\) can be determined by setting its mean to zero and its variance to 1.

Now to show that Model is identified, we need to determine that the matrix decomposition of is unique. More specifically, the problem reduces to showing that the decomposition of \({\boldsymbol{\Lambda}}{\boldsymbol{\Phi}}{\boldsymbol{\Lambda}}^{\intercal}\) is unique.

Rotation Invariance

Parameters are said to be invariant under rotation when we can find an orthogonal rotation matrix \(\textbf{T}\), where \(\textbf{T}^{\intercal}\textbf{T}={\boldsymbol{\Phi}}=\mathbf{I}\), such that when we rotate \({\boldsymbol{\Lambda}}\) and \({\boldsymbol{\vartheta}}\), we obtain two rotated matrices \({\boldsymbol{\Lambda}}^{\ast}={\boldsymbol{\Lambda}}\textbf{T}^{\intercal}\) and \({\boldsymbol{\vartheta}}^{\ast}=\textbf{T}{\boldsymbol{\vartheta}}\) whose product gives us back the original matrices \[ \begin{align*} {\boldsymbol{\Lambda}}^{\ast}{\boldsymbol{\vartheta}}^{\ast}={\boldsymbol{\Lambda}}\textbf{T}^{\intercal}\textbf{T}{\boldsymbol{\vartheta}}={\boldsymbol{\Lambda}}{\boldsymbol{\vartheta}}\end{align*} \]

In factor analysis, there are two approaches to deal with rotational invariance. In Exploratory Factor Analysis (Thurstone 1947), we set \({\boldsymbol{\Phi}}=\textbf{I}\) to make \({\boldsymbol{\Lambda}}{\boldsymbol{\Lambda}}^{\intercal}\) identifiable and chose \({\boldsymbol{\Lambda}}\) such that it meets certain criteria for interpretability. For example, one criterion, the VARIMAX criterion (Kaiser 1958), finds a simple interpretable solution by rotating the factors so that each factor has a large number of loadings with values near zero and small number of loadings with large values. Note that, the resulting rotated matrix is not unique. For a \(D\)-dimensional model, there are still \(2^D \times D!\) matrices in its equivalence class. We should add that, that label switching (i.e., column permutations) and sign reversals are not considered a problem in the frequentist framework because inference can be restricted to one likelihood mode with the proper choice of initial values. Moreover, after applying a rotation criterion, the common practice is that the researcher usually flips the signs and orders the \(D\) columns of the orthogonally rotated loading matrix to obtain the most interpretable solution among the \(2^D \times D!\) possible matrices that satisfy the rotation criterion used. In a two-dimensional model, we have \(2^2\) possible column sign changes and \(2!\) column permutations, for a total 8 matrices in the equivalence class.

In the second approach, Confirmatory Factor Analysis (Joreskog 1973), the off-diagonal elements of \({\boldsymbol{\Phi}}\) are allowed to be unconstrained and the rotational invariance of \({\boldsymbol{\Lambda}}\) and \({\boldsymbol{\Phi}}\) is resolved by imposing theoretically informed constraints on \({\boldsymbol{\Lambda}}\). If no constraints are imposed on \({\boldsymbol{\Lambda}}\), then we can find a transformation matrix \(\textbf{T}\) with the constraint \(\operatorname{Diag}({\boldsymbol{\Phi}})=\operatorname{Diag}({\textbf{T}}^{-1}{\textbf{T}}^{\intercal^{-1}})=\mathbf{I}\) such that

\[ \begin{align*} {\boldsymbol{\Lambda}}{\boldsymbol{\Phi}}{\boldsymbol{\Lambda}}^{\intercal}=({\boldsymbol{\Lambda}}\textbf{T}) ({\textbf{T}}^{-1}{\boldsymbol{\Phi}}{\textbf{T}}^{\intercal^{-1}})({\textbf{T}}^{\intercal}{\boldsymbol{\Lambda}}^{\intercal}) \end{align*} \]

Note that although the transformation matrix is referred in the literature as an oblique rotation matrix (Browne 2001), it is not necessarily orthogonal and therefore not exactly a rotation matrix.

Identification Conditions

Anderson and Rubin (1956) suggested that model identification can be achieved by imposing at least \(D^2\) restrictions on \(({\boldsymbol{\Lambda}},{\boldsymbol{\Phi}})\). Howe (1955) stated three sufficient conditions for local uniqueness of \({\boldsymbol{\Lambda}}\) and recently Peeters (2012) added a fourth condition to insure global rotational uniqueness. To ensure uniqueness according to the four conditions, we need to

  1. Make \({\boldsymbol{\Phi}}\) a positive definite correlation matrix such that \(\operatorname{Diag}({\boldsymbol{\Phi}})= \mathbf{I}\)
  2. Fix at least \(D-1\) zeros in every column of \({\boldsymbol{\Lambda}}\)
  3. Check that the submatrix \(\mathbf{\Lambda_{d}}\) has rank \(D-1\), where \(\mathbf{\Lambda_{d}}\) is the matrix remaining after keeping the rows that have zero fixed entries in column \(d\), for all columns \(d=1,..., D\).
  4. Constrain one parameter in each column of \({\boldsymbol{\Lambda}}\) to take only positive or negative values.

Note that the first condition imposes \(D\) restrictions and the second condition imposes another\(D(D-1)\) constraints for a total of \(D^2\) constraints. Jointly, the first three conditions reduce the number of possible modes from infinitely many to just the \(2^D\) modes produced by sign reversals. The fourth condition further reduces that number to 1 so that the resulting parameter space is unimodal on the condition that the remaining parameters in the model are identified. Therefore, to identify our two-dimensional model, we need to fix one of the crossloadings to zero while making sure that the second condition is satisfied. At any rate, if we are not sure whether our specified constraints make the parameters locally identifiable, we can check for local identification algebraically by using the Wald Rank Rule that evaluates the rank of the Jacobian (Bekker 1989) or if we are conducting maximum likelihood based analysis, we can empirically assess local identification from the quality of the information matrix produced by the computer model fit.

The Problematic Consequences of Imposing Constraints

Although some parameter constraints can guarantee a unimodal likelihood, they can have the unintended consequence of impacting the shape of the likelihood. In particular, the constraints can induce local modes in the likelihood and make estimation problematic. For example, Millsap (2001) has shown that some common constraint choices can produce a discrimination or loading matrix that lies outside the true equivalence class. Moreover, in maximum likelihood estimation, fixing the \(\frac{D(D-1)}{2}\) elements to nonzero constants renders the large sample normal approximation inappropriate (Loken 2005). But that is not all. Another consequence is that since the constraints are placed on particular elements of the parameter matrix, both estimation and inference will depend on the ordering of the variables in data matrix. Sometimes however, restricted confirmatory model estimation is not so problematic. For example, if our data is very informative and we restrict the factors to be orthogonal and we constrain \(\frac{D(D-1)}{2}\) elements of \(({\boldsymbol{\Lambda}})\) to zero while observing the rank condition (Algina 1980), then the modes will most likely be separated, allowing us to obtain valid estimates of the standard errors of the loadings based on a unimodal normal approximation (Dolan and Molenaar 1991).


This webpage was created using the R Markdown authoring format and was generated using knitr dynamic report engine.

(C) 2016 Rick Farouni


References

Algina, James. 1980. “A Note on Identification in the Oblique and Orthogonal Factor Analysis Models.” Psychometrika 45: 393–96.

Anderson, Theodore W., and Herman Rubin. 1956. “Statistical Inference in Factor Analysis.” In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5:1.

Bekker, Paul A. 1989. “Identification in Restricted Factor Models and the Evaluation of Rank Conditions.” Journal of Econometrics 41.

Browne, Michael W. 2001. “An Overview of Analytic Rotation in Exploratory Factor Analysis.” Multivariate Behavioral Research 36: 111–50.

Dolan, Conor V., and Peter C.M. Molenaar. 1991. “A Comparison of Four Methods of Calculating Standard Errors of Maximum-Likelihood Estimates in the Analysis of Covariance Structure.” BMSP British Journal of Mathematical and Statistical Psychology, 359–68.

Howe, William Gerow. 1955. Some Contributions to Factor Analysis. Oak Ridge, Tenn.: Oak Ridge National Laboratory.

Joreskog, Karl G. 1973. A General Method for Estimating a Linear Structural Equation System. Uppsala.

Kaiser, Henry F. 1958. “The Varimax Criterion for Analytic Rotation in Factor Analysis.” Psychometrika 23: 187–200.

Loken, Eric. 2005. “Identification Constraints and Inference in Factor Models.” Structural Equation Modeling: A Multidisciplinary Journal 12: 232–44.

Millsap, Roger E. 2001. “When Trivial Constraints Are Not Trivial: The Choice of Uniqueness Constraints in Confirmatory Factor Analysis.” Structural Equation Modeling 8 (1): 1–17.

Peeters, Carel F. 2012. “Rotational Uniqueness Conditions Under Oblique Factor Correlation Metric.” Psychometrika 77: 288–92.

Thurstone, Louis L. 1947. “Multiple Factor Analysis.”