Simulate Data that Conforms to the theory of Exploratory Factor Analysis.

This function is used to simulate data that conforms to the theory of exploratory factor analysis, with a high degree of customization for the variables involved.

EFAsim.data(
  nfact,
  vpf,
  N = 500,
  distri = "normal",
  fc = "R",
  pl = "R",
  cl = "R",
  low.vpf = 5,
  up.vpf = 15,
  a = NULL,
  b = NULL,
  vis = TRUE
)

Arguments

nfact: A numeric value specifying the number of factors to simulate.
vpf: A numeric or character value specifying the number of items under each factor. If a numeric value is provided, the numeric must be larger than 2, and the number of items under each factor will be fixed to this value. If a character value is provided, it must be one of 'S', 'M', 'L', or 'R'. These represent random selection of items under each factor from $U(5, 10)$, $U(5, 15)$, $U(5, 20)$, or $U(low.vpf up.vpf)$, respectively.
N: A numeric value specifying the number of examinees to simulate.
distri: A character, either 'normal' or 'beta', indicating whether the simulated data will follow a standard multivariate normal distribution or a multivariate beta distribution.
fc: A numeric or character value specifying the degree of correlation between factors. If a numeric value is provided, it must be within the range of 0 to 0.75, and the correlation between all factors will be fixed at this value. If a character value is provided, it must be 'R', and the correlations between factors will be randomly selected from $U(0.0, 0.5)$.
pl: A numeric or character value specifying the size of the primary factor loadings. If a numeric value is provided, it must be within the range of 0 to 1, and all primary factor loadings in the loading matrix will be fixed at this value. If a character value is provided, it must be one of 'L', 'M', 'H', or 'R', representing $pl~U(0.35, 0.50)$, $pl~U(0.50, 0.65)$, $pl~U(0.65, 0.80)$, or $pl~U(0.35, 0.80)$, respectively, consistent with the settings in Goretzko & Buhner (2020).
cl: A numeric or character value specifying the size of cross-loadings. If a numeric value is provided, it must be within the range of 0 to 0.5, and all cross-loadings in the loading matrix will be fixed at this value. If a character value is provided, it must be one of 'L', 'H', 'None', or 'R', representing $cl~U(-0.1, 0.1)$, $cl~U(-0.2, -0.1) \cup U(0.1, 0.2)$, $cl = 0$, or $cl~U(-0.2, 0.2)$, respectively, consistent with the settings in Auerswald & Moshagen (2019).
low.vpf: A numeric value specifying the minimum number of items per factor, must be larger than 2, effective only when vpf is 'R'. (default = 5)
up.vpf: A numeric value specifying the maximum number of items per factor, effective only when vpf is 'R'. (default = 15)
a: A numeric or NULL specifying the 'a' parameter of the beta distribution, effective only when distri = 'beta'. If a numeric value is provided, it will be used as the 'a' parameter of the beta distribution. If NULL, a random integer between 1 and 10 will be used. (default = NULL)
b: A numeric or NULL specifying the 'b' parameter of the beta distribution, effective only when distri = 'beta'. If a numeric value is provided, it will be used as the 'b' parameter of the beta distribution. If NULL, a random integer between 1 and 10 will be used. (default = NULL)
vis: A logical value indicating whether to print process information. (default = TRUE)

Value

An object of class EFAdata is a list containing the following components:

loadings: A simulated loading matrix.
items: A list containing all factors and the item indices under each factor.
cor.factors: A simulated factor correlation matrix.
cor.items: A simulated item correlation matrix.
response: A simulated response data matrix.

Details

A population correlation matrix was created for each data set based on the following decomposition: $$\mathbf{\Sigma} = \mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}^T + \mathbf{\Delta}$$ where $\mathbf{\Lambda}$ is the loading matrix, $\mathbf{\Phi}$ is the factor correlation matrix, and $\mathbf{\Delta}$ is a diagonal matrix, with $\mathbf{\Delta} = 1 - \text{diag}(\mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}^T)$. The purpose of $\mathbf{\Delta}$ is to ensure that the diagonal elements of $\mathbf{\Sigma} $ are 1.

The response data for each subject was simulated using the following formula: $$X_i = L_i + \epsilon_i, \quad 1 \leq i \leq I$$ where $L_i$ follows a a standard normal distribution (distri = 'normal') or a beta distribution (distri = 'beta'), representing the contribution of latent factors. And $\epsilon_i$ is the residual term following a standard normal distribution (distri = 'normal') or a beta distribution (distri = 'beta') . $L_i$ and $\epsilon_i$ are uncorrelated, and $\epsilon_i$ and $\epsilon_j$ are also uncorrelated.

References

Goretzko, D., & Buhner, M. (2020). One model to rule them all? Using machine learning algorithms to determine the number of factors in exploratory factor analysis. Psychological Methods, 25(6), 776-786. https://doi.org/10.1037/met0000262.

Auerswald, M., & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological methods, 24(4), 468-491. https://doi.org/https://doi.org/10.1037/met0000200

Examples

library(EFAfactors)

## Run EFAsim.data function with default parameters.
data.obj <- EFAsim.data(nfact = 3, vpf = 5, N=500, distri="normal", fc="R", pl="R", cl="R",
                        low.vpf = 5, up.vpf = 15, a = NULL, b = NULL, vis = TRUE)
#> $factor1
#> [1]  2  3 10 14 15
#> 
#> $factor2
#> [1]  4  5  6  7 12
#> 
#> $factor3
#> [1]  1  8  9 11 13
#> 

head(data.obj$loadings)
#>             factor1     factor2     factor3
#> item 1  0.118186967 -0.05246182  0.48012188
#> item 2  0.497564324 -0.13902210 -0.15124030
#> item 3  0.779526642 -0.14447757  0.02437919
#> item 4 -0.190154526  0.79742140 -0.11738744
#> item 5 -0.008881612  0.64506761 -0.14898734
#> item 6  0.103383815  0.66883871  0.10132315