EFAsim.data.Rd
This function is used to simulate data that conforms to the theory of exploratory factor analysis, with a high degree of customization for the variables involved.
EFAsim.data(
nfact,
vpf,
N = 500,
distri = "normal",
fc = "R",
pl = "R",
cl = "R",
low.vpf = 5,
up.vpf = 15,
a = NULL,
b = NULL,
vis = TRUE,
seed = NULL
)
A numeric value specifying the number of factors to simulate.
A numeric or character value specifying the number of items under each factor. If a numeric value is provided, the numeric must be larger than 2, and the number of items under each factor will be fixed to this value. If a character value is provided, it must be one of 'S', 'M', 'L', or 'R'. These represent random selection of items under each factor from \(U(5, 10)\), \(U(5, 15)\), \(U(5, 20)\), or \(U(low.vpf up.vpf)\), respectively.
A numeric value specifying the number of examinees to simulate.
A character, either 'normal' or 'beta', indicating whether the simulated data will follow a standard multivariate normal distribution or a multivariate beta distribution.
A numeric or character value specifying the degree of correlation between factors. If a numeric value is provided, it must be within the range of 0 to 0.75, and the correlation between all factors will be fixed at this value. If a character value is provided, it must be 'R', and the correlations between factors will be randomly selected from \(U(0.0, 0.5)\).
A numeric or character value specifying the size of the primary factor loadings. If a numeric value is provided, it must be within the range of 0 to 1, and all primary factor loadings in the loading matrix will be fixed at this value. If a character value is provided, it must be one of 'L', 'M', 'H', or 'R', representing \(pl~U(0.35, 0.50)\), \(pl~U(0.50, 0.65)\), \(pl~U(0.65, 0.80)\), or \(pl~U(0.35, 0.80)\), respectively, consistent with the settings in Goretzko & Buhner (2020).
A numeric or character value specifying the size of cross-loadings. If a numeric value is provided, it must be within the range of 0 to 0.5, and all cross-loadings in the loading matrix will be fixed at this value. If a character value is provided, it must be one of 'L', 'H', 'None', or 'R', representing \(cl~U(-0.1, 0.1)\), \(cl~U(-0.2, -0.1) \cup U(0.1, 0.2)\), \(cl = 0\), or \(cl~U(-0.2, 0.2)\), respectively, consistent with the settings in Auerswald & Moshagen (2019).
A numeric value specifying the minimum number of items per factor, must be larger than 2, effective only when vpf
is 'R'. (default = 5)
A numeric value specifying the maximum number of items per factor, effective only when vpf
is 'R'. (default = 15)
A numeric or NULL specifying the 'a' parameter of the beta distribution, effective only when distri = 'beta'
.
If a numeric value is provided, it will be used as the 'a' parameter of the beta distribution.
If NULL, a random integer between 1 and 10 will be used. (default = NULL)
A numeric or NULL specifying the 'b' parameter of the beta distribution, effective only when distri = 'beta'
.
If a numeric value is provided, it will be used as the 'b' parameter of the beta distribution.
If NULL, a random integer between 1 and 10 will be used. (default = NULL)
A logical value indicating whether to print process information. (default = TRUE)
A numeric or NULL specifying the random seed. If a numeric value is provided, it will be used as the seed. If NULL, the current timestamp will be used. (default = NULL)
An object of class EFAdata
is a list
containing the following components:
A simulated loading matrix.
A list
containing all factors and the item indices under each factor.
A simulated factor correlation matrix.
A simulated item correlation matrix.
A simulated response data matrix.
A population correlation matrix was created for each data set based on the following decomposition: $$\mathbf{\Sigma} = \mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}^T + \mathbf{\Delta}$$ where \(\mathbf{\Lambda}\) is the loading matrix, \(\mathbf{\Phi}\) is the factor correlation matrix, and \(\mathbf{\Delta}\) is a diagonal matrix, with \(\mathbf{\Delta} = 1 - \text{diag}(\mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}^T)\). The purpose of \(\mathbf{\Delta}\) is to ensure that the diagonal elements of \(\mathbf{\Sigma} \) are 1.
The response data for each subject was simulated using the following formula:
$$X_i = L_i + \epsilon_i, \quad 1 \leq i \leq I$$
where \(L_i\) follows a a standard normal distribution (distri = 'normal'
) or a beta
distribution (distri = 'beta'
), representing the contribution of latent factors.
And \(\epsilon_i\) is the residual term following a standard normal distribution
(distri = 'normal'
) or a beta distribution (distri = 'beta'
) . \(L_i\) and \(\epsilon_i\)
are uncorrelated, and \(\epsilon_i\) and \(\epsilon_j\) are also uncorrelated.
Goretzko, D., & Buhner, M. (2020). One model to rule them all? Using machine learning algorithms to determine the number of factors in exploratory factor analysis. Psychological Methods, 25(6), 776-786. https://doi.org/10.1037/met0000262.
Auerswald, M., & Moshagen, M. (2019). How to determine the number of factors to retain in exploratory factor analysis: A comparison of extraction methods under realistic conditions. Psychological methods, 24(4), 468-491. https://doi.org/https://doi.org/10.1037/met0000200
library(EFAfactors)
## Run EFAsim.data function with default parameters.
data.obj <- EFAsim.data(nfact = 3, vpf = 5, N=500, distri="normal", fc="R", pl="R", cl="R",
low.vpf = 5, up.vpf = 15, a = NULL, b = NULL, vis = TRUE, seed = NULL)
#> $factor1
#> [1] 1 3 11 12 15
#>
#> $factor2
#> [1] 2 6 7 8 13
#>
#> $factor3
#> [1] 4 5 9 10 14
#>
head(data.obj$loadings)
#> factor1 factor2 factor3
#> item 1 0.50667021 -0.104841927 0.18592048
#> item 2 0.09759483 0.709403526 0.15411184
#> item 3 0.67106578 0.008141584 -0.09725999
#> item 4 -0.12067013 -0.029733123 0.48969366
#> item 5 -0.11266620 -0.121623561 0.69300440
#> item 6 0.13821580 0.607314936 -0.19221646