randomly generate response data matrix according to certen conditions, including attributes distribution, item quality, sample size, Q-matrix and cognitive diagnosis models (CDMs).

sim.data(
  Q = NULL,
  N = NULL,
  IQ = list(P0 = NULL, P1 = NULL),
  model = "GDINA",
  distribute = "uniform",
  control = NULL,
  verbose = TRUE
)

Arguments

Q

The Q-matrix. A random 30 × 5 Q-matrix (sim.Q) will be used if NULL.

N

Sample size. Default = 500.

IQ

A List contains tow I-length vectors: P0 and P1.

model

Type of model to be fitted; can be "GDINA", "LCDM", "DINA", "DINO", "ACDM", "LLM", or "rRUM".

distribute

Attribute distributions; can be "uniform" for the uniform distribution, "mvnorm" for the multivariate normal distribution (Chiu, Douglas, & Li, 2009) and "horder" for the higher-order distribution (Tu et al., 2022).

control

A list of control parameters with elements:

  • sigma A positive-definite symmetric matrix specifying the variance-covariance matrix when distribute = "mvnorm". Default = 0.5 (Chiu, Douglas, & Li, 2009).

  • cutoffs A vector giving the cutoff for each attribute when distribute = "mvnorm". Default = \(k/(1+K)\) (Chiu, Douglas, & Li, 2009).

  • theta A vector of length N representing the higher-order ability for each examinee. By default, generate randomly from the normal distribution (Tu et al, 2022).

  • a The slopes for the higher-order model when distribute = "horder". Default = 1.5 (Tu et al, 2022).

  • b The intercepts when distribute = "horder". By default, select equally spaced values between -1.5 and 1.5 according to the number of attributes (Tu et al, 2022).

verbose

Logical indicating to print information or not. Default is TRUE

Value

Object of class simGDINA. An simGDINA object gained by simGDINA function form GDINA package. Elements that can be extracted using method extract include:

dat

An N × I simulated item response matrix.

Q

The Q-matrix.

attribute

An N × K matrix for inviduals' attribute patterns.

catprob.parm

A list of non-zero category success probabilities for each latent group.

delta.parm

A list of delta parameters.

higher.order.parm

Higher-order parameters.

mvnorm.parm

Multivariate normal distribution parameters.

LCprob.parm

A matrix of item/category success probabilities for each latent class.

References

Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster Analysis for Cognitive Diagnosis: Theory and Applications. Psychometrika, 74(4), 633-665. DOI: 10.1007/s11336-009-9125-0.

Tu, D., Chiu, J., Ma, W., Wang, D., Cai, Y., & Ouyang, X. (2022). A multiple logistic regression-based (MLR-B) Q-matrix validation method for cognitive diagnosis models:A confirmatory approach. Behavior Research Methods. DOI: 10.3758/s13428-022-01880-x.

Author

Haijiang Qin <Haijiang133@outlook.com>

Examples


################################################################
#                           Example 1                          #
#          generate data follow the uniform distrbution        #
################################################################
library(Qval)

set.seed(123)

K <- 5
I <- 10
Q <- sim.Q(K, I)

IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)

data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "uniform")
#> distribute =  uniform 
#>  model =  GDINA 
#>  number of attributes:  5 
#>  number of items:  10 
#>  num of examinees:  10 
#>  average of P0 =  0.116 
#>  average of P1 =  0.926 

print(data$dat)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    0    0    1    0    1    0    0    0     1
#>  [2,]    1    0    1    1    1    1    1    1    0     0
#>  [3,]    1    1    1    1    0    0    0    1    0     0
#>  [4,]    0    0    1    0    0    1    1    1    1     0
#>  [5,]    1    1    0    1    0    0    1    0    1     0
#>  [6,]    0    0    1    0    0    1    1    1    0     1
#>  [7,]    1    1    0    1    0    0    0    0    1     0
#>  [8,]    1    1    1    1    1    1    1    1    1     1
#>  [9,]    0    1    0    0    1    0    1    0    1     1
#> [10,]    0    1    1    0    0    0    0    1    1     1

################################################################
#                           Example 2                          #
#          generate data follow the mvnorm distrbution         #
################################################################
set.seed(123)
K <- 5
I <- 10
Q <- sim.Q(K, I)

IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)

example_cutoffs <- sample(qnorm(c(1:K)/(K+1)), ncol(Q))
data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "mvnorm",
                 control = list(sigma = 0.5, cutoffs = example_cutoffs))
#> distribute =  mvnorm 
#>  model =  GDINA 
#>  number of attributes:  5 
#>  number of items:  10 
#>  num of examinees:  10 
#>  average of P0 =  0.116 
#>  average of P1 =  0.926 
#> sigma = 0.5 
#>  cutoffs = -0.967 0 0.431 0.967 -0.431 

print(data$dat)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    1    1    1    1    1    0    1    1    1     0
#>  [2,]    0    1    0    0    1    0    1    0    0     0
#>  [3,]    1    1    1    1    1    0    1    1    1     1
#>  [4,]    1    1    1    1    1    0    1    1    1     1
#>  [5,]    1    1    1    1    1    0    1    1    1     0
#>  [6,]    1    1    1    1    1    0    1    1    1     0
#>  [7,]    0    0    0    0    0    0    0    0    0     1
#>  [8,]    0    1    0    0    1    0    1    0    1     0
#>  [9,]    1    1    1    1    0    1    1    1    1     1
#> [10,]    0    0    1    0    0    0    1    1    0     0

#################################################################
#                            Example 3                          #
#           generate data follow the horder distrbution         #
#################################################################
set.seed(123)
K <- 5
I <- 10
Q <- sim.Q(K, I)

IQ <- list(
  P0 = runif(I, 0.0, 0.2),
  P1 = runif(I, 0.8, 1.0)
)

example_theta <- rnorm(10, 0, 1)
example_b <- seq(-1.5,1.5,length.out=K)
data <- sim.data(Q = Q, N = 10, IQ=IQ, model = "GDINA", distribute = "horder",
                 control = list(theta = example_theta, a = 1.5, b = example_b))
#> distribute =  horder 
#>  model =  GDINA 
#>  number of attributes:  5 
#>  number of items:  10 
#>  num of examinees:  10 
#>  average of P0 =  0.116 
#>  average of P1 =  0.926 
#> theta_mean =  -0.625 , theta_sd = 0.906 
#>  a =  1.5 
#>  b =  -1.5 -0.75 0 0.75 1.5 

print(data$dat)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]    0    0    0    0    0    1    0    0    0     0
#>  [2,]    1    1    0    1    1    1    1    1    1     1
#>  [3,]    0    0    0    0    0    0    0    0    0     0
#>  [4,]    0    0    0    0    0    0    0    0    1     0
#>  [5,]    1    1    0    0    0    0    0    0    1     1
#>  [6,]    1    0    0    1    0    1    0    0    0     1
#>  [7,]    0    1    0    1    0    1    0    0    1     1
#>  [8,]    0    1    0    0    0    0    0    0    1     0
#>  [9,]    0    0    0    0    0    0    0    0    0     0
#> [10,]    0    0    1    0    1    1    1    1    0     1