Statistical Report on Numerus Strictus Logicae

Estimated norm:

Raw score IQ (SD = 15) 95% CI
9 124
12 132
15 143
18 151
21 158
24 166
27 173
30 181
33 193
36 ≥205

Disclaimer:

These norms are valid only under conditions of fully independent completion. Any external aid invalidates the resulting estimate.

Histogram of scores

       x            
 x  x  x  x  x  x  x
04 07 15 17 19 20 23

Histogram of ages

 x
 x  x  x  x  x  x  x
25 28 29 38 44 54 55

Sex distribution

Male Female
6 2

Country distribution

CN IT JP RO US
1 1 1 1 4

Date when taken

JAN FEB JUN JUL AUG SEP NOV
2024 0 0 0 1 0 0 1
2025 0 0 1 0 1 1 0
2026 1 2 0 0 0 0 0

Sample information

[↓1] N = 8

Lowest raw score = 4

Highest raw score = 23

[↓2] Range = 20

[↓3] Resolution = 5.38


Central tendency and dispersion

[↓4] Mean = 15.0 (95% CI: 9.58–20.42)

[↓5] Median = 16

[↓6] Mode = 15

[↓7] Raw score standard deviation = 6.06

[↓8] Quartile deviation = 3.13

[↓9] Standard error of measurement = 1.66 raw score points


Distribution shape

[↓10] Skewness = -0.79

[↓11] Excess kurtosis = -0.29


Test difficulty

Items solved by all = 4

Items solved by none = 12

[↓12] Mean item facility (p-value) = 0.4166

[↓13] Sample-dependent hardness = 0.56

[↓14] Sample-independent hardness = 0.59

[↓15] Overall test complexity = 0.76


Reliability

[↓16] Cronbach's alpha = 0.9254

[↓17] Split half reliability index = 0.9063

[↓18] Spearman–Brown corrected = 0.9509

[↓19] McDonald's omega = 0.9403

[↓20] Test dimensionality = unidimensional

Mean discrimination = 0.6703

Discrimination range = 0.1972 – 0.8904

Designation Discrimination index Number of items
Excellent ≥0.7 8
Very Good 0.6–0.6999 5
Good 0.5–0.5999 2
Acceptable 0.4–0.4999 4
Borderline 0.3–0.3999 0
Poor 0.2–0.2999 0
Nonfunctional ≤0.1999 1
Insufficient data NaN 16

You can find explanations about the discrimination computation here.

[↑1] N — the total number of candidates who took the test. Read more

[↑2] Range — the difference between the highest and lowest score in the dataset. Read more

[↑3] Resolution — the number of consecutive possible raw scores contained within a unit of spread. Read more

[↑4] Mean — the arithmetic average of all scores in the dataset. Read more

[↑5] Median — the middle score when all scores are ordered from lowest to highest. Read more

[↑6] Mode — the value that occurs most frequently in the dataset. Read more

[↑7] Standard deviation — the square root of variance, indicating typical distance of scores from the mean; most informative for normal distributions. Read more

[↑8] Quartile deviation — a spread measure useful for skewed or non-normal score distributions. Read more

[↑9] Standard error of measurement — the expected variability of a test score due to measurement error. Read more

[↑10] Skewness — a measure of how asymmetric a score distribution is around its mean. Read more

[↑11] Excess kurtosis — measures how much distribution tails differ in heaviness from those of the normal distribution. Read more

[↑12] Item facility — refers to the probability that examinees will answer the item correctly. Read more

[↑13] Sample-dependent hardness — the average proportion of the possible raw score range that candidates fail to achieve. Read more

[↑14] Sample-independent hardness — a difficulty index anchored to a fixed normative ability level instead of the sample mean. Read more

[↑15] Overall test complexity — a sample-independent index of global difficulty across the test’s entire effective measurement range. Read more

[↑16] Cronbach’s alpha — quantifies the extent to which items measure the same underlying construct. Read more

[↑17] Split-half reliability — estimates consistency by correlating scores from two halves of the test, often averaged over multiple random splits. Read more

[↑18] Spearman-Brown correction — scales split-half reliability to reflect the reliability of the entire test. Read more

[↑19] McDonald’s omega — a reliability coefficient estimating internal consistency using a latent factor model. Read more

[↑20] Test dimensionality — the number of distinct latent constructs or abilities measured by the test. Read more

Credits: Paul Cooijmans, Marc-André Nydegger


Flag Counter