Statistics for the Numerus Strictus Logicae

Normalisation

Estimated norm:

Raw score	IQ (SD = 15)	95% CI
9	124
12	132
15	143
18	151
21	158
24	166
27	173
30	181
33	≥193

Disclaimer:

These norms are valid only under conditions of fully independent completion. Any external aid invalidates the resulting estimate.

Participant data

Histogram of scores

 x  x  x  x  x  x  x

04 07 15 17 19 20 23

Histogram of ages

 x  x  x  x  x  x  x

25 28 29 38 44 54 55

Sex distribution

Male	Female
6	2

Country distribution

CN	IT	JP	RO	US
1	1	1	1	4

Date when taken

	JAN	FEB	JUN	JUL	AUG	SEP	NOV
2024	0	0	0	1	0	0	1
2025	0	0	1	0	1	1	0
2026	1	2	0	0	0	0	0

Statistical summary

Sample information

^[↓1] N = 8

Lowest raw score = 4

Highest raw score = 23

^[↓2] Range = 20

^[↓3] Resolution = 5.38

Central tendency and dispersion

^[↓4] Mean = 15.0 (95% CI: 9.58–20.42)

^[↓5] Median = 16

^[↓6] Mode = 15

^[↓7] Raw score standard deviation = 6.06

^[↓8] Quartile deviation = 3.13

^[↓9] Standard error of measurement = 1.66 raw score points

Distribution shape

^[↓10] Skewness = -0.79

^[↓11] Excess kurtosis = -0.29

Test difficulty

Items solved by all = 4

Items solved by none = 12

^[↓12] Mean item facility (p-value) = 0.4166

^[↓13] Sample-dependent hardness = 0.56

^[↓14] Sample-independent hardness = 0.59

^[↓15] Overall test complexity = 0.76

Reliability

^[↓16] Cronbach's alpha = 0.9254

^[↓17] Split half reliability index = 0.9063

^[↓18] Spearman–Brown corrected = 0.9509

^[↓19] McDonald's omega = 0.9403

^[↓20] Test dimensionality = unidimensional

Item analysis

Mean discrimination = 0.6703

Discrimination range = 0.1972 – 0.8904

Designation	Discrimination index	Number of items
Excellent	≥0.7	8
Very Good	0.6–0.6999	5
Good	0.5–0.5999	2
Acceptable	0.4–0.4999	4
Borderline	0.3–0.3999	0
Poor	0.2–0.2999	0
Nonfunctional	≤0.1999	1
Insufficient data	NaN	16

You can find explanations about the discrimination computation here.

Explanatory notes

^[↑1] N — the total number of candidates who took the test. Read more

^[↑2] Range — the difference between the highest and lowest score in the dataset. Read more

^[↑3] Resolution — the number of consecutive possible raw scores contained within a unit of spread. Read more

^[↑4] Mean — the arithmetic average of all scores in the dataset. Read more

^[↑5] Median — the middle score when all scores are ordered from lowest to highest. Read more

^[↑6] Mode — the value that occurs most frequently in the dataset. Read more

^[↑7] Standard deviation — the square root of variance, indicating typical distance of scores from the mean; most informative for normal distributions. Read more

^[↑8] Quartile deviation — a spread measure useful for skewed or non-normal score distributions. Read more

^[↑9] Standard error of measurement — the expected variability of a test score due to measurement error. Read more

^[↑10] Skewness — a measure of how asymmetric a score distribution is around its mean. Read more

^[↑11] Excess kurtosis — measures how much distribution tails differ in heaviness from those of the normal distribution. Read more

^[↑12] Item facility — refers to the probability that examinees will answer the item correctly. Read more

^[↑13] Sample-dependent hardness — the average proportion of the possible raw score range that candidates fail to achieve. Read more

^[↑14] Sample-independent hardness — a difficulty index anchored to a fixed normative ability level instead of the sample mean. Read more

^[↑15] Overall test complexity — a sample-independent index of global difficulty across the test’s entire effective measurement range. Read more

^[↑16] Cronbach’s alpha — quantifies the extent to which items measure the same underlying construct. Read more

^[↑17] Split-half reliability — estimates consistency by correlating scores from two halves of the test, often averaged over multiple random splits. Read more

^[↑18] Spearman-Brown correction — scales split-half reliability to reflect the reliability of the entire test. Read more

^[↑19] McDonald’s omega — a reliability coefficient estimating internal consistency using a latent factor model. Read more

^[↑20] Test dimensionality — the number of distinct latent constructs or abilities measured by the test. Read more

Credits: Paul Cooijmans, Marc-André Nydegger

[ All Statistics ] [ The Test ] [ About ]

Statistical Report on Numerus Strictus Logicae