In The Pattern Seekers: How Autism Drives Human Invention (2020), Baron-Cohen proposes the Systemizing Mechanism as an explanation for human progress through invention, from the first tools to the digital revolution.
Regarding the association between autism and family achievement in the field of engineering, is this controlling for the fact that those who have higher SES might also have a higher average parental age? While it is true that autism is heritable, it may also be epigenetic. The genes for autism may lie dormant or latent in a population, and only become expressed or activated due to late parental age. When the genes for autism remain dormant, they may express themselves in positive ways, but when they become expressed due to reproductive damage from parental aging, they are deleterious.
Autism is not simply a heightened level of testosterone, but also an atrophy of social functioning. Otherwise, a person on steroids would become autistic. Autism could also be thought of as sickle cell anemia, where certain expressions of a gene are positive, but other expressions are deleterious.
Good point. I had to look at multiple papers I have cited in that article. And indeed, I couldn't find where they said they accounted for father's age or parent age.
smart observations, DLA. Other than the possibility for reader confusion from implying the existence of a single traceable "autism gene", similar to sickle cell disease. I doubt that was intended, but some readers might get the wrong idea. So, to review:
The red blood cell mutation known as 'sickle cell' that leads to the vulnerability to its signature form of anemia is due entirely to one gene mutation of hemoglobin, which allows for ready identification, sure diagnosis, and amenability to being modified with currently extant gene technologies. Transmitted entirely be hereditary means.
Unlike sickle cell anemia, "autism" is not one easily identifiable malady with a clearly outlined differential diagnosis. Autism is a chronic condition, a cluster of symptoms of varying degrees of severity or intensity that are expressed with a high quotient of idiosyncrasy, without significant correlation between the mental and emotional disruptions and tropisms of the symptom cluster and visible physical trait markers (such as those often found in connection with many common hereditary maladies and disease syndromes.)
The etiology of autism is currently unknown. No single clearly identifiable gene has been found to provide a sure diagnosis of "autism" the way one has been identified to cause the sickle cell mutation, or the genetic disease phenylketonuria, etc.
"...The study defines four subtypes of autism — Social and Behavioral Challenges, Mixed ASD with Developmental Delay, Moderate Challenges, and Broadly Affected. Each subtype exhibits distinct developmental, medical, behavioral and psychiatric traits, and importantly, different patterns of genetic variation.
1) Individuals in the Social and Behavioral Challenges group show core autism traits, including social challenges and repetitive behaviors, but generally reach developmental milestones at a pace similar to children without autism. They also often experience co-occurring conditions like ADHD, anxiety, depression or obsessive-compulsive disorder alongside autism. One of the larger groups, this constitutes around 37% of the participants in the study.
2) The Mixed ASD with Developmental Delay group tends to reach developmental milestones, such as walking and talking, later than children without autism, but usually does not show signs of anxiety, depression or disruptive behaviors. “Mixed” refers to differences within this group with respect to repetitive behaviors and social challenges. This group represents approximately 19% of the participants.
3) Individuals with Moderate Challenges show core autism-related behaviors, but less strongly than those in the other groups, and usually reach developmental milestones on a similar track to those without autism. They generally do not experience co-occurring psychiatric conditions. Roughly 34% of participants fall into this category.
4) The Broadly Affected group faces more extreme and wide-ranging challenges, including developmental delays, social and communication difficulties, repetitive behaviors and co-occurring psychiatric conditions like anxiety, depression and mood dysregulation. This is the smallest group, accounting for around 10% of the participants.
“These findings are powerful because the classes represent different clinical presentations and outcomes, and critically we were able to connect them to distinct underlying biology,” said Aviya Litman, a Ph.D. student at Princeton and co-lead author..."
-----
That improvement in the ability to outline four subtypes of autism is worthwhile in itself, but the study also addressed causative factors. Speculation has been rife for decades about the relative importance of inherited genetic factors versus epigenetics influence (i.e., matters of gene expression influenced by biological/chemical influences in prenatal or early childhood development.) The Princeton study found only one autism subtype out of the four--Mixed ASD with Developmental Delay [19% of total]-- that was strongly correlated with a particular and identifiable parental genetic inheritance (albeit not definitively traced). By contrast, the "children in the Broadly Affected group [10% of total] showed the highest proportion of damaging de novo mutations — those not inherited from either parent." (To me, that bespeaks toxic exposure, severe autoimmune reaction, and the likelihood of proximal contact with chemical/biological influences in the surrounding environment that can heavily influence epigenetic expression, often in deleterious ways. I find that worrying. A particularly ominous mystery.) Meanwhile, 71% of the total ASD population falls into the two remaining categories, where correlative clues to causation remain unclear.
another quote from the Princeton article:
"The team also found that autism subtypes differ in the timing of genetic disruptions’ effects on brain development. Genes switch on and off at specific times, guiding different stages of development. While much of the genetic impact of autism was thought to occur before birth, in the Social and Behavioral Challenges subtype — which typically has substantial social and psychiatric challenges, no developmental delays, and a later diagnosis — mutations were found in genes that become active later in childhood. This suggests that, for these children, the biological mechanisms of autism may emerge after birth, aligning with their later clinical presentation..."
The SPARK team that includes the authors of the study isn't just about one research paper. SPARK is a long-term project reviewing a massive population sample. Family-oriented. Impressive. Everyone should read the whole article, and click on some of its hyperlinks.
To reiterate my criticism in an earlier post about the limitations of the Metastudy concept and how metastudies can be misused, even the best ones cannot be taken as conclusive. To its credit, the linked metastudy doesn't even try. If I were a researcher, I think I'd like it a lot, for the clues it develops as the most promising avenues for future research. As I said earlier, metastudies are for offering good starting points, not asserting fiat conclusions. The graphics are eays to read, and though provoking. It's easy to follow back to the primary source studies included in the aggregation. The metastudy doesn't pretend to think for the human researcher. It's more like a search engine with a centrifuge function. Metaphorically, of course.
"What appears paradoxical given that autism is characterized by below-average IQ can be resolved under the hypothesis that autism involves enhanced, but imbalanced, components of intelligence. Crespi reviews several studies suggesting that increased local brain connectivity in autism is linked with specific enhanced abilities such as hyper-sensitivities and attention to detail but that comes at the cost of reduced long-range brain connectivity which could contribute to such imbalances by reducing general intelligence. Autism is indeed the only psychiatric condition characterized by notable rates of savant skills (Treffert, 2009), which account for their highly limited range of enhancements.
Another study consilient with IQ research found that autistic people had higher SD in IQ. They are are 12 times more likely to score within the intellectual disability range but also 1.5 times within the superior range (Billeiter & Froiland, 2022)."
The autism spectrum is only one trait cluster within a wider set of "non-neurotypical" features thus far found to appear in a significant fraction of the human population. Most of the research is very, very recent. Simply when considering the presentation of autism, beyond the simple interpretive estimates of low-functioning and high-functioning, verbal and nonverbal, there may be even more subsets to be identified. For example, the "idiot savant" manifestations of autism may have specific neurological pathways that strengthen those uncanny specialized abilities, or they may be enabled by the innate weakening of other brain capacities. (Or both. Or maybe it's a field effect. Etc.)
Some other forms of "non-neurotypical" variation include the attention deficit disorders; dyslexia; patterning and looping behaviors like OCD; variations in short-term memory recall capacity and long-term memory recording (possibly including idiosyncratic variation in regard to accuracy); eidetic imagery; extraordinary spatial modeling facility (Tesla, Beethoven); etc. Autism ("verbal", not low functioning) has already been shown to be correlated with a test-dependent variation in scores on IQ tests, often to a pronounced degree. Now results of that sort are showing up in the case of diagnosed ADD individuals (although the test-dependent correlation is quite different from that associated with verbal autistic individuals.) More and more attention is being given to examining the role of non-neurotypical variation as a confounding factor on IQ exam performance, and it's a subject that's only been given concerted research attention within the last 20 years or so.
What does this say about Spearman's "g" concept--that all intelligence abilities emerge from the same substrate, posited as a general foundation amenable to linear measurement- similar to, say, grip strength?
Except for means of measurement, of course, with grip strength determined by quite straightforward testing process to yield an unambiguously quantifiable result, whereas...let's just say that IQ tests don't rely on a similarly direct approach. A grip strength test and an IQ test are identical in only one respect: they yield a measurable score that's amenable to linear ranking.
Yet the result of an IQ test is somehow considered similarly precise and accurate--and considered much more significant than the result of a grip strength test, both in terms of its probative value both for the evaluation of individuals, and populations. This, despite the fact that the only result a written IQ examination can ever offer is a probability estimate, generated indirectly on the basis of a design intended to work for the purpose of providing an interpretive conclusion.
What does it say about g? You would have to test for measurement invariance and check whether the group score difference reflects the same construct. You could also test for the strength of positive manifold and check whether the g loadings are lower in the autism group.
On measurement, you might want to read this paper:
I can't help but wonder whether the gist of my comment has eluded you.
Does extended personal interaction with other human beings ever count as an acceptable means of evaluation to yield fruitful evidence, regarding this topic of research? Such a methodology has the advantages of allowing for considerably more depth and breadth of inquiry than a written exam, along with allowing for considerably more evaluative subtlety.
Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity. No written exam can measure creativity, because the correct answers are all plotted by the designers of the test in advance! Without the unexpected leap, how is one to detect creative synthesis, much less measuring it as a linear capacity? Or doesn't impromptu resourcefulness count? Wit and humor also strongly rely on the presence of the unpredictable twist. Doesn't that count as a feature of intelligence, despite its resistance to being summed and plotted on a graph?
And no, psychological research isn't a field where I'm content to defer entirely to the Appeal to Authority. I read that Jensen paper already. As with most psychological theory papers written to favor Spearman's g hypothesis, it's a question-beg. The validity of the premise is assumed, along with the narrowness of the criteria held to determine it and measure it.
Although references are certainly welcome, I'm not interested in entering into a discussion that revolves around the writings of a third party, or an extended elaboration on them. I'm here to discuss your ideas, not theirs. Your suggestions interest me; they might yield some worthwhile findings. I can follow the logic of your proposals. But your reference to "checking the g loadings" indicates that you've already assumed the validity of the g premise, and I've been contending that the g premise is to be skeptically examined, not assumed. Your assumption has a way of foreclosing other possible interpretive frames.
I also have to mention a recurrent problem I've found in the field of intelligence testing research, of insufficient attention to possible confounders. I've gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them. I'm gratified to notice the recent (<20 years) focus on examining the impact of neurotype variation on IQ test score results; that's the first broadening of the inquiry in decades. The psychology of human behaviors is a field where confounders happen to abound--measurable confounders, at that. So many of them, in fact, that I recognize the difficulty of including them all in one study. But it should be recognized that the way forward in the field is to broaden the inquiry to account for the role played by as yet unmeasured confounders (null, or measurably significant) and their importance (relative impact on scores.) I'm not implying an endless rabbit hole. If the usual run of human intelligence studies includes five variables to sort cohorts and subsets, adding another five would supply another level of insight. Even if the factors examined were shown to have no importance, the findings would be on the record.
I realize that assessing a wider range of variables and then controlling for their importance complicates the studies. But it's the only way to accumulate a base of findings that qualify as Authentic Big Data. In regard to evaluating levels of human intelligence abilities, we don't have Authentic Big Data yet. The default of reifying Raven's Progressive Matrices as some all-purpose reductive distillation process that supplies the gold standard of intelligence measurement worldwide in accordance with the g hypothesis is not to be confused with a position based on Authentic Big Data.
Metastudies per se are not to be confused with Authentic Big Data, either. Although I'm getting the unsettling impression that they're increasingly viewed that way, simply due to their quantitative aspect, of aggregating a great many studies--and hence a vast amount of data, albeit not all that thoroughly evaluated by the researchers. Metastudies are carried out largely for the convenience of the researchers. As such, they need to be considered as starting points, and not as authoritative conclusions. Metastudies per se don't even work reliably as reinforcement for conclusions. Some of the ones I've reviewed appear to function more like exercises in confirmation bias. Metastudies aggregate by skimming and scanning. There's no Thought involved. To the extent that they winnow, they do so in accordance with the critieria pre-selected by the human researchers (the actual thinkers in the process, who are also responsible for evaluating the probity of the conclusions.) That's a process that can be done well or poorly, but it isn't to be confused with analytical depth.
There's a crucial difference between the unprecedentedly new access to vast databases and Authentic Big Data. Vast troves of information that all ask the same few questions the same ways to achieve a continually reinforcing loop of the same results simply do not constitute a robust set of considerations. The mere existence of a lot of data is not to be confused with Authentic Big Data.
The Collateralized Debt Obligations that imploded in the 2008 mortgage collapse bundled a large amount of data, too. I don't think enough attention has been paid to the similar liability latent in Metastudies: bundling preliminary research, studies reliant on slipshod methodology, etc., and then certifying the aggregated product as AA, so to speak. Big Data implies the ability to supply cogent answers to relevant questions that no one has yet asked. When financial analyst Vincent Daniel began to ask his own questions about CDOs, his in-depth analytical scrutiny revealed that many of those financial products lacked validity. I'm finding similar problems with some of the metastudies I've read, including studies asserting conclusions about "national intelligence" that purport to represent the state of the art in intelligence research.
As you see, I have my own thoughts about the study of human intelligence. Those, I'm willing to argue. I've published several essays on my Substack page, the first of which I linked upthread. The others can be easily found by navigating my page.
I welcome detailed criticisms of the observations and inferences I've made to guide the formation of my opinions. Thus far I haven't received any replies of that sort. As a result, I find myself in peril of embracing a false self-assurance about my views. Could it be that my insights are that ironclad, and my views that flawlessly stated?
You said, “I can’t help but wonder whether the gist of my comment has eluded you.”
On my end, I see you didn’t understand my point at all.
With measurement invariance testing, cognitive imbalances as well as other heterogeneity between groups will be detected. And Jensen’s paper directly addresses your earlier point on IQ measurement being merely indirect. If you don’t think it addresses your point, perhaps it's because your definition of intelligence is esoteric (i.e., IQ can’t be measured well because IQ designs are too restrictive), which I strongly suspect given your first message (and the second reinforcing my impression). If you have time, I highly suggest you read Jensen’s The g Factor, at least chapters 4, 5, 8 and 10.
As for your new question, “Does extended personal interaction…”
This can be easily checked by entering such a predictor variable using proxy like specific knowledge along with IQ in a typical regression. If it’s more important than IQ at predicting important life outcomes, of it adds predictive value above IQ, then you would be able to see it. You can also use factor analyses, with several cognitive domains and several specific knowledge tests, to check whether these would yield different factors.
If intelligence is imperfectly measured due to being narrowly defined (which is doubtful given available evidence), then it only means its current predictive validity is lower than what it should be. If on the other hand, intelligence cannot be measured at all, then the anti-IQ crusaders should start explaining why all the correlates are consistent with the idea that g is causal and real and expected by theory (see below).
You said, “Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity.”
This is not comparable. In your situation, you argue that there is no single correct answer, while in IQ, there is a single correct answer. Furthermore, multiple correct answers for creativity items can be achieved. You just have to change the design to allow for possible, multiple correct answers. If you require more extensive tests, then have a battery measuring non-overlapping dimensions, and check whether high score in one yields high score in another, i.e., whether creativity in one type “transfers” to another. Along with correct responses, you can check patterns of wrong responses. Among all wrong answers, some are more wrong than others, and some response checks are more common than some others. By looking at the frequency of each wrong answer and frequency of each good answer, you get a measure of “originality” by using the rarity of response as a proxy. To validate it, you will obviously need to check the loading of each of the responses and response patterns.
As for nonlinearity, IRT can model it very well.
You said, “And no, psychological research isn’t a field where I'm content to defer entirely to the Appeal to Authority.”
People who aren’t content with the actual measurement should have a better proposition of measurement, therefore a testable one at that. If your point is that you can’t measure creativity or IQ, then your hypothesis can never be tested. It’s a sort of pseudo science, because it’s made irrefutable by definition.
You said, “But your reference to “checking the g loadings” indicates that you’ve already assumed the validity of the g premise”
Before we go further, you should read this article:
It shows several interesting things. One of which is that several psychologists devised their intelligence tests based on the assumption that g is a myth. They never meant for the abilities to correlate, yet they always do. The same error is observed from Gardner’s supposedly multiple, uncorrelated intelligences, which turned out to be correlated. It shows therefore that psychologists did try their best to use alternative tests that were by design not related with g, yet statistically they were not different than g. Thus, IQ measures intelligence but not because it’s “by design”. To illustrate, consider what Jensen said here: “Another puzzle in terms of sampling theory is that tests such as forward and backward digit span memory, which must tap many common elements, are not as highly correlated as are, for instance, vocabulary and block designs, which would seem to have few elements in common. Of course, one could argue trivially in a circular fashion that a higher correlation means more elements in common, even though the theory can’t tell us why seemingly very different tests have many elements in common and seemingly similar tests have relatively few.”
Maybe, as I suggested, you should read Jensen’s book first. He explains this well why g is a valid construct. Functional deficiencies are isolated from the person’s total repertoire of abilities. That deficiency in ability A does not affect ability B, is a strong suggestion that g is causal. In fact, g measured differently are still highly correlated, whether using psychometric or chronometric test, the g loading of a psychometric test is highly correlated with chronometric test (purported to measure basic processing speed). More generally, when you examine the complex correlational network in which g is a major node, you see a pervasive impact of g on the varying, lifetime outcomes. As Gottfredson noted, with respect to health outcomes, “As gambling houses know well, even small odds in one’s favor can produce big profits in the long term when they remain consistently in one’s favor and other influences are more erratic. Information processing is involved in all daily tasks, even if only to a minor degree, so higher g always provides an edge, even if small.” Not only does IQ testing behave as expected if it truly measures intelligence but g is causal. Surely, if it’s not an accurate measure, it wouldn’t predict outcomes as expected by theory. And I haven’t started commenting on the biological reality of g yet, because there is a lot to say…
Obviously, g loading and g score aren’t the same thing. The g loading is a statistics showing how well each variable measures (i.e., correlates with) g. Typically, the higher the g loading, the higher the predictive validity, exactly as predicted by theory. That test complexity and the ability to decontextualize what has been learned form the core of g and its predictive value. This strengthens the point that g is not an artefact. There are many ways to check its validity. You can use stratified samples with varying background levels and see if the g-loadings of each non-g factor vary across these levels. Nonlinear factor analyses can offer some relevant information too. You can check whether g loadings vary across g score levels. If you suspect high g score individuals exhibit lower g loadings, due to external factors like schooling or other IQ-relevant environments, then this would be detectable. For instance, the fact that g loadings are inversely correlated with test-retest gains is consistent with the idea that g is the causal ingredient. This is predicted by theory: the test loses its g loading as practice continues.
But really, I don’t think you understand how much information you can obtain from statistical analyses. All of the stuff you mention, about not being not measurable etc., can be detected using the appropriate methods.
You said, “I’ve gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them.”
It’s the same problem with people who aren’t content with the actual measurements. They always complain but they never have a solution. If you have a better proposition, stop complaining and make some new, measurable variables that are not overlapping with the actual ones being in current use. Then we will test your variables and check whether it has added value. Some people tried this before, and they used their new variables, but after thorough evaluations, the conclusion is that g is still a very strong predictor of life outcomes.
Another issue with such an obsession with confounders is the known sociologist fallacy. You don’t want to include all possible “confounders” just because they appear like so, but because you believe they are causal.
You said, “Metastudies are carried out largely for the convenience of the researchers ... There’s no Thought involved”
If you didn’t know that, meta analyses are conducted to check for patterns. That there’s no thought is oversimplistic. While it doesn't replace theory and design, meta-analyses almost always involve the use of moderators, especially, theoretically-driven moderators.
About your blog post, “improvement of up to 2 standard deviations improvement with RPM over the WISC, by verbal autistic test takers--for test supposedly measuring the same quality, ‘g’, general intelligence”
That definitely proves what I suspected. You don’t have the basic knowledge on IQ. RPM is not the gold standard first, unlike what you claimed. Moreover, WISC is not unidimensional, which is your mistaken assumption. Using CFA to specify either Bifactor or Hierarchical g will show you RPM clustering with fluid reasoning tests, not verbal tests. And it will allow you to see whether g is affected by this imbalance in score difference between groups, just like I said before using measurement invariance testing.
By saying in your post that nonverbal is equivalent to culture free, you are not seeing what cultural bias is all about. It's the difference in exposure to the cultural content elicited by the test, not its cultural content (i.e., load). In fact, you can check it using, like I said, measurement invariance. Culture load is not the same as culture bias.
Meng Hu: "On my end, I see you didn’t understand my point at all.
With measurement invariance testing, cognitive imbalances as well as other heterogeneity between groups will be detected."
That's the suggestion offered in your comment that I praised as worthwhile. You've made it sound as if those studies have yet to be done, and I can only wonder why not.
"Jensen’s paper directly addresses your earlier point on IQ measurement being merely indirect. If you don’t think it addresses your point, perhaps it's because your definition of intelligence is esoteric (i.e., IQ can’t be measured well because IQ designs are too restrictive), which I strongly suspect given your first message (and the second reinforcing my impression)."
I already told you, I have no time for a discussion centering on the writings of another author. Don't you have enough thoughts of your own?
I don't find anything "esoteric" about my criticisms of the Spearman g hypothesis and IQ testing (readers can find them outlined in various posts on my Substack page.) The empirical approach yields much higher quality insight than the Theory I've read on the subject. The empirical approach is based on close reading of history and biography, acquaintance with the arts, thick description, detailed observation, and personal experience. The approach that emphasizes Theory concerns itself primarily--arguably exclusively--with metrics. That's a terribly remote way of assessing qualities like intelligence abilities. Another one of its principal weaknesses is the requirement of the numerical quantification of qualities that are defined verbally, and often only in rough outline, or silhouette. Metaphorically speaking.
"As for your new question, “Does extended personal interaction…”
This can be easily checked by entering such a predictor variable using proxy like specific knowledge along with IQ in a typical regression"
I've always preferred the more straightforward approach, of actually interacting personally with other humans on an individual basis. Interact with enough of them, and it's possible to build quite a database.
Your approach is superior to mine only in the respect that it's much more amenable to numerical plotting and graphing. By any other standard, it's risible. I mean, have another look at that last sentence in quotes, above.. And you accused me of "esotericism"? In its own way, your pronouncement is as jargon-laden and empty of insight as the most determinedly abstruse postmodern critical theory. The emphasis is different, of course-- the verbal obfuscation of pomo CT is overwhelmingly Rhetorical, whereas your Metrics-insistent approach partakes of the hollow buzzword techspeak associated with mechanistic formulas intended to generate "hard data" numerical representations of vaguely defined verbal abstractions. Yes, vaguely defined verbal abstractions. Example:
"If it’s more important than IQ at predicting important life outcomes"
"Predicting important life outcomes..." Well, that sounds like a tall order to me, pardner. Arguably outright pretentious. But for the purpose of discussion, I'll take a look at the authoritative definition of that phrase that's printed in your textbook, just the same. Assuming that you can supply it.
"If intelligence is imperfectly measured due to being narrowly defined (which is doubtful given available evidence)"
Wait. Hold on. You're asserting a turf claim as if it was a fact claim. truism. As a truism, with no evidence offered in support. (As if I haven't read what passes for evidence, to uphold claims of that sort.)
"then it only means its current predictive validity is lower than what it should be"
You've just allowed yourself a lot of climb-down room with that phrasing. After all, "predictive validity lower than it should be" logically extends down to None, in principle. Zero. Even if the validity is merely Very Low, that should tell you something. But not about the test subject(s).
"If on the other hand, intelligence cannot be measured at all, then the anti-IQ crusaders should start explaining why all the correlates are consistent with the idea that g is causal and real and expected by theory"
I agree with the position that some intelligence-related abilities are amenable to measurement (although idiosyncratic variables and influences proximate to the act of taking the examination can influence the results; there's a default assumption that everyone taking an IQ test is at their sharpest--ready, willing, and able to give their peak performance, like an Olympic trial. But wouldn't you know, science marches on, and factors that might put someone off their game are increasingly being studied.) But I'm more intrigued by researches into specific components (long-term memory, short-term numerical recall, frustration tolerance) associated with strengths and weaknesses of the various intelligence abilities than I am with some claim to have found a Secret Substrate of quantifiable Abstract Reasoning Acuity that provides the Master Key to unlock all other Intelligence abilities.
"You said, “Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity.”
This is not comparable. In your situation, you argue that there is no single correct answer, while in IQ, there is a single correct answer. Furthermore, multiple correct answers for creativity items can be achieved. You just have to change the design to allow for possible, multiple correct answers."
I'm intrigued by your assertion, given that I've never encountered an IQ test that allowed more than one correct answer. But even if a test is designed to allow for more than one correct answer, the correct answers--plural--are still pre-ordained.
"If you require more extensive tests, then have a battery measuring non-overlapping dimensions, and check whether high score in one yields high score in another, i.e., whether creativity in one type “transfers” to another. Along with correct responses, you can check patterns of wrong responses. Among all wrong answers, some are more wrong than others, and some response checks are more common than some others. By looking at the frequency of each wrong answer and frequency of each good answer, you get a measure of “originality” by using the rarity of response as a proxy. To validate it, you will obviously need to check the loading of each of the responses and response patterns."
You seem to think you can enclose creativity, and then quantify it and rank it--like a prize for "creative" maze-running, with the prize given for an assessment of the fanciest footwork pattern rather than linear speed. Prizes awarded according to the Judges, of course.
"As for nonlinearity, IRT can model it very well." IRT doesn't model "nonlinearity". It's able to model degrees of complexity and difficulty, and run a comparator scan on it. But even an exponential level of increased difficulty still amounts to a progression on a one-line track. Porbably beneficial for incorporating into a knowledge exam like the GRE. Considered as a way of modeling human attitudes and other complex behaviors, it's a scale built on sand. To be kind. Convenient for bureaucracy.
"People who aren’t content with the actual measurement should have a better proposition of measurement, therefore a testable one at that."
That's hardline dogmatic Postivism, which is one of those rigid idealist paradigms that takes itself so seriously that it doesn't realize how absurd it is. I'm not a hardline dogmatic Postivist. No one with any ordinary good sense is, in my opinion. Granted, I have no way to plot that view on graph.
"If your point is that you can’t measure creativity or [the one true source of all Intelligence worth caring about, presumably], then your hypothesis can never be tested."
I'm not interested in asserting a hypothesis intended to quantify Creativity Quotient. Do you actually think it's possible to walk into an art gallery and use a set of standardized criteria to assign quantifiable values to the amount of creativity expressed in each of the works, either as a linear score or a multivariate assessment yielding a summed result?
If you don't think that such a quantification is objectively possible to construct, does that mean that Creativity doesn't exist?
If you do think such a quantification is possible to construct in a way that yields objectively coherent findings that support impartially evaluative conclusions, I invite you to provide an example.
I will only answer the parts where you actually provided some argument. Because your comment is almost void of content. Next time I won’t even bother to answer.
> “Don't you have enough thoughts of your own?”
What a tackle! Beware, I bite people. And I bite hard.
In your first comment you said “grip strength is determined by quite straightforward testing process to yield an unambiguously quantifiable result” while IQ indirectly measures intelligence. Jensen’s paper admits that psychometric tests are merely rank-order measures but not ratio scale measures (while RT are actually ratio scale measures). He argues that “Intelligence is the periodicity of neural oscillation in the action potentials of the brain and central nervous system (CNS)” and that “The periodicity and oscillation of electrical potentials in the CNS, commonly called ‘brain waves,’ is an established phenomenon. Reliable correlations between specific identified brain waves and measures of psychometric g have only recently been reported.” Here RT is used as a proxy for such a measure. You might still criticize it as being yet another proxy, rather than a “real” measure. Yet EEG correlates with IQ, and not only that but it is the complexity of the EEG (rather than the frequency) that correlates best with IQ. Exactly what Jensen expected.
> “But even if a test is designed to allow for more than one correct answer, the correct answers--plural--are still pre-ordained.”
To let you know, I have no patience with trolls. They don’t last long here. Let me ask you. Are you trying to argue there is no objective correct answer to an IQ item, such that the correct answer depends on the test taker, such that there is no way to objectively define what is the correct answer?
> “You seem to think you can enclose creativity, and then quantify it and rank it”
What a dumb way to put it. As I noted above, you lean very close to pseudo science by insinuating creativity cannot be measured. A non testable hypothesis is a pseudo science. Don’t make me repeat this.
And once again you are not even trying to answer at all. You don’t seem to understand the flexibility of statistical analyses. It gets bigger as creativity runs wild. You repeatedly argue that creativity is so complex that it is almost impossible to measure it, yet you fail to realize that statistical analyses are also creative methodologies, yielding many direct and indirect (which deals with the “arbitrariness” of the measure) assessments to cope with the complexity of creativity measurements. A particular analysis is meant to test a given hypothesis. As there are multiple ways one could appreciate and evaluate creativity, there are potentially as many ways to measure this creativity. You can have a “restrictive” test with a number of responses (e.g., 6) but varying the number of correct responses (2, 3 or 4). Or you can have a free test in which you give a paperclip and an object, and assess the number of novel uses a participant can generate with this paperclip and even assess whether any novel use has a significant impact, as evaluated by peers or experts, etc.
This achievement is not reduced to a single value. As explained, complex psychological behaviors/outcomes are typically multidimensional. If you want to evaluate the ability to apply one’s knowledge to other domains, you can compare how the change in one dimension (each corresponding to a theoretically-driven construct) impacts other dimensions. If you want to assess novelty, you examine the rarity of correct responses and rarity of wrong responses, weighted by response time if needed. If there are more critical aspects of creativity one needs to consider, then add those.
Even some simple correlational analyses can tell you a lot. If the creativity measure does not reflect creativity well, it shouldn’t correlate well with peer-report evaluation of someone’s creativity. And there are so many ways to evaluate how people appreciate someone’s creativity.
> “IRT doesn’t model “nonlinearity”.”
I’m reaching my limit. Next post with such nonsense, and I will ban you. I hate wasting my precious time with people who don’t deserve it. Learn about IRT before talking nonsense.
If by one-line track, you are referring to unidimensionality, IRT can handle multidimensionality. Again, learn IRT before talking.
> “Do you actually think it’s possible to walk into an art gallery and use a set of standardized criteria to assign quantifiable values”
As I said, if you had any basic knowledge on statistics, you know it is quite easy to correlate any measured creativity variable with peer-reported evaluation of creativity and creative achievement, expert judgements, psychophysiological responses (brain activity when viewing art), and then compare the predictive validity of these creativity variables.
If you are not satisfied with the criteria, again name which critical aspect has been ignored, and then it will be incorporated in the latent scores.
Of course, I can see a counterpoint coming. What if people can’t recognize its creative value? What about Picasso’s art not being appreciated by some of his peers? But this point applies equally well to non-statistical assessment of fine art. It’s a temporal and cultural bias. Not statistically related.
"Maybe, as I suggested, you should read Jensen’s book first."
What makes you think I haven't? I told you, I'm not interested in shifting the focus of the conversation to the thoughts of a third party. Readers can refer to the summary you included in your comment. I'm moving on.
"...But really, I don’t think you understand how much information you can obtain from statistical analyses..."
You have no way of knowing that.
I have some idea of how much information cannot be obtained be statistical analyses. While you apparently find the very idea that some phenomena might be wholly or partly impenetrable to quantification and statistical analysis to be blasphemous.
"All of the stuff you mention, about not being not measurable etc., can be detected using the appropriate methods."
Your vague allusion to "appropriate methods" is a mumble. A hand-wave.
To repeat my challenge: If you do think such a quantification is possible to construct in a way that yields objectively coherent findings that support impartially evaluative conclusions, I invite you to provide an example.
"You said, “I’ve gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them.”
It’s the same problem with people who aren’t content with the actual measurements. They always complain but they never have a solution..."
Well, that isn't so. I've already mentioned the recent increase in studies that include research into some of those factors, which were formerly taken for granted. I welcome the findings. If the studies find that a particular influence is null, so be it.
"Another issue with such an obsession with confounders is the known sociologist fallacy. You don’t want to include all possible “confounders” just because they appear like so, but because you believe they are causal."
Yes. I agree that relevance is important. The decision about whether they're important enough to be examined is a decision of the researcher, of course, and also subject to budget constraints. I happen to think that there are some variables that haven't been given sufficient attention. Listing them and outlining my reasoning is a topic that deserves more thought, and it's a digression I don't want to include in an already lengthy comment exchange. .
"About your blog post, “improvement of up to 2 standard deviations improvement with RPM over the WISC, by verbal autistic test takers--for test supposedly measuring the same quality, ‘g’, general intelligence”
That definitely proves what I suspected. You don’t have the basic knowledge on IQ. RPM is not the gold standard first, unlike what you claimed."
Support for my claim: "Raven’s Progressive Matrices remains a gold standard in nonverbal intelligence testing due to its emphasis on fluid intelligence and culture-fair design." https://www.cogn-iq.org/rpm-guide.php#heading-conclusion
From the Summary provided by Richard Lynn and Tatu Vanhaven, for their 2006 study "Intelligence and the Wealth and Poverty of Nations": "National IQs assessed by the Progressive Matrices were calculated for 60 nations and examined in relation to per capita incomes in the late 1990s and to post World War Two rates of economic growth." http://www.rlynn.co.uk/index.php?page=intelligence-and-the-wealth
"Intelligence and the Wealth and Poverty of Nations" is one of the most widely cited studies in the field of intelligence research. It's certainly the one that's been most widely referenced on Substack, most often by writers who assume that its conclusions are authoritative. The summary confirms that much of that data is based on the RPM test.
I don't view the RPM as a probative measure of "general intelligence", but it's undeniable that some people do.
"Moreover, WISC is not unidimensional, which is your mistaken assumption."
I've assumed no such thing! My observations on Raven's vs. the WISC can be found in the posts on the topic of intelligence testing found on my Substack page. It's obvious that RPM is the unidimentional test.*
I've always felt that Weschler tests like the WISC and WAIS have superior value in assessing facility with knowledge concepts and scholastic skills; those abilities are the ones that count in modern literate societies and technological civilizations, after all. And yes, I realize that the Weschler exams include matrix block diagrams similar to RPM. The RPM alone is comparatively narrow in its assessments--although this limitation doesn't seem to have gotten its due consideration until the disparities in performance between RPM and WISC by test subjects with verbal autism were brought to the awareness of researchers. I give tests like the WAIS a lot of credit for recognizing that cognitive ability has multiple components, and that there's a lot of individual variance in regard to specific strengths and weaknesses. The plain fact is that--quite unlike Arthur Jensen and yourself-- Weschler did not adhere to the original concept of Spearman's g!
"American psychologist David Wechsler developed a new tool due to his dissatisfaction with the limitations of the Stanford-Binet test (Cherry, 2020).
Like Thurstone, Gardner, and Sternberg, Wechsler believed intelligence involved many different mental abilities and felt that the Stanford-Binet scale too closely reflected the idea of one general intelligence.
Because of this, Wechsler created the Wechsler Intelligence Scale for Children (WISC) and the Wechsler Adult Intelligence Scale (WAIS) in 1955, with the most up-to-date version being the WAIS-IV (Cherry, 2020)..." https://www.simplypsychology.org/intelligence.html
"This study examines the question of whether or not average Full Scale IQ (FSIQ) differences between groups that differ in their academic level can be attributed to g, because IQ results from g plus a mixture of specific cognitive abilities and skills...The results support the conclusion that the Wechsler FSIQ does not directly or exclusively measure g across the full range of the population distribution of intelligence. There is no significant association between the scientific construct of general intelligence (g) and the differences in intelligence in general (IQ) assessed by the WAIS-III..." https://www.sciencedirect.com/science/article/abs/pii/S0160289602001228
"Wechsler never lost sight of the limitations of his intelligence tests. Although his tests often are interpreted as a clear measure of intelligence, Wechsler himself believed that they were useful only in conjunction with other clinical measurements. To Wechsler, assessments were far superior to mere testing..." https://psychology.jrank.org/pages/650/David-Wechsler.html#google_vignette
David Weschler, 1958: "I have...become increasingly convinced that intelligence is most usefully interpreted as an aspect of the total personality. I look upon intelligence as an effect rather than a cause, that is, as a resultant of interacting abilities—nonintellective included. The problem confronting psychologists today is how these abilities interact to give the resultant effect we call intelligence. At this writing it seems clear that factorial analysis alone is not the answer..." https://psycnet.apa.org/fulltext/2006-09607-000-FRM.pdf
The value of Weschler tests has shifted back and forth over time; for a while in the 1990s, it was viewed as "atheoretical". For a while, RPM seems to have been viewed as a more reliable measure. But the Weschler tests continue to make minor adjustments, and the newest versions are presently considered the most comprehensive and well-rounded assessments.
That said, the WISC and WAIS cannot be viewed as examinations intended to probe and rank some innate biological substrate of intelligence in the test subjects.
So why do you and the rest of the agenda-laden "HBD researcher" crowd continue to insist on the paramount importance of Spearman's g, and Arthur Jensen's insistence that it's biologically determined foundation of intellectual abilities that's largely inherited? No one has yet traced any specific etiology for the generation of human mental capacities and their means of functioning. Weschler speculated about "field effects". An elusive process to map and label. Why can't you simply be satisfied with the WISC and WAIC as approximate measures of the activated skill sets of the majority of test subjects, rather than insisting that in every case they must be measuring the limit of potential of intellectual ability in a given individual?
[* re: Raven's vs. the Weschler WAIC: I've just found a quick and dirty aggregated comparison of the possible variance in results between the two tests on an open-source psychometrics page. The results of the comparison that was performed indicated that the "weighted average of all the correlations found in the literature review between the WAIS and RPM was 0.67.
If we assume that this is the true value and that IQ scores are normally distributed like they are supposed to be, the average expected difference between the two scores an individual would receive if they took both is 9.7 points..." Huge, if true. https://openpsychometrics.org/info/wais-raven-correlation/ ]
"if you had any basic knowledge on statistics, you know it is quite easy to correlate any measured creativity variable with peer-reported evaluation of creativity and creative achievement, expert judgements, psychophysiological responses (brain activity when viewing art), and then compare the predictive validity of these creativity variables."
Thank you for stating your position in regard to Creativity so clearly and succinctly. Some of the readers in the vast Substack Notes audience will undoubtedly find your observation edifying.
Regarding the association between autism and family achievement in the field of engineering, is this controlling for the fact that those who have higher SES might also have a higher average parental age? While it is true that autism is heritable, it may also be epigenetic. The genes for autism may lie dormant or latent in a population, and only become expressed or activated due to late parental age. When the genes for autism remain dormant, they may express themselves in positive ways, but when they become expressed due to reproductive damage from parental aging, they are deleterious.
Autism is not simply a heightened level of testosterone, but also an atrophy of social functioning. Otherwise, a person on steroids would become autistic. Autism could also be thought of as sickle cell anemia, where certain expressions of a gene are positive, but other expressions are deleterious.
Good point. I had to look at multiple papers I have cited in that article. And indeed, I couldn't find where they said they accounted for father's age or parent age.
smart observations, DLA. Other than the possibility for reader confusion from implying the existence of a single traceable "autism gene", similar to sickle cell disease. I doubt that was intended, but some readers might get the wrong idea. So, to review:
The red blood cell mutation known as 'sickle cell' that leads to the vulnerability to its signature form of anemia is due entirely to one gene mutation of hemoglobin, which allows for ready identification, sure diagnosis, and amenability to being modified with currently extant gene technologies. Transmitted entirely be hereditary means.
Unlike sickle cell anemia, "autism" is not one easily identifiable malady with a clearly outlined differential diagnosis. Autism is a chronic condition, a cluster of symptoms of varying degrees of severity or intensity that are expressed with a high quotient of idiosyncrasy, without significant correlation between the mental and emotional disruptions and tropisms of the symptom cluster and visible physical trait markers (such as those often found in connection with many common hereditary maladies and disease syndromes.)
The etiology of autism is currently unknown. No single clearly identifiable gene has been found to provide a sure diagnosis of "autism" the way one has been identified to cause the sickle cell mutation, or the genetic disease phenylketonuria, etc.
Mystery that it is, autism etiology is now a hot topic of research--one that appears to be developing fruitful leads, to judge from recent research findings. Very recent ones, in fact; just found this report earlier this evening. "Hot off of the presses", as the saying used to go: https://www.princeton.edu/news/2025/07/09/major-autism-study-uncovers-biologically-distinct-subtypes-paving-way-precision
"...The study defines four subtypes of autism — Social and Behavioral Challenges, Mixed ASD with Developmental Delay, Moderate Challenges, and Broadly Affected. Each subtype exhibits distinct developmental, medical, behavioral and psychiatric traits, and importantly, different patterns of genetic variation.
1) Individuals in the Social and Behavioral Challenges group show core autism traits, including social challenges and repetitive behaviors, but generally reach developmental milestones at a pace similar to children without autism. They also often experience co-occurring conditions like ADHD, anxiety, depression or obsessive-compulsive disorder alongside autism. One of the larger groups, this constitutes around 37% of the participants in the study.
2) The Mixed ASD with Developmental Delay group tends to reach developmental milestones, such as walking and talking, later than children without autism, but usually does not show signs of anxiety, depression or disruptive behaviors. “Mixed” refers to differences within this group with respect to repetitive behaviors and social challenges. This group represents approximately 19% of the participants.
3) Individuals with Moderate Challenges show core autism-related behaviors, but less strongly than those in the other groups, and usually reach developmental milestones on a similar track to those without autism. They generally do not experience co-occurring psychiatric conditions. Roughly 34% of participants fall into this category.
4) The Broadly Affected group faces more extreme and wide-ranging challenges, including developmental delays, social and communication difficulties, repetitive behaviors and co-occurring psychiatric conditions like anxiety, depression and mood dysregulation. This is the smallest group, accounting for around 10% of the participants.
“These findings are powerful because the classes represent different clinical presentations and outcomes, and critically we were able to connect them to distinct underlying biology,” said Aviya Litman, a Ph.D. student at Princeton and co-lead author..."
-----
That improvement in the ability to outline four subtypes of autism is worthwhile in itself, but the study also addressed causative factors. Speculation has been rife for decades about the relative importance of inherited genetic factors versus epigenetics influence (i.e., matters of gene expression influenced by biological/chemical influences in prenatal or early childhood development.) The Princeton study found only one autism subtype out of the four--Mixed ASD with Developmental Delay [19% of total]-- that was strongly correlated with a particular and identifiable parental genetic inheritance (albeit not definitively traced). By contrast, the "children in the Broadly Affected group [10% of total] showed the highest proportion of damaging de novo mutations — those not inherited from either parent." (To me, that bespeaks toxic exposure, severe autoimmune reaction, and the likelihood of proximal contact with chemical/biological influences in the surrounding environment that can heavily influence epigenetic expression, often in deleterious ways. I find that worrying. A particularly ominous mystery.) Meanwhile, 71% of the total ASD population falls into the two remaining categories, where correlative clues to causation remain unclear.
another quote from the Princeton article:
"The team also found that autism subtypes differ in the timing of genetic disruptions’ effects on brain development. Genes switch on and off at specific times, guiding different stages of development. While much of the genetic impact of autism was thought to occur before birth, in the Social and Behavioral Challenges subtype — which typically has substantial social and psychiatric challenges, no developmental delays, and a later diagnosis — mutations were found in genes that become active later in childhood. This suggests that, for these children, the biological mechanisms of autism may emerge after birth, aligning with their later clinical presentation..."
The SPARK team that includes the authors of the study isn't just about one research paper. SPARK is a long-term project reviewing a massive population sample. Family-oriented. Impressive. Everyone should read the whole article, and click on some of its hyperlinks.
~~
Also, this is an example of a well-done metastudy: https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins.2024.1514678/full
To reiterate my criticism in an earlier post about the limitations of the Metastudy concept and how metastudies can be misused, even the best ones cannot be taken as conclusive. To its credit, the linked metastudy doesn't even try. If I were a researcher, I think I'd like it a lot, for the clues it develops as the most promising avenues for future research. As I said earlier, metastudies are for offering good starting points, not asserting fiat conclusions. The graphics are eays to read, and though provoking. It's easy to follow back to the primary source studies included in the aggregation. The metastudy doesn't pretend to think for the human researcher. It's more like a search engine with a centrifuge function. Metaphorically, of course.
"What appears paradoxical given that autism is characterized by below-average IQ can be resolved under the hypothesis that autism involves enhanced, but imbalanced, components of intelligence. Crespi reviews several studies suggesting that increased local brain connectivity in autism is linked with specific enhanced abilities such as hyper-sensitivities and attention to detail but that comes at the cost of reduced long-range brain connectivity which could contribute to such imbalances by reducing general intelligence. Autism is indeed the only psychiatric condition characterized by notable rates of savant skills (Treffert, 2009), which account for their highly limited range of enhancements.
Another study consilient with IQ research found that autistic people had higher SD in IQ. They are are 12 times more likely to score within the intellectual disability range but also 1.5 times within the superior range (Billeiter & Froiland, 2022)."
The autism spectrum is only one trait cluster within a wider set of "non-neurotypical" features thus far found to appear in a significant fraction of the human population. Most of the research is very, very recent. Simply when considering the presentation of autism, beyond the simple interpretive estimates of low-functioning and high-functioning, verbal and nonverbal, there may be even more subsets to be identified. For example, the "idiot savant" manifestations of autism may have specific neurological pathways that strengthen those uncanny specialized abilities, or they may be enabled by the innate weakening of other brain capacities. (Or both. Or maybe it's a field effect. Etc.)
Some other forms of "non-neurotypical" variation include the attention deficit disorders; dyslexia; patterning and looping behaviors like OCD; variations in short-term memory recall capacity and long-term memory recording (possibly including idiosyncratic variation in regard to accuracy); eidetic imagery; extraordinary spatial modeling facility (Tesla, Beethoven); etc. Autism ("verbal", not low functioning) has already been shown to be correlated with a test-dependent variation in scores on IQ tests, often to a pronounced degree. Now results of that sort are showing up in the case of diagnosed ADD individuals (although the test-dependent correlation is quite different from that associated with verbal autistic individuals.) More and more attention is being given to examining the role of non-neurotypical variation as a confounding factor on IQ exam performance, and it's a subject that's only been given concerted research attention within the last 20 years or so.
What does this say about Spearman's "g" concept--that all intelligence abilities emerge from the same substrate, posited as a general foundation amenable to linear measurement- similar to, say, grip strength?
Except for means of measurement, of course, with grip strength determined by quite straightforward testing process to yield an unambiguously quantifiable result, whereas...let's just say that IQ tests don't rely on a similarly direct approach. A grip strength test and an IQ test are identical in only one respect: they yield a measurable score that's amenable to linear ranking.
Yet the result of an IQ test is somehow considered similarly precise and accurate--and considered much more significant than the result of a grip strength test, both in terms of its probative value both for the evaluation of individuals, and populations. This, despite the fact that the only result a written IQ examination can ever offer is a probability estimate, generated indirectly on the basis of a design intended to work for the purpose of providing an interpretive conclusion.
What does it say about g? You would have to test for measurement invariance and check whether the group score difference reflects the same construct. You could also test for the strength of positive manifold and check whether the g loadings are lower in the autism group.
On measurement, you might want to read this paper:
https://arthurjensen.net/wp-content/uploads/2022/12/The-theory-of-intelligence-and-its-measurement-Arthur-Jensen-2011.pdf
I can't help but wonder whether the gist of my comment has eluded you.
Does extended personal interaction with other human beings ever count as an acceptable means of evaluation to yield fruitful evidence, regarding this topic of research? Such a methodology has the advantages of allowing for considerably more depth and breadth of inquiry than a written exam, along with allowing for considerably more evaluative subtlety.
Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity. No written exam can measure creativity, because the correct answers are all plotted by the designers of the test in advance! Without the unexpected leap, how is one to detect creative synthesis, much less measuring it as a linear capacity? Or doesn't impromptu resourcefulness count? Wit and humor also strongly rely on the presence of the unpredictable twist. Doesn't that count as a feature of intelligence, despite its resistance to being summed and plotted on a graph?
And no, psychological research isn't a field where I'm content to defer entirely to the Appeal to Authority. I read that Jensen paper already. As with most psychological theory papers written to favor Spearman's g hypothesis, it's a question-beg. The validity of the premise is assumed, along with the narrowness of the criteria held to determine it and measure it.
Although references are certainly welcome, I'm not interested in entering into a discussion that revolves around the writings of a third party, or an extended elaboration on them. I'm here to discuss your ideas, not theirs. Your suggestions interest me; they might yield some worthwhile findings. I can follow the logic of your proposals. But your reference to "checking the g loadings" indicates that you've already assumed the validity of the g premise, and I've been contending that the g premise is to be skeptically examined, not assumed. Your assumption has a way of foreclosing other possible interpretive frames.
I also have to mention a recurrent problem I've found in the field of intelligence testing research, of insufficient attention to possible confounders. I've gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them. I'm gratified to notice the recent (<20 years) focus on examining the impact of neurotype variation on IQ test score results; that's the first broadening of the inquiry in decades. The psychology of human behaviors is a field where confounders happen to abound--measurable confounders, at that. So many of them, in fact, that I recognize the difficulty of including them all in one study. But it should be recognized that the way forward in the field is to broaden the inquiry to account for the role played by as yet unmeasured confounders (null, or measurably significant) and their importance (relative impact on scores.) I'm not implying an endless rabbit hole. If the usual run of human intelligence studies includes five variables to sort cohorts and subsets, adding another five would supply another level of insight. Even if the factors examined were shown to have no importance, the findings would be on the record.
I realize that assessing a wider range of variables and then controlling for their importance complicates the studies. But it's the only way to accumulate a base of findings that qualify as Authentic Big Data. In regard to evaluating levels of human intelligence abilities, we don't have Authentic Big Data yet. The default of reifying Raven's Progressive Matrices as some all-purpose reductive distillation process that supplies the gold standard of intelligence measurement worldwide in accordance with the g hypothesis is not to be confused with a position based on Authentic Big Data.
Metastudies per se are not to be confused with Authentic Big Data, either. Although I'm getting the unsettling impression that they're increasingly viewed that way, simply due to their quantitative aspect, of aggregating a great many studies--and hence a vast amount of data, albeit not all that thoroughly evaluated by the researchers. Metastudies are carried out largely for the convenience of the researchers. As such, they need to be considered as starting points, and not as authoritative conclusions. Metastudies per se don't even work reliably as reinforcement for conclusions. Some of the ones I've reviewed appear to function more like exercises in confirmation bias. Metastudies aggregate by skimming and scanning. There's no Thought involved. To the extent that they winnow, they do so in accordance with the critieria pre-selected by the human researchers (the actual thinkers in the process, who are also responsible for evaluating the probity of the conclusions.) That's a process that can be done well or poorly, but it isn't to be confused with analytical depth.
There's a crucial difference between the unprecedentedly new access to vast databases and Authentic Big Data. Vast troves of information that all ask the same few questions the same ways to achieve a continually reinforcing loop of the same results simply do not constitute a robust set of considerations. The mere existence of a lot of data is not to be confused with Authentic Big Data.
The Collateralized Debt Obligations that imploded in the 2008 mortgage collapse bundled a large amount of data, too. I don't think enough attention has been paid to the similar liability latent in Metastudies: bundling preliminary research, studies reliant on slipshod methodology, etc., and then certifying the aggregated product as AA, so to speak. Big Data implies the ability to supply cogent answers to relevant questions that no one has yet asked. When financial analyst Vincent Daniel began to ask his own questions about CDOs, his in-depth analytical scrutiny revealed that many of those financial products lacked validity. I'm finding similar problems with some of the metastudies I've read, including studies asserting conclusions about "national intelligence" that purport to represent the state of the art in intelligence research.
As you see, I have my own thoughts about the study of human intelligence. Those, I'm willing to argue. I've published several essays on my Substack page, the first of which I linked upthread. The others can be easily found by navigating my page.
I welcome detailed criticisms of the observations and inferences I've made to guide the formation of my opinions. Thus far I haven't received any replies of that sort. As a result, I find myself in peril of embracing a false self-assurance about my views. Could it be that my insights are that ironclad, and my views that flawlessly stated?
Here's a different, shorter post I've published on the topic https://adwjeditor.substack.com/p/on-chris-langan-at-al-and-iq
You said, “I can’t help but wonder whether the gist of my comment has eluded you.”
On my end, I see you didn’t understand my point at all.
With measurement invariance testing, cognitive imbalances as well as other heterogeneity between groups will be detected. And Jensen’s paper directly addresses your earlier point on IQ measurement being merely indirect. If you don’t think it addresses your point, perhaps it's because your definition of intelligence is esoteric (i.e., IQ can’t be measured well because IQ designs are too restrictive), which I strongly suspect given your first message (and the second reinforcing my impression). If you have time, I highly suggest you read Jensen’s The g Factor, at least chapters 4, 5, 8 and 10.
As for your new question, “Does extended personal interaction…”
This can be easily checked by entering such a predictor variable using proxy like specific knowledge along with IQ in a typical regression. If it’s more important than IQ at predicting important life outcomes, of it adds predictive value above IQ, then you would be able to see it. You can also use factor analyses, with several cognitive domains and several specific knowledge tests, to check whether these would yield different factors.
If intelligence is imperfectly measured due to being narrowly defined (which is doubtful given available evidence), then it only means its current predictive validity is lower than what it should be. If on the other hand, intelligence cannot be measured at all, then the anti-IQ crusaders should start explaining why all the correlates are consistent with the idea that g is causal and real and expected by theory (see below).
You said, “Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity.”
This is not comparable. In your situation, you argue that there is no single correct answer, while in IQ, there is a single correct answer. Furthermore, multiple correct answers for creativity items can be achieved. You just have to change the design to allow for possible, multiple correct answers. If you require more extensive tests, then have a battery measuring non-overlapping dimensions, and check whether high score in one yields high score in another, i.e., whether creativity in one type “transfers” to another. Along with correct responses, you can check patterns of wrong responses. Among all wrong answers, some are more wrong than others, and some response checks are more common than some others. By looking at the frequency of each wrong answer and frequency of each good answer, you get a measure of “originality” by using the rarity of response as a proxy. To validate it, you will obviously need to check the loading of each of the responses and response patterns.
As for nonlinearity, IRT can model it very well.
You said, “And no, psychological research isn’t a field where I'm content to defer entirely to the Appeal to Authority.”
People who aren’t content with the actual measurement should have a better proposition of measurement, therefore a testable one at that. If your point is that you can’t measure creativity or IQ, then your hypothesis can never be tested. It’s a sort of pseudo science, because it’s made irrefutable by definition.
You said, “But your reference to “checking the g loadings” indicates that you’ve already assumed the validity of the g premise”
Before we go further, you should read this article:
https://humanvarieties.org/2013/04/03/is-psychometric-g-a-myth/
It shows several interesting things. One of which is that several psychologists devised their intelligence tests based on the assumption that g is a myth. They never meant for the abilities to correlate, yet they always do. The same error is observed from Gardner’s supposedly multiple, uncorrelated intelligences, which turned out to be correlated. It shows therefore that psychologists did try their best to use alternative tests that were by design not related with g, yet statistically they were not different than g. Thus, IQ measures intelligence but not because it’s “by design”. To illustrate, consider what Jensen said here: “Another puzzle in terms of sampling theory is that tests such as forward and backward digit span memory, which must tap many common elements, are not as highly correlated as are, for instance, vocabulary and block designs, which would seem to have few elements in common. Of course, one could argue trivially in a circular fashion that a higher correlation means more elements in common, even though the theory can’t tell us why seemingly very different tests have many elements in common and seemingly similar tests have relatively few.”
Maybe, as I suggested, you should read Jensen’s book first. He explains this well why g is a valid construct. Functional deficiencies are isolated from the person’s total repertoire of abilities. That deficiency in ability A does not affect ability B, is a strong suggestion that g is causal. In fact, g measured differently are still highly correlated, whether using psychometric or chronometric test, the g loading of a psychometric test is highly correlated with chronometric test (purported to measure basic processing speed). More generally, when you examine the complex correlational network in which g is a major node, you see a pervasive impact of g on the varying, lifetime outcomes. As Gottfredson noted, with respect to health outcomes, “As gambling houses know well, even small odds in one’s favor can produce big profits in the long term when they remain consistently in one’s favor and other influences are more erratic. Information processing is involved in all daily tasks, even if only to a minor degree, so higher g always provides an edge, even if small.” Not only does IQ testing behave as expected if it truly measures intelligence but g is causal. Surely, if it’s not an accurate measure, it wouldn’t predict outcomes as expected by theory. And I haven’t started commenting on the biological reality of g yet, because there is a lot to say…
Obviously, g loading and g score aren’t the same thing. The g loading is a statistics showing how well each variable measures (i.e., correlates with) g. Typically, the higher the g loading, the higher the predictive validity, exactly as predicted by theory. That test complexity and the ability to decontextualize what has been learned form the core of g and its predictive value. This strengthens the point that g is not an artefact. There are many ways to check its validity. You can use stratified samples with varying background levels and see if the g-loadings of each non-g factor vary across these levels. Nonlinear factor analyses can offer some relevant information too. You can check whether g loadings vary across g score levels. If you suspect high g score individuals exhibit lower g loadings, due to external factors like schooling or other IQ-relevant environments, then this would be detectable. For instance, the fact that g loadings are inversely correlated with test-retest gains is consistent with the idea that g is the causal ingredient. This is predicted by theory: the test loses its g loading as practice continues.
But really, I don’t think you understand how much information you can obtain from statistical analyses. All of the stuff you mention, about not being not measurable etc., can be detected using the appropriate methods.
You said, “I’ve gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them.”
It’s the same problem with people who aren’t content with the actual measurements. They always complain but they never have a solution. If you have a better proposition, stop complaining and make some new, measurable variables that are not overlapping with the actual ones being in current use. Then we will test your variables and check whether it has added value. Some people tried this before, and they used their new variables, but after thorough evaluations, the conclusion is that g is still a very strong predictor of life outcomes.
Another issue with such an obsession with confounders is the known sociologist fallacy. You don’t want to include all possible “confounders” just because they appear like so, but because you believe they are causal.
You said, “Metastudies are carried out largely for the convenience of the researchers ... There’s no Thought involved”
If you didn’t know that, meta analyses are conducted to check for patterns. That there’s no thought is oversimplistic. While it doesn't replace theory and design, meta-analyses almost always involve the use of moderators, especially, theoretically-driven moderators.
About your blog post, “improvement of up to 2 standard deviations improvement with RPM over the WISC, by verbal autistic test takers--for test supposedly measuring the same quality, ‘g’, general intelligence”
That definitely proves what I suspected. You don’t have the basic knowledge on IQ. RPM is not the gold standard first, unlike what you claimed. Moreover, WISC is not unidimensional, which is your mistaken assumption. Using CFA to specify either Bifactor or Hierarchical g will show you RPM clustering with fluid reasoning tests, not verbal tests. And it will allow you to see whether g is affected by this imbalance in score difference between groups, just like I said before using measurement invariance testing.
By saying in your post that nonverbal is equivalent to culture free, you are not seeing what cultural bias is all about. It's the difference in exposure to the cultural content elicited by the test, not its cultural content (i.e., load). In fact, you can check it using, like I said, measurement invariance. Culture load is not the same as culture bias.
Meng Hu: "On my end, I see you didn’t understand my point at all.
With measurement invariance testing, cognitive imbalances as well as other heterogeneity between groups will be detected."
That's the suggestion offered in your comment that I praised as worthwhile. You've made it sound as if those studies have yet to be done, and I can only wonder why not.
"Jensen’s paper directly addresses your earlier point on IQ measurement being merely indirect. If you don’t think it addresses your point, perhaps it's because your definition of intelligence is esoteric (i.e., IQ can’t be measured well because IQ designs are too restrictive), which I strongly suspect given your first message (and the second reinforcing my impression)."
I already told you, I have no time for a discussion centering on the writings of another author. Don't you have enough thoughts of your own?
I don't find anything "esoteric" about my criticisms of the Spearman g hypothesis and IQ testing (readers can find them outlined in various posts on my Substack page.) The empirical approach yields much higher quality insight than the Theory I've read on the subject. The empirical approach is based on close reading of history and biography, acquaintance with the arts, thick description, detailed observation, and personal experience. The approach that emphasizes Theory concerns itself primarily--arguably exclusively--with metrics. That's a terribly remote way of assessing qualities like intelligence abilities. Another one of its principal weaknesses is the requirement of the numerical quantification of qualities that are defined verbally, and often only in rough outline, or silhouette. Metaphorically speaking.
"As for your new question, “Does extended personal interaction…”
This can be easily checked by entering such a predictor variable using proxy like specific knowledge along with IQ in a typical regression"
I've always preferred the more straightforward approach, of actually interacting personally with other humans on an individual basis. Interact with enough of them, and it's possible to build quite a database.
Your approach is superior to mine only in the respect that it's much more amenable to numerical plotting and graphing. By any other standard, it's risible. I mean, have another look at that last sentence in quotes, above.. And you accused me of "esotericism"? In its own way, your pronouncement is as jargon-laden and empty of insight as the most determinedly abstruse postmodern critical theory. The emphasis is different, of course-- the verbal obfuscation of pomo CT is overwhelmingly Rhetorical, whereas your Metrics-insistent approach partakes of the hollow buzzword techspeak associated with mechanistic formulas intended to generate "hard data" numerical representations of vaguely defined verbal abstractions. Yes, vaguely defined verbal abstractions. Example:
"If it’s more important than IQ at predicting important life outcomes"
"Predicting important life outcomes..." Well, that sounds like a tall order to me, pardner. Arguably outright pretentious. But for the purpose of discussion, I'll take a look at the authoritative definition of that phrase that's printed in your textbook, just the same. Assuming that you can supply it.
"If intelligence is imperfectly measured due to being narrowly defined (which is doubtful given available evidence)"
Wait. Hold on. You're asserting a turf claim as if it was a fact claim. truism. As a truism, with no evidence offered in support. (As if I haven't read what passes for evidence, to uphold claims of that sort.)
"then it only means its current predictive validity is lower than what it should be"
You've just allowed yourself a lot of climb-down room with that phrasing. After all, "predictive validity lower than it should be" logically extends down to None, in principle. Zero. Even if the validity is merely Very Low, that should tell you something. But not about the test subject(s).
"If on the other hand, intelligence cannot be measured at all, then the anti-IQ crusaders should start explaining why all the correlates are consistent with the idea that g is causal and real and expected by theory"
I agree with the position that some intelligence-related abilities are amenable to measurement (although idiosyncratic variables and influences proximate to the act of taking the examination can influence the results; there's a default assumption that everyone taking an IQ test is at their sharpest--ready, willing, and able to give their peak performance, like an Olympic trial. But wouldn't you know, science marches on, and factors that might put someone off their game are increasingly being studied.) But I'm more intrigued by researches into specific components (long-term memory, short-term numerical recall, frustration tolerance) associated with strengths and weaknesses of the various intelligence abilities than I am with some claim to have found a Secret Substrate of quantifiable Abstract Reasoning Acuity that provides the Master Key to unlock all other Intelligence abilities.
"You said, “Dynamic interactive analysis of that sort also allows for the opportunity to evaluate features of human intelligence like creativity.”
This is not comparable. In your situation, you argue that there is no single correct answer, while in IQ, there is a single correct answer. Furthermore, multiple correct answers for creativity items can be achieved. You just have to change the design to allow for possible, multiple correct answers."
I'm intrigued by your assertion, given that I've never encountered an IQ test that allowed more than one correct answer. But even if a test is designed to allow for more than one correct answer, the correct answers--plural--are still pre-ordained.
"If you require more extensive tests, then have a battery measuring non-overlapping dimensions, and check whether high score in one yields high score in another, i.e., whether creativity in one type “transfers” to another. Along with correct responses, you can check patterns of wrong responses. Among all wrong answers, some are more wrong than others, and some response checks are more common than some others. By looking at the frequency of each wrong answer and frequency of each good answer, you get a measure of “originality” by using the rarity of response as a proxy. To validate it, you will obviously need to check the loading of each of the responses and response patterns."
You seem to think you can enclose creativity, and then quantify it and rank it--like a prize for "creative" maze-running, with the prize given for an assessment of the fanciest footwork pattern rather than linear speed. Prizes awarded according to the Judges, of course.
"As for nonlinearity, IRT can model it very well." IRT doesn't model "nonlinearity". It's able to model degrees of complexity and difficulty, and run a comparator scan on it. But even an exponential level of increased difficulty still amounts to a progression on a one-line track. Porbably beneficial for incorporating into a knowledge exam like the GRE. Considered as a way of modeling human attitudes and other complex behaviors, it's a scale built on sand. To be kind. Convenient for bureaucracy.
"People who aren’t content with the actual measurement should have a better proposition of measurement, therefore a testable one at that."
That's hardline dogmatic Postivism, which is one of those rigid idealist paradigms that takes itself so seriously that it doesn't realize how absurd it is. I'm not a hardline dogmatic Postivist. No one with any ordinary good sense is, in my opinion. Granted, I have no way to plot that view on graph.
"If your point is that you can’t measure creativity or [the one true source of all Intelligence worth caring about, presumably], then your hypothesis can never be tested."
I'm not interested in asserting a hypothesis intended to quantify Creativity Quotient. Do you actually think it's possible to walk into an art gallery and use a set of standardized criteria to assign quantifiable values to the amount of creativity expressed in each of the works, either as a linear score or a multivariate assessment yielding a summed result?
If you don't think that such a quantification is objectively possible to construct, does that mean that Creativity doesn't exist?
If you do think such a quantification is possible to construct in a way that yields objectively coherent findings that support impartially evaluative conclusions, I invite you to provide an example.
I will only answer the parts where you actually provided some argument. Because your comment is almost void of content. Next time I won’t even bother to answer.
> “Don't you have enough thoughts of your own?”
What a tackle! Beware, I bite people. And I bite hard.
In your first comment you said “grip strength is determined by quite straightforward testing process to yield an unambiguously quantifiable result” while IQ indirectly measures intelligence. Jensen’s paper admits that psychometric tests are merely rank-order measures but not ratio scale measures (while RT are actually ratio scale measures). He argues that “Intelligence is the periodicity of neural oscillation in the action potentials of the brain and central nervous system (CNS)” and that “The periodicity and oscillation of electrical potentials in the CNS, commonly called ‘brain waves,’ is an established phenomenon. Reliable correlations between specific identified brain waves and measures of psychometric g have only recently been reported.” Here RT is used as a proxy for such a measure. You might still criticize it as being yet another proxy, rather than a “real” measure. Yet EEG correlates with IQ, and not only that but it is the complexity of the EEG (rather than the frequency) that correlates best with IQ. Exactly what Jensen expected.
> “But even if a test is designed to allow for more than one correct answer, the correct answers--plural--are still pre-ordained.”
To let you know, I have no patience with trolls. They don’t last long here. Let me ask you. Are you trying to argue there is no objective correct answer to an IQ item, such that the correct answer depends on the test taker, such that there is no way to objectively define what is the correct answer?
> “You seem to think you can enclose creativity, and then quantify it and rank it”
What a dumb way to put it. As I noted above, you lean very close to pseudo science by insinuating creativity cannot be measured. A non testable hypothesis is a pseudo science. Don’t make me repeat this.
And once again you are not even trying to answer at all. You don’t seem to understand the flexibility of statistical analyses. It gets bigger as creativity runs wild. You repeatedly argue that creativity is so complex that it is almost impossible to measure it, yet you fail to realize that statistical analyses are also creative methodologies, yielding many direct and indirect (which deals with the “arbitrariness” of the measure) assessments to cope with the complexity of creativity measurements. A particular analysis is meant to test a given hypothesis. As there are multiple ways one could appreciate and evaluate creativity, there are potentially as many ways to measure this creativity. You can have a “restrictive” test with a number of responses (e.g., 6) but varying the number of correct responses (2, 3 or 4). Or you can have a free test in which you give a paperclip and an object, and assess the number of novel uses a participant can generate with this paperclip and even assess whether any novel use has a significant impact, as evaluated by peers or experts, etc.
This achievement is not reduced to a single value. As explained, complex psychological behaviors/outcomes are typically multidimensional. If you want to evaluate the ability to apply one’s knowledge to other domains, you can compare how the change in one dimension (each corresponding to a theoretically-driven construct) impacts other dimensions. If you want to assess novelty, you examine the rarity of correct responses and rarity of wrong responses, weighted by response time if needed. If there are more critical aspects of creativity one needs to consider, then add those.
Even some simple correlational analyses can tell you a lot. If the creativity measure does not reflect creativity well, it shouldn’t correlate well with peer-report evaluation of someone’s creativity. And there are so many ways to evaluate how people appreciate someone’s creativity.
> “IRT doesn’t model “nonlinearity”.”
I’m reaching my limit. Next post with such nonsense, and I will ban you. I hate wasting my precious time with people who don’t deserve it. Learn about IRT before talking nonsense.
If by one-line track, you are referring to unidimensionality, IRT can handle multidimensionality. Again, learn IRT before talking.
> “Do you actually think it’s possible to walk into an art gallery and use a set of standardized criteria to assign quantifiable values”
As I said, if you had any basic knowledge on statistics, you know it is quite easy to correlate any measured creativity variable with peer-reported evaluation of creativity and creative achievement, expert judgements, psychophysiological responses (brain activity when viewing art), and then compare the predictive validity of these creativity variables.
If you are not satisfied with the criteria, again name which critical aspect has been ignored, and then it will be incorporated in the latent scores.
Of course, I can see a counterpoint coming. What if people can’t recognize its creative value? What about Picasso’s art not being appreciated by some of his peers? But this point applies equally well to non-statistical assessment of fine art. It’s a temporal and cultural bias. Not statistically related.
"Maybe, as I suggested, you should read Jensen’s book first."
What makes you think I haven't? I told you, I'm not interested in shifting the focus of the conversation to the thoughts of a third party. Readers can refer to the summary you included in your comment. I'm moving on.
"...But really, I don’t think you understand how much information you can obtain from statistical analyses..."
You have no way of knowing that.
I have some idea of how much information cannot be obtained be statistical analyses. While you apparently find the very idea that some phenomena might be wholly or partly impenetrable to quantification and statistical analysis to be blasphemous.
"All of the stuff you mention, about not being not measurable etc., can be detected using the appropriate methods."
Your vague allusion to "appropriate methods" is a mumble. A hand-wave.
To repeat my challenge: If you do think such a quantification is possible to construct in a way that yields objectively coherent findings that support impartially evaluative conclusions, I invite you to provide an example.
"You said, “I’ve gotten all too familiar with studies that lean on the same few controlling factors, when there's actually an abundance of them.”
It’s the same problem with people who aren’t content with the actual measurements. They always complain but they never have a solution..."
Well, that isn't so. I've already mentioned the recent increase in studies that include research into some of those factors, which were formerly taken for granted. I welcome the findings. If the studies find that a particular influence is null, so be it.
"Another issue with such an obsession with confounders is the known sociologist fallacy. You don’t want to include all possible “confounders” just because they appear like so, but because you believe they are causal."
Yes. I agree that relevance is important. The decision about whether they're important enough to be examined is a decision of the researcher, of course, and also subject to budget constraints. I happen to think that there are some variables that haven't been given sufficient attention. Listing them and outlining my reasoning is a topic that deserves more thought, and it's a digression I don't want to include in an already lengthy comment exchange. .
"About your blog post, “improvement of up to 2 standard deviations improvement with RPM over the WISC, by verbal autistic test takers--for test supposedly measuring the same quality, ‘g’, general intelligence”
That definitely proves what I suspected. You don’t have the basic knowledge on IQ. RPM is not the gold standard first, unlike what you claimed."
Support for my claim: "Raven’s Progressive Matrices remains a gold standard in nonverbal intelligence testing due to its emphasis on fluid intelligence and culture-fair design." https://www.cogn-iq.org/rpm-guide.php#heading-conclusion
"The Progressive Matrices is widely accepted as one of the best tests of reasoning ability, general intelligence and Spearman’s g (Jensen, 1998)." https://richardlynn.net/wp-content/uploads/2025/02/5-6.pdf
From the Summary provided by Richard Lynn and Tatu Vanhaven, for their 2006 study "Intelligence and the Wealth and Poverty of Nations": "National IQs assessed by the Progressive Matrices were calculated for 60 nations and examined in relation to per capita incomes in the late 1990s and to post World War Two rates of economic growth." http://www.rlynn.co.uk/index.php?page=intelligence-and-the-wealth
"Intelligence and the Wealth and Poverty of Nations" is one of the most widely cited studies in the field of intelligence research. It's certainly the one that's been most widely referenced on Substack, most often by writers who assume that its conclusions are authoritative. The summary confirms that much of that data is based on the RPM test.
I don't view the RPM as a probative measure of "general intelligence", but it's undeniable that some people do.
"Moreover, WISC is not unidimensional, which is your mistaken assumption."
I've assumed no such thing! My observations on Raven's vs. the WISC can be found in the posts on the topic of intelligence testing found on my Substack page. It's obvious that RPM is the unidimentional test.*
I've always felt that Weschler tests like the WISC and WAIS have superior value in assessing facility with knowledge concepts and scholastic skills; those abilities are the ones that count in modern literate societies and technological civilizations, after all. And yes, I realize that the Weschler exams include matrix block diagrams similar to RPM. The RPM alone is comparatively narrow in its assessments--although this limitation doesn't seem to have gotten its due consideration until the disparities in performance between RPM and WISC by test subjects with verbal autism were brought to the awareness of researchers. I give tests like the WAIS a lot of credit for recognizing that cognitive ability has multiple components, and that there's a lot of individual variance in regard to specific strengths and weaknesses. The plain fact is that--quite unlike Arthur Jensen and yourself-- Weschler did not adhere to the original concept of Spearman's g!
"American psychologist David Wechsler developed a new tool due to his dissatisfaction with the limitations of the Stanford-Binet test (Cherry, 2020).
Like Thurstone, Gardner, and Sternberg, Wechsler believed intelligence involved many different mental abilities and felt that the Stanford-Binet scale too closely reflected the idea of one general intelligence.
Because of this, Wechsler created the Wechsler Intelligence Scale for Children (WISC) and the Wechsler Adult Intelligence Scale (WAIS) in 1955, with the most up-to-date version being the WAIS-IV (Cherry, 2020)..." https://www.simplypsychology.org/intelligence.html
"This study examines the question of whether or not average Full Scale IQ (FSIQ) differences between groups that differ in their academic level can be attributed to g, because IQ results from g plus a mixture of specific cognitive abilities and skills...The results support the conclusion that the Wechsler FSIQ does not directly or exclusively measure g across the full range of the population distribution of intelligence. There is no significant association between the scientific construct of general intelligence (g) and the differences in intelligence in general (IQ) assessed by the WAIS-III..." https://www.sciencedirect.com/science/article/abs/pii/S0160289602001228
"Wechsler never lost sight of the limitations of his intelligence tests. Although his tests often are interpreted as a clear measure of intelligence, Wechsler himself believed that they were useful only in conjunction with other clinical measurements. To Wechsler, assessments were far superior to mere testing..." https://psychology.jrank.org/pages/650/David-Wechsler.html#google_vignette
David Weschler, 1958: "I have...become increasingly convinced that intelligence is most usefully interpreted as an aspect of the total personality. I look upon intelligence as an effect rather than a cause, that is, as a resultant of interacting abilities—nonintellective included. The problem confronting psychologists today is how these abilities interact to give the resultant effect we call intelligence. At this writing it seems clear that factorial analysis alone is not the answer..." https://psycnet.apa.org/fulltext/2006-09607-000-FRM.pdf
The value of Weschler tests has shifted back and forth over time; for a while in the 1990s, it was viewed as "atheoretical". For a while, RPM seems to have been viewed as a more reliable measure. But the Weschler tests continue to make minor adjustments, and the newest versions are presently considered the most comprehensive and well-rounded assessments.
That said, the WISC and WAIS cannot be viewed as examinations intended to probe and rank some innate biological substrate of intelligence in the test subjects.
So why do you and the rest of the agenda-laden "HBD researcher" crowd continue to insist on the paramount importance of Spearman's g, and Arthur Jensen's insistence that it's biologically determined foundation of intellectual abilities that's largely inherited? No one has yet traced any specific etiology for the generation of human mental capacities and their means of functioning. Weschler speculated about "field effects". An elusive process to map and label. Why can't you simply be satisfied with the WISC and WAIC as approximate measures of the activated skill sets of the majority of test subjects, rather than insisting that in every case they must be measuring the limit of potential of intellectual ability in a given individual?
[* re: Raven's vs. the Weschler WAIC: I've just found a quick and dirty aggregated comparison of the possible variance in results between the two tests on an open-source psychometrics page. The results of the comparison that was performed indicated that the "weighted average of all the correlations found in the literature review between the WAIS and RPM was 0.67.
If we assume that this is the true value and that IQ scores are normally distributed like they are supposed to be, the average expected difference between the two scores an individual would receive if they took both is 9.7 points..." Huge, if true. https://openpsychometrics.org/info/wais-raven-correlation/ ]
"if you had any basic knowledge on statistics, you know it is quite easy to correlate any measured creativity variable with peer-reported evaluation of creativity and creative achievement, expert judgements, psychophysiological responses (brain activity when viewing art), and then compare the predictive validity of these creativity variables."
Thank you for stating your position in regard to Creativity so clearly and succinctly. Some of the readers in the vast Substack Notes audience will undoubtedly find your observation edifying.
Autism comes from one thing - to the best of my knowledge: Vaccine injuries.