What predicts best how much you will earn? A very novel way to study this...

This morning Jan Demol sent me this study and at first I was looking at the outcomes – remembering the present Marshmallow debate. But tonight when I looked again at the study I noticed that the method is the bigger news. The researchers used machine learning to compute a whole lot of data, and it left me wondering how this will influence further research.

Let’s look at the summary in the press release:

For the first time, Temple University researchers have used machine learning to rank the most important determinants of future affluence. Education and occupation were the best predictors — but surprisingly, a person’s ability to delay instant gratification was also among the most important determinants of higher income, beating age, race, ethnicity and height. Published in Frontiers in Psychology, the study suggests that interventions to improve this “delay discounting” could have literal payoffs in terms of higher income attainment.

Many factors are related to how much money a person will earn, including age, occupation, education, gender, ethnicity and even height. Behavioral variables are also implicated, such as one relating to the famous “marshmallow test.” This study of delay discounting, or how much a person discounts the value of future rewards compared to immediate ones, showed children with greater self-control were more likely to have higher salaries later in life.

But the study’s lead author, Dr William Hampton, now at the University of St. Gallen in Switzerland, says more traditional ways of analyzing data have been unable to indicate which of these factors are more important than others.

“All sorts of things predict income. We knew that this behavioral variable, delay discounting, was also predictive — but we were really curious how it would stack up against more common-sense predictors like education and age. Using machine learning, our study was the first to create a validated rank ordering of age, occupation, education, geographic location, gender, race, ethnicity, height, age and delay discounting in income prediction.”

Traditional methods used by psychologists (such as correlations and regression) haven’t allowed for a simultaneous comparison of different factors relating to an individual’s affluence. This study collected a large amount of data — from more than 2,500 diverse participants — and split them into a training set and a test set. The test set was put aside while the training set produced model results. The researchers then went back to the test set to test the accuracy of their findings.

Unsurprisingly, the models indicated that occupation and education were the best predictors of high income, followed by location (as determined by zip code) and gender — with males earning more than females. Delay discounting was the next most-important factor, being more predictive than age, race, ethnicity or height.

Dr Hampton hopes the research approach will be part of a new era in data analysis. “This was amazing because it allowed us to check our findings and replicate them, giving us much greater confidence that they were accurate. This is particularly important given the recent wave of findings across science that do not seem to replicate. Using this machine learning approach could lead to more research that replicates — and we hope this spurs the use of more sophisticated analytic approaches in general.”

The study’s authors caution that the data sample was purposely limited to the United States and it is possible that the rank order of variables that predict salary may differ in other countries. Dr Hampton says he is looking forward to exploring this analytical approach in a broader context.

“I would love to see a replication of this study in another culture. I also would be very interested in future studies aiming to reduce delay discounting. There is much debate about whether delay discounting is a stable trait or whether it is malleable — longitudinal studies could help settle that.”

Finally, Dr Hampton has an interesting observation for parents, “if you want your child to grow up to earn a good salary, consider instilling in them the importance of passing on smaller, immediate rewards in favor of larger ones that they have to wait for. This is probably easier said than done, as very few people naturally enjoy waiting, but our results suggest that those who develop the ability to delay gratification are likely investing in their own earning potential.”

Why I am still a bit puzzled if I’m a fan of this new approach, is that it could regarded as being pretty close to p-hacking. On the other hand it can be regarded as an interesting new way to do exploratory research.

Abstract of the study:

Income is a primary determinant of social mobility, career progression, and personal happiness. It has been shown to vary with demographic variables like age and education, with more oblique variables such as height, and with behaviors such as delay discounting, i.e., the propensity to devalue future rewards. However, the relative contribution of each these salary-linked variables to income is not known. Further, much of past research has often been underpowered, drawn from populations of convenience, and produced findings that have not always been replicated. Here we tested a large (n = 2,564), heterogeneous sample, and employed a novel analytic approach: using three machine learning algorithms to model the relationship between income and age, gender, height, race, zip code, education, occupation, and discounting. We found that delay discounting is more predictive of income than age, ethnicity, or height. We then used a holdout data set to test the robustness of our findings. We discuss the benefits of our methodological approach, as well as possible explanations and implications for the prominent relationship between delay discounting and income.

2 thoughts on “What predicts best how much you will earn? A very novel way to study this…”

kadir kozan

September 3, 2018 at 8:23 pm Reply

Reblogged this on kadir kozan.

Loading...
Mark Jasicon

September 5, 2018 at 4:24 pm Reply

Machine learning and cross validation are the opposite of p-value hacking.

Loading...