Measuring children’s emotional and behavioural problems: are SDQ parent reports from native and immigrant parents comparable?
Child and Adolescent Psychiatry and Mental Health volume 13, Article number: 46 (2019)
The number of immigrants worldwide is growing and migration might be a risk factor for the mental health of children. A reliable instrument is needed to measure immigrants' childrens mental health. The aim of the study was to test the measurement invariance of the parent version of the Strengths and Difficulties Questionnaire (SDQ) between German native, Turkish origin and Russian origin immigrant parents in Germany. The SDQ is one of the most frequently used screening instruments for mental health disorders in children.
Differential Item Functioning (DIF) was tested in samples matched by socio-economic status, age and gender of the child. A logistic regression/item response theory hybrid method and a multiple indicators- multiple causes model (MIMIC) was used to test for DIF. Multi Group Confirmatory Factor analysis (MGCFA) was used to test for configural invariance. Parent reports of 10610 German native, 534 Russian origin and 668 Turkish origin parents of children aged 3–17 years were analysed.
DIF items were found in both groups and with both methods. We did not find an adequate fit of the original five factor model of the SDQ for the Turkish origin group, but for the Russian origin group. An analysis of functional equivalence indicated that the SDQ is equally useful for the screening of mental health disorders in all three groups.
Using the SDQ in order to compare the parent reports of native and immigrant parents should be done cautiously. Thus, the use of the SDQ in epidemiological studies and for prevention planning is questionable. However, the SDQ turns out to be a valid instrument for screening purposes in parents of native and immigrant children.
The number of international immigrants increases rapidly worldwide, from 1990 to 2017 it rose by 69% . Germany hosted the third largest numbers of immigrants all over the world in 2017, 16.1% of the German population migrated from another country. In the age group of children under five years the proportion of children of immigrants accounted for 39% in 2017 . Monitoring the mental health of those children is a societal task, keeping in mind that being an immigrant might be a risk factor for children’s mental condition . In order to achieve high quality data, a reliable instrument for measuring mental health problems is needed, measuring the same underlying constructs and thus providing comparable scores between native children and children of immigrants, to assess the need for specific preventive interventions and treatment programs .
For younger children in general parent reports are used. Immigrant parents however might be rooted in the culture of their country of origin, which might affect the way they report about their children. This could lead to non-comparable parent reports between groups of different cultural origin. Differences in reporting could be due to specific response styles (tendencies to agree or disagree to items of a questionnaire) in different countries , the use of different reference groups when evaluating oneself  or unalike societal norms, which are associated with different expectations how a child should behave or when certain developmental steps should happen. Different degrees of social desirability of a behaviour could result in different probabilities that problematic behaviour of one’s own child is reported [7,8,9,10,11].
In Germany, the largest immigrant groups are from Turkey, Poland and Russia . In the current study, we will focus on Turkish and Russian immigrants. The majority of the Russian immigrants are ethnic Germans who came to Germany after the collapse of the Soviet Union (as Spätaussiedler) and got the German citizenship after arrival. Most people of Turkish origin living in Germany are work immigrants (or their descendants and family members), who came during the economy boom in Germany between 1950s and 1970s (as guest workers). Turkish citizens are the biggest group of people without a German citizenship living in Germany [2, 12].
Harzing  found differences in response styles between people in Germany, Turkey and Russia: Disacquiescence, the tendency to disagree with an item, was more often found in Russia compared to Germany and acquiescence, the tendency to agree with an item, was more often found in Turkey than in Germany. If these response styles are still prevailing in immigrants from these countries, scale values might be biased.
To date, some research about developmental expectations and parenting values in Turkish immigrants in Germany and less about Russian immigrants was conducted. Turkish immigrant parents in Germany expected their children to have close relations within the family, to support the family and to be obedient and well-mannered more often than German native parents and they were less likely to value autonomy or self-control [13,14,15]. Parents from Russia expected their children to be obedient more often than German parents .
In the current study we want to investigate if, despite the potential differences in parental response styles and in societal norms mentioned above, a widely used instrument for the screening of mental health, the Strength and Difficulties questionnaire by Goodman (SDQ; ) provides comparable scores when answered by German native parents and parents of Turkish or Russian origin. The SDQ was developed in the United Kingdom, but is in use worldwide . Several studies used the SDQ to compare the mental health of native and immigrants's children in Germany [18,19,20,21] and in other western countries [22,23,24]. Goodman  proposed a five factor structure for his questionnaire (representing the subscales hyperactivity, peer problems, conduct problems, emotional problems and prosocial behaviour), each subscale of which contains five items. The factor structure and the psychometric characteristics of the questionnaire have been mostly investigated separately for different countries (for reviews see e.g. [25,26,27,28]). A lot of these studies confirm the five factor structure, others support a three factor solution (internalizing problems behaviour, externalizing problem behaviour and prosocial behaviour, as first order factors, e.g.  or second order factors e.g. ), or other solutions e.g. . Studies questioning the cross-cultural validity of the parent-version of the SDQ draw inconsistent conclusions. While Stone et al.  found satisfactory internal consistency, test–retest reliability, and inter-rater agreement for the parent version of the SDQ for different countries in their review, Kersten et al.  reported a lack of evidence for cross-cultural validity and Stevanovic et al.  conclude, that there is only weak evidence for cross-cultural validity of the SDQ parent version. Apart from the factor structure, people in different countries or different ethnic groups within one country do not rate the same amount of behaviour reported as similarly problematic, show different SDQ sum scores and the correlations between SDQ scores and the results of mental disorder diagnostic interviews vary in different countries [34,35,36,37,38,39,40]. Concerning the most relevant countries of origin of immigrants in Germany, Turkey and Russia, there is only limited research about the validity of the SDQ parent version. Güvenir et al.  reported a high internal consistency (except for the peer problem scale) and a good convergent and discriminative validity of the SDQ in Turkey but did not test the fitting of the proposed five-factor structure. Stevanovic et al.  could not confirm the five-factor structure for adolescents’ self-reports in Turkey. Husky et al.  found that the SDQ score predicted mental health disorders equally well in Turkey and Germany, but also found low internal consistency for the peer problems subscale in the Turkish sample. In Russia, adolescents’ SDQ self-reports also showed inadequate psychometric characteristics . Goodman et al.  investigated the comparability of the parent version of the SDQ in Britain, Russia and other countries and concluded that cross-national differences in SDQ indicators do not necessarily reflect comparable differences in disorder rates. In Russia, the SDQ total difficulties score led to an overestimation of disorder prevalence. A study investigating the factor-structure of the SDQ parent version in Russia does not seem to exist so far.
Few studies tested the comparability of SDQ results between ethnic groups within one country. Zwirs et al.  compared the factor structure of the SDQ rated by Dutch and Surinamese teachers and found measurement invariance, Richter et al.  explored self-reports of ethnic Norwegian and ethnic minority adolescents in Norway and found a good fit of the five-factor model in ethnic Norwegian adolescents and an acceptable fit in ethnic minority subsamples, but no measurement invariance between the samples. To our knowledge, only one study so far has investigated measurement invariance of the parent version of the SDQ in native and immigrant parents: Goodman et al.  compared a British Indian with a native British sample and found strict invariance in the parent version when excluding the prosocial scale from the analysis.
In the current study we aim to test the measurement invariance, and therefore the comparability, of the SDQ parent version between native German parents and parents of Russian and Turkish origin. We also were interested if the SDQ has the same predictive value for mental health disorders in these three groups, thus testing the SDQ’s functional equivalence.
We used data from two waves of the German Health Interview and Examination Survey for Children and Adolescents (KiGGS), a nationwide survey in Germany, representative for children and adolescents, conducted by the Robert Koch Institute (RKI). For the analysis of measurement invariance, we used the data from the first survey wave, conducted from 2003 to 2006 . To increase sample size, data from second survey wave (2009–2012, ) was added (respondents, who did not take part in the first wave). Several steps were taken to ensure a representative sample of migrants in the first wave’s sample: migrants were oversampled, invitation and interview material was translated in six languages (including Turkish and Russian), non-responders were contacted by phone or visited to reduce worries and fears and interviewers were culturally trained . In the second wave, the extra steps mentioned above were not taken, resulting in a non-representative sample of migrants . For the analysis of functional equivalence, cross-sectional (within the 1. study wave) and longitudinal data was used.
Children’s emotional and behavioural problems were assessed with the parent-version of the Strengths and Difficulties questionnaire , a short questionnaire measuring behavioural strengths and weaknesses of children or adolescents aged 4–17 years. Five subscales (hyperactivity, peer relationship problems, conduct problems, emotional problems and prosocial behaviour) are proposed, each of them consisting of five items. Each item can be answered with “not true” (0) “somewhat true” (1) or “certainly true” (2). While most items describe problematic behaviour and are therefore phrased negatively, some items are formulated positively.
Socioeconomic status (SES)
An overall SES measure was used, containing information about income, education and employment status. Children in the lowest SES score quintile are defined as “low SES”, in the second lowest to second highest quintile as “medium SES” and in the highest quintile as “high SES”. See  for a more detailed description.
The interview partner was allocated to the group of persons of Russian/Turkish origin if he or she was born in Russia/Turkey, had the Russian/Turkish citizenship or stated to speak primarily Russian/Turkish at home. If mothers and fathers were interviewed together, they were allocated to the groups if both of them met one of the characteristics mentioned. N = 2 couples were excluded, because they answered the interview together but only one of them was of Turkish/Russian origin.
Functional equivalence measures
We used the sum score of the short form of the Patient Health Questionnaire, the PHQ-8  as indicator for depression. Parents were asked, if the child was ever diagnosed with Attention Deficit Hyperactivity Disorder (ADHD) and if the child was ever diagnosed with any mental health disorder. Additionally, they were asked if the child has had contact to a psychiatrist, psychologist or psychotherapist in the last 12 months. Answers for diagnoses and contact were dichotomous (yes/no).
To examine differences in response behaviour due to cultural origin, we wanted to minimize the influence of other factors potentially causing bias. Therefore, for testing measurement invariance, we draw two subsamples from the German native parents group: One was matched in SES, child’s age and gender to the Russian origin group (matched sample 1), the other to the Turkish origin group (matched sample 2). This was done using the IBM Statistical Package of Social Sciences (SPSS) version 25.0 for Windows.
Measurement invariance was examined by testing for Differential Item Functioning (DIF) in the subscales and the total difficulties scale and by checking for equivalence of the factor structure. DIF was performed by using the lordif package in R, which uses a logistic regression/Item Response Theory (IRT) hybrid DIF detection method, and by using McFaddens pseudo R2 > 0.02 as detection criterion . To check the stability of results, we also used the multiple indicators, multiple causes (MIMIC) confirmatory factor analysis method with scale purification as proposed by Wang, Shih and Yang  within the lavaan package in R . The MIMIC approach tests for uniform DIF. As recommended for ordinal data with medium sample sizes  diagonally weighted least squares (DWLS) were used to estimate the model parameters. Robust test statistics are reported. To evaluate the size of DIF effects in the MIMIC framework, a MIMIC effect size (MIMIC-ES) as proposed by Jin et al.  was calculated, with 0.3 indicating a small, 0.5 indicating a medium and 0.7 indicating a large effect. Additionally, Multi Group Confirmatory Factor Analysis (MGCFA) in lavaan was performed to examine equivalence of the factor structure with and without items flagged for DIF in the previous step. Model parameters in the MGCFA were also estimated using DWLS. In order to compare results with other studies using MGCFA to test for measurement invariance [e.g. 31, 33, 45], we additionally tested measurement invariance within this approach. We followed the process recommended by Hirschfeld and Von Brachel  with first establishing a configural model, second testing for configural equivalence (same loadings are significant across groups), third testing for weak/metric equivalence (loadings are constrained to be equal) and fourth testing for strong/scalar invariance (intercepts are constrained to be equal). We used χ2, the Comparative Fit Index (CFI) and the Root Mean Square Error of Approximation (RMSEA) to evaluate the model fit. A CFI > 0.90 was rated as acceptable and > 0.95 as good, a RMSEA < 0.6 was rated as good . To evaluate the meaningfulness of changes of the model fit we used the change in the CFI (ΔCFI) because this index is proposed to be independent of overall model fit and sample size. A value of ΔCFI smaller than or equal to – 0.01 indicates that the null hypothesis of invariance should not be rejected . Missings were dropped listwise.
We used linear and logistic regressions within SPSS for testing functional equivalence of the SDQ. SDQ total difficulties score or SDQ subscales and the sample subgroup (categorical variable with the German native group as reference group) were used as predictors, mental health diagnoses, use of mental health service or depressive symptoms as outcome variables. We tested for an interaction effect of group and SDQ scores indicating a different predictive power of the SDQ scores between the groups. Cross-sectional and longitudinal data was used.
The full sample (N = 11,812) used in this study comprises answers from N = 10,610 native German interview partners (n = 10560 first wave respondents and n = 50 second wave respondents), N = 534 Russian origin interview partners (n = 477 first wave respondents and n = 57 second wave respondents), and N = 668 Turkish origin interview partners (n = 620 first wave respondents and n = 48 second wave respondents). The three subsamples German native, Russian origin and Turkish origin parents differed from one another in some aspects. Whereas mothers were interview partners in most cases in the German native and in the Russian origin group (88.5% and 83.5%), this was only true for 57.9% in the Turkish origin group. All native German interview partners were born in Germany, but only 1.7% in the Russian origin group and 19.5% in the Turkish origin group. German native children had a higher SES than children of Russian origin, children of Turkish origin had the lowest SES. Children in the Turkish origin group were more often male (55.7%) and were slightly younger (M = 9.01) compared to the other two groups (Table 1). To avoid biasing effects due to age, gender and SES, for the measurement invariance analyses, two subsamples from the large German native group were drawn: In each strata (e.g. boys or high SES) a random sample was drawn with equal sample-sizes as in the corresponding strata in the Turkish/Russian origin group. After matching, there were no significant differences in age, gender and SES between the German native and the Turkish/Russian origin groups anymore and the groups were of equal sample size (matched German native sample for the Russian origin group N = 550, for the Turkish origin group N = 670).
The SDQ response behaviour of the groups is displayed in Additional file 1.
Differential item functioning
German native/Russian origin group
When comparing the item-functioning of the items in the originally proposed 5-factor model with the logistic regression/IRT hybrid method (lordif), only Item 22 ‘Steals from home, school or elsewhere’ in the conduct problems scale was flagged for DIF (ΔR21,2 = 0.0733 and ΔR22,3 = 0.0868). When testing the total difficulties scale, four items were flagged: Item 22 ‘Steals from home, school or elsewhere’, Item 11 ‘Has at least one good friend’, Item 14 ‘Generally liked by other children’ and Item 23 ‘Gets on better with adults than with other children’ (Items 11, 14, 23 are from the peer problems subscale). Results are shown in Table 2 and Fig. 1. All the flagged items show uniform DIF, Item 22 also shows non-uniform DIF. For this item, the three answer categories were collapsed to two categories. Item thresholds and the Individual–level DIF impact figure indicate that accounting for DIF lead to lower total difficulties scores in Russian origin children and higher scores in German native children (Fig. 1).
The MIMIC approach detected several items for DIF (Table 3). In the conduct problem scale, all items were detected for DIF, that is why a combined externalizing problems scale (conduct problems and hyperactivity) was tested. When taking into account the MIMIC-ES, the items 15 (‘Easily distracted, concentration wanders‘), 7 (‘Generally obedient, usually does what adults request’), 18 (‘Often lies or cheats’), 6 (‘Rather solitary, tends to play alone’), 19 (‘Picked on or bullied by other children’), 23 (‘Gets on better with adults than with other children’) show small DIF effects, item 5 (‘Often has temper tantrums or hot tempers’) shows a medium and item 22 (‘Steals from home, school or elsewhere’) shows a large DIF effect. Thus, only the items 22 and 23 show DIF within both analytic strategies.
German native/Turkish origin group
Using the logistic regression/IRT hybrid method, item 22 from the conduct problems scale was marked for DIF. Within the peer problems scale, 4 of 5 items were marked for DIF. When testing the total difficulties scale, the items 22 (conduct problems), 11 and 23 (peer problems) were flagged for DIF (see Fig. 2 and Table 4). All items showed uniform DIF. Thresholds and the Individual–level DIF impact figure indicate that at lower levels of the trait, a purified scale without DIF items lead to a lower total difficulties score in Turkish origin children and a higher score in German native children. This effect seems to be less strong at higher levels of the trait.
The MIMIC method, when considering only DIF with an effect size above 0.3 (small effect) also results in the detection of item 11 (medium effect) and 23 (large effect; Table 3).
Testing the configural model
In light of existing literature questioning the validity of the five factor solution and the described results above, indicating validity problems (in particular regarding the peer problems scale) the model fit of six different models were tested separately for the three subgroups: (1) A five factor model as proposed by Goodman : hyperactivity, peer problems, conduct problems, emotional problems and prosocial behaviour, (2) a model with two additional higher order factors: internalizing behaviour (containing the subscales emotional problems and peer problems) and externalizing behaviour (containing the subscales hyperactivity and conduct problems), (3) a three factor model (internalizing behaviour, externalizing behaviour and prosocial behaviour), (4) a bifactor model with a general problem behaviour factor and the 5 factors proposed by Goodman , (5) a five factor model with an additional higher order general problem behaviour factor (containing the subscales hyperactivity, peer problems, conduct problems, emotional problems) and (6) a two factor model (general problem behaviour and prosocial behaviour). Because of the problems with the peer problems subscale, we additionally tested a model with a combined internalizing scale and the original three other scales (7).
The models were tested with and without the items detected for DIF within both methods in the previous analyses. Table 5 (with DIF items) and Table 6 (without DIF items) shows the fits of the models tested for each subgroup. The bifactor model (model 4) did not converge in any analysis. Only the original five factor model proposed by Goodman  reached an acceptable fit in the German natives group, but in none of the others. While the fits for the models were better in the Russian origin (CFI M = 0.78), than in the Turkish origin subgroup (CFI M = 0.72), in neither one they reached an acceptable fit.
The deletion of the DIF items did not improve most of the model fits for the Russian origin group. The original five factor model did fit best to the Russian origin data (CFI = 0.79 without DIF items).
When allowing residual correlation within subscales and between positively worded items, the original five factor model showed an acceptable model fit in the Russian origin group (Chi2(210) = 402.121, CFI = 0.91, RMSEA(CI) = 0.044 (0.038–0.051), SRMR = 0.076) and in the German native group (matched sample; Chi2(210) = 432.913, CFI = 0.94, RMSEA(CI) = 0.044 (0.039–0.051), SRMR = 0.072).
Configural invariance was reached between the Russian origin and the German native group, but not weak invariance (Table 7). Thus, strong invariance was not tested.
When deleting the items flagged for DIF in the previous analysis for each subgroup, most of the model fits improved for the Turkish origin group, while the first, second and the fifth model were not identified anymore. The seventh model without the DIF items reached the best fit (CFI = 0.77) in the Turkish origin group, but did not reach an acceptable fit even after allowing residual correlation within subscales and between positively worded items.
One reason for the insufficient fit might be the wording of the items. Since positively worded items tend to cluster together, some studies involved a positive construal factor to deal with the impact of wording [4, 60, 61]. However, including a common method factor might be problematic because it is impossible to estimate the exact effect of the common method variance without directly measuring the common source variable, possibly leading to a bias in the loadings of the other factors . Because most of practitioners are using the subscales that describe problem behaviour only and not the prosocial behaviour subscale to screen for mental health problems anyway, we decided to test a configural model without the prosocial subscale items .
When allowing residual correlation within subscales and between positively worded items and neglecting the prosocial behaviour scale, an acceptable model fit (Chi2(122) = 302.201, CFI = 0.92, RMSEA(CI) = 0.051 (0.043–0.056), SRMR = 0.067) was reached. The same model also showed an acceptable/good fit in the German native group (matched sample; Chi2(122) = 261.949, CFI = 0.957, RMSEA(CI) = 0.047 (0.039–0.054), SRMR = 0.082). Testing invariance within the MGCFA framework revealed configural, metric and scalar invariance between the groups (Table 8).
We compared the total difficulties scores before and after exclusion of the DIF Items. In both analysis, problem behaviour was rated higher for children in the Turkish origin group and Russian origin group compared to the German native group, but the score difference was lower after excluding the DIF Items (Turkish origin/German native comparison original score: ΔM = 1.85; New score ΔM = 1.04; Russian origin/German native comparison original Score: ΔM = 1.16; New score ΔM = 0.90).
We tested the predictive power of the SDQ total difficulties score within the first survey wave and the predictive power of the SDQ total difficulties score, hyperactivity subscale and emotional problems subscale in a longitudinal design using logistic and linear regression analysis with the German native group as reference group. The SDQ total difficulties scale and the emotional and hyperactivity subscales predicted mental health problems. However we did not find interaction effects for the SDQ scores and the group of origin (German, Russian, Turkish). Results are displayed in Table 9.
People from different cultural backgrounds may differ in the way they answer a questionnaire due to different response styles, reference groups or societal norms [5,6,7] and measures thus might be biased. Comparing measures across cultures requires cross-cultural comparability or methodologically spoken measurement invariance, which needs to be tested beforehand . In the current study we examined the measurement invariance of the SDQ, a questionnaire measuring behavioural problems and strengths of children, for native German parents and parents of Russian and Turkish origin in Germany. To our knowledge, the current study is only the second to test measurement invariance in the parent report version of the SDQ between native parents and immigrant parents, the first one doing this with parents of Russian or Turkish origin and the first one in Germany. Items were detected for DIF in both the Russian origin/German native and the Turkish origin/German native comparisons. Whereas in the German native/Turkish origin analysis, the logistic regression/IRT hybrid method and the MIMIC model detection method flagged similar items for DIF, in the Russian origin/German native sample a lot more items were detected in the MIMIC framework. Moreover, comparing Russian origin and German native respondents by using the MGCFA framework to items not flagged for DIF, only configural invariance was reached. One reason for the unstable results could be a non-sufficient sample size in the Russian/German native comparison. Differing properties of the analyses might be another one: MIMIC analyses for DIF detection were found to work better in scales with a high percentage of DIF items  and with smaller sample sizes , but also seem to be vulnerable to detect false positives . Only finding configural invariance moreover might be a result of deleting items only, if they were flagged for DIF in both preliminary analyses (MIMIC approach and logistic regression/IRT hybrid method). Thus DIF items remaining in the questionnaire led merely to configural invariance.
We replicated the five factor structure of the SDQ as proposed by Goodman  for the Russian origin, but not for the Turkish origin parents group. However, using a three factor structure (without the prosocial behaviour scale and with the peer problems and emotional problems scale combined to an internalizing problems scale), configural invariance (and also metric and scalar invariance) for the German native/Turkish origin comparison was found. Thus, given the original five factor structure of the SDQ, at least for the Turkish origin parents, it cannot be certain if the same underlying construct is measured compared to the German native parents.
The five factor structure of the SDQ was already questioned by other studies: Mellor and Stokes  evaluated the five factor structure as inadequate and several studies found a better fit for a three factor solution [29, 67]. A higher order factor model or a bifactor model (as proposed in [46, 68, 69]) did not reach an acceptable fit in our analyses. Some studies suspected the prosocial subscale to be problematic (e.g. ). This might be a result of the combination of the positively worded prosocial subscale with positively worded (reversed) items in the problem subscales, because the positively worded items tend to cluster together . Essau et al.  chose another solution and removed the reversed items, afterwards they found an improved fit. We also found acceptable model fits in the immigrant groups only after allowing positively worded item residuals to correlate.
Whereas research about the child rearing values in Russian immigrants in Germany is very scarce, some studies compared German native with Turkish origin parents. Parents of Turkish origin in Germany were more likely than German native parents to expect close family relations, mutual support in the family, obedience and being well-mannered and they were less likely to value autonomy or self-control in their children [13,14,15]. First and second generation mothers had quite similar socialization goals, second-generation mothers still highly valued their traditional Turkish socialization patterns . Unfortunately, we do not have the data necessary to investigate the underlying reasons for the DIF and the missing equivalence of the factor structure in our study. However, because we matched the samples according to SES, age and gender of the child, none of these factors is apparently the reason for the lack of invariance when using the whole set of items. Hypotheses to be tested in future research could be, that the item detected for DIF from the original peer problems subscale ‘Gets on better with adults than with other children’ is understood as a part of family closeness or obedience and thus does not belong to a peer problem construct in Turkish origin and Russian origin parents. Or that the item ‘Steals from home, school or elsewhere’ could be biased by social desirability in the Russian and Turkish origin subgroup less strongly than in the German native group. The peer problems subscale, to which two of the three items detected for DIF belong, was also found to have a low internal consistency in other studies, Husky et al.  recommend to exclude the scale when one wants to predict internalizing mental health disorders.
Despite the need of cautiousness when comparing SDQ results, our study supports the usability of the SDQ as a screening tool in groups of different cultural origin. We did not find a difference in the predictive power of SDQ scores between the groups (concerning depressive symptoms, ADHD and mental disorders in general).
With regard to limitations of our study, first of all, the sample size was maybe too small to detect all DIF items or to gain stable results in the Russian origin sample. We could not cross-validate the results with data from the second available survey wave, because the immigrant sample was too small for a separate analysis. Instead we added respondents from this wave to the sample of the first wave to increase power. The missing representativeness of the second sample might have affected our longitudinal functional equivalence analysis. Additionally, we do not have objective data to evaluate the real behavioural problems of the children; the report of depressive symptoms or the existence of an ADHD diagnosis are also possibly biased, the former by response styles and the latter e.g. by different health care utilization behaviour. Accordingly, other measures, like observational data or the use of vignettes, might give more insight into the equivalence of the SDQ results. It would also be interesting to test measurement invariance between immigrant groups and the population in the countries of origin.
However, our study also has strong implications. It is not clear if differences in the level of behavioural problems between immigrant and native German children (e.g. in the studies [18,19,20,21]) are actual differences or consequences of lacking measurement invariance. Our results are in line with results of other studies, that found a lack of measurement invariance in SDQ self- report data of adolescents of different cultural origins (e.g. [42, 45]). It is worth mentioning that we already did not use very strict criteria when testing DIF and model fit: We reported MIMIC-ES instead of just significant effects and used two approaches to validate the results. In the analysis of model fit, we allowed residual correlations and accepted CFI parameters of 0.90 instead of 0.95.
For both immigrant groups, the comparison with the German native group revealed smaller differences in the total difficulties scale after exclusion of DIF items. Thus, it is possible that the use of original questionnaire leads to an overestimation of differences between native and immigrant groups. This is relevant when the SDQ is used to examine if immigrant children are at special risk for mental illness, e.g. for prevention planning. We only tested equivalence in two immigrant groups, but it is highly possible that the issue also affects the measurement in immigrants from other countries of origin. The limited amount of research in African countries [72, 73] and the research conducted with refugee children  also indicate to be careful when using the SDQ.
Summarizing, our results indicate that one has to be cautious using the SDQ to compare behavioural problems in groups of different cultural origins. It is not advisable to directly compare the scores of the original scales. Measurement invariance should always be tested before drawing conclusions. If there is a lack of invariance, adapted scales or latent models should be used. However, the SDQ still seems to be a valuable instrument for the screening for mental disorders in native children as well as in children of immigrants.
Availability of data and materials
The data that support the findings of this study are available from the RKI but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the RKI upon reasonable request.
Strengths and Difficulties questionnaire
German health interview and examination survey for children and adolescents
Robert Koch Institute
Attention Deficit Hyperactivity Disorder
Differential Item Functioning
Item response theory
The multiple indicators, multiple causes model effect size
Comparative Fit Index
Root Mean Square Error of Approximation
United Nations, Department of Economic and Social Affairs, population division. International Migration Report 2017. Highlights. 2017. http://www.un.org/en/development/desa/population/migration/publications/migrationreport/docs/MigrationReport2017_Highlights.pdf. Accessed 11 Feb 2019.
Statistisches Bundesamt. Bevölkerung und Erwerbstätigkeit: Bevölkerung mit Migrationshintergrund–Ergebnisse des Mikrozensus 2017. Fachserie 1. 2018. https://www.destatis.de/DE/Publikationen/Thematisch/Bevoelkerung/MigrationIntegration/Migrationshintergrund2010220177004.pdf;jsessionid=C233EA949D1EFB30CBE3E090668B5D47.InternetLive2?__blob=publicationFile. Accessed 11 Feb 2019.
Belhadj Kouider E, Koglin U, Petermann F. Emotional and behavioral problems in migrant children and adolescents in Europe: a systematic review. Eur Child Adolesc Psychiatry. 2014;23:373–91.
D’Souza S, Waldie KE, Peterson ER, Underwood L, Morton SMB. Psychometric properties and normative data for the preschool strengths and difficulties questionnaire in two-year-old Children. J Abnorm Child Psychol. 2017;45:345–57.
Harzing A-W. Response styles in cross-national survey research: a 26-country study. Int J Cross Cult Manag. 2006;6:243–66.
Heine SJ, Lehman DR, Peng K, Greenholtz J. What’s wrong with cross-cultural comparisons of subjective Likert scales?: the reference-group effect. J Pers Soc Psychol. 2002;82:903–18.
Bornstein MH. Parenting and child mental health: a cross-cultural perspective. World Psychiatry. 2013;12:258–65.
Hackett L, Hackett R. Parental ideas of normal and deviant child behaviour. A comparison of two ethnic groups. Br J Psychiatry J Ment Sci. 1993;162:353–7.
Junger M. Discrepancies between police and self-report data for dutch racial minorities. Br J Criminol. 1989;29:273–84.
Otyakmaz BO. Erziehungsverhalten und Entwicklungserwartungen von Müttern. In: Frühe Kindheit in der Migrationsgesellschaft. Wiesbaden: Springer; 2015. p. 67–81.
Pachter LM, Dworkin PH. Maternal expectations about normal child development in 4 cultural groups. Arch Pediatr Adolesc Med. 1997;151:1144–50.
Bundesamt für Migration und Flüchtlinge. Migrationsbericht der Bundesregierung 2016/2017. Berlin; 2019. http://www.bamf.de/SharedDocs/Anlagen/DE/Publikationen/Migrationsberichte/migrationsbericht-2016-2017.pdf?__blob=publicationFile. Accessed 5 June 2019.
Citlak B, Leyendecker B, Schölmerich A, Driessen R, Harwood RL. Socialization goals among first- and second-generation migrant Turkish and German mothers. Int J Behav Dev. 2008;32:56–65.
Döge P. Sozialisationsziele von Müttern und Vätern mit türkischem, russischem und ohne Migrationshintergrund. In: Frühe Kindheit in der Migrationsgesellschaft. Wiesbaden: Springer; 2015. 49–66.
Durgel ES, Leyendecker B, Yagmurlu B, Harwood R. Sociocultural influences on german and Turkish immigrant mothers’ long-term socialization goals. J Cross-Cult Psychol. 2009;40:834–52.
Goodman R. The Strengths and Difficulties Questionnaire: a research note. J Child Psychol Psychiatry. 1997;38:581–6.
Achenbach TM, Becker A, Döpfner M, Heiervang E, Roessner V, Steinhausen H-C, et al. Multicultural assessment of child and adolescent psychopathology with ASEBA and SDQ instruments: research findings, applications, and future directions. J Child Psychol Psychiatry. 2008;49:251–75.
Holling H, Erhart M, Sieberer UR, Schlack R. Verhaltensauffälligkeiten bei Kindern und Jugendlichen. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz. 2007;50:784–93.
Jäkel J, Leyendecker B, Agache A. Family and individual factors associated with Turkish immigrant and German children’s and adolescents’ mental health. J Child Fam Stud. 2015;24:1097–105.
Kuschel A, Heinrichs N, Bertram H, Naumann S, Hahlweg K. Psychische Auffälligkeiten bei Kindergartenkindern aus der Sicht der Eltern und Erzieherinnen in Abhängigkeit von soziodemografischen Merkmalen. Kindh Entwickl. 2008;17(3):161–72.
Schreyer I, Petermann U. Verhaltensauffälligkeiten und Lebensqualität bei Kindern im Vorschulalter und deren Mütter. Z Für Gesundheitspsychologie. 2010;18:119–29.
Goodman A, Patel V, Leon DA. Child mental health differences amongst ethnic groups in Britain: a systematic review. BMC Public Health. 2008;8:258.
Sagatun A, Lien L, Søgaard AJ, Bjertness E, Heyerdahl S. Ethnic Norwegian and ethnic minority adolescents in Oslo, Norway. A longitudinal study comparing changes in mental health. Soc Psychiatry Psychiatr Epidemiol. 2008;43:87–95.
Washbrook E, Waldfogel J, Bradbury B, Corak M, Ghanghro AA. The development of young children of immigrants in Australia, Canada, the United Kingdom and the United States. Child Dev. 2012;83:1591–607.
Kersten P, Czuba K, McPherson K, Dudley M, Elder H, Tauroa R, et al. A systematic review of evidence for the psychometric properties of the Strengths and Difficulties Questionnaire. Int J Behav Dev. 2016;40:64–75.
Marzocchi GM, Capron C, Pietro MD, Tauleria ED, Duyme M, Frigerio A, et al. The use of the Strengths and Difficulties Questionnaire (SDQ) in Southern European countries. Eur Child Adolesc Psychiatry. 2004;13:40–6.
Vostanis P. Strengths and Difficulties Questionnaire: research and clinical applications. Curr Opin Psychiatry. 2006;19:367.
Woerner W, Fleitlich-Bilyk B, Martinussen R, Fletcher J, Cucchiaro G, Dalgalarrondo P, et al. The Strengths and Difficulties Questionnaire overseas: evaluations and applications of the SDQ beyond Europe. Eur Child Adolesc Psychiatry. 2004;13:47–54.
Cefai C, Camilleri L, Cooper P, Said L. The structure and use of the teacher and parent Maltese Strengths and Difficulties Questionnaire. Int J Emot Educ. 2011;3:4–19.
Niclasen J, Skovgaard AM, Andersen AM, Sømhovd MJ, Obel C. A confirmatory approach to examining the factor structure of the Strengths and Difficulties Questionnaire (SDQ): a large scale cohort study. J Abnorm Child Psychol. 2013;41:355–65.
Goodman A. Why do British Indian children have an apparent mental health advantage? [doctoral]. London School of Hygiene & Tropical Medicine; 2009
Stone LL, Otten R, Engels RCME, Vermulst AA, Janssens JMAM. Psychometric properties of the parent and teacher versions of the Strengths and Difficulties Questionnaire for 4- to 12-year-olds: a review. Clin Child Fam Psychol Rev. 2010;13:254–74.
Stevanovic D, Jafari P, Knez R, Franic T, Atilola O, Davidovic N, et al. Can we really use available scales for child and adolescent psychopathology across cultures? A systematic review of cross-cultural measurement invariance data. Transcult Psychiatry. 2017;54:125–52.
Becker A, Steinhausen H-C, Baldursson G, Dalsgaard S, Lorenzo MJ, Ralston SJ, et al. Psychopathological screening of children with ADHD: Strengths and Difficulties Questionnaire in a pan-European study. Eur Child Adolesc Psychiatry. 2006;15(Suppl 1):I56–62.
Bevaart F, Mieloo CL, Jansen W, Raat H, Donker MCH, Verhulst FC, et al. Ethnic differences in problem perception and perceived need for care for young children with problem behaviour. J Child Psychol Psychiatry. 2012;53:1063–71.
Bevaart F, Mieloo CL, Donker MCH, Jansen W, Raat H, Verhulst FC, et al. Ethnic differences in problem perception and perceived need as determinants of referral in young children with problem behaviour. Eur Child Adolesc Psychiatry. 2014;23:273–81.
Goodman A, Heiervang E, Fleitlich-Bilyk B, Alyahri A, Patel V, Mullick MSI, et al. Cross-national differences in questionnaires do not necessarily reflect comparable differences in disorder prevalence. Soc Psychiatry Psychiatr Epidemiol. 2012;47:1321–31.
Heiervang E, Goodman A, Goodman R. The Nordic advantage in child mental health: separating health differences from reporting style in a cross-cultural comparison of psychopathology. J Child Psychol Psychiatry. 2008;49:678–85.
Leijten P, Raaijmakers MA, Orobio de Castro B, Matthys W. Ethnic differences in problem perception: immigrant mothers in a parenting intervention to reduce disruptive child behavior. Am J Orthopsychiatry. 2016;86:323–31.
Zwirs B, Burger H, Schulpen T, Vermulst AA, HiraSing RA, Buitelaar J. Teacher ratings of children’s behavior problems and functional impairment across gender and ethnicity: construct equivalence of the Strengths and Difficulties Questionnaire. J Cross-Cult Psychol. 2011;42:466–81.
Güvenir T, Özbek A, Baykara B, Arkar H, Şentürk B, İncekaş S. Psychometric properties of the Turkish version of the Strengths and Difficulties Questionnaire (SDQ). Turk J Child Adolesc Ment Health. 2008;15:65–74.
Stevanovic D, Urbán R, Atilola O, Vostanis P, Balhara YPS, Avicenna M, et al. Does the Strengths and Difficulties Questionnaire—self report yield invariant measurements across different nations? Data from the International Child Mental Health Study Group. Epidemiol Psychiatr Sci. 2015;24:323–34.
Husky MM, Otten R, Boyd A, Pez O, Bitfoi A, Carta MG, et al. Psychometric properties of the Strengths and Difficulties Questionnaire in children aged 5–12 years across seven European Countries. Eur J Psychol Assess. 2018. https://doi.org/10.1027/1015-5759/a000489.
Ruchkin V, Koposov R, Schwab-Stone M. The strength and difficulties questionnaire: scale validation with Russian adolescents. J Clin Psychol. 2007;63:861–9.
Richter J, Sagatun A, Heyerdahl S, Oppedal B, Roysamb E. The Strengths and Difficulties Questionnaire (SDQ)—Self-Report. An analysis of its structure in a multiethnic urban adolescent sample. J Child Psychol Psychiatry. 2011;52:1002–11.
Goodman A, Patel V, Leon DA. Why do British Indian children have an apparent mental health advantage? J Child Psychol Psychiatry. 2010;51:1171–83.
Hölling H, Kamtsiuris P, Lange M, Thierfelder W, Thamm M, Schlack R. Der Kinder- und Jugendgesundheitssurvey (KiGGS): Studienmanagement und Durchführung der Feldarbeit. Bundesgesundheitsbl. 2007;50:557–66.
Lange M, Butschalowsky H, Jentsch F, Kuhnert R, Rosario AS, Schlaud M, et al. Die erste KiGGS-Folgebefragung (KiGGS Welle 1). Bundesgesundheitsbl. 2014;57:747–61.
Schenk L, Ellert U, Neuhauser H. Kinder und Jugendliche mit Migrationshintergrund in Deutschland. Bundesgesundheitsbl. 2007;50:590–9.
Lampert T, Müters S, Stolzenberg H, Kroll LE. Messung des sozioökonomischen Status in der KiGGS-Studie. Bundesgesundheitsbl. 2014;57:762–70.
Löwe B, Spitzer RL, Zipfel S, Herzog W, für Patienten P-DG. Manual (Komplettversion und Kurzform): Autorisierte deutsche Version des »Prime MD Patient Health Questionnaire (PHQ)«. 2. Karlsr Pfizer. 2002.
Choi SW, Gibbons LE, Crane PK. lordif: an R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and monte carlo simulations. J Stat Softw. 2011;39:1–30.
Wang W-C, Shih C-L, Yang C-C. The MIMIC method with scale purification for detecting differential item functioning. Educ Psychol Measur. 2009;69:713–31.
Rosseel Y. Lavaan: an R package for structural equation modeling. J Stat Softw. 2012;48:1–36.
Brown TA. Confirmatory factor analysis for applied research. New York: Guilford Press; 2006.
Jin Y, Myers ND, Ahn S, Penfield RD. A comparison of uniform DIF effect size estimators under the MIMIC and Rasch models. Educ Psychol Measur. 2013;73:339–58.
Hirschfeld G, Von Brachel R. Multiple-group confirmatory factor analysis in R-A tutorial in measurement invariance with continuous and ordinal indicators. Pract Assess Res Eval. 2014;19:1–12.
Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Model Multidiscip J. 1999;6:1–55.
Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Equ Model Multidiscip J. 2002;9:233–55.
van de Looij-Jansen PM, Goedhart AW, de Wilde EJ, Treffers PDA. Confirmatory factor analysis and factorial invariance analysis of the adolescent self-report Strengths and Difficulties Questionnaire: how important are method effects and minor factors? Br J Clin Psychol. 2011;50:127–44.
Van Roy B, Veenstra M, Clench-Aas J. Construct validity of the five-factor Strengths and Difficulties Questionnaire (SDQ) in pre-early and late adolescence. J Child Psychol Psychiatry. 2008;49:1304–12.
Antonakis J, Bendahan S, Jacquart P, Lalive R. On making causal claims: a review and recommendations. Leadersh Q. 2010;21:1086–120.
Berry JW, Poortinga YH, Segall MH, Dasen PR. Cross-cultural psychology: research and applications. 2nd ed. New York: Cambridge University Press; 2002.
Woods CM. Evaluation of MIMIC-Model methods for DIF testing with comparison to two-group analysis. Multivar Behav Res. 2009;44:1–27.
Wang W-C, Shih C-L. MIMIC methods for assessing differential item functioning in polytomous items. Appl Psychol Meas. 2010;34:166–80.
Mellor D, Stokes M. The factor structure of the Strengths and Difficulties Questionnaire. Eur J Psychol Assess. 2007;23:105–12.
Di Riso D, Salcuni S, Chessa D, Raudino A, Lis A, Altoè G. The Strengths and Difficulties Questionnaire (SDQ). Early evidence of its reliability and validity in a community sample of Italian children. Personal Individ Differ. 2010;49:570–5.
Kóbor A, Takács Á, Urbán R. The bifactor model of the Strengths and Difficulties Questionnaire. Eur J Psychol Assess. 2013;29:299–307.
Goodman A, Lamping DL, Ploubidis GB. When to use broader internalising and externalising subscales instead of the hypothesised five subscales on the Strengths and Difficulties Questionnaire (SDQ): data from British parents, teachers and children. J Abnorm Child Psychol. 2010;38:1179–91.
Essau CA, Olaya B, Anastassiou-Hadjicharalambous X, Pauli G, Gilvarry C, Bray D, et al. Psychometric properties of the Strength and Difficulties Questionnaire from five European countries. Int J Methods Psychiatr Res. 2012;21:232–45.
Durgel ES. Parenting beliefs and practices of Turkish immigrant mothers in Western Europe [doctoral]. Tilburg University; 2011.
Hoosen N, Davids EL, de Vries PJ, Shung-King M. The Strengths and Difficulties Questionnaire (SDQ) in Africa: a scoping review of its application and validation. Child Adolesc Psychiatry Ment Health. 2018;12:6.
Sharp C, Venta A, Marais L, Skinner D, Lenka M, Serekoane J. First evaluation of a population-based screen to detect emotional-behavior disorders in orphaned children in sub-saharan Africa. AIDS Behav. 2014;18:1174–85.
Stolk Y, Kaplan I, Szwarc J. Review of the strengths and difficulties questionnaire translated into languages spoken by children and adolescents of refugee background. Int J Methods Psychiatr Res. 2017. https://doi.org/10.1002/mpr.1568.
We would like to thank the KIGGS study group for data provision. We acknowledge financial support by Stiftung Universität Hildesheim for the Open Access publication.
Ethics approval and consent to participate
The KIGGS study was approved by the Charité/Universitätsmedizin Berlin ethics committee and the Federal Office for the Protection of Data.
Informed consent was obtained by the KIGGS study group from all individual participants included in the study.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Runge, R.A., Soellner, R. Measuring children’s emotional and behavioural problems: are SDQ parent reports from native and immigrant parents comparable?. Child Adolesc Psychiatry Ment Health 13, 46 (2019). https://doi.org/10.1186/s13034-019-0306-z