The validity, reliability and normative scores of the parent, teacher and self report versions of the Strengths and Difficulties Questionnaire in China
Child and Adolescent Psychiatry and Mental Healthvolume 2, Article number: 8 (2008)
The Strengths and Difficulties Questionnaire (SDQ) has become one of the most widely used measurement tools in child and adolescent mental health work across the globe. The SDQ was originally developed and validated within the UK and whilst its reliability and validity have been replicated in several countries important cross cultural issues have been raised. We describe normative data, reliability and validity of the Chinese translation of the SDQ (parent, teacher and self report versions) in a large group of children from Shanghai.
The SDQ was administered to the parents and teachers of students from 12 of Shanghai's 19 districts, aged between 3 and 17 years old, and to those young people aged between 11 and 17 years. Retest data was collected from parents and teachers for 45 students six weeks later. Data was analysed to describe normative scores, bandings and cut-offs for normal, borderline and abnormal scores. Reliability was assessed from analyses of internal consistency, inter-rater agreement, and temporal stability. Structural validity, convergent and discriminant validity were assessed.
Full parent and teacher data was available for 1965 subjects and self report data for 690 subjects. Normative data for this Chinese urban population with bandings and cut-offs for borderline and abnormal scores are described. Principle components analysis indicates partial agreement with the original five factored subscale structure however this appears to hold more strongly for the Prosocial Behaviour, Hyperactivity – Inattention and Emotional Symptoms subscales than for Conduct Problems and Peer Problems. Internal consistency as measured by Cronbach's α coefficient were generally low ranging between 0.30 and 0.83 with only parent and teacher Hyperactivity – Inattention and teacher Prosocial Behaviour subscales having α > 0.7. Inter-rater correlations were similar to those reported previously (range 0.23 – 0.49) whilst test retest reliability was generally lower than would be expected (range 0.40 – 0.79). Convergent and discriminant validity are supported.
We report mixed findings with respect the psychometric properties of the Chinese translation of the SDQ. Reliability is a particular concern particularly for Peer Problems and self ratings by adolescents. There is good support for convergent validity but only partial support for structural validity. It may be possible to resolve some of these issues by carefully examining the wording and meaning of some of the current questions.
Mental health problems in children and adolescents result in significant burden and impact not only on the individual child but also their families, schools and communities [1–3]. In China, as in the rest of the world, increasing numbers of children and adolescents are being identified as suffering from a wide range of mental health problems [4–6]. In recent years, China has had a more open policy, and Chinese society has been changing rapidly. There has been a shift from traditional cultural models towards a multi-culture model with traditional ideas increasingly being influenced by different cultures and in particular those from the West . There however remain many differences between contemporary Chinese and Western societies. It seems likely that these differences and the inevitable tensions, between Western and traditional Chinese values, will impact on the lives of children. For the children born during the "one family one child" era life has become very competitive. These are thought by many to have increased the stresses placed upon on the child and to have, potentially, increased the incidence of child and adolescent mental health problems . Also, particularly in South China, where the economy has developed more rapidly, an increasing number of students have been living away from their parents either boarding in schools or living in their teachers' homes. As a consequence teachers have become much more aware of their students emotional functioning and their strengths and difficulties. As a consequence the development and validation of tools that allow teachers views to be considered has become increasingly important .
Despite a trend towards increased recognition of children and adolescents with mental health problems, studies of service use generally suggest that only a minority of those with mental health needs are in contact with specialist services [8, 9]. Unfortunately strategies for both primary prevention (the prevention of the onset of a condition), and secondary prevention (the identification and treatment of asymptomatic individuals who have already developed risk factors or preclinical disease but in whom the condition is not clinically apparent), are not well developed in child and adolescent mental health fields. It is therefore clearly important that clinicians develop effective, reliable and valid and usable tools that can facilitate the early identification of child and adolescent mental health problems as well as the detection of hidden comorbidities in those presenting with either general physical or mental health problems. Parent, teacher and self report questionnaires can potentially play an important role in this process. A range of questionnaires are available to evaluate behavioural and emotional problems of children and adolescents, several of these have been validated for use in Chinese populations, including the Child Behaviour Checklist, the Rutter Questionnaires, and the Conner's Questionnaires [10–13]. Although these instruments are useful they have several shortcomings. They are felt by many clinicians to be too long, cumbersome to score and to place too great an emphasis on certain behaviours. Their focus on problem behaviours, such as hyperactivity, has also resulted in a reduced acceptance by non-medical professionals. Goodman initially developed the Strengths and Difficulties Questionnaire (SDQ) in the UK , it has now been translated into 66 different languages and has become an internationally recognized tool which is extensively used in both research and clinical settings. Use of the SDQ as an assessment of children's behaviour and emotional problems has been supported by the Chairman of the World Psychiatric Association Children's Mental Health Projects. The SDQ has several advantages over the other scales mentioned above. It is relatively short, with only 25 questions and a simple scoring system, making it quick and easy to complete and to score. It has a simple factor structure with good face validity. Perhaps the most important feature of the SDQ is its emphasis on an individual's strengths as well as their difficulties which has resulted in a very broad acceptance by non health professionals, children and their parents.
The structure, normative scoring and psychometric properties of the SDQ have been extensively investigated in samples from the UK and Europe [15–24], the Americas [25–29], Australia [30, 31], the Middle East [32–35] and Asia [35, 36] Despite these studies having generally supported reliability and validity, several important cross cultural issues have been raised. For example several recent studies have questioned whether the original subscale structure of the SDQ is equally valid in all cultures [21, 27, 33]. It is therefore essential that the reliability and validity of the SDQ continues to be assessed across differing cultural settings, particularly in situations such as in China, where issues of tradition or social structure and organization may result in subtle alterations in the meaning of specific items which could impact on reliability and validity.
There are currently no published data on the use of the SDQ in China. In order to assist with the preparation and implementation of the World Psychiatric Association Children's Mental Health Projects in Shanghai, a densely populated and rapidly developing urban area, we collected normative data from a large representative community sample in order to address five broad research questions.
Do the Chinese translations of the parent, teacher and self report versions of the SDQ have the same five subscale factor structure in this population as was demonstrated for the original English version in a UK population?
What are the mean scores and subscale scores for each version of the questionnaire in this population?
What are the appropriate normal, borderline and abnormal bandings and cut-off scores for these scales in this population?
Do the Chinese translations of the SDQ have acceptable reliability in this population?
Do the Chinese translations of the SDQ have acceptable validity in this population?
This is a cross sectional epidemiological study investigating the structure, reliability and validity of the parent, teacher and self report versions on the SDQ.
As it was not possible, for logistic reasons, to include children from across the whole of Shanghai we used a mixture of stratified cluster, random sampling and stratification, to identify children from nursery, primary and secondary schools from 12 of Shanghai's 19 administrative districts. These twelve districts were chosen to be representative of the whole of Shanghai. Within each district schools were randomly chosen and all children within a chosen school were approached. Prior to commencing data collection, we met with all school principals and psychological counselling teachers to explain the significance of the investigation and discuss the research strategy. They in turn informed the students and their parents about the study. We sampled a total of 2128 students aged between 3 – 17 years, including 535 nursery school students, 693 primary school students and 900 secondary school students.
The official Chinese translations of the parent, teacher and self report versions of the Strengths and Difficulties Questionnaire  were used. These versions were translated and back-translated by academic staff at the Centre for Clinical Trials and Epidemiological Research at the Chinese University of Hong Kong, and by Iris Tan Mink. Each of these questionnaires includes 25 items, each of which is scored on a three point scale (0 = not true, 1 = somewhat true, 2 = certainly true). Fifteen of the questions ask about difficulties and ten ask about strengths. The ten questions asking about strengths are positively worded. Five of these make up the prosocial behaviours subscale for which, unlike the other four subscales a higher score signifies less problems. The other five positively worded questions are reverse scored. Five subscale scores are generated each of which relates to 5 of the questions. These are; emotional symptoms, conduct problems, hyperactivity/inattention, peer relationship problems and prosocial behaviour. A total difficulties score is calculated by summing four of the subscale scores (emotional symptoms, conduct problems, hyperactivity/inattention and peer relationship problems). In addition, but not used in this study, an impact rating can be generated using separate questions from an impact supplement. In general a high score represents greater difficulties, except for the prosocial scale score where a lower score indicates greater difficulties. General information on the SDQ, the Chinese versions, and the SDQ scoring can be found online[37, 38]. Parents and teachers were asked to rate the behavioural and emotional aspects of the child's behaviour over the past six months as per their general observations of the child, young people aged 11 – 17 were asked to rate themselves over the past six months. Parents were also asked to complete the Chinese version of the Conner's Parent Symptoms Questionnaire (PSQ) .
Parents, teachers who knew the children well and young people aged between 11 and 17 years, completed questionnaires. Questionnaires were completed in the classrooms at the children's schools, guided by a trained psychological counselling teacher. If whilst completing the questionnaire either the parent the teacher or the young person had doubts about how to proceed the psychological counselling teachers would explain. Each parent and teacher completed the questionnaire alone, and handed in the questionnaires to the psychological counselling teachers. We received a total of 2,101 (98.7%) questionnaires for parents, 2,123 (99.7) from teachers and 816 (90.6%) from young people. A questionnaire was considered invalid if answers were missing for one or more questions. Only subjects with complete parent and teacher data were analysed and data from the one subject younger than 3 years and the one subject older than 17 years were excluded. One thousand nine hundred and sixty five subjects had complete parent questionnaires and teacher questionnaires (93.5% of the parent questionnaires and 92.5% of the teacher questionnaires) and 690 subjects had complete self report, parent and teacher questionnaires (84.6% of eligible subjects). There were no differences with respect district, age or gender between those with complete and incomplete questionnaires (social class data were not available) and the sample was representative of the Shanghai population with respect age and gender distribution. There were no other exclusion criteria. Retest data was collected from parents and teachers for 45 students six weeks later (practical limitations precluded a shorter re-testing interval).
We established the database of the raw data in FoxPro; data description and statistical analyses were performed by SPSS (versions 11.0 and 14.0). Statistical analyses were conducted on unweighted data. Normative data is presented descriptively. Distributions of raw scores were used to determine the cut-off scores to identify normal, borderline and abnormal bandings. Where appropriate analyses were repeated for two age bandings (3 – 10 years and 11 – 17 years). A principle components analyses was conducted to investigate the subscale structure of the scales. Reliability was assessed from analyses of internal consistency using Cronbach'sα, inter-rater agreement, and temporal stability (test retest reliability) for which test-retest reliability ≥ 0.7 is deemed to be satisfactory . Structural validity was assessed via cross scale correlations. Convergent validity was assessed by calculating correlations between the parent completed SDQ and the parent completed PSQ, Discriminant validity was assessed by comparing 47 subjects from the normative sample with 47 age and gender matched ADHD outpatients using receiver operating characteristic (ROC) curves employing area under the curve (AUC) as an index of discriminant ability. For the AUC a score ≤ 0.6 suggests that discrimination is no better than chance; 0.6 – 0.75 is fair; 0.75 – 0.90 is good, 0.90 – 0.97 is very good and 0.97 – 1. 0 is excellent .
Complete parent and teacher data were available for 1965 children and complete parent, teacher and self report data were available for 690 cases. There were no differences with respect to age and gender between those cases with and without complete data. Data on social class were not available. These data were used to generate the following results.
Scale means, age and gender effects
The mean SDQ subscale scores for parent, teacher and self ratings subdivided by age-band (3 to 10 years and 11 to 17 years) and gender are presented in tables 1, 2 and 3 respectively. For all three raters boys of all ages were rated as having statistically significantly greater difficulties on the total problems score and on the conduct problems, hyperactivity/inattention, peer problems, and prosocial behaviour subscales with one exception; parent ratings of peer problems in the younger age group showed no gender differences. On the emotional symptoms subscale younger but not older girls were rated as having statistically significantly greater difficulties on the parent rated scale. There were no gender differences seen on this subscale on the teacher or self reported self reported scales (all significant p values ≤ 0.001).
For parent ratings there was a main effect of age on the emotional symptoms [F (1, 1963) = 11.8, p < .001] and hyperactivity/inattention [F (1, 1963) = 40.7, p < .001] subscales. For both of these subscales the scores decreased as age increased. There was no main effect of age on parent rated conduct problems, peer problems or prosocial behaviour. There were gender × age interactions for peer problems [F(2,1962) = 11.7, p < .001] whereby the boys peer relations were rated as getting worse as they got older and girls were rated as improving.
For teacher ratings there was a main effect of age on hyperactivity/inattention [F (1, 1963) = 12.7, p < .001], peer problems [F (1, 1963) = 34.8, p < .001] and prosocial behaviour [F (1, 1963) = 14.2, p < .001]. Hyperactivity/inattention and prosocial behaviour were adjudged to have improved as the children got older, peer relations were rated as worse for older children than for younger children. There was no main effect of age on teacher rated emotional symptoms or conduct problems. There was gender × age interaction for teacher rated prosocial behaviour [F (2, 1962) = 12.7, p < .01] and of the teacher reported subscales whereby boys older boys were rated as less prosocial and older girls as more prosocial.
Age effects were not calculated for the self reports due to the constricted age range in this sample.
Bandings and cut-offs
Bandings and cut-offs were estimated from the distributions of raw values in the manner described by Woerner, et al . For the total difficulties scores cut-offs were calculated with the intention of placing approximately 10% of the sample with the most extreme scores in the "abnormal" banding, the next 10% in the "borderline" banding and the remaining 80% in the "normal" banding. As prevalence's for individual disorders are necessarily lower than those for any disorder it was felt more appropriate to place a slightly lower percentage of subjects in the abnormal and borderline bandings for each of the subscales therefore cut-offs were determined for each such that approximately 85% of subjects were placed in the normal banding and 7.5% in each of the abnormal and borderline bandings. However since each of the subscales can only have a limited number of scores (i.e. 11, between 0 and 10) the actual percentages could only be approximated. These bandings are shown in table 4 along with the actual percentage of subjects in each of the three banding categories. In view of the extended age range of the sample these bandings were also calculated separately for younger and older age ranges for the parent and teacher completed scales. The bandings for the different age groups were very similar with few differences (data not shown).
The Cronbach's α coefficients for the parent and teacher SDQ subscales and total score are reported in table 5. As above data from Goodman et al. (2001) have been included in this table for comparison. Overall the α coefficients were lower than hoped for. The α coefficient directly reflects the degree of the internal consistency of the factors and an α ≥ 0.70, is generally considered to indicate good internal consistency sufficient for group comparison . For the parent subscales only the hyperactivity/inattention (α = 0.76) subscale had an α ≥ 0.70 with the other α coefficients ranging between 0.30 and 0.68. The alphas for the teacher subscales were constantly higher than those for the parent subscales however good reliability was only found for the hyperactivity/inattention (α = 0.82) and prosocial behaviours (α = 0.83) subscales. The other subscales alphas ranged between 0.48 and 0.63. For the self reported scale the subscale α coefficients were lower than for the other two informants and none of the subscales had an α coefficient > 0.7 (range 0.30 – 0.64).
These analyses were repeated for the two main age bands (3 – 10 years and 11 to 17 years). The results of these analyses were very similar to those for the whole group and are not reported further (range for 3 – 10 years, parent 0.29 – 0.74, teacher 0.45 – 0. 84, self 0.29 – 0.62, range for 11 to 17 years, parent 0.32 – 0.77, teacher 0.49 – 0. 83, self 0.30 – 0.65).
The inter-rater correlations between parents and teachers are reported in table 6. To keep consistency with the Goodman  paper the mean cross-informant correlations for other similar measures based on the meta-analysis conducted by Achenbach et al.  have been included for comparison. These data were also analyzed by age. The correlations were between parents and teachers were consistently higher for the younger children (3 – 10 years) than for the older children (11 – 17 years) (data not shown).
Parents and teachers of sixth grade students completed the SDQ for a second time 6 weeks after their first completion. Test retest correlations of ≥ 0.7 are generally considered reliable. The correlations between these scores are reported in table 7. All the coefficients were statistically significant (P < 0.001).
Principle Components Analyses
The results of the rotated principal components analyses with subsequent Varimax rotation for the parent, teacher and self rated SDQs are detailed in tables 8, 9 and 10 respectively. In each analysis a fixed 5 component solution was chosen in order to obtain comparability with the original SDQ papers.
For the parent ratings the prosocial behaviour, hyperactivity/inattention and emotional symptoms items loaded on the predicted components, the conduct items loaded onto two separate components. Two of the peer problems items (good friend and popular) loaded onto the prosocial component, "good friend" loaded onto the emotional symptoms component and "bullied" loaded onto one of the conduct components. The "Best with adults" question did not load onto any of the components. The other three peer problems items (solitary, popular, bullied) each loaded independently onto one of the other components. Three items (somatic, restless and fidgeting) also loaded onto conduct components with higher loadings than they did onto their predicted component.
For the teacher ratings the outcome was less clear. The five prosocial items loaded onto a single component on which there were also high loadings for five other positively worded questions two hyperactivity/inattention items (reflective, persistent) one conduct item (obedient) and two peer problems items (good friend and popular). All 5 hyperactivity/inattention items loaded onto a single component however two items had higher loadings on another component that also included the highest loadings for two conduct symptoms (tempers and fights) 1 emotional symptom item (somatic) and moderate loading for another two conduct items (obedient and argues with adults) that however loaded higher onto other scales. The four other emotional symptoms items had their highest loading onto a single component. Four of the peer problems items (bullied, best with adults, good friend and popular) loaded onto a single component along with two conduct items (argues with adults and spiteful) however two of the peer problems items (good friend and popular) loaded more highly onto the prosocial behaviours component. Both the parent and teacher rated "prosocial" components could also have been labelled as a "positive" component as the additional items which loaded highly on them were all positively worded.
For the self reported ratings prosocial behaviour, hyperactivity/inattention and emotional symptoms items again loaded on the predicted components. There were two less well defined "mixed" components the first of which included two conduct items (Argues with adults and spiteful), one emotional symptoms item (fears) and two peer relationships items (bullied and best with adults), a second "mixed" component included two conduct items (tempers and fights) and to items negatively correlated with these one from the emotional subscale (clingy) and a prosocial item (kind to kids).
The parent and teacher principle components analyses were repeated with the sample split into two age groupings (3 – 10 years and 11 to 17 years). The results from each of these analyses were very similar to those described above (data not shown) and are not discussed further.
The cross-scale correlations between the three psychopathological subscales are reported separately for each informant in table 11. As a comparison the figures for the same analysis from the original UK description of the psychometric properties of the SDQ  have been included. As expected the conduct – hyperactivity/inattention correlations (parent = .46, teacher = .61, self = .39) are considerably higher than either the conduct – emotional (parent = .22, teacher = .22, self = .27) or the hyperactivity/inattention – emotional ones (parent = .21, teacher = .19, self = .33).
The Conner's Parent Symptom Questionnaire (PSQ) is frequently used to evaluate children's behaviour . Su has developed and validated a Chinese version of the PSQ . We conducted convergent validity analysis between SDQ and PSQ. All the parents were asked to complete the PSQ at the same time as completing the SDQ. Data was available for 1940 subjects. The scores of the SDQ and PSQ subscales were correlated with each other. The results of this analysis are reported in Table 12. As expected the correlations are highest for matching subscales and between externalizing – externalizing pairs and internalizing – internalizing pairs, lower for externalizing – internalizing pairs and in-between for the peer and prosocial subscales of the SDQ and subscales of the PSQ which does not attempt to measure these domains. Similarly the correlations between the physical and mental problems subscale of the PSQ and the SDQ subscales are low.
We compared 48 respondents from the normative sample with 47 ADHD outpatients matched for age and gender. As expected the hyperactivity/inattention subscale and total difficulties scores were scored higher by all raters for the ADHD group, than for the control group. Parents and teachers also scored the ADHD group higher for conduct problems and the teachers scored them higher for emotional symptoms. ROC analyses supported the ability of the Chinese SDQ to discriminate between these two groups. For this purpose the underlying assumption was that children with ADHD were substantially more likely to have problems with hyperactivity/inattention, conduct, peer relationships, prosocial behaviours and total difficulties than the control children. In ROC analyses sensitivity and specificity are calculated for all possible cut-offs on the questionnaire. These are then combined to give a statistic the "area under the curve" (AUC). Values for AUC are between 0 and 1.0. The convention for interpreting AUC is that an AUC ≤ 0.6 suggests that discrimination is no better than chance; 0.6 – 0.75 is fair; 0.75 – 0.90 is good, 0.90 – 0.97 is very good and 0.97 – 1. 0 is excellent . The results for the ROC analyses are summarized in table 13. All of the SDQ scales and subscales, except for the parent scored peer relations and prosocial behaviours subscales, discriminated between the ADHD and control cases better than chance. Whilst most of the AUCs were in the "fair" range (0.6 – 0.75) several (parent and teacher Hyperactivity – Inattention, and teacher hyperactivity -inattention, conduct problems and total difficulties), were "good" (0.75 – 0.90). The teacher ratings were significantly better at discriminating hyperactivity – inattention, conduct problems and total difficulties than either the parental or the self report ratings.
The normative scores, bandings and cut-offs and the psychometric properties of the Chinese version of the SDQ were evaluated for a representative sample of children and adolescents aged between 3 and 17 years from 12 of the 19 districts of Shanghai. The collection and description of normative data within specific populations is important as differing means are possible both as a consequence of actual differences in the prevalence of particular difficulties between different populations and as a result of cultural biases and expectations as to what is "normal" on the part of raters with differing backgrounds and experiences. In general the Chinese normative data closely resembles that from the UK . In particular the age and gender patterns were similar to those seen in the UK sample. It was however noticeable that the Chinese scores for the peer problems subscale were consistently higher than those for the UK. As we failed to replicate the "peer problems" grouping in our principle components analysis it seems likely that these differences in scoring may reflect a difference in meaning for these questions rather than a true difference in peer relationships. In addition the Chinese teachers also tended to rate conduct problems, hyperactivity – inattention, and total difficulties somewhat higher than their UK counterparts. With respect to the bandings and cut-off scores for the total scale and subscales there were again only minor differences. The teachers higher scoring on several subscales was associated with slightly broader "normal" bands meaning that children some Chinese children who would be rated as "normal" would have been within the "borderline" band had they had the same score in a UK sample.
Whilst these normative data provide important information for future researchers and clinicians who wish to use the SDQ in China, the overall usefulness of these scales in this setting is dependent on the SDQ, originally designed for use in a Western cultural setting, proving to be reliable and valid in a Chinese population. Our findings extend and partially replicate previous findings from community and clinic samples from around the world and suggest reasonable but not unequivocal validity and reliability.
When the psychometric properties of the SDQ have previously been examined in differing cultural contexts the results have generally supported reliability and validity. However several important cross cultural issues have been raised . Several studies have supported the original five factor structure of the SDQ in both clinical and epidemiological samples [15, 20, 30, 32, 45–47], others have raised questions about the structural validity of this model. Studies across several cultures have reported low internal consistencies for the parent and self report Conduct Problems subscale and the self-report Peer Problems subscale [16, 18–21, 47–49]. These may simply be due to the fact that each subscale only contains 5 questions or they may suggest that, at least in some cultures, these subscales represent and tap into more heterogeneous constructs than originally intended. Several recent studies have questioned the original subscale structure of the SDQ and more specifically whether it is equally applicable across differing cultures. Thabet et al  conducted a confirmatory factor analysis of the Arabic version of the SDQ scored by parents of children within the Gazza Strip. Whilst there was some support for the original 5 factor structure they found that certain items appeared to have a different function or meaning than is seen in western children and their parents. These included; being unhappy, scared, and distractible, stealing, and being picked on or bullied. As a consequence the emotional and peer relationship subscales and the total difficulties scores seemed to be either more heterogeneous or more multifactorial than is typically seen in western cultures. Dickey and Blumberg  in a US sample also failed to replicate the original five factor structure. They concluded that a three factor model, consisting of externalizing problems, internalizing problems and positively worded items, was the most stable and best accounted for their parent reported data. Koskelainen et al  also reported a three factor solution as the most adequate representation for a Finnish sample. Using the self report version of the Dutch SDQ Muris et al  reported a four-factor solution (Emotional Symptoms, Prosocial Behaviour including positively worded items from other scales, Hyperactivity-Inattention and a mixed Peer Problems -Conduct Problems scale) as the most satisfactory solution. Most recently Palmieri and Smith  used confirmatory factor analysis to investigate three models of the SDQs factor structure using data from a US sample of custodial grandmothers and found that the best representation of the latent structure was provided by a model which included the original five factors and an additional factor comprising a "positive construal" factor made up from the positively worded questions.
In our Chinese sample the principle components analyses in the main support the Hyperactivity-Inattention, Emotional and Prosocial subscales but provide less support for the Conduct and Peer Problems subscales. There was also some support for a positive construal component as suggested by Palmieri & Smith . It is possible that this pattern of results reflects the underlying nature of the subscales and represent a greater cross cultural acceptance and consistency of what should be regarded as a prosocial behaviour, and as a behaviour indicative of hyperactivity/impulsivity disorders (i.e. ADHD) and emotional disorders (i.e. anxiety and depression), than there is about what types of behaviours indicate the presence of oppositionality and conduct problems and positive peer relationships. The problems with the peer problems subscale were, as would be expected mirrored by low estimates of internal consistency for this subscale across all three raters.
Other aspects of reliability as measured by internal consistency were also rather disappointing. Other than for prosocial behaviours and hyperactivity/inattention all of the internal consistency coefficients for the Chinese sample were all somewhat lower than those reported in the original analysis of the psychometric properties of the SDQ . None of the self reported measures had an α > 0.70 and only the hyperactivity – inattention subscale for the parent scale and the hyperactivity – inattention and the prosocial behaviour scales for the teacher scale reached this level. As was previously reported by Goodman  the reliabilities for the teachers were consistently higher than those for the parents and both of these were more reliable than the self report scale. Inter-rater correlations were, however, reasonable and indeed in this respect our sample was again very similar to that Goodman's  with all but one of the inter-rater correlations exceeding the meta-analytic mean reported by Achenbach et al. .
The validity of the Chinese versions of the SDQ was supported by the cross scale correlations, which were very similar to those previously reported by Goodman in a UK sample . The convergent validity with the Connors Parent Symptom Questionnaire and the discriminant validity as measured by the ability of the Chinese SDQ to discriminate between a community sample of children and children with ADHD were also very good. With respect to discriminant validity the AUC values from this sample are similar to those previously reported for a German sample .
The SDQ has generally been thought of as a screening instrument rather than a measure of outcome. We are aware however of several clinical centres using the SDQ as an outcome measure e.g. Unfortunately the test-retest reliability of the SDQ, a prerequisite for measuring outcome, has not yet been extensively investigated. A test-retest reliability ≥ 0.7 is generally reported as satisfactory . Goodman  reported data from a small sample of UK parents retested 3–4 weeks after initial testing, the intra-class correlations ranged from 0.44 for the "burden" item from the impact scale to 0.85 for total difficulties. Unfortunately the coefficients for the five subscales are not reported. Hawes & Dadds  reported correlations for retesting on the parent instrument after 12 months. As they acknowledge correlations over this period of time will reflect real changes in the child's behaviour due to development, environmental changes etc., as well as instrument instability, and as a consequence they would be expected to under-estimate stability. It is therefore notable that these correlations, which ranged between 0.61 for peer problems and 0.77 for hyperactivity-inattention, were as high as they were. Indeed the test retest reliabilities for the Emotional Symptoms, Hyperactivity – Inattention, Prosocial Behaviour subscales and for the Total Difficulties score for this Australian sample were larger than those reported here. Only Muris et al  have reported the test retest stability of the self report scale. They obtained retest data from 91 young people and their parents two months after initial testing. With the exception of the self reported prosocial subscale correlations for both informants on all subjects the intra-class correlations were all above 0.70. As far as we are aware ours is the first study to report the test retest reliability of the teacher SDQ. Our results are less positive than previously reported. Despite the intra-class correlations all being significant with p < 0.001, they were lower than expected ranging between 0.40 for teacher rated Emotional Symptoms to 0.79 for parent rated Peer Problems with only two other correlations ≥ 0.70 (parent rated Conduct Problems and Total Difficulties).
It must be noted that Shanghai is a densely populated and rapidly developing urban area and that these findings may not generalize to other more rural provinces.
In summary we report mixed findings with respect the psychometric properties of the Chinese translation of the SDQ. The structural analysis suggests that whilst there is support for the Prosocial behaviour, Hyperactivity/Inattention and Emotional Problems subscales there appear to be differences in the way the Chinese interpret the questions relating to Conduct and Peer Problems. These differences may also underpin the lower internal consistencies of the parent and self reported scales. These issues require further investigation and it may be the case that certain questions would need to be altered or reworded in order to capture the intended constructs. The normative scores, cut-offs and bandings only differ slightly from those reported in other cultures. Convergent and discriminant validity and inter-rater agreement appear good however there are issues relating to stability as measured by test retest reliability. These findings clearly need to be replicated in other Chinese samples, including those from rural rather than urban settings. However until such data is available these results should be taken into account by clinicians and researchers using this instrument.
Attention Deficit/Hyperactivity Disorder
Conner's Parent Symptom Questionnaire
Strengths and Difficulties Questionnaire
Meltzer H, Goodman R, Ford T: Mental health of children and adolescents in Great Britian. 2000, London, HMSO
Costello EJ, Egger H, Angold A: 10-year research update review: the epidemiology of child and adolescent psychiatric disorders: I. Methods and public health burden. J Am Acad Child Adolesc Psychiatry. 2005, 44: 972-986. 10.1097/01.chi.0000172552.41596.6f.
Angold A, Messer SC, Stangl D, Farmer EM, Costello EJ, Burns BJ: Perceived parental burden and service use for child and adolescent psychiatric disorders. Am J Public Health. 1998, 88: 75-80.
Hong KM, Yamazaki K, Banaag CG, Du Y: Systems of care in Asia. Facilitating Pathways: Care Treatment and Prevention in Child and Adolescent Mental Health. Edited by: Remschmidt H, Belfer ML and Goodyer I. 2004, Berlin, Springer - Verlag, 58-70.
Du Y: Child Mental Health Care. 1999, Shanghai, Shanghai Science and Technology Press
Remschmidt H, Belfer ML, Goodyer I: Facilitating Pathways: Care Treatment and Prevention in Child and Adolescent Mental Health. 2004, Berlin, Springer - Verlag
Du Y, Xu T, Tong J: Assessment of emotional state and self-concept of students in high school. Chinese Journal of Clinical Psychology. 1998, 6: 106-107.
Angold A, Erkanli A, Farmer EM, Fairbank JA, Burns BJ, Keeler G, Costello EJ: Psychiatric disorder, impairment, and service use in rural African American and white youth. Arch Gen Psychiatry. 2002, 59: 893-901. 10.1001/archpsyc.59.10.893.
Ford T, Hamilton H, Meltzer H, Goodman R: Predictors of service use for mental health problems among British schoolchildren. Child and Adolescent Mental Health. 2008, 13: 32-40.
Du Y, Su L, Li X: Usage of Conners Parent Symptom Questionnaire in boys with Attention Deficit Hyperactivity Disorder. Chinese Journal of Clinical Psychology. 1995, 1: 44-45.
Fan J, Du Y: Urban Norm and reliability of Conners Teacher Rating Scale. Shanghai Archives of Psychiatry. 2004, 16: 69-71.
Wang X: Rating Scales Manual for Mental Health. Chinese Journal of Mental Health. 1993, 54-68.
Xu T: Conners Children Behavior Checklist. Shanghai Archives of Psychiatry. 1990, 2: 46-47.
Goodman R: The Strengths and Difficulties Questionnaire: a research note. J Child Psychol Psychiatry. 1997, 38: 581-586. 10.1111/j.1469-7610.1997.tb01545.x.
Woerner W, Becker A, Rothenberger A: Normative data and scale properties of the German parent SDQ. Eur Child Adolesc Psychiatry. 2004, 13 Suppl 2: II3-10.
Goodman R: Psychometric properties of the strengths and difficulties questionnaire. J Am Acad Child Adolesc Psychiatry. 2001, 40: 1337-1345. 10.1097/00004583-200111000-00015.
Goodman R, Meltzer H, Bailey V: The Strengths and Difficulties Questionnaire: a pilot study on the validity of the self-report version. Eur Child Adolesc Psychiatry. 1998, 7: 125-130. 10.1007/s007870050057.
van Widenfelt BM, Goedhart AW, Treffers PD, Goodman R: Dutch version of the Strengths and Difficulties Questionnaire (SDQ). Eur Child Adolesc Psychiatry. 2003, 12: 281-289. 10.1007/s00787-003-0341-3.
Malmberg M, Rydell AM, Smedje H: Validity of the Swedish version of the Strengths and Difficulties Questionnaire (SDQ-Swe). Nord J Psychiatry. 2003, 57: 357-363. 10.1080/08039480310002697.
Smedje H, Broman JE, Hetta J, von Knorring AL: Psychometric properties of a Swedish version of the "Strengths and Difficulties Questionnaire". Eur Child Adolesc Psychiatry. 1999, 8: 63-70. 10.1007/s007870050086.
Koskelainen M, Sourander A, Kaljonen A: The Strengths and Difficulties Questionnaire among Finnish school-aged children and adolescents. Eur Child Adolesc Psychiatry. 2000, 9: 277-284. 10.1007/s007870070031.
Ronning JA, Handegaard BH, Sourander A, Morch WT: The Strengths and Difficulties Self-Report Questionnaire as a screening instrument in Norwegian community samples. Eur Child Adolesc Psychiatry. 2004, 13: 73-82. 10.1007/s00787-004-0356-4.
Obel C, Heiervang E, Rodriguez A, Heyerdahl S, Smedje H, Sourander A, Guethmundsson OO, Clench-Aas J, Christensen E, Heian F, Mathiesen KS, Magnusson P, Njarethvik U, Koskelainen M, Ronning JA, Stormark KM, Olsen J: The Strengths and Difficulties Questionnaire in the Nordic countries. Eur Child Adolesc Psychiatry. 2004, 13 Suppl 2: II32-II39.
Marzocchi GM, Capron C, Di Pietro M, Duran TE, Duyme M, Frigerio A, Gaspar MF, Hamilton H, Pithon G, Simoes A, Therond C: The use of the Strengths and Difficulties Questionnaire (SDQ) in Southern European countries. Eur Child Adolesc Psychiatry. 2004, 13 Suppl 2: II40-II46.
Cury CR, Golfeto JH: Strengths and difficulties questionnaire (SDQ): a study of school children in Ribeirao Preto. Rev Bras Psiquiatr. 2003, 25: 139-145. 10.1590/S1516-44462003000300005.
Woerner W, Fleitlich-Bilyk B, Martinussen R, Fletcher J, Cucchiaro G, Dalgalarrondo P, Lui M, Tannock R: The Strengths and Difficulties Questionnaire overseas: evaluations and applications of the SDQ beyond Europe. Eur Child Adolesc Psychiatry. 2004, 13 Suppl 2: II47-II54.
Dickey WC, Blumberg SJ: Revisiting the factor structure of the strengths and difficulties questionnaire: United States, 2001. J Am Acad Child Adolesc Psychiatry. 2004, 43: 1159-1167. 10.1097/01.chi.0000132808.36708.a9.
Bourdon KH, Goodman R, Rae DS, Simpson G, Koretz DS: The Strengths and Difficulties Questionnaire: U.S. normative data and psychometric properties. J Am Acad Child Adolesc Psychiatry. 2005, 44: 557-564. 10.1097/01.chi.0000159157.57075.c8.
Palmieri PA, Smith GC: Examining the structural validity of the Strengths and Difficulties Questionnaire (SDQ) in a U.S. sample of custodial grandmothers. Psychol Assess. 2007, 19: 189-198. 10.1037/1040-35188.8.131.52.
Hawes DJ, Dadds MR: Australian data and psychometric properties of the Strengths and Difficulties Questionnaire. Aust N Z J Psychiatry. 2004, 38: 644-651. 10.1111/j.1440-1614.2004.01427.x.
Mathai J, Anderson P, Bourne A: Comparing psychiatric diagnoses generated by the Strengths and Difficulties Questionnaire with diagnoses made by clinicians. Aust N Z J Psychiatry. 2004, 38: 639-643. 10.1111/j.1440-1614.2004.01428.x.
Almaqrami MH, Shuwail AY: Validity of the self-report version of the strengths and difficulties questionnaire in Yemen. Saudi Med J. 2004, 25: 592-601.
Thabet AA, Stretch D, Vostanis P: Child mental health problems in Arab children: application of the strengths and difficulties questionnaire. Int J Soc Psychiatry. 2000, 46: 266-280. 10.1177/002076400004600404.
Alyahri A, Goodman R: Validation of the Arabic Strengths and Difficulties Questionnaire and the Development and Well-Being Assessment. East Mediterr Health J. 2006, 12 Suppl 2: S138-S146.
Samad L, Hollis C, Prince M, Goodman R: Child and adolescent psychopathology in a developing country: testing the validity of the strengths and difficulties questionnaire (Urdu version). Int J Methods Psychiatr Res. 2005, 14: 158-166. 10.1002/mpr.3.
Goodman R, Renfrew D, Mullick M: Predicting type of psychiatric disorder from Strengths and Difficulties Questionnaire (SDQ) scores in child mental health clinics in London and Dhaka. Eur Child Adolesc Psychiatry. 2000, 9: 129-134. 10.1007/s007870050008.
SDQ: Information for researchers and professionals about the Chinese version of Strengths & Difficulties Questionnaire. 2008, [http://www.sdqinfo.com/d4a.html]
SDQ: Information for researchers and professionals about the Strengths & Difficulties Questionnaires. 2008, [http://www.sdqinfo.com]
Su L, Li X, Huang C: Chinese urban norm of Conner's Parent Symptom Questionnaire. Chinese Journal of Clinical Psychology. 2001, 9: 241-243.
Murphy KR, Davidshofer CO: Psychological testing:Principles and applications. 1996, New Jersey, Prentice Hall International Inc., 4th
Swets JA: Measuring the accuracy of diagnostic systems. Science. 1988, 240: 1285-1293. 10.1126/science.3287615.
Nunnally JC, Bernstein IH: Psychometric Theory. 1994, New York, McGraw-Hill Companies
Achenbach TM, McConaughy SH, Howell CT: Child/adolescent behavioral and emotional problems: implications of cross-informant correlations for situational specificity. Psychol Bull. 1987, 101: 213-232. 10.1037/0033-2909.101.2.213.
Mind Y: Normative data for the SDQ. 2007, [http://www.sdqinfo.com/b8.html]
Becker A, Woerner W, Hasselhorn M, Banaschewski T, Rothenberger A: Validation of the parent and teacher SDQ in a clinical sample. Eur Child Adolesc Psychiatry. 2004, 13 Suppl 2: II11-II16.
Becker A, Steinhausen HC, Baldursson G, Dalsgaard S, Lorenzo MJ, Ralston SJ, Dopfner M, Rothenberger A, Coghill D, Curatolo P, Falissard B, Hervas A, Le Heuzey MF, Novik TS, Pereira RR, Preuss U, Rasmussen P, Riley AW, Spiel G, Vlasveld L: Psychopathological screening of children with ADHD: Strengths and difficulties questionnaire in a pan-European study. European Child and Adolescent Psychiatry 15(SUPPL 1)()(pp I/56-I/62), 2006 Date of Publication: Dec 2006. 2006, I/56-I/62.
Muris P, Meesters C, van den BF: The Strengths and Difficulties Questionnaire (SDQ)--further evidence for its reliability and validity in a community sample of Dutch children and adolescents. Eur Child Adolesc Psychiatry. 2003, 12: 1-8. 10.1007/s00787-003-0298-2.
Koskelainen M, Sourander A, Vauras M: Self-reported strengths and difficulties in a community sample of Finnish adolescents. Eur Child Adolesc Psychiatry. 2001, 10: 180-185. 10.1007/s007870170024.
Muris P, Meesters C, Eijkelenboom A, Vincken M: The self-report version of the Strengths and Difficulties Questionnaire: its psychometric properties in 8- to 13-year-old non-clinical children. Br J Clin Psychol. 2004, 43: 437-448. 10.1348/0144665042388982.
CAMHS outcome research consortium. 2006
Goodman R: The extended version of the Strengths and Difficulties Questionnaire as a guide to child psychiatric caseness and consequent burden. J Child Psychol Psychiatry. 1999, 40: 791-799. 10.1017/S0021963099004096.
(Special thanks to Cao Qingwen of Shanghai Songjiang Lida school; Mei Jie of Lu Wan High school; Yang Lingdi of Jianjiang secondary school; Teng Jin of Huangpu school; Cai Suwen of Da Chang Zhen primary school; Zhu qin of Lian Jian school; Wang Rongfang of Shan Hai kindergarten; Wang Shunli of Yin Chuhu kindergarten; Zhang Bei of Jing Gu No.1 kindergarten; Zou Ruhao of Bei Jiao school; Wang Xiuling of Jiang Ning secondary school; Wu Junlin of Xing Zhi secondary school)
YD has received research funding from Xi'an-Janssen Pharmaceutical Ltd, Eli Lilly and Company.
JK has no competing interests to declare.
DC is an advisory board member for Cephalon, Eli Lilly, Janssen Cilag, Shire and UCB and has received research funding from Eli Lilly and Janssen Cilag.
YD and JK designed the study and collected the data, DC designed and conducted the analysis DC and YD wrote the paper. All authors revised and agreed the final paper.
Yasong Du and David Coghill contributed equally to this work.