Developing a test battery for diagnosis of childhood apraxia of speech in Arabic speakers

Childhood apraxia of speech (CAS) is a speech sound disorder in which the precision and consistency of movements underlying speech are impaired in absence of neuromuscular deficits. It is important to differentiate between language disorders and CAS to avoid misdiagnosis. The objective of this study was to develop a test battery for CAS in order to identify its possible presence in Arabic-speaking children, thus allowing the planning of appropriate therapy programs. The constructed test battery for CAS was administered to 70 monolingual Arabic-speaking Egyptian children including 10 children with suspected CAS, 20 children with phonological disorders, and 40 typically developing children. Participants’ responses were statistically analyzed to assess the validity and reliability, and to evaluate sensitivity and specificity of the test battery. Statistically significant differences were found between the three groups as regard all subtotal and total scores of CAS test battery with good validity and reliability of the test. The constructed test battery for diagnosis of CAS is a reliable, valid, and sensitive tool that can be used to detect the presence of CAS in Arabic-speaking children and differentiate between it and phonological disorders.


Background
Childhood apraxia of speech (CAS) is a neurological childhood (pediatric) speech sound disorder in which the precision and consistency of movements underlying speech are impaired in the absence of neuromuscular deficits (e.g., abnormal reflexes and abnormal tone). CAS may occur as a result of a known neurological impairment, in association with complex neuro-behavioral disorders of known or unknown origin, or as an idiopathic neurogenic speech sound disorder. The core impairment in planning and/or programming spatiotemporal parameters of movement sequences results in errors in speech sound production and prosody [1].
The diagnostic criteria used to identify CAS have been at the center of controversy for decades. Forrest [2] stated that the most common speech behaviors proposed to characterize CAS were inconsistent productions, difficulty with oral motor skills, difficulty with imitation of sounds, poor sound sequencing, increasing difficulty with increased utterance length, struggle, and groping [3]. Both receptive and expressive language deficits were frequently reported in CAS, with receptive skills often being superior to expressive skills [4]. Academic difficulties including reading (decoding and comprehension) and spelling difficulties were also noted. The complex of behavioral features associated with CAS leads to severely reduced speech intelligibility even to family members [5].
It is important to differentiate between language disorders and CAS to avoid misdiagnosis. A differential diagnosis of CAS is often not possible for children under the age of 2 years. Even when children are between 2-3 years, a clear diagnosis cannot always be done because, at this age, children may still be unable to focus on, or cooperate with, diagnostic testing [6]. Davis and Velleman [7] stated that the child's speech inventory should be analyzed for limited consonant and vowel inventory, flat or monotone vocal quality, and/or the lack of consistent speech patterns. Further assessment of apraxia can include articulation and phonological performance on standardized tests. Non-speech motor functions assess the child's oral structure/function and movements. A language assessment should also be completed to assess the child's level of comprehension and expression, in addition to their level of intelligibility of speech in conversation.
Many tests were designed for diagnosis of CAS in English-speaking children such as Kaufman Speech Praxis Test for Children (KSPT) [8], The Apraxia Profile [9], Quick Assessment for Apraxia of Speech [10], and Screening Test for Developmental Apraxia of Speech-Second Edition (STDAS-2) [11].
To the best of the authors' knowledge, there is no published test battery to diagnose childhood apraxia in Arabic-speaking children.
The aim of this work was to develop an Arabic screening test for CAS in order to identify its possible presence in Arabic-speaking children, thus allowing the planning of appropriate therapy programs for these children.

Subjects
This cross-sectional descriptive study was applied on 70 monolingual Arabic-speaking Egyptian children in the age range 4-16 years who were divided into three groups. Group I consisted of 10 children with suspected CAS based on analysis of speech output obtained through spontaneous speech sample and articulation test, and children in this group were selected as they displayed some or all of the characteristics suggested by Davis et al. [12] which are markedly reduced speech intelligibility, limited phonemic repertoire, predominant use of simple syllable shapes, high incidence of vowel errors, increased errors on longer units of speech output, inconsistent articulation errors, frequent omission errors, and prosodic abnormalities. Group II consisted of 20 children with phonological disorders, who showed persistent phonological processes (whether developmental or non-developmental) but did not show signs suggestive of CAS based on clinical judgment. Group III consisted of 40 typically developing children who displayed age-appropriate speech and language skills.
Only children with an IQ ≥ 90, with mean length of utterance of at least two words, were included in the study. Children with other developmental disorders, hearing or visual impairment, evidence of dysarthria, or history of previous language or speech therapy were excluded from the study.
Children in groups I and II were collected from children attending the Phoniatric outpatient clinic. Children in group III were collected from kindergartens and schools or from patients' relatives who agreed to participate in this study.

Methods
The Arabic test battery for childhood apraxia of speech passed through the following stages, namely, design stage, pilot study, test application stage, and data analysis stage.

Design stage
The Arabic test battery for CAS consisted of 4 test items: receptive-expressive discrepancy, consistency of speech productions, assessment of speech and nonspeech motor tasks of the articulators, and assessment of prosody.
Both receptive-expressive discrepancy and consistency of speech productions were assessed through the use of preschool language scale-4 "Arabic Version" [13] and Mansoura Arabic Articulation Test (MAAT) [14] respectively. Psychometric evaluation using Stanford Banet Intelligence Scale "4 th Arabic Version" [15] was done for determination of IQ. Only children with an IQ of 90 and above were participating in the study.
A specially designed test battery for diagnosing CAS was constructed taking in consideration social, cultural, and language appropriateness to Arabic Egyptian culture and society for each proposed item in the test. The test was evaluated by three independent, experienced phoniatricians twice, one before the presentation of the test to the pilot study group and another evaluation after modification as suggested by the pilot study group. The final form of the test after the expert's opinion was presented to the study group. The following is a description of the test items (see Additional file 1): (1) Receptive-expressive discrepancy (RED) After determination of receptive language age (R age) and expressive language age (E age) by using the preschool language scale-4 "Arabic Version " [13], RED was calculated by subtracting receptive age from expressive age (E age-R age). Based on calculation of cutoff scores between group II and group III, each child's score on RED was interpreted as either 0% if RED was high (the discrepancy is above 6 m) or 100% if RED was in normal range (the discrepancy is 0-6 m).
(2) Consistency of speech productions Twenty five words of Mansoura Arabic Articulation Test (MAAT) [14] were chosen for testing consistency of speech production. These were the words presenting the sounds in word-initial position. The child was asked to name the pictures 3 times with 10 min interval between each, or separated by another activity. A score of 0%, 50%, or 100% was given based on the following criteria: 0% = inconsistent: if ≥ 40% of the words were inconsistent (≥ 10 words). 50% = borderline: if 10-< 40% of the words were inconsistent. 100% = consistent: if < 10% of the words were inconsistent (< 3 words).
(3) Assessment of speech and non-speech motor tasks of the articulators Non-speech motor tasks were assessed through asking the child to obey some orders (6 single and 2 sequenced orders) on command and imitation. A score of 0-II was given based on the following criteria: 0 = if the child failed to obey order on both command and imitation, I = if the child failed to obey order on command but could obey it on imitation, and II = if the child obeyed order on command. Then, a total score was calculated as percentage (total score = -----× 100∕ 16).
For assessment of speech motor tasks, the child was asked to perform each of twenty four motor speech tasks twice, once immediately then after 2 s delay. The tasks included six isolated sounds (3 vowels and 3 consonants), three CV syllables, four CVC syllables, two clusters (CVCC), two disyllabic words, two trisyllabic words, two phrases, two sentences, and counting (automatic). For immediate imitation, a score of 0-II was given based on the child's performance as follows: 0 = if the child could not perform the task, I = if the child produced the task immediately but with cueing, and II = if the child produced the task immediately without cueing. Cueing means helping the child to perform the task by slowing the movement with visual and\or tactile cue. Delayed imitation was given a score of 0-II as follows: 0 = if the child could not perform the task and II = if the child could perform the task.
Accordingly, each task was given a subtotal score of 0-IV by summation of the scores of both immediate and delayed imitation. Then, a total score was calculated as percentage (total score = ----× 100/96).
Diadokokinesis was tested by asking the child to repeat |pa|, |ta|, |ka|, and |pa ta ka| each as fast and as long as he/she could for 6 s (using timer). The task was audiorecorded and scored according to both frequency and accuracy. Frequency refers to the number of repetitions the child could execute in 6 s. The reference values, as calculated from group III children, were 15-20 repetitions for |pa| or |ta| or |ka| and 6-12 repetitions for |pa ta ka|. Accordingly, a child was given a score of I if it is in the normal range and 0 if below the normal range. Accuracy refers to the ability of the child to produce the sequence |pa ta ka| correctly. Correct sequence scored I, and incorrect sequence scored 0. Then, a total score was calculated as percentage (total score = -----× 100∕ 5).

(4) Assessment of prosody
The child was shown five pictures and asked to comment on each picture in appropriate prosody. Two practice trials were given first for illustration. A score of 0-III was given as follows: 0 = if the child produced incorrect prosody even with prompt (imitation), I = if the child produced borderline prosody but with prompt, II = if the child produced correct prosody but with prompt, and III = if the child produced correct prosody spontaneously without prompt. Then, a total score was calculated as percentage (total score = ----× 100/15).
The total score of CAS test battery was calculated by summation of subtotal scores in percentage (%) then divided by 6. The rationale of converting scores into percentages was to adjust the statistical weight of the individual items as was statistically recommended.

Pilot study and test application stages
The Arabic test battery for childhood apraxia of speech (CAS) was initially applied on 15 normal children to ensure understanding of test items. In testing consistency of speech production, the word /jas.mi:n/ which was initially chosen as representative of the sound /j/ was found to be difficult to identify by most children, so the picture of the word /taj.ja:.ra./ (pilot) was used instead. The test was then applied to the three study groups.

Statistical analysis
The results were collected, tabulated, and analyzed using SPSS statistical package Version 15. Qualitative data were presented as numbers and percentages. Comparison between groups was done by chi-square test. Spearman's correlation coefficient was used to test correlation between variables. P < 0.05 was considered to be statistically significant. Spearman's correlation coefficient was used for estimating test reliability. The reliability was scaled as follows: < 0-0.25 weak reliability, 0.25-0.75 moderate reliability, 0.75-1 strong reliability, and 1 optimum. Receiver operating characteristic (ROC) curve with the area under the curve (AUC) and their statistical significance was used as the indicator for total score. Identification of the cutoff value for diagnosis was

Results
The results were arranged into: 1. Descriptive data.
2. Reliability and validity of the test battery.
3. Sensitivity and specificity of the test battery.

Descriptive data
This study was conducted on a sample of 70 children divided into 3 groups: Group I (CAS group) was composed of 10 children including 7 males and 3 females in the age range 4-16 years (mean 7.79 ± 3.92). Group II (phonological disorders group) was composed of 20 children including 16 males and 4 females in the age range 4-8 years (mean 4.85 ± .68). Group III (normal group) was composed of 40 children including 24 males and 16 females in the age range 4-8 years (mean 6.19 ± 1.04). The 3 groups were matched for age and sex (P-value > 0.05) ( Table 1).

Reliability and validity of the test battery
Reliability was tested by test-retest and internal consistency reliability. In test-retest reliability, group III children were required to respond to the final form of the CAS test battery twice with 10 days interval. The results indicated excellent reliability of the test ( Table 2).
The internal consistency reliability was analyzed using reliability coefficient alpha (Cronbach's alpha) test. Values of alpha are considered excellent when α ≥ 0.9, good when 0.8 ≤ α < 0.9, and acceptable when 0.7 ≤ α < 0.8. Alpha value was good for groups I and III and acceptable for group II ( Table 3).
Validity of CAS test battery was measured by content validity, internal consistency validity, and contrasted group validity. The CAS test battery proved to have excellent content validity. Internal consistency validity is a measure of test homogeneity by correlating each section subtotal with the total score. The CAS test battery proved to have good internal consistency as there were significant positive correlations between each subtotal score and the total score (Table 4).
Regarding contrasted group validity, statistically significant difference was found between group I, group II, and group III as regard all subtotal and total scores of CAS test battery (P < 0.001). Group I (apraxia group) recorded the lowest scores while group III (normal group) recorded the highest scores (Table 5).

Sensitivity and specificity of the test battery
Using the ROC curve and AUC for identification of cutoff values, statistically highly significant values were obtained when differentiating between groups I versus II and between groups I versus III as indicated in Tables 6  and 7 respectively.   Accordingly, a child with score above 91.8% will be considered in the normal range. A child with score ranging between 60% and 91.8% will be considered in the phonological disorders range while a child with score less than 60% will be considered in the apraxic range (Table 8).

Discussion
This study was conducted on a sample of 70 children in the age range 4-16 years. Children above 4 years were selected because, as stated by ASHA [1], some primary characteristics of CAS (e.g., word inconsistency and a predominant error pattern) exist in the emerging speech of typically developing children under the age of 4 years. Also, there may be lack of a sufficient speech sample size for making a more definitive diagnosis below this age. Only children with average mentality were included because mental sub normality might affect children's performance on the different test items.
The Arabic Test battery for CAS consisted of 4 items including receptive-expressive language discrepancy, consistency of speech productions, assessment of speech and non-speech motor tasks of the articulators, and assessment of prosody. These items were selected depending on the main published symptoms of CAS and also based on reviewing tests for assessment of CAS in English-speaking children such as Screening Test for Developmental Apraxia of Speech-Second Edition (STDAS-2) [11], Apraxia Profile test [9], and Nuffield Centre Dyspraxia Program-3rd Edition (NDP3) [16].
In the present study, receptive-expressive language discrepancy was evident in the apraxic group of children. This finding extends the findings of previous study [17] which demonstrated that children suspected to have CAS which typically also have significant language deficits. A research challenge was to determine how such constraints are associated with the praxis deficit in planning and programming that defines CAS. One possibility is that language impairment is a consequence of having any type of disorder affecting neurological development [18]. Another possibility is that all expressive language deficits in children with CAS are due to their speech involvements [19].
Inconsistent articulation errors on repeated speech productions of the same words or utterance are considered as one of the key features of CAS. When producing the same utterance in different instances, a person with CAS may have difficulty using and maintaining the same articulation that was previously used for that utterance [20]. Previous research on preschool-aged children shows that phonemic inconsistency of speech sounds across multiple opportunities within and across word  position differentiates children with suspected CAS and phonological disorder [3,21].The present study supports that view as children in the apraxic group showed statistically significant impairment in their performance on the consistency subtest when compared with other groups.
Motor non-speech examination is critical for differentiating CAS from childhood dysarthria and other speech sound disorders and for identifying both oral apraxia and apraxia of speech either of which may occur in the absence of the other. Differential performance on the pairs of tasks and across tasks of varying complexity may indicate motoric difficulty with speech [22]. In the Arabic test battery for CAS, the child was asked to perform some oral non-speech movements both on command and imitation. No statistically significant difference was found between the 3 groups in their scoring on this subtest which indicated that there was no associated oral apraxia with CAS in the study sample.
The speech motor items were graded in complexity from isolated sounds to sentences. Cues (e.g., slowed rate, visual, or tactile cue) were provided to better judge the speech production and to determine how much cueing was necessary to facilitate performance. Statistically, significant difference was found in performance on speech motor tasks between the apraxic and the two other groups, being the lowest in the apraxic group. Caruso [23] and Forrest [2] stated that children with CAS always have delayed speech motor activity that affects programming more than those who suffer from phonological disorders. Diadokokinesis (DDK) was assessed by detection of the rate and the accuracy of the child's repetitions of C-V syllables. The score of the apraxic group was the lowest among the three groups which agreed with the previous studies [4,24,25]. All of these studies indicated that children with CAS suffer from inaccuracies, inconsistency, and miss-ordering in DDK production more than phonological disorders.
Inappropriate prosody, especially in the realization of lexical or phrasal stress (emphasis added), is considered as one of core features of CAS [1].The speech motor impairment of CAS appears to interfere with the development of the fine rapid control of articulatory muscles that are required for expression of subtle lexical stress contrasts across syllables [26,27]. Statistically, significant differences were found between the scores of the studied groups on the prosodic assessment subtest, being the lowest in the apraxic group.
Results indicated that both reliability and validity of CAS test battery were good to excellent. Also, the percentage of sensitivity and specificity was 100% which proved its value in differentiating between normal children and those with CAS or phonological disorders.

Conclusion
The designed Arabic test battery for diagnosis of childhood apraxia of speech is a valid and reliable tool to detect the presence of CAS in Arabic-speaking children and differentiate between it and phonological disorders.

Additional file 1. Arabic Test battery for diagnosis of Childhood Apraxia of Speech in Arabic speakers
Abbreviations CAS: Childhood apraxia of speech; RED: Receptive-expressive discrepancy; CV: Consonant Vowel; CVC: Consonant Vowel Consonant; CVCC: Consonant Vowel Consonant Consonant; ASHA: American Speech-Language-Hearing Association; DDK: Diadokokinesis DA shared in designing the research, conducted the clinical work, contributed to data analysis, and writing the paper. OA shared in designing the research, contributed to data analysis, writing and reviewing the paper, and submit the paper for publication. HB shared in designing the research, contributed to data analysis, and reviewing the paper. TA shared in designing the research, contributed to data analysis, and reviewing the paper. All authors read and approved the final manuscript.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Availability of data and materials All data generated or analyzed during this study are included in this published article.
Ethics approval and consent to participate Parents of the children have given their written informed consent to participate in the study. The study protocol has been approved by the IRB