Translation, cultural adaptation, and validation of an Arabic version of the test of narrative language—second edition

Background The significance of narrative skills is evident due to their role in the development of language and their connection to significant social and academic skills. This study aimed to translate, adapt, and validate the Test of Narrative Language-Second Edition (TNL-2) for its use as a tool for the assessment of narrative language in Arabic-speaking Egyptian children. In a cross-sectional study design, the Arabic-translated version of the TNL-2 was administered to 200 typically developing Arabic-speaking Egyptian children ranging in age from 4 years to 15 years and 11 months for validation. The participants were categorized according to their age into ten groups and their scores were analyzed. Face validity was assessed by asking five expert phoniatricians to review the Arabic version of the TNL-2 and complete a questionnaire that assessed the test’s effectiveness in measuring different narrative skills. Results A statistically significant difference was found when comparing the TNL-2 scores among the age groups under study. In addition, there was a significant correlation between standardized Arabic language test scores and the total comprehension and total production subtests’ raw scores of the TNL-2. The test-retest reliability and inter-rater agreement demonstrated a high level of reliability and inter-rater agreement. Experts have reached a consensus that the Arabic version of the TNL-2 is capable of evaluating the primary microstructural and macrostruc-tural components of Arabic narratives. Furthermore, it can provide insights into the overall narrative skills of Egyptian Arabic-speaking children. Conclusion The Arabic-translated version of the TNL-2 demonstrated validity and reliability as an instrument for assessing narrative language comprehension and production skills in Arabic-speaking Egyptian children.


Background
A narrative refers to the ability to produce a fictional or factual account of meaningful, chronologically sequenced occurrences and experiences [1].Narrative skills play a crucial role in language development and are closely connected to critical academic skills, namely, reading, comprehension, and writing [2][3][4][5].Narratives are crucial for developing proficient social skills, as evidenced by the observation that children with delayed language development typically exhibit less proficient social communication skills [6].
Studies have shown that narrative competence increases with age alongside language, cognitive, and social skills [5,7,8].Development of narrative abilities commences during the preschool period, typically around the age of 2, progresses throughout the schoolage years, and continues to develop through adolescence and even adulthood [5,7].
Language development is reflected in narrative skills, as the narrator must utilize age-appropriate linguistic abilities to communicate the primary narrative events, including the central theme.Furthermore, linguistic abilities are utilized to express the primary characters' affective states that motivate them to carry out specific actions [9].
In the early stages of development, narratives establish connections between the language used in various contexts, such as the language spoken at home and the language used in schools and literary contexts [10].Several studies have demonstrated the role of narrative language in predicting academic skills, precisely reading comprehension, writing, and mathematics [2][3][4].Likewise, children with poor reading and comprehension abilities have delayed narrative comprehension and production skills compared to their typically developing peers [11].
Delayed narrative skills are evident in various communication disorders, including developmental language disorder, hearing impairment, intellectual disabilities, and autism spectrum disorder [12,13].
The narrative structural organization consists of macrostructure and microstructure.The macrostructural level reflects story episodes based on Stein and Glenn's story grammar model 1979 [14].The model analyzes a narrative into a number of temporally sequenced episodes.Episodes encompass a specific setting (person, place, and time), a beginning (a problem that motivates actions), an internal response of the characters (emotions), actions (attempts to solve the problem), and an ending (resolution).These elements comprise the primary components of the narrative, and traditional storytelling requires the incorporation of these pivotal episodes [14].On the other hand, microstructural level refers to the utilization of productive and complex linguistic elements, including compound sentences, temporal and causal subordinate clauses, adverbial phrases, and adjectives [15].By integrating all linguistic components and employing intricate microstructural elements, the story's grammatical elements (macrostructure) are clarified, meaning is communicated, and additional details are incorporated [16].

Assessment of narrative skills
Narrative abilities are assessed in terms of macrostructure and microstructure [17].According to the existing literature, various tasks have been used to assess narrative language.Research has determined that both storytelling and retelling tasks are adequate for assessing narrative skills as both methods have advantages that rely on the cognitive and linguistic abilities necessary for storytelling [18].Story-retelling tasks focus on story structure, vocabulary acquisition, articulation, retrieval, and comprehension [18].The retelling requires the narrator to possess a deep understanding of the original story to retell it accurately [15].As a result, narratives serve as a means of coordinating sequencing, intricate language, pragmatic competence, and conceptual thinking [19].Difficulties in retelling tasks arise from children's reluctance to generate the target vocabulary and difficulty scoring [20].
In contrast, storytelling tasks entail newly generated stories encompassing personal or fictional narratives [1].Personal narratives are accounts of past experiences using characters and temporally coordinated events that might include problems and attempts to solve them [21].A script is a relatively uncommon form of personal narrative in which an individual is expected to recount regularly recurring events based on multiple personal experiences, such as describing the typical way in which a person spends their holidays [22].This discourse is characterized by its elaborative nature, as it centers around the narrator's personal experiences and recollections related to a specific subject rather than focusing on a specific occurrence [22].
Fictional narratives entail recounting a story constructed from fabricated events, which are not factual and are derived from the storyteller's imagination.Fictional narratives encompass untold stories, including those prompted by readily available stimuli, such as pictures [23].
Various assessment tools are available for Englishspeaking children.Examples include the Narrative Assessment Protocol by Bowles et al. [24], Assessment of Story Comprehension by Spencer and Goldstein [25], The Monitoring Indicators of Scholarly Language S. L. Gillam et al. [7], and the Test of Narrative Language by Gillam and Pearson 2004 [26].
The first version of the TNL [26] was designed to evaluate the narrative abilities of children between the ages of 5 and 11 years 11 months in terms of their understanding and creation of narratives, utilizing both actual and fictional stories.The initial iteration of the narrative language test (TNL) consists of two subtests: narrative comprehension and oral narration subtest.Each subtest includes three tasks presented in three different formats: no picture, sequenced pictures, and a single picture.Narrative comprehension is evaluated by presenting a story for each of the different formats to which the child is required to listen and then answer comprehension questions that contain literal questions that require the child to recall information presented in the story regarding the main story elements, such as characters' names, setting, and main problem, in addition to inferential questions that evaluate the ability of the child to make inferences beyond what was explicitly mentioned in the story.The Child's answers are scored according to the examiner's manual provided [26].The assessment of narrative production involves three tasks: retelling the first story with no picture, producing a story with a sequence of pictures, and with a single scene picture.Story production is scored for both story content, macrostructure, and microstructure [26].
The authors published the second version of the TNL, TNL 2, in 2017 [27].This version will be further discussed in the methods section.The primary distinction in the format lies in the inclusion of a picture in the TNL 2 version, whereas the initial version of the McDonald's story lacks a picture.Furthermore, the age range being evaluated is extended to encompass individuals between the ages of 4 and 15 years and 11 months.
TNL has been used in research to assess narrative skills in children with delayed language development, to assess the effects of narrative intervention, to correlate with other measures of language evaluation and working memory, and to predict academic performance in relation to narrative skills level [28][29][30][31].
Narratives can be different among cultures.Spanishspeaking children often emphasize the primary characters' internal responses [32].Japanese children's narratives lean towards producing brief, concise stories as they combine multiple experiences without much elaboration.In contrast, North American children often tell detailed narratives about a single event [33].Despite cultural variations, certain story elements consistently stand out, including the introduction of main characters, setting, timeframe, and the existence of a problem requiring a solution [19].

Narrative research in Arabic
Narrative language development of children was previously assessed in several languages, including Arabic [34].About 422 million people speak Arabic, including non-native speakers or those who speak Arabic as a second language.Most native Arabic speakers are present in Egypt, with a population of over 100 million.Several Arabic dialects exist, with the Egyptian, Maghrebi, and Gulf being the most commonly spoken forms [35].
A special property of the Arabic language is diglossia, which refers to the use of two forms of the language by its speakers for different social situations: Colloquial Arabic (Spoken Arabic) and Modern Standard Arabic (MSA) [36].Spoken Arabic is frequently utilized in daily interactions, while MSA (Modern Standard Arabic) is the formal variant utilized in educational contexts, writing, and formal events [37].The spoken form is the first to be developed as it is used in everyday context and the variety in which the narrative skills are first developed.The standard form is the formal variety used in literal and academic situations used in formal contexts.It is usually first encountered by children in the academic context, in reading and writing, or earlier in the preschool years through their exposure to the media [37,38].The two forms are similar regarding various aspects [39].However, the distance between the Spoken and the Standard forms of Arabic affects the phonological and lexical domains most [40].
The assessment of narrative language skills in Egyptian Arabic has not received much attention.To the best of our knowledge, only one study by Safwat et al. [41] targeted the assessment of narrative skills in preschool children.The objective of the study was to create a battery for evaluating narrative language.This involved having children retell a story using a series of pictures without words and then analyzing both the microstructure and macrostructure of their narrative production.The study comprised a cohort of 60 children, ranging in age from 2 to 6 years, who were native speakers of Egyptian Arabic.The child's performance was evaluated based on the organization of the story, which included elements such as the introduction of the setting and topic, the chronological order of events, the use of references, and the coherence of the narrative.The study findings indicated that the initial component of narrative organization to emerge at the age of 2 years was the utilization of basic verbs to depict action and setting.The study examined various components of language structure, including adjectives and the utilization of prepositions, as well as sentence structure, such as the utilization of simple and compound sentences.They observed a rise in the intricacy of sentences generated as individuals grew older, including the utilization of verb tense and various noun forms [41].Narrative productivity was assessed by calculating the total number of words, mean length of utterance, and type-token ratio, which refers to the number of different words in relation to the total number of words produced by the child.The mean length of utterance increased across the different age groups [41].The study was limited in scope as it exclusively focused on preschool children, despite the fact that literature indicates that narrative skills continue to progress throughout adolescence and even into adulthood [5,7].Additionally, story comprehension was not assessed [41].
Another study by Khodeir et al. [42] reported on developing and standardizing a test for pragmatic skills in Egyptian Arabic.The test assessed various pragmatic aspects in children aged 4 years through 10 years, including narrative skills.Narrative skills were evaluated by assessing the child's comprehension of the main story elements and the ability to answer questions about four stories.Narrative production was assessed by eliciting storytelling and retelling from pictures.The study reported a positive correlation between the children's scores and age.Nevertheless, the test did not focus on thoroughly evaluating narrative language proficiency, and the comprehension questions used were mainly literal.
Other studies investigating Arabic narrative production include a study by Ravid et al. [43] which examined the narrative skills of 97 monolingual Palestinian Arabicspeaking children using a story-retelling task.The study concluded that the length of narratives increased with age across seven different age groups.Interestingly, they reported using both the standard Arabic and the Spoken Arabic forms even in younger preschool children, with an increase in the complexity of the lexical and morphosyntactic structure as the grade level increased.
A recent study by Kawar et al. [44] investigated the narrative skills of 30 monolingual Palestinian Arabicspeaking preschool children by comparing the story comprehension and retelling abilities in both its Spoken and Modern Standard forms, focusing on microstructure and macrostructure.Unlike other studies, their findings indicated superior narrative comprehension in the MSA form (except for the theory of mind questions), while better production was observed in the Spoken Arabic form.
Asli-Badarneh et al. [45] conducted a study to assess the narrative abilities of 75 Arabic-speaking Canadian immigrants aged 7 to 12 years, using an Arabic translation of the Test of Narrative Language (TNL).The study aimed to investigate the impact of diglossia on their language skills.The study specifically examined the relationship between microstructure and macrostructure, focusing on the impact of the Standard Arabic lexicon on microstructural elements and its ability to predict macrostructure.
After reviewing the previous studies that focused on the effect of diglossia, it was concluded that better narratives are produced by children in their Spoken form rather than in MSA regarding the length of the story and morphosyntax, and narrative comprehension is easier in Spoken Arabic [43][44][45].Research shows that when oral language skills are examined, the Spoken variety is the one in which speakers are more proficient [46].Therefore, our translation of the TNL-2 of the stories and the comprehension questions directed to the child were in the context of the Spoken form of Egyptian Arabic.The instructions to the clinician and the scoring sheet were translated into MSA.
Based on this brief review of narrative research in Arabic, it is evident that there is a need for further studies focusing on Egyptian Arabic.Considering the significance of narrative language skills in language development and social interactions, a comprehensive assessment tool for narrative language is required.Due to the dearth of research targeting the assessment of narrative language skills of Arabic-speaking Egyptian children, this study focused on the translation, adaptation, and validation of the Test of Narrative Language-Second Edition (TNL-2) [27] for its use in assessing narrative language in Arabic-speaking Egyptian children.Our research question was: Do narrative language skills vary across different age groups, and is the translated version of the TNL-2 a valid and reliable tool to assess the overall narrative skills of an Egyptian Arabic-speaking child?

Methods
The study proceeded in the following steps:

Translation and adaptation of the TNL-2[27].
The translation and cultural adaptation process was carried out in accordance with the principles of good practice, following the subsequent steps [47]: 1-Preparation: Prior to translating the TNL-2, permission was obtained from the publisher to translate and culturally adapt the test, as well as its administration to the designated number of participants.1 2-Forward translation and reconciliation: Forward translation was then carried out from the original language (English) to the target language (Arabic) by two independent bilingual certified translators; both were native speakers of the target language.The two forward-translated versions were then reconciled to resolve any discrepancies between the translated versions through an independent native speaker of the Arabic language who had not been involved in any of the forward translations.The translation was mainly intended to capture the concept rather than being a literal translation.Furthermore, the stories and the questions directed toward the child were translated into the spoken form of Egyptian Arabic, the instructions directed toward the clinician, and the scoring sheet were translated to MSA.A single forward translated version was created.
3-Backward translation and review: A certified translator made a backward translation of the agreed upon forward translated version to translate the Arabic version back to the test's original language (English).The research team reviewed the backward translated version by two expert phoniatricians in the field of speech-language pathology to reach a final version and confirm the cultural appropriateness of the translated version.Adaptations made to the tasks of the TNL-2 in the Egyptian Arabic translated version are shown in Table 1.

Face validity
Face validity was assessed by pres1enting the Arabictranslated version of the TNL-2 to five experienced phoniatricians with at least 10 years of experience in child speech-language pathology and asking them to answer a questionnaire about the ability of the TNL-2 to assess different narrative comprehension and production skills, by giving a score from 1 to 5, denoting a poor to excellent ability of the TNL-2 to assess the skills in question.

Pilot study
A pilot study was done on 20 participants to ensure the test's clarity and cultural appropriateness, the children's On the weekend (Saturday/Sunday) On the weekend (Friday/Saturday) The days of the weekend were changed in accordance with the days of the weekend in Egypt (Friday and Saturday) Monday Sunday The answer to the question "when did Maria take her ship to school'' was changed to Sunday which is the first school day of the week in Egypt A A or full mark The grading system is different among the Egyptian schools.Grading using numerical marks is more common in Egyptian national schools.During scoring of the comprehension questions, both A or full mark such as 10/10 were counted as correct.

Late for school Task 4
No changes were done to adapt the story, it was found culturally appropriate.

Treasure Task 5
Erika and Michael Rana and Karim Common Egyptian first names.
Michael snuck up behind a large rock Karim hid behind a large rock The translation of the expression "snuck up" is not commonly used in Spoken Arabic.
Rolled their eyes They found it strange and laughed The literal translation of the expression "rolled their eyes" is not commonly used to show annoyance in Egyptian Arabic.We found it more appropriate that they expressed their disbelief by finding it strange and laughable.

Aliens Task 6
No changes were done to the original version.
ability to understand the stories, and the examiner's ability to administer the test.Additionally, reliability and inter-rater agreement were investigated on the 20 participants (data for the pilot study are included as Supplementary material).The participants were selected from the relatives of the patients attending the outpatient clinic of the phoniatrics unit to assess the validity and reliability of the translated Arabic version of the TNL-2.The study was conducted from April 2022 to March 2023.

Application of the
All participants were assessed by the following protocol of evaluation: a. Elementary diagnostic procedures: • History taking which included personal data and history of delayed language development or other developmental disorders.b.
Clinical diagnostic procedures: • Psychometric evaluation by Stanford Binet Scale 4th edition to assess intelligence quotient and mental age in order to exclude intellectual disability.An Arabic validated version of the Stanford Binet scale was used in the study, and the scores obtained by the children were given according to Arabic norm-referenced measures [48].• Standardized Arabic language test, as a screening tool for the children's linguistic skills and for exclusion of delayed language development [49].
The Arabic language test is formed of five subtests: semantics, expressive language, receptive language, pragmatics, and prosody.The child's score for each subtest and the total score were compared to the means for the child's age group to ensure all participants' linguistic skills were adequate for their age.The test was completed in about 20 min.
The test was administered in the Spoken Arabic form.• Finally, the Arabic-translated version of the TNL-2 was applied.

Description of the TNL-2
The TNL-2 is a measure of comprehension and production of connected speech used to tell stories.It assesses children's ability from 4 years 0 months through 15 years and 11 months to tell and comprehend three types of stories: scripts, personal narratives, and fictional narratives.The TNL-2 consists of six tasks organized into two subtests (comprehension and production).The comprehension and production tasks are presented alternatively.Comprehension tasks include Task 1, Task 3, and Task 5, while production tasks comprise Task 2, Task 4, and Task 6.

Comprehension subtest
The comprehension subtest comprises task 1, "McDonald's story, " task 3, "shipwreck story, " and task 5, "Treasure story." The stories are narrated to the child, and they are required to listen carefully and answer comprehension questions that assess the ability to recall essential story elements and events (such as the name of the characters, time, and the main problem).The questions also assess the ability of the child to make inferences and non-literal interpretations about the story.

Production subset
The production subtest is also composed of three tasks: task 2 is retelling the "McDonald's story." The second production task is task 4, "late for school, " in which the child is required to generate a story based on a sequence of five pictures.The third and final production task is task 6, "Aliens, " in which the child is required to generate a fictional narrative based on a picture.

Test timing and scoring
After listening to each story, the child is asked questions in the comprehension subset.The child receives 1 point for every correct answer.The first task has a maximum score of 20.Task 3 has a maximum score of 14. Lastly, task 5 has a maximum score of 13.The maximum total raw score for the comprehension subtest is 47.
The production subset is evaluated by listening to a voice recording of the child's speech at least three times.Every production is evaluated based on its narrative content and grammatical elements.The child is assigned one point for every story element that is mentioned.The evaluated story grammar elements include the utilization of temporal relations, causal relations, accurate grammar, dialogue inclusion, sequencing, and complete episodes.
Task 2 has a maximum score of 31, task 4 has a maximum score of 25, while task 6 has a maximum score of 30.The maximum total production raw score is 86.
The results of the three comprehension tasks are combined to form a total comprehension raw score.Similarly, the results of the three production tasks form a production subtest raw score.Raw scores are converted to scaled scores and percentile ranks according to the child's age.Age equivalents can also be obtained.The comprehension and production scaled scores are combined to form a total scaled score from which a composite (narrative language ability index) can be obtained.Descriptive terms are used to describe the scaled scores and narrative language ability index ranging from very poor to very superior.
A digital voice recorder was used to record the entire test during its application.The recordings were replayed later to fill the scoring sheet as appropriate.Administration of the test required about 15-20 min.However, the scoring time varied from one participant to another, as the recordings of the production subtests were required to be replayed at least three times to be scored appropriately.

Reliability testing
The 20 participants of the pilot study were reassessed after 2 weeks to obtain test-retest reliability.Additionally, two independent expert-phoniatricians were asked to listen to the recordings of the 20 participants in the pilot study and score the participants separately to test for inter-rater agreement.

Statistical methodology
The data were analyzed with the Statistical Package for the Social Science (SPSS) for version 25 (SPSS Inc, Chicago, IL).Kolmogorov-Smirnov's test of normality revealed significance in the distribution of most variables, so non-parametric statistics were adopted.Data were described using minimum, maximum, mean, standard deviation, standard error of the mean, 95% CI of the mean, median, 95% CI of the median, 25th-75th percentile, and inter-quartile range.Categorical variables were described using frequency and percentage.Comparisons were carried out between more than two independent, not-normally distributed subgroups using the Kruskal-Wallis test.Pearson's correlation was used.Intra-class correlation (ICC) was also used to assess agreement.Cicchetti guidelines were utilized for the evaluation of the ICC coefficient value.Interpretation for ICC: Cicchetti (1994) gives the following often quoted guidelines for interpretation for kappa or ICC inter-rater agreement measures: Less than 0.40 (poor), between 0.40 and 0.59 (fair), between 0.60 and 0.74 (good), and between 0.75 and 1.00 (excellent).During sample size calculation, beta error accepted up to 20% with a power of study of 80%.An alpha level was set to 5% with a significance level of 95%.Statistical significance was tested at p-value <.05.

Distribution of the participants according to sex, Stanford Binet scale, and Arabic language test results
Table 2 shows the distribution of the study participants in 10 age groups and their sex distribution.The results indicated that the children demonstrated at least average intelligence and overall general IQ on the Stanford Binet subtests.The participants' results on the standardized Arabic language test were within the range considered "adequate for their age" when compared to the normative data.

Face validity
The summary of the responses from the five expert phoniatricians is displayed in Table 3.All experts unanimously concurred that the TNL-2 possesses exceptional proficiency in comprehensively understanding a child's narrative abilities at a specific age.It also evaluates the child's capacity to generate narratives' fundamental microstructural and macrostructural components.

Application of the TNL-2 results
Median scores for the comprehension subtests and the total raw score for comprehension are reported in Table 4. Median scores for the production subtests and production total raw score are reported in Table 5.
The Kruskal-Wallis test revealed a statistically significant difference in all assessed subtests of the TNL-2 across different age groups.This difference was reflected in a statistically significant increase in raw scores for both the comprehension and production subtests across the age groups.
A statistically significant difference was found between the assessed age groups regarding the raw scores of the comprehension subtest.A statistically significant difference was found in the McDonald's story raw scores (p < 0.001), the shipwreck story raw scores (p < 0.001), and the Treasure story raw scores (p < 0.001).The total comprehension raw score obtained by combining the raw scores of the three previously mentioned stories showed a statistically significant difference between the age groups (p < 0.001), as depicted in Table 4.
A statistically significant difference was found between the assessed age groups regarding the raw scores of the production subtest.A statistically significant difference was found in the McDonald's story raw scores (p < 0.001),   the late-for-school story raw scores (p < 0.001), and the Aliens story raw scores (p < 0.001).The total production raw score, obtained by combining the raw scores of the three previously mentioned stories, showed a statistically significant difference between the age groups with a p-value (p < 0.001; Table 5).

Correlation between the Arabic language test scores and the TNL-2 scores
Figure 1 shows a strong positive correlation between the TNL-2 comprehension total raw score and the Arabic language test total raw score in 200 measurement points (p < .0001).
Figure 2 shows a strong positive correlation between the TNL-2 production total raw score and the Arabic language test total raw score in 200 measurement points (p < .0001).

Test-retest reliability: (data provided as Supplementary material)
A very high positive correlation was found between the total comprehension raw score of the TNL-2 at the test and the retest times of assessment (r = 0.977, p < .001*).An excellent degree of reliability was found between the total comprehension raw score of the Test of Narrative Language measurements.The single measure ICC was .970with a 95% confidence interval from .908 to .989(F (19,19) = 80.543, p < .001).
A strong positive correlation was found between the total production raw score of the Test of Narrative Language at the initial assessment and the subsequent retest times (r = 0.981, p < .001*).An excellent degree of reliability was found between the total production raw score of the Test of Narrative Language measurements.The single measure ICC was .970with a 95% confidence interval from .839 to .991(F (19,19) = 106.179,p < .001).

Inter-rater agreement (data provided as Supplementary material)
There was an excellent degree of inter-rater agreement between the total comprehension raw score of the TNL-2 measurements.The single measure ICC was .994with a 95% confidence interval from .986 to .998(F (19,19) = 332.191,p < .001).
In addition, there was an excellent degree of inter-rater agreement between the total production raw score of the TNL-2 measurements.The single ICC was .993with a 95% confidence interval from .991 to .999(F(19,19) = 270.191,p < .001).

Discussion
Narratives are regarded as a significant measure of language development and a means of structuring language comprehension, abstract reasoning, and sequencing of events [19].Moreover, narratives are linked to social, literacy, and academic skills development [2][3][4][5][6].
To our knowledge, no currently available Egyptian Arabic tool for assessing narratives addresses the full range of narrative abilities and the broad age range assessed by the TNL-2.The primary objective of this study was to create an Egyptian Arabic version of the TNL-2, which   can be utilized as an assessment tool for evaluating the progress of Arabic language skills, particularly in the area of narrative development.This study considered the lack of literature that examines narrative skills in Egyptian Arabic-speaking children.
Several studies have used the TNL to assess the narrative skills of children with delayed language development, to evaluate the effectiveness of narrative interventions, and to correlate the performance of children in narrative tasks to different academic skills such as reading [28][29][30][31].
The TNL [26] has been translated, culturally adapted, and validated to other languages, such as Portuguese, to assess children's narrative skills.They concluded that the TNL can differentiate between different age groups regarding their narrative skills [50].Additionally, an Arabic version was used to assess Arabic microstructure in Arabic-speaking children in Canada with respect to diglossia.The study concluded a significant relationship between microstructure and story grammar elements, with evidence of the role of the standard Arabic lexicon in predicting macrostructural elements [45].
The TNL-2 was specifically chosen in the current study for translation and adaptation into Egyptian Arabic for several reasons.First, the test assesses the main narrative dimensions: macrostructure and microstructure.Both fundamental aspects are represented in the TNL-2 as the narrative production tasks are scored based on the story content and complexity.Children's narrative productions are scored based on semantic and morphosyntactic elements, including conjunctions, temporal relations, correct grammar, story grammar elements, and  the production of complete narratives [27].These aspects are reported in the literature as narrative language's most critical linguistic representatives [51].
Furthermore, TNL-2 assesses both comprehension and production [27].Comprehension tasks include literal and inferential questions that tap into the children's cognitive and pragmatic abilities [52].Additionally, we aimed to validate the test in order to use it to assess narrative skills for those with normal and disordered language skills later on.Some language disorders, such as specific language disorders, are known to have a discrepancy between receptive and expressive language skills [26].Incorporating both comprehension and production tasks in the TNL-2 would be advantageous for capturing these distinctions and facilitating the diagnosis and monitoring of intervention programs.
Furthermore, the TNL-2 assesses narrative comprehension and production in a wide age range (4.0 through 15.11) with normative data for children aged 4 years through 15 years and 11 months [27].The assessment of narrative skills encompasses various formats, including story retelling, picture sequencing, and the interpretation of a single picture as a script, personal narrative, or fictional narrative.This approach enables the examination of children's narrative abilities through a diverse range of tasks [27].The test does not evaluate the spontaneous production of narratives, known as open narrative assessment methods [53].
In contrast to the structured methods, such as story retelling or using pictures to elicit narratives, in open methods, the child is required to produce a spontaneous account of a familiar situation, which requires memory skills, linguistic competence, and cognitive maturation.In that case, the examiner has no control over the narrative's subject, thereby posing challenges in standardizing assessment instruments and drawing comparisons among subjects [53,54].The TNL-2 serves as an assessment tool that fills the gap where the evaluation of narratives is concerned, and other previously used tools assessing narratives in Egyptian Arabic were lacking.One study by Safwat et al. targeted the assessment of narrative skills in preschool Arabic-speaking Egyptian children.The study limitations included assessment of preschool children only, and assessment of narrative comprehension was not included [41].
Another study by Khodeir et al. (2017) assessed various pragmatic aspects in children from the age of 4 years through 10 years, including narrative skills.Even though the assessment of a broader age range was included in addition to evaluating comprehension skills, the questions were mainly literal, targeting the main story elements without evaluating the ability to make inferences [42].
In the present study, comprehension of stories was assessed by listening to three stories, followed by comprehension questions the child was required to answer.The results showed a statistically significant increase in the comprehension scores of the three stories across the age groups.This finding can be attributed to the ability of children to develop story comprehension skills with age, in parallel with receptive language and cognitive skills [55].Our finding is also supported by the strong correlation obtained by the Arabic language test scores, which constitute receptive and expressive components and the total raw comprehension score of the TNL-2.Comprehension questions for each story were a mixture of literal and inferential questions.The ability to make inferences continues to develop with pragmatic language development as the child matures, and the ability of children to make implications starts as early as 4 or 5 years old and continues to develop with age [56].Earlier studies have reported that making inferences is an ability of late acquisition observed by the age of 8 years [57].The development of these inferential skills enables children to answer more comprehension questions correctly.Therefore, increased comprehension scores in the current study were observed with increasing age.
A statistically significant increase in the scores of the production subtests was found across the age groups.Narrative production was assessed by retelling a story script while looking at the appropriate picture, producing a personal narrative based on a sequence of five pictures of familiar events, and producing a fictional narrative while looking at a picture.The scores for the production subtests were given based on the number of the correct elements produced in addition to the use of specific microstructural elements specified in the scoring sheet.The lower scores obtained in, the younger age groups are explained by the younger children producing fewer story elements [58], with less use of temporal and causal relations [59], as observed in the sample included in the current study.Additionally, the use of correct morphosyntax, complete episodes, and dialogue was noticed in the narratives of older children.The fact that these elements were scored in the test rendered the scores of children in the younger age groups less than the older ones.Subsequently, significant differences were found among the groups.This finding agrees with Safwat et al. (2013), who reported that the use of references increased with age.The study also reported an increase in the complexity of the sentences produced with age, such as the use of verb tense and different noun forms [41].
These findings were also supported in the current study by a strong positive correlation between the total scores of the Arabic language test and the total production raw scores of the TNL-2, showing that language development and narrative language skills continue to develop with age.However, it should be taken into consideration that the ALT used in the current study is a screening tool for children from 2 to 8 years, which is the age range of language development save for the more complex linguistic pragmatic skills that continue to develop throughout adolescence [8].This explains the ceiling effect found in our study in the results of the ALT around the age of 8 years, as most of the children obtained a total score around and above this age.
The study results agree with research on the development of narrative skills that show that children continue to develop their narrative skills during maturation and produce narratives containing the main macrostructural elements in the form of initial events, problems, consequences, emotions, attempts at solving the problem, and resolution, all the while incorporating the use of language, temporal relations, and causal relations to produce a coherent narrative [60,61].
It is also noticed in the present study results that the medians of the scores of the children in the retelling task were higher than those of the personal narratives.The scores for fictional narratives were found to be the lowest, especially in the younger age groups.The later development of fictional narratives explains this finding, as cognitive development continues [62].Additionally, story-retelling tasks are easier for children than personal narratives, especially for the younger age groups.Our findings agree with literature demonstrating that story retelling is easier than story generation, and personal events are recounted more readily than telling fictional stories [19].
The Arabic version of the TNL-2 was proved to be reliable in the current study by assessing the test-retest reliability of the test items, which showed a very high positive correlation between test-retest results of the total comprehension raw scores, total production raw scores, and the narrative language index.Additionally, inter-rater agreement was measured and showed an excellent degree of agreement between raters regarding the same previously mentioned items.Face validity was verified through the evaluation of the translated version of the TNL-2 by five expert phoniatricians.They were asked to review the test and complete a questionnaire assessing its effectiveness in measuring different narrative skills.Experts unanimously concur that the Arabic iteration of the TNL-2 possesses the capacity to evaluate the primary microstructural and macrostructural components of narratives, providing insight into the overall narrative abilities of Egyptian children who speak Arabic.Furthermore, the test's internal validity was confirmed by the strong positive correlation between the test and retest scores.
The Arabic-translated version of the TNL-2 utilized in the current study demonstrated its validity, reliability, and comprehensiveness as an assessment tool for evaluating various narrative skills, encompassing both the understanding and production of narratives.The TNL-2 can be utilized to evaluate the progression of narrative skills in children across various age cohorts, enabling the determination of their present narrative proficiency levels in relation to normative data.
The current study has a number of limitations.The Arabic version of the TNL-2 was not administered to atypically developing children in order to evaluate the test's capacity to differentiate between children with typical language development and those with delayed language development caused by various factors.This could serve as a guiding principle for future research.Caution should be exercised when interpreting the results because the sample used in the study was not normally distributed.This was due to the cross-sectional design of the study, which resulted in the presence of outliers and caused some age groups to deviate from a normal distribution.As a result, a statistical analysis was conducted using a test specifically designed for non-normally distributed populations.Additional data should be collected on a larger scale of cities and schools to obtain more substantial evidence for generalization, applicability, and test standardization.

Conclusion
The Arabic-translated version of the TNL-2 is a valid and reliable tool that can be used to assess the comprehension and production of narrative language skills in Egyptian Arabic-speaking children.Further application of the test on a larger sample of children is recommended.The Arabic version of the TNL-2 is suggested to be used to evaluate narrative skills in children with delayed language development and to assess the results of language intervention on narrative language.

nTable 5
number of patients, Min-Max minimum-maximum, Std.Dev. standard deviation, CI confidence interval, KS Kolmogorov-Smirnov, H:KW Kruskal-Wallis H, NS statistically not significant (p > 0.05) Test of Narrative Language production subtests and total production raw score in the studied age groups n number of patients, Min-Max minimum-maximum, Std.Dev. standard deviation, CI confidence interval, KS Kolmogorov-Smirnov, H:KW Kruskal-Wallis H, NS statistically not significant (p > 0.05)

Fig. 1
Fig. 1 Correlation between the TNL-2 total comprehension raw score and the Arabic language test total raw score.Scatter plot with best-fit (regression) showing strongly positive correlation between Test of Narrative Language comprehension total raw score and Arabic Language Test total raw score in 200 points of measurements

Fig. 2
Fig. 2 Correlation between the TNL-2 total Production raw score and the Arabic language test total raw score.Scatter plot with best-fit (regression) showing strongly positive correlation between Test of Narrative Language production total raw score and Arabic Language Test total raw score in 200 points of measurements

Table 1
Adaptations made to the tasks of the TNL-2 in the Arabic version

Table 2
Sex distribution, Stanford Binet, and Arabic language test results in the studied age groups

Table 3
Summary of the rating of the five expert phoniatricians to the TNL-2 face validity questionnaire

Table 4
Test of Narrative Language comprehension subtests and total comprehension raw score in the studied age group