Quality assurance processes of language assessment artefacts and the development of language teachers’ assessment competence

score points (DBE 2017:26). This pattern of low learner literacy performance has continued uninterruptedly for over two decades and a half under a post-apartheid democratic dispensation, precipitating ‘a low achievement trap’ (Carnoy, Chisholm & Chilisa 2012:158). Allied to this perpetual state of poor educational outcomes is ‘the learning trap’ (World Bank 2018:189). Causes of both the learning and low achievement entrapments are multiple, systemic and historical. Bloch (2009) referred to the challenged state of the South African education as resulting from ‘a toxic mix’ (Bloch 2009:88), which perpetuates learning and learner achievement Background: This article revisits the quality assurance (QA) processes instituted during the development of the Teacher Assessment Resources for Monitoring and Improving Instruction (TARMII) e-assessment tool. This tool was developed in response to evidence of a dearth in assessment expertise among South African teachers. The tool comprises a test builder and a repository of high-quality curriculum-aligned language item pool and administration-ready tests available for teacher usage to enhance learning. All assessment artefacts in the repository were subjected to QA processes prior to being field-tested and uploaded into the repository. Aim: The aim of this study was to extract from the assessment artefacts’ QA processes the lessons learned for possible development of language teachers’ assessment competence. Setting: The reported work is based on the TARMII tool development project, which was jointly carried out by the Human Sciences Research Council (HSRC) and the national education department in South Africa. Methods: Through employing an analytical reflective narrative approach, the article systematically retraces the steps followed in enacting the QA processes on the tool’s assessment artefacts. These steps include the recruitment of suitably qualified and experienced assessment quality assurers, the training they had received and the actual review of the various assessment artefacts. The QA processes were enacted with the aim of producing high-quality assessment artefacts. Results: The language tests and item pool QA processes enacted are explained, followed by an explication of the lessons learned for language teachers’ assessment writing and test development for the South African schooling context. Conclusion: A summary of the article is provided in conclusion.


Introduction
South African education continues to be characterised by underperformance of learners at both national and international levels. At national level, during the 2014 annual national testing, Grade 3 learners reportedly attained an average score of 52% on the verification (independently marked and moderated) literacy (home languages) component of the assessment (Department of Basic Education [DBE] 2014:50). On the Progress in International Reading Literacy Study (PIRLS) survey -an international comparative measurement of reading literacy competence -the country's Grade 5 learners continued to come last when compared with their peers internationally (Howie et al. 2018). Furthermore, the performance of their Grade 6 peers on the reading literacy component of the Southern and Eastern African Consortium for Monitoring Educational Quality (SACMEQ) regional learner attainment survey showed a marginal improvement (DBE 2017). South Africa joined the SACMEQ programme in the year 2000 beginning with SACMEQ II. However, it was during the SACMEQ IV cycle that the country's Grade 6 learners, for the first time, performed above the mean SACMEQ reading score by a few score points (DBE 2017:26). This pattern of low learner literacy performance has continued uninterruptedly for over two decades and a half under a post-apartheid democratic dispensation, precipitating 'a low achievement trap' (Carnoy, Chisholm & Chilisa 2012:158). Allied to this perpetual state of poor educational outcomes is 'the learning trap' (World Bank 2018:189). Causes of both the learning and low achievement entrapments are multiple, systemic and historical. Bloch (2009) referred to the challenged state of the South African education as resulting from 'a toxic mix' (Bloch 2009:88), which perpetuates learning and learner achievement inequalities. Spaull (2019) reflecting on the congruences and continuities of apartheid-era inequalities visible in nowadays South African education remarked that: It cannot be denied that the level of inequity that exists in South African education today has been heavily influenced by apartheid. Access to power, resources and opportunities -both in school and out still follow the predictable fault lines of apartheid. Yet while these patterns are historically determined, it is also an ongoing choice to tolerate the extreme levels of inequity and injustice that are manifest in our schooling system. (p. 19) Cilliers (2020) echoing Bloch and Spaull, ascribed the South African education challenges to policy choices the country made, which are incongruous with the schooling context they were enacted for. Notwithstanding the multiple causes of poor learning and learner underperformance, one area which seems to constantly receive attention in the discourse on South African learners' inefficient learning and attendant poor learning outcomes is the effectiveness of teachers in the classrooms (Hoadley 2012;Wildsmith-Cromarty & Balfour 2019). This refers to teachers ensuring that learners do profit from classroom instruction. The necessary preconditions to achieving successful instruction are that, teachers should possess sound content knowledge and pedagogical content knowledge (Shulman 1986). Research evidence points to South African teachers' instructional ineptitudes as resulting partly from their deficiencies in content knowledge and the subject instructional knowledge (Carnoy et al. 2012;Spaull 2016;Taylor 2019;Wildsmith-Cromarty & Balfour 2019). Both content and pedagogical knowledges are allied to teachers' capacity to develop and utilise assessments and tests to support learning and thereby facilitate epistemological access (Singer-Freeman & Robinson 2020). To counter their ineffectiveness in utilising assessments formatively, diagnostically or summatively, it is argued that teachers would require the following: available and accessible highquality assessment artefacts (e.g. assessment items, assessment tasks and tests); the necessary assessment and testing knowledge and skills to utilise these artefacts; the ability to adjudicate the quality of various assessment artefacts at their disposal before putting them to any purposed uses; and the ability to develop own high-quality and curriculum-relevant assessment items and tests. The development of the Teacher Assessment Resources for Monitoring and Improving Instruction (TARMII) tool was an attempt at making available to teachers the high-quality assessment artefacts to support teaching and learning in the classroom, and to concomitantly boost teachers' assessment competence.

Literature review
The work of teachers in schools, particularly in classrooms, is mainly concerned with ensuring that effective learning takes place. This means that teachers should mandatorily be in possession of the necessary instructional tools -the requisite subject (content) knowledge and the necessary teaching (pedagogical content) expertise (Deacon 2016) -to enable them to drive forward the learning process, in the correct direction, while ensuring that learning does occur. Allied to possessing instructional knowledge and skills, teachers should possess the necessary evaluative competence to deploy in gauging their learners' learning progress. According to Herppich et al. (2018), educational assessment is 'the process of assessing school students with respect to those characteristics that are relevant for learning in order to inform educational decisions' (Herppich et al. 2018:183). In the context of language teaching and learning, language teachers are expected to possess the language evaluative competence they will use to ascertain how much of language learning has occurred or still needs to occur. However, how language learning is evaluated in a school is linked to the learning culture found in that school (Inbar-Lourie 2008a, 2008b. Research points to the cultural contexts within which both language learning and language evaluation occur, making the distinction between the testing and assessment cultures that undergird the language leaning evaluation (Inbar-Lourie 2008a, 2008b. According to Inbar-Lourie, the testing culture with its behaviourist conception of knowledge is grounded in the psychometric positivist outlook of reality. It is analogous to what Shohamy (2001a:3) referred to as 'traditional testing', which emphasises the measuring of (language) knowledge learned or acquired to the exclusion of the context within which language learning takes place. Furthermore, it can be linked to the exclusive uses of testing by 'measurement experts'. Thus, where a testing culture predominates, the learning context within which it is applied becomes inconsequential as more emphasis is placed on testinternal issues (such as reliability and validity) (Shohamy 2001a:xxiii) to the exclusion of the social context of testing. On the contrary, the assessment culture considers assessment as a context-relevant activity grounded in learning and emphasising the social context within which language evaluation occurs. According to Inbar-Lourie, 'the constructed tests will be sensitive to contextual variables, to the learners' culture and linguistic background and to the knowledge they bring with them to the assessment encounter' (Inbar-Lourie 2008a:295-296). It achieves this through its link to the constructivist view of language learning, which is pivoted on the notion of socially constructed reality. This is akin to testexternal issues (Shohamy 2001a:xxiii) that incorporate the learners' language learning milieu in the assessment process or event. Furthermore, this has been argued for and evidenced by some researchers with reference to language learning (Heugh 2021;Makalela 2019) and testing and assessment in multilingual educational situations (Antia 2021;Heugh et al. 2016;Makgamatha et al. 2013) in the southern hemisphere and African contexts. This form of assessment utilisation is associated with the 'non-psychometric expert users' that include ordinary language teachers in schools. Thus, the language assessment practices are assessment user orientated and are 'embedded in the educational, social and political contexts' (Shohamy 2001a:4); they involve the roles of various stakeholders (e.g. test takers, teachers, broader society, etc.) (McNamara & Roever 2006), emphasise democratic participation of all involved in the assessment process in order to eliminate its detrimental effects, use of assessments to benefit the educational process and finally contribute to yielding ethical and socially just assessment outcomes (Beets & Van Louw 2011;Shohamy 2001aShohamy , 2001b. However, Inbar-Lourie (2008a, 2008b reminded us that although both the testing and assessment cultures evince distinct philosophical tenets and paradigmatic orientations, they nevertheless tend to exist side by side and in an entangled fashion in a variety of schooling and educational contexts. Furthermore, their coexistence in many countries' educational policy regimes and their attendant assessment systems is an undeniable reality, posing challenges to teachers' awareness of and their knowledge and abilities to deal with both testing and assessment precipitated demands in their work. While at the micro level (or inside school) language teachers may be agentially positioned to meet and deal with the testing and assessment demands in their work, the same cannot be said when their testing or assessment experiences are authorised from the macro level (or outside school). The latter experience leaves teachers with little or no say as authoritarian compliance is expected of them -resulting in teachers 'succumbing to what is perceived as binding assessment regulations similar to the external exams' (Levi & Inbar-Lourie 2020:178). As a result, language teachers would require both the general education and language learningrelated assessment competencies and the awareness of the socio-political context of their work in order to navigate the testing and assessment minefield.
While research on teachers' assessment competence or assessment literacy has captured the attention of researchers globally (Campbell 2013;Di Donato-Barnes, Fives & Krause 2014), in South Africa research in this area is still in an embryonic stage (Kanjee & Mthembu 2015;Weideman 2019) despite existing evidence of a dearth of assessment expertise among the country's teachers (Kanjee 2020;Reyneke, Meyer & Nel 2010;Vandeyar & Killen 2003). However, Hill, Ell and Eyers (2017) posited that while it is the primary role of assessment and testing to ensure that both teaching and learning occur effectively, it is equally incumbent that teachers are competently equipped with the knowledge and skills to facilitate the learners' learning through assessment and testing. In other words, language teachers should possess what is referred to as language assessment literacy or competence (Coombe, Vafadar & Mohebbi 2020). According to Coombe et al. (2020), language assessment literacy could be described as a: [R]epertoire of competences, knowledge of using assessment methods, and applying suitable tools in an appropriate time that enables an individual to understand, assess, construct language tests, and analyze test data. (p. 2) Thus, language assessment competence refers to assessment capacities that language teachers can utilise in developing their own language assessment items and tests of sound quality for the primary purpose of supporting their work inside the classrooms (e.g. for conducting classroom-based language assessment and testing), and secondarily to prepare language teachers to respond to assessment and testing demands from within and outside of their schools (such as, responding to assessments and tests mandated from the district, provincial or national education office). As a result, an ongoing development of language assessment literacy is a legitimate necessity for the language teachers' professional learning in order to close any gap or deficiency in their language assessment and testing competence (Popham 2009;Weideman 2019). Coombe et al. go further to argue for deisolation of language assessment literacy or competence from disciplinary pedagogical knowledge. To them, language teachers' professional armoury should be formed by an eclectic mix of their subject (disciplinary) pedagogical knowledge with their assessment and testing knowhow (assessment literacy or competence), both constituted in the service of the teaching and learning process. Language teachers' assessment literacy or competence can be developed through pre-service teacher training for individuals intending to enter the teaching profession (Taylor 2009) or through inservice professional learning for teachers who are already working (Jeong 2013).
A language assessment-literate teacher workforce working in any of South Africa's schools will be a boom to the country's test dominated and grading orientated assessment regime (Kanjee & Sayed 2013). The assessor roles and functions of these teachers, in addition to utilising their language assessment competence to support language learning, could include serving as a national assessment resource for constructing or developing language assessment items and tests to support language measurement initiatives mediated through the various tiers of the education system (such as accountability orientated formal assessment and testing) and in future replenishing of the TARMII tool's repository with high-quality language assessment items and tests. More will be said about the TARMII tool in the next section.

A brief description of the teacher assessment tool
The Teacher Assessment Resources for Monitoring and Improving Instruction abbreviated TARMII, is a web-based (or e-assessment) tool designed for South African teachers to enhance teaching and learning through assessment and testing. The TARMII tool could also be referred to as 'an integrated system of systems' (Drasgow, Luecht & Bennett 2006:471) because it has the following systems built into it: a repository of a collection of stand-alone assessment items (or item pool) and full-length administration-ready tests; a test builder for assembling a required test from the item pool; a test delivery and administration mechanism; a test scoring and reporting mechanism (Nissan & French 2014). The items and tests housed in the tool's repository were developed following the South African Curriculum and Assessment Policy Statement (CAPS) (DBE 2011a), making the TARMII tool fully aligned to and in sync with the country's curriculum. Furthermore, this tool allows teachers to manipulate the items and their metadata in the process of building their own tests. Utilising this tool, teachers will be able to select and draw from its repository, individual items to compile customised language tests for their learners. They will be able to assemble tests for their learners at a click of a (laptop or desktop) computer button or mouse or by touching the screen of an electronic device (such as a tablet or smartphone). Alternatively, teachers could draw from the tool a complete administration-ready language test and administer it to their learners. Each test comes with administration instructions, test item example(s), mark allocation and scoring procedures.
Learners can retrieve and take the teacher composed language tests from anywhere in the country provided they have internet connectivity. Accessed tests are shown on the screen of an electronic device and can be taken from there. In addition, the tool has the capacity to auto-mark selected-response assessment items and the short answer constructed-response ones. However, essay type questions would require marking and inputting of scores in the tool by teachers. Once marking is complete and scores inputted, teachers can generate diagnostic reports indicating their learners' strengths and weaknesses. From a pedagogical perspective, the TARMII tool provides teachers with lowstakes assessment items and tests which they could utilise for the purpose of enhancing language learning. The sophisticated technological functionalities built into this tool are complemented with a repository populated with high-quality assessment items and tests whose diversity is affirmed by the CAPS curriculum. It is thus argued that assessments and tests derived from this tool possess in-built sensitivity to the CAPS-aligned language teaching and learning. Furthermore, the language assessment items are linked to pedagogical video clips that demonstrate how some language content (e.g. phonics) can be taught. These video clips are a language resource meant for language teachers' usage in conjunction with the diagnostic reports. They serve the purpose of demonstrating to teachers how to teach some aspects of language skills, abilities or content their learners reportedly found challenging. Thus, the TARMII interlinks language instruction as prescribed in the CAPS to language assessment and testing and their attendant practices.

Theoretical framework
The development of assessment artefacts for an assessment tool repository follows a robust process that encapsulates the following broad stages: defining the tests and item pool purposes, writing of the items and test specifications (or blueprints) and the assessment tool metadata, contracting suitable (i.e. qualified, knowledgeable and experienced) item and test writers, training of these item and test writers on both item (or test) development and attendant quality review processes, commencement of the actual item or test development and concomitant qualitative quality assurance (QA) processes, assembly of draft 'testlets' (Reynolds & Livingston 2014:126) for field-testing (or piloting), and compilation of final tests and finalisation of the item pool following the performance of quantitative (or statistical) quality reviews (e.g. Albano & Rodriguez 2018;Muckle 2016;Schmeiser & Welch 2006) (see also Figure 1). Although Figure  1 portrays the stages of developing the assessment item pool and tests as linear and sequential, in reality, the development of assessment artefacts is neither direct nor sequential but iterative and sometimes repetitive (Davidson & Lynch 2002). It involves moving forward and backward between various developmental stages in a systematic manner (Grabowski & Dakin 2014;Lane et al. 2016). For instance, while the assessment specifications may be produced earlier to guide the entire item writing process, they may be modified as and when the actual item writing, or test composition is in progress. The rationale is that the resultant assessment artefacts should be in sync with the finally modified specifications. Also, the concomitant QA processes may necessitate that those items or tests be modified if their initial forms were found to be defective. Through the implementation of both qualitative and quantitative QA measures, good stand-alone assessment items (or items in tests) are retained, while faulty ones are sent back through the developmental trajectory for further corrective action, or are outrightly rejected and discarded if they are deemed irreparably flawed. It can be argued that while this assessment artefacts development process may seem to be an exclusive preserve for experts and professionals trained in test development, some of its aspects could be gleaned and packaged for language teachers' professional learning.
Given the seven stages outlined above for tests and item pool development for building a composite assessment item and test repository, what is abundantly clear is that teachers in schools neither have the capacity nor the time to engage in such an elaborate process for developing their own assessment activities (i.e. teacher-made items, tasks or tests). Some language teachers may display a lack of assessment acuity necessary for making informed and dependable choices when required to select high-quality assessment artefacts (Campbell 2013;Di Donato-Barnes et al. 2014). Both Campbell and DiDonato-Barnes et al. argued that some teacher-made and teacher-selected assessment artefacts turn to be of poor quality -testing cognitively low-order knowledge, skills and abilities (e.g. naming or recall). What is of concern is that these teachers tended to rely on such pedagogically questionable assessment artefacts for making sound educational decisions such as grading of their learners, evaluating their instructional effectiveness or determining the learning progress of their learners.

Aim of the article
This article focuses on the QA processes enacted to improve the quality of the language assessment artefacts during the development of the TARMII tool. These are the qualitative validation processes conducted prior to the field-testing of individual assessment or test items, or the compiled tests or testlets (also referred as 'sensitivity review' or 'fairness review' (McNamara & Roever 2006:129). Thus, assessment items and tests developed for the TARMII tool were put through quality review processes for quality enhancement purposes. The article is concerned with the lessons learned from the QA or review processes enacted during the development of the Foundation Phase (i.e. Grades 1-3) English Home Language (EHL) tests and the item pool for the TARMII tool. Of interest is what lessons have been learned from these processes that could inform the development of language teachers' assessment competence?
The latter refers to language teachers' 'awareness and knowledge of assessment' (Weideman 2019:2) that they could use to develop their own language assessments for teaching and learning evaluation purposes. Both Popham (2009) and Weideman considered continuous language assessment literacy development as a legitimate necessity for the language teachers' professional learning as it may serve the potential of closing any gaps in their language assessment and testing competence.

Paper research framework
This article utilises a reflective narrative approach (Clandinin et al. 2009;Downey & Clandinin 2010;Willig 2014) to 'restory' (Connelly & Clandinin 1990:9) and reconstruct selected observations from the development of the TARMII tool. The author as a member of the Human Sciences Research Council (HSRC) research team that collaborated with the DBE in building the TARMII tool, retrospectively reflects on selected aspects of building the tool's repository. This is performed with the view of offering possible future action towards the development of language teachers' assessment competence. This follows from Willig's (2014) declaration that: [A]ll narrative research is based on the theoretical premise that telling stories is fundamental to human experience, and that it is through constructing narratives that people make connections between events and interpret them in way that creates something that is meaningful (at least to them). (p. 147) Thus, from a participant observer perspective, the author makes a departure from the commonly agreed past story of and relational engagement between the HSRC and DBE in building the TARMII tool, thereby reconstructing a new narrative from the old (Clandinin & Caine 2008;Connelly & Clandinin 1990). The motivation towards this (re)telling episode is the evidenced dearth in assessment expertise among South Africa's language teachers (Weideman 2019). So, the central focus of this article is the revisitation of the language assessment artefacts' prior field-testing QA processes and the restorying of the lessons gleaned from these processes to inform the possible development of language teachers' assessment competence.

The teacher assessment tool development context
The development of the TARMII tool was spearheaded by a tripartite partnership comprising the HSRC, the DBE and the United States Agency for International Development (USAID), which also funded the project. However, the actual development of this tool rested primarily with the HSRC researchers and DBE officials. From the DBE side, the Directorate for National Assessments (DBE-DNA) provided leadership with occasional participation and inputs received from the Curriculum, Teacher Development and e-Learning branches. The DBE-DNA had over the years since post-1994 education reform amassed a wealth of knowledge and experience in initiating, preparing, implementing and reporting on national, regional and international assessments. For examples, the national assessments it conducted included the systemic evaluation studies (DoE 2003(DoE , 2005 and the annual national assessments (ANAs) (DBE 2013(DBE , 2014. It further chaperoned the country's participation in learner achievement surveys such as the regional SACMEQ studies (DBE 2010) and the PIRLS international reading literacy survey (HSRC 2017). Both the national and international learner achievement studies resulted in the packaging of assessment or test item exemplars for supporting teaching and learning.
The HSRC researchers and DBE officials constituted a TARMII project implementing committee. The HSRC assumed the following responsibilities: (1) contracting of the personnel responsible for language test item development and QA; (2) ensuring an evidence-informed development of the TARMII tool through working in collaboration with the software developers and guiding the development of various components (or functionalities) of the tool; (3) conducting field-testing of every component of the TARMII tool developed by software engineers in selected trial (or try-out) schools; (4) providing software developers the evidence of what worked or did not work in schools and classrooms with teachers and their learners and (5) last, running general acceptance tests of various TARMII tool components or functionalities at the request of the software developers before such functionalities were integrated into the system. The DBE, through its DBE-DNA, was tasked with directing the development of the tool -ensuring that it matched the South African curriculum and schooling contexts. This responsibility conferred on the DBE a gatekeeping role of ensuring that the TARMII tool, across various stages of its development, was aligned to the CAPS in both structure and functionality. Furthermore, the DBE-DNA was tasked with approving the quality of language assessment artefacts (items and tests) developed and their compliance with the curriculum.

Participant assessment quality assurers
Participants comprised 13 purposively selected education officials, 12 females and 1 male, identified by DBE-DNA for the HSRC to contract for reviewing all assessment artefacts destined for the TARMII tool's repository. Among them were provincial and district education officials and the EHL and http://www.sajce.co.za Open Access EFAL primary schoolteachers. All were regular participants in DBE's yearly ANA item and test development workshops with extensive experience and in-depth knowledge of language item writing and test development for the Foundation Phase. Also, they had deepened language content and teaching knowledges, a good understanding of the CAPS curriculum and its assessment prescripts and invaluable experience of the South African education system and its challenges.
The quality assurers were divided into three smaller groups per grade: the Grades 1 and 3 groups had three reviewers and a moderator each whereas the Grade 2 group consisted of four reviewers and a moderator.
The moderators' role was to lead, coordinate, support and guide their teams in performing quality checks on all assessment items or language tests developed. The processes for conducting quality reviews on all assessment artefacts were planned in a series of phases earmarked at achieving as an end product, curriculum aligned high-quality language tests and item pool. While both the HSRC and DBE-DNA played different but complementary roles in the item and test quality verification processes, the DBE-DNA played a gatekeeping role of adjudicating on the quality of items and tests finalised for uploading into the repository. The language items and tests that met the DBE-DNA criterion of being of high-or acceptable-quality were signed off for either direct uploading into the tool's repository or field-testing first then uploading. Substandard assessment artefacts were either recommended for further correction or outrightly rejected and discarded.

Sources of language assessment artefacts
The assessment artefacts destined for inclusion into the TARMII tool's repository comprised Foundation Phase (Grades 1-3) EHL items and the Grade 3 EHL tests. The EHL items were obtained from the following two sources: firstly, the HSRC's Foundation Phase language assessment item banking study (Frempong et al. 2015); secondly, the Foundation Phase EHL tests and exemplar items produced in DBE's yearly ANA testing; and thirdly, DBE's Foundation Phase (Grade 3 only) EHL diagnostic assessments.
In addition to existing language items, a new set of Grade 3 only EHL term tests were developed from scratch for the TARMII tool. These were full administration-ready tests produced in a series of four weekend group workshops conducted over a period of 4 months. The tests were produced following the CAPS-based test specifications as outlined in Stage 2 (in Figure 1). A total of eight language tests were developed: two tests for each school term consisting of a test for the beginning of a school term and another for the end of the term (as per Stages 5 and 6 in Figure 1).
Both the language tests and existing items were aligned with CAPS derived metadata or item tagging framework (shown in Appendix 1). This item tagging framework was developed in consultation with DBE's Curriculum Unit to ensure its curriculum compliance. Furthermore, all the language items and the full administration-ready tests were reviewed prior to either being field-tested or included in the tool's repository. The language items constituting the tests were also made available as stand-alone items, adding to the tool's item pool.

Preparing assessment quality assurers
Both the assessment quality assurers and moderators obtained a refresher training on language item writing and the attendant item quality review processes as outlined in Figure 1 (DoE 2016). The assessment quality review component included checking and ensuring that tests and items exhibited: general curriculum compliance, language content correctness or accuracy, accurate tagging of the items on the item tagging framework or metadata. Training for quality review of existing language items occurred in the following two residential workshops: an initial training workshop and a refresher training workshop. The DBE-DNA officials facilitated the workshops aided by HSRC researchers.

Initial training workshop:
This was a one full-day on-site workshop during which the following activities were carried out: (1) each group selected suitable assessment or test items from a pool of existing Foundation Phase EHL items for reviewing in accordance with the DBE-DNA guidelines (DBE 2015; DBE 2016); (2) retained good items and flawed ones that could be improved, while outrightly rejecting and discarding irredeemable ones; (3) checked for the alignment of items or tests to the CAPS for the respective grades; and (4) ensured that the language content of the assessment artefacts was suitable for the children who will be using them, etc.
Following the on-site training workshop, the quality assurers went back home to their respective provinces and continued to work from there. A week later, reviewers were asked to e-mail the first batches of items they had reviewed in order to get feedback from the trainers. It was on the basis of the item review challenges experienced and the feedback prepared that further corrective or refresher action was required (more will be said later about this in the paragraph below). Consequently, all item quality assurers and their moderators were invited to a follow-up full-day residential workshop aimed at correcting emerging missteps in the reviews.

Refresher training workshop:
All reviewers and moderators were assembled to a refresher workshop following the initial evaluation of their reviews. The purpose of this workshop was to deal with some of the language item oddities experienced from the reviewed items such as: (1) reviewers approached their task from a summative or judgement angle more than from a formative or classroom assessment perspective. Their association with and participation in ANA processes was the likely cause of this outcome; (2) A shortcoming of paying less attention to detail stemming from the fact that working part-time from their homes competed for time space with the full-time jobs they held as either teachers in schools or officials in the district or provincial education offices; and (3) the established WhatsApp support group mechanism (Moodley 2019) was not used optimally either because of the latter reason or the quality assurers' variances in the knowledge and understanding of their assignment.

Language assessment items and tests quality assurance processes
Following the quality assurers' training, the following fourphased QA processes were performed on both the tests and item pool in two off-site phases (Phases 1 and 2) and two onsite phases (Phases 3 and 4) as captured in Stage 5 (Figure 1): The QA Phase 1: The QA process occurred off-site from the quality assurers' homes. It involved the review of all standalone items and items in the administration-ready language tests. All language items were checked for language content correctness, appropriateness and CAPS alignment. Any language errors or misrepresentations were corrected. Untagged items were tagged following the tagging framework (see Appendix 1), whereas those already tagged had their tagging reviewed for correctness. Successfully reviewed items and tests were emailed to moderators for the next phase of the process.
The QA Phase 2: The moderation of all language tests and items received from Phase 1 also occurred off-site. All assessment artefacts were moderated for their language content correctness, grade appropriateness, general curriculum compliance or alignment and their tagging checked for correctness. Approved assessment artefacts were passed on (emailed) to the HSRC in Phase 3 and defective ones returned to Phase 1 to be corrected.
The QA Phase 3: Moderated language tests and items received from quality assurers were further inspected on-site by HRSC researchers for their language correctness, curriculum compliance and appropriateness of the item tagging before being passed on to the DBE-DNA officials for final review. Sub-standard assessment artefacts were sent back for corrective action.
The QA Phase 4: This last phase in the QA sequence entailed DBE-DNA receiving the language items and tests from the HSRC and conducting final checks on the appropriateness of the tagging following the metadata specifications (see Appendix 1), language content correctness and appropriateness for the target learner users, etc. Any deviation realised at this level necessitated sending back the item or items to the previous phase or phases for necessary fixing. However, all assessment tests and items deemed appropriate were signed off by the DBE-DNA for field-testing, and then uploaded into the repository by the HSRC.

Lessons learned for language teachers' assessment and test development competence
South African teachers, as is the case with their peers elsewhere, are expected to have acquired skills to develop their own assessment artefacts from pre-service training or through in-service learning at their schools. On the contrary, research evidence points to teachers' continued display of their assessment expertise deficiencies (Kanjee et al. 2012;Reyneke et al. 2010) in an inequitable post-apartheid education system. Teachers in general (language teachers included) tend to rely on low-quality and cognitively unchallenging assessment artefacts (Kanjee et al. 2012), with assessment practices leaning towards grading, recording and reporting (Kanjee & Sayed 2013), and barely inclined towards enhancement of instructional processes (Kanjee 2020 (3) or enrolment for further education and training at a relevant institution of higher learning. However, Bachman and Palmer (2010) in dispelling the misconception that language test development is a preserve for the highly technical 'experts', argue that ordinary teachers in schools could become competent language assessors (Bachman & Palmer 2010:8). As a result, lessons learned from the TARMII tool language assessments QA processes could offer some pointers towards processes of developing language teachers' assessment or testing competence. Such processes could focus on the non-technical aspects of assessment item writing or test development mediated through language teachers' in-service learning as proposed here.
Both test development and item writing are consultative and collaborative activities that require cooperation and teamwork. Teachers involved in the TARMII tool assessment quality review processes demonstrated this fact. However, group dynamics among the language teacher assessor trainees can be expected but should be well managed. Collaboration engenders the spirit of reliance on one's peers -contributing ideas to their colleagues and, in turn, receiving their feedback.
Training for item writing or test development can occur in a centralised or decentralised format, or in a combination of the two. A centralised group training can be conducted from a central place as was the case with the TARMII tool development. Schools through their existing school-based structures might be ideal training places. Fortunately, the South African schooling system is already equipped with such structures that can engender cooperative or collaborative language teachers' assessment literacy development (e.g. school assessment teams, curriculum phase teams, etc.). Decentralised training could be conducted virtually as experienced with the adaptation of education provisioning under COVID-19. Another possibility is an adapted combination of both physical group training and virtual training. In this context, a mix of physically attended initial group training workshops and follow-up monitoring and support through e-mail exchanges and WhatsApp group communication (Moodley 2019) is a possibility. However, the challenge posed by virtual training is the digital divide. Language teachers from materially under-resourced contexts with limited, unreliable or no access to internet connectivity will be at a disadvantage compared with their counterparts from technologically enabled contexts.
The trainers of aspirant language teacher assessors should ideally be experienced and competent language assessment or testing literate teachers (Weideman 2019). These should be teachers who have undergone such training themselves either during their initial teacher training or as part of their in-service teacher learning initiatives. Furthermore, trainers should possess sound content and pedagogical content knowledge, knowledge of the language curriculum, and the South African teaching, learning and assessment or testing contexts (including policy and practice challenges). They should be language teacher assessors who are willing to share their experiences and expertise with their peers in a non-threatening capacity development environment. As already stated, trainers could be drawn from language facilitators from the district offices.
The training of aspirant language teacher assessors should be focused on a set of skills, knowledge and content dealing with the assessment and testing of language as prescribed in the curriculum and attendant assessment policies; language item content compliance with the curriculum (or curriculum alignment); issues of item difficulty (and inclusion of items of different cognitive levels); and language item fairness (free from any possible bias) and equity (accommodation of learners' socio-economic, linguistic and culturally diverse backgrounds). This will ensure that language item or test validity is enhanced (Davidson 2013). In addition, trainees should be exposed to general basics of language item writing, the dos and don'ts of item writing, the item quality review processes, the development of language tests for different purposes (summative, formative or diagnostic purpose), etc. What is crucial for teachers in the classrooms is how can assessment and testing benefit the teaching and learning process. Consequently, training should attempt to disentangle the dominance of testing over other forms of assessment and inevitably allow for a balance between testing and assessment geared towards supporting pedagogical processes in the classroom.

Strengths and limitations
This article was conceived from a context of developing an e-assessment tool to support teaching and learning in the classroom. However, the tool's development was overshadowed by a context where testing had an overbearing presence within an assessment system. Consequently, assessment for accountability purposes dominated the testing ethos and was extended to the classroom level. Furthermore, language assessment quality assurers brought with them to the project their experiences and exposure to developing summative language tests (such as the ANA language tests) for measuring the performance or healthstate of the education system. The net effect of all these is that the TARMII tool was unintentionally orientated towards a dominant summative testing culture imbedded with an assessment culture. It is this bias towards testing that should be corrected when it comes to developing language teachers' assessment competence towards supporting teaching and learning in the classroom.

Conclusion
This article located non-technical aspects in the pre-fieldtesting phases of the assessment artefacts developmental sequence as potential starting points for the development of language teachers' assessment competence. Through a reflective narrative approach, it restoried the enactment of qualitative reviews of language assessment artefacts, offering, in turn, suggestions on how these non-technical QA processes could be harnessed to inform the development of language teachers' assessment competence. Of importance is how features of these quality enhancement processes could be appropriated for informing language teachers to becoming competent language assessors or testers even without possessing the skills of technically sophisticated assessor or tester experts. The argument made is that the subject content and pedagogical content knowledge language teachers possess, which could be a rarity in assessment expert technical teams, are a positive potential starting point from which language teachers could begin to accumulate their assessor or tester expertise. However, this would require a training intervention focussed specifically on empowering language teachers to becoming assessors or testers. Such intervention could be conducted at the school level utilising existing school-based structures. Training could take a centralised or decentralised format or combination of the two. Centralised training could involve teacher trainees meeting physically as a group, whereas decentralised training could be performed virtually, owing to the availability of internet connectivity. Training should be facilitated by knowledgeable and experienced language and assessment trainers with sound knowledge of the South African curriculum, assessment policies, language policies and their attendant practices and challenges in the country's schooling context. Such training should cover a wide spectrum of language assessment item writing and test development issues paying attention to the various assessment or testing purposes.