Tests on adaptive control systems. The concept of adaptive testing and the principles of its implementation. The constructed diagram of use cases for the adaptive testing subsystem is in Appendix A

One of the priority areas of development Russian education on modern stage is to ensure accessibility and equal opportunities to receive a full-fledged education, as well as to achieve a fundamentally new quality of professional educational services. Obviously, the main means of achieving these goals is to increase the role and importance of information technology. The construction of intelligent teaching systems is a big step towards the development and accumulation of electronic pedagogical content, which today consists of hypertext, electronic materials and tests. The main requirements for new learning systems include: intelligence, scalability, openness, flexibility and adaptability at all stages of the learning process.

Recently, there has been an increasing use of different stages educational process received various kinds of electronic diagnostic mechanisms (materials) - computer tests. Unfortunately, traditional testing, which is implemented using standardized tests, is gradually losing its relevance. It is developing and evolving into modern, more efficient intelligent forms of adaptive testing. Smart Forms knowledge diagnostics are based on theoretical and methodological foundations different from traditional ones and other technologies for constructing and reproducing tests. The system model must include modules that implement adaptive algorithms.

The key advantage of adaptive testing over the traditional form is its obvious effectiveness. An adaptive test allows you to diagnose the level of knowledge of the test taker using a significantly smaller number of questions. When interacting with the same adaptive test, test takers with a high level of training and test takers with a low level of training will solve completely different subsets of tasks. The first subject will see a significantly larger number of questions with a high difficulty coefficient, and the second one with a low one. The percentage of correct answers among subjects may be the same, but the number of points will vary significantly.

Adaptive testing allows you to more accurately build a model of knowledge (mastered competencies) of test takers. The computer testing system adapts to the user's level directly during the testing process. Thanks to flexible adaptation mechanisms, the system can determine which question and with what difficulty coefficient to present to the subject at each specific moment in time. For example, a subject begins to solve a diagnostic set and is presented with a task with a difficulty coefficient b, the solution of which tests knowledge within the framework of some small didactic unit S. If the subject solves the task presented to him correctly, then the analytical core of the system selects the next task within the same unit S, but with a higher complexity coefficient, etc. If the subject answers incorrectly to the initial question of the didactic element, then he is presented with a task with a lower difficulty coefficient, etc. The boundary values of the complexity coefficients are described in the model used in the diagnosis.

A computer intelligent adaptive testing system must have the following set of characteristics:

Openness and extensibility . The system should be built on a modular basis. An approximate composition of basic modules may be as follows: “Base”, “Tester”, “Designer”, “Configurator”, “Report Designer”, “Planning Module”, “Base” is intended for maintaining a list of users of the installed copy of the program, preparing a list of subjects, managing the directory of groups of subjects, configuring a special space (decomposition into thematic blocks). "Constructor" is intended for working with a database of test tasks and developing test packages. “Configurator” is intended for setting up testing work items (connecting tests, assigning testing sessions). “Report Designer” is designed to process primary testing protocols and build various reports. The “Planning Module” is designed to plan and monitor the testing process. The tester directly implements an adaptive mechanism for diagnosing the level of knowledge.

Nonlinearity of diagnostic content reproduction. An adaptive intelligent selection of the next test task should be implemented depending on the results of solving the previous ones.

Known Difficulty . All test tasks must be divided into categories of difficulty and have an appropriate coefficient that can be manipulated during the adaptation process.

Universality of the diagnostic model. The system allows for complete and high-quality testing of the knowledge of a large number of examinees without significant expenditure of time and resources within didactic units of any size

Reliability and accuracy of adaptive testing results. An approach is used that completely excludes the effective factor when analyzing the individual model of the subject’s knowledge.

Currently, a huge number of computer testing systems have been developed and implemented. Such systems vary significantly in classification parameters. And now we can say with confidence that adaptive computer testing systems are actively occupying their niche in the software market for organizing and supporting pedagogical processes.

Bibliography

Nikiforov, O.Yu. Application of a computer testing environment based on a database of tasks in test form in the quality control system of a higher educational institution. / O.Yu. Nikiforov, Yu.I. Nicoare // Quality management of continuing education: Collection of articles of the 2nd interregional scientific and practical conference // Responsible. editor E.Yu. Bakhtenko; Ministry of Education and Science of the Russian Federation; Dept. Education Volog. region; Vologda. state ped. univ. – Vologda: VSPU, 2011. – 192 p. , With. 72-79.
Nikiforov, O.Yu. Analysis of the MOODLE LMS testing subsystem // Information technologies in science and education: Proceedings of the International Scientific and Practical Internet Conference. – Mines: YURGUES Publishing House, 2008. – 238 p.
Nikiforov, O.Yu. Generalized component model of a computer testing system / O. Yu. Nikiforov // Education, science, business: features regional development and integration: Materials of the All-Russian scientific and methodological conference. – Cherepovets, – 2006. – p.309-311.
Nikiforov, O.Yu. The main elements of tasks in test form / O. Yu. Nikiforov // Education, science, business: features of regional development and integration: Materials of the All-Russian scientific and methodological conference. – Cherepovets, – 2006. – pp. 315-316.
Nikiforov O.Yu., Koksharova E.I. Complex of features for classification of computer testing systems // Modern Scientific research and innovation. 2013. No. 6
Nikiforov, O.Yu. Signs of classification of computer testing systems / O. Yu. Nikiforov // Education, science, business: features of regional development and integration: Materials of the All-Russian scientific and methodological conference. – Cherepovets, – 2006. – p.312-314.

Number of views of the publication: Please wait 1

According to the Concept of Modernization of Russian Education, it can be noted that the main efforts to implement the reform today are focused on increasing the role of information technology. They are used in different ways: to manage the educational process, for direct teaching, to control and check the assimilation and practical application acquired knowledge by trainees. For this purpose, recently there has been an increasing use of different stages training received various kinds of test surveys. The range of application of tests is very wide, from a short survey after an explanation of the current topic to final, final or entrance exams. At the same time, for many higher educational institutions topical issue is the use of information technology in the development of automated training and knowledge control systems. After all, the use of this type of system in the educational process will make it possible to apply new adaptive test control algorithms, use the multimedia capabilities of computers in test tasks, reduce the amount of paperwork, speed up the process of calculating survey results, simplify administration, and reduce the costs of organizing and conducting testing. In conclusion, it can be noted that computer knowledge control systems are becoming increasingly popular, which is explained by their objectivity, accessibility and cost-effectiveness.

Based on the above, a decision was made to develop a software package, a universal automated adaptive testing system (ASAT), which is a means of developing and creating various types of tests, and is also used for testing and processing results. The main requirement for the developed system was its intelligence, achieved by organizing the adaptability of the testing process.

The ACAT software package provides the following capabilities for organizing the testing process:

Automation of the test creation process, high-quality testing process.

Openness and scalability of the system.

The absence of a rigid connection to any subject.

Ease of creating and modifying tests.

Providing the possibility of multi-user work. Personalized access for all categories of users.

Protection against unauthorized access to tested tasks.

Developed navigation tools at all levels during the testing process. Availability of a means of dynamic control of the testing process on the part of the teacher.

Customization (adaptation) of test material to the individual characteristics of the student (student, student, specialist, etc.).

Adaptive selection of the next question depending on the correctness of the student’s previous answers.

Filling the database with test tasks, allowing you to work with both text, graphic, and dynamic test information.

Ability to create different tasks from one set of questions.

Possibility of testing parts of the course and, as a result, conducting final examination testing for the entire course.

Ensuring complete and high-quality testing of knowledge of a large number of trainees (students, pupils, specialists) without special time expenditure and material resources in all sections of the educational process.

Reliability, accuracy and objectivity of testing results. Elimination of a subjective approach to assessing students' knowledge.

Reducing the likelihood of errors occurring when calculating test results and generating the final grade.

Freeing teachers from the labor-intensive work of processing test results.

Prompt collection and analysis of testing results at any time with the ability to generate periodic reports and statements for various requests.

Introduction of ASAT into the learning process of Surgut State University students and in educational institutions of the Khanty-Mansi Autonomous Okrug - Ugra.

According to the method of assessing results, tests are of two types: traditional and adaptive. The advantage of an adaptive test over a traditional one is its effectiveness. An adaptive test can determine the test taker's knowledge level with fewer questions. When performing the same adaptive test, test takers with a high level of training and test takers with a low level of training will see completely different sets of questions: the first will see larger number difficult questions, and the last one - easy ones. The percentage of correct answers for both may be the same, but since the first answered more complex questions, he will score more points. Another significant effect is an increase in reliability, since in this case it is eliminated quick learning bank of tasks by simply “clicking” options on the computer (thus, you can learn only easy tasks, while difficult ones and some of the middle ones turn out to be unstudied).

In this testing system, adaptability is expressed in a change in the relative proportions in the presentation of easy, medium, and difficult tasks, depending on the number of correct answers recorded during the testing session. It should be noted that the transition to adaptive technique is possible only as a result of the accumulation of a significant bank of tasks with an empirically measured level of difficulty. Adaptability is combined with the principle of a “ladder algorithm” - presenting tasks with a systematic increase in the level of difficulty. First, easy tasks are presented, then medium ones, and, if the test taker is successful at previous levels, difficult ones. After each answer, the testing program determines the validity of the so-called “early transfer” of the test taker to a higher level. high level difficulties. At each step, the significance of the differences between the number of correct and incorrect answers is assessed. For values less than the 5% error level (rejecting the hypothesis of equal probability of occurrence of correct answers and errors), the test taker is transferred to a higher level of difficulty. If the tasks of a given level have been exhausted and the test taker has not moved to the next level of difficulty, then the testing process ends and the level of knowledge of the test taker is determined.

The system is implemented in the form of three independent modules:

testing module (intended for test takers);

module for creating and editing tests (intended for teachers);

module for statistics and analysis of results (intended for the teacher), which can be installed independently of each other on different client machines;

To save the initial data and results of the tests, a database is used that stores a bank of test tasks, test setup parameters and the testing process, information for user authentication, test results and other information on data processing.

The module for creating and editing tests identifies registered teachers or registers new teachers, accesses the database that stores test tasks and answers for each test, as well as its parameters, and allows the teacher to create a new test, change the settings of an existing test, edit questions and answers .

It should be noted that a teacher, having logged into the system using his login name and password, gains access only to his own set of tests, without being able to view or change the tests of another teacher.

Using the database, the testing module identifies registered or registers new users, selects a test, tests the subject by displaying a question on the screen and then receiving an answer, processes the received data and writes the test results into the database for further analysis and use by the teacher.

Test takers can only access certain tests pre-assigned by the instructor. In this case, an adaptive knowledge control algorithm is used, which determines the choice of the next task depending on the test taker’s answers to previous questions. In this system, there is no option to skip a question and return to it at the end of testing, this is due to the fact that depending on how the test taker answers the current question, the choice of the next question asked is determined. Upon completion of the test, the test result and a short comment are displayed to the test taker. The result of the test is the score that the user receives based on the criteria specified by the teacher for this test.

The module for statistics and analysis of test results provides the teacher with the opportunity to view the test results of an individual student or an entire group, for one or more tests with varying degrees of detail. In this case, the report displays the results of all students for all tests they have taken related to the selected subject of a specific teacher.

Since testing is based on the principle of adaptability, the questions, as well as their number within one test, will not be the same for each user. Therefore, this module provides the ability to output not only general information, but also a more detailed report on the test, which contains information about what questions the user received and how he answered them.

The created system corresponds modern requirements requirements for a class of this type of system, both in the field of pedagogical testing and in the field of information technology.

Bibliographic link

Bushmeleva K.I. AUTOMATED ADAPTIVE TESTING SYSTEM // Basic Research. – 2007. – No. 2. – P. 48-50;
URL: http://fundamental-research.ru/ru/article/view?id=2517 (date of access: 09/18/2019). We bring to your attention magazines published by the publishing house "Academy of Natural Sciences"

Adaptive test control is understood as a computerized system of scientifically based verification and assessment of learning results, which is highly effective due to the optimization of procedures for generating, presenting and evaluating the results of performing adaptive tests. The effectiveness of control and evaluation procedures increases when using a multi-step strategy for selecting and presenting tasks based on algorithms with full context dependence, in which the next step is performed only after evaluating the results of the previous step. After the subject completes the next task, each time there is a need to make a decision about selecting the difficulty of the next task, depending on whether the previous answer was correct or incorrect. The algorithm for selecting and presenting tasks is based on the principle of feedback, when, if the subject answers correctly, the next task is selected as more difficult, and an incorrect answer entails the presentation of a subsequent easier task than the one to which the subjects were given an incorrect answer. It is also possible to ask additional questions on topics that the student does not know very well in order to more accurately determine the level of knowledge in these areas. Thus, we can say that the adaptive model is reminiscent of a teacher taking an exam - if the student answers the questions asked confidently and correctly, the teacher quickly gives him a positive mark. If the student begins to “float”, then the teacher asks him additional or guiding questions of the same level of complexity or on the same topic. And finally, if the student answers poorly from the very beginning, the teacher also gives a grade quickly enough, but negative.

Advantages:

Allows for more flexible and accurate measurement of trainees’ knowledge;

Allows you to measure knowledge with fewer tasks than in the classical model;

Identifies topics that the student knows poorly and allows him to ask a number of additional questions about them.

Flaws:

It is not known in advance how many questions need to be asked to the student to determine his level of knowledge. If the questions included in the testing system are not enough, you can interrupt the testing and evaluate the result based on the number of questions answered by the student;

Can only be used on a computer.

Classic knowledge assessment scales and Item Response Theory.

Classical testing theory ( Classical Test Theory - CTT ) originally created for the interpretation of diagnostic procedures. This theory was created for purely applied problems, so some of the assumptions used in the foundations of this theory need to be clarified, especially since these foundations are almost not discussed in the literature.

Classic testing theory explicitly assumes:

1. One-dimensionality, i.e. The test procedure measures only one quality, readiness or ability.

2. Representativeness, within the CTT framework, understood as the independence of the probability of a particular assessment from which subgroup of the general population will perform the test.

3. Independence of tasks, i.e. tasks are independent of each other.

4. Independence of the test subjects' answers.

Both mentioned independences are understood at least in a statistical sense.

Since diagnostic procedures in most cases were carried out in the form of tests, and in most tests in the form of closed or, less often, open questions, the result of each answer was assumed to be measurable in points on some scale.

In addition to explicit assumptions, this theory contains some implicit assumptions. In particular, it is implicitly assumed:

- measurability all possible answers, i.e. the existence of an effective procedure for obtaining an answer to any question posed,

- completeness answers, i.e. receiving answers to all questions asked, from which it follows that refusals to answer are not taken into account,

- equivalence all questions and, therefore, equal weights of all received answers,

- equality of variances when using parallel response forms,

- normal distribution answers.

As in the case of technical measurements, it is implicitly assumed that any measurement result consists of the true value and the measurement error, and the measurement errors are assumed to be additive, which is necessary for the correct transition from sums of errors to one integral error, and the integral error is also assumed to be normally distributed.

How correct these assumptions are is usually not discussed. If anything, the biggest questions about CTT relate to ensuring true task independence. The issue of choosing rating scales is also not discussed; the initial assumption is that “raw scores” have already been obtained.

A more subtle question is related to the metrological meaning of the category “error”. In technical measurements, it is implicitly assumed that the error and the error generated by it are a property of the measurement procedure, and, therefore, the error, in principle, can be estimated and taken into account based on the results of verification and calibration. When measuring ergatic elements, another source of error appears - the instability of what is being measured, which arises as a result of the action of various factors, the most important of which include learning, forgetting, fatigue and the dynamics of the functional state. Corrections for these factors are not discussed in metrology.

To obtain the final estimate, various computational procedures are used. Most often, the average score is calculated using the usual arithmetic average formula, where is the final score i-th subject, and the square of the deviation from the mean or variants of this indicator - standard deviation or dispersion. To compare the results, the correlation coefficient between tasks and between subjects is used.

As an option, a weighted average score of the form is sometimes used, where are the corresponding weighting coefficients.

Of all the assumptions listed above, the most difficult to prove is the equivalence of the answers, since this requires proof of the subjective equality of all the difficulties of the corresponding answers and at the same time proof of the equal importance of all the questions posed. The assumption of the computability of the mentioned statistical indicators requires substantive proof of the correctness of the homeomorphic embedding of the scale of points in the scale of real numbers, in which such calculations are actually performed. In other words, questions about both criterion and construct validity usually remain open.

In addition to the aforementioned standard statistical indicators (the question of the mathematical correctness of which is usually not discussed) for the subjects, some psychometric properties of measurement procedures with a clear pragmatic, but dubious mathematical meaning are assessed, for example,

The ease factor of a task (or a similar difficulty coefficient), where is the average score received for the task, is the maximum possible score for the same task, despite the fact that the minimum possible score for any task is assumed to be zero by default,

The task discrimination coefficient, i.e. the correlation coefficient between the task result and the final result, or what is considered a more informative option is the correlation coefficient between the task result and the final result without taking into account this task,

and some other coefficients, the interpretation of which in this science differs from the generally accepted one.

In particular, reliability here, in contrast to the standard understanding, it is considered the quality not of a system or object, but of a measurement, and is assessed not through the time of proper operation or any variants of this time, for example, in terms of time between failures, but as the possibility of obtaining comparable indicators, assessed through correlation coefficient. From this interpretation we get consistent reliability, i.e., the correlation coefficient between the results of performing two tasks, the time distance between which is sufficient for these tasks to be considered subjectively independent, parallel reliability, i.e. correlation coefficient between the results of task variants, reliability of parts, i.e. the correlation coefficient between the results of the entire measurement procedure and any part of it, and other indicators. In other words, consistent reliability in this science is called what in professional measurement theory is considered a quantitative measure of test-retest validity, parallel reliability and form reliability is a measure of test-subtest validity, and in general there is confusion in terminology, which leads to confusion validity and reliability.

According to another version, the reliability coefficient is defined as , where is the variance of measurement errors and is the variance of the points scored, i.e. time is not mentioned at all in this definition of the reliability coefficient.

The dubiousness of such calculations from a mathematical point of view is due to the fact that the initial data were initially obtained on a point scale, on which the relation of order, and even linear order, is sometimes specified, but arithmetic operations are not defined. Therefore, addition and subsequent calculation of means, weighted means, variances and correlations on a score scale are undefined. Another assumption, understandable from a pragmatic point of view, but with clearly inadequate theoretical justification, amounts to claims about the normal distribution of responses and, therefore, the distribution of “raw scores” on a real number scale. The assumption of a lognormal distribution of the same scores often seems more plausible, but is usually not substantiated either. These assumptions make it possible to use well-known methods for statistical processing of results, but the mathematical correctness of all subsequent calculations after this assumption is not discussed.

Many problems of the traditional approach to constructing scales (metrics) of knowledge as points for completing some specially selected sets of tasks are widely discussed in the literature.

First of all, it is almost impossible to prove test-to-test and intertest validity, therefore, the question of comparison, and even more so of the general accounting of measurement results performed using different methods, remains open.

“Edge effects” have been repeatedly noted, i.e., the relative stability of results closer to the median of the response distribution and unstable results at the edges of this distribution, which is usually explained by the increasing role of foreign factors in both the “lower” and “upper” parts of the distribution. To combat these effects, an empirically based recommendation is usually proposed to set some “confidence quantile” of the distribution, it is usually proposed to accept, or, and if the answer falls below or above, adjust for instability, mainly, overestimate the obtained estimates using empirically selected correction formulas.

In the case of closed questions, situations of random guessing are possible; to correct the data in this case, it is proposed to make corrections of the form , where is the result after correction, is the result (in points or other scales) of the answer to the th question before correction, is the number of possible answers to the th question, w- the number of unfulfilled tasks in a series of measurements. This formula is justified empirically, in particular, the question of the advisability of taking into account in this formula unfulfilled tasks for which the corresponding value is discussed, which reduces the value, and there are discussions about the substantive meaning of such amendments.

In general, the metrics of knowledge quality in the classical approach are justified by the statistical calibration of methods for the corresponding population. Since creation IQ metrological substantiation of knowledge measurements is carried out based on the distributions of points calculated for the corresponding contingent of respondents. For example, the average values are indicated IQ by age, social or professional groups. However, from the difference IQ It is not clear what fundamental differences in the structure of knowledge distinguish these groups.

Source:

http://cblis.utc.sk/cblis-cd-old/2003/3.PartB/Papers/Science_Ed/Testing-Assessment/Papanastasiou.pdf

annotation

Computer-assisted learning (CBT) can have great potential when used appropriately to improve learning. However, this quality can be improved through the use of computer-based testing (CT) and, more specifically, the use of computer adaptive testing (CAT). For the purposes of this work, the author describes the mechanism and advantages of computer adaptive testing, as well as how it can improve the learning process in the subject area of science. The educator is encouraged to consider some limitations and challenges in implementation, and science education will also be discussed. KEYWORDS Computer adaptive testing, CAT, computer testing, computer training, natural science education, assessment, feedback.

INTRODUCTION

Computer-based learning has extremely great potential for improving learning in many fields and disciplines, including the subject area of science. However, computer-based training must be closely and continuously monitored to ensure its effectiveness. This is especially true since some prior research has shown that computer use is negatively correlated with achievement in mathematics and science (Papanastasiou and Ferdig, 2003). Although it is not clear under what circumstances these Negative consequences are evolving, and if there is a cause-and-effect relationship between these variables, why does it still exist. Therefore, this relationship should remind the educator that the use of a computer is not necessarily a "panacea" and that it should not be used irresponsibly and occupy the attention of students who find it difficult to deal with. Also, this negative relationship between computer use and achievement should remind educators that there is a significant need for ongoing constructive and summative assessment in science. With the help of proper assessment, problems that arise during learning can be identified and possibly corrected if they are detected early enough. However, assessment must also be used wisely in such a way that it can complement the learning process. Since computer-based learning is the focus of this conference, this article will be related to computer-based assessment. The purpose of this paper is to go beyond simple computer-based training to describe computer adaptive testing, and discuss its implications, benefits, and how it can effectively complement computer-based training in this field of science.

Description

Computer-assisted testing (CT) can be defined as any type of assessment that is carried out through a computer. However, computer testing can take various shapes, depending on how tailored the test is (College Board, 2000). For example, some computerized tests, also called fixed computerized tests, are purely linear (Parshall, Spray, Kalohn, and Davey, 2002). These are the tests that most closely resemble paper and pencil testing in that they are of a fixed shape, fixed length, and the test items are pre-set in a specific order. Unlike fixed computerized tests, computer adaptive tests (CATs) are computer-based tests that have the maximum degree of adaptability, as they can be tailored for each student based on the overall difficulty and order in which the questions are presented to the test taker. So computer adaptive tests (CATs) are computer tests that are created and adapted specifically for each test taker based on an assessment of the test taker’s abilities and based on the answers in the previous steps.

Advantages of computer adaptive tests

The main advantages of computer adaptive testing are related to the fact that they are efficient in terms of time as well as resources used. These benefits will be discussed in the next section from the point of view of test takers, from the point of view of the teacher who wants to determine the student's level of knowledge, and also from the point of view of the test developer.

Efficiency

Adaptive tests make it possible to assess a subject's abilities more accurately and at a lower cost than using paper tests. Typical paper-based tests are created for mass testing, so that the test is administered to a large group of students of varying abilities. In order to do this, most of the questions in this test are of medium difficulty level (since most students are of average academic performance). As a result, the test content of this type creates problems for high and low achieving students. A test taker with a low level of knowledge is able to answer the first few relatively easy questions. And questions of medium and high difficulty levels will not be easy for a student to answer. Consequently, the test taker may end up guessing the answers to these questions, or may simply leave them blank. In this case, it is difficult to really assess his knowledge and capabilities, since any conclusions should be based only on the answers to the first few questions that the student was able to understand. Another, more specific example This situation is given below. The teacher wants to conduct a biology test on the topic “liver” Question low level requires identifying the location of the liver in pictures of a person, and the high-level questions require the student's ability to diagnose liver disease from pictures. In this case, if a student cannot even locate the liver in a picture of the human body, there is no reason to ask him a more difficult question. When looking at such tests from the perspective of a student with a deep knowledge of biology, the situation is somewhat better, although it is still not perfect. Most of the questions will be too easy for this person. Adaptive tests allow you to effectively select questions that are specifically designed for a certain level of knowledge of each test taker. When all questions are clearly aimed at each student's abilities, the teacher can reach more reliable and valid conclusions about the student's actual knowledge.

Feedback

Another advantage of computer-based testing in general, as well as computer-adaptive tests, is that they can manipulate direct and immediate student-teacher feedback (Wise & Plake, 1990). With a typical paper-based test, there is always a time delay between the teacher and the test taker. Without forming an assessment, teachers will not be able to determine whether computer-based instruction is truly intended to help a student learn or not. This is especially important because without proper assessment, some students may find themselves at a disadvantage from computer training. In addition to overall assessment, this type of assessment shows how each student has mastered the material as a whole; it is also possible to provide a list of areas and subjects in which each student had questions, based on his performance in adaptive testing. However, the teacher may have such a question, with continuous testing, it is possible that some students will remember the test questions and inform other students about them. However, if an adaptive test contains a relatively large pool of questions, this problem will not arise, especially since different students should be given different items based on their individual ability levels.

Time

From a test developer's point of view, creating an adaptive test is time-consuming, but more effective from a teacher's point of view. In particular, students during adaptive testing must answer fewer questions than during regular testing. In addition, regular testing is usually carried out by the whole group for a certain amount of time, which may not suit some students. The teacher and the entire group must wait until all students have completed the test before they can move on to another activity. With computer adaptive testing, students can take the exam whenever they are ready, the only requirement is that the computer is available, and they do not have to wait until the entire group is ready to take the test or until the entire group has completed it. From a teacher's perspective, adaptive testing is time-saving and the teacher no longer has to worry about creating tests for the group as long as the CAT is able to cover the material covered. In addition, the teacher saves time on checking papers, since the test is scored by a computer.

Other benefits

Computer adaptive testing also has some additional benefits. Computer adaptive testing has a high level of security, since the list of questions can no longer be stolen, and cheating from neighbors is not profitable, since most of the test questions are individual for everyone. In addition, other types of data can be collected using CAT, such as the amount of time taken to answer each question or the number of changes that are made to students while taking the test.

CONCLUSION

Modern research in the field of testing and assessment have shown that the potential of computer adaptive tests has increased. The advantages and capabilities of computer adaptive testing make it possible to go even further. This is evident in the number of large-scale testing (e.g., GRE, TOEFL, ASVAB) that have become or are becoming adaptive (Papanastasiou, 2001). However, such a step must always be taken wisely so that such an assessment procedure is well integrated into the learning process to ensure its maximum effectiveness.

Links

1. Bennett, R. E. (1999). Using new technology to improve assessment. RR99-6. Princeton, NJ: Educational Testing Service. 2. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum associates. 3. Meijer, R. R. & Nerling, M. L. (1999). Computerized adaptive testing: Overview and introduction. Applied psychological measurement, 23(3), 187-194. 4. O'Neill, K. (1995). Performance of examinees with disabilities on computer-based academic skills tests. Paper presented at the American educational research association, San Francisco, April, 1995. 5. Papanastasiou, E. C. (2001). A ‘Rearrangement Procedure’ for administering adaptive tests when review options are permitted. (Doctoral dissertation, Michigan State University, 2001). 6. Papanastasiou, E. (2002a). A ‘rearrangement procedure’ for scoring adaptive tests with review options. Paper presented at the National Council of Measurement in Education, New Orleans, LA. 7. Papanastasiou, E. (2002b). Factors that differentiate mathematics students in Cyprus, Hong Kong, and the USA. Educational Research and Evaluation, 8(1), 129-146. 8. Papanastasiou, E. C. & Ferdig, R. E. (2003, January). Computer use and mathematical literacy. An analysis of existing and potential relationships. Paper presented at the third Mediterranean conference on mathematics education, Athens, Greece, January 3-5, 2003. 9. Parshall, C. G., Spray, J. A., Kalohn, J. C. & Davey, T. (2002). Practical considerations in computer-based testing. NY: Springer. 10. Parshall, C. G., Stewart, R., & Ritter, J. (1996). Innovations: Graphics, sound and alternative response modes. Paper presented at the National Council of Measurement in Education, April 9-11, 1996, New York. 11. The College Board. (2000, April). An overview of computer-based testing. RN-09. 12. Wainer, H. (2000). CATs: Whither and whenever. Psicologica, 21(1-2), 121-133. 13. Wise, S. L. & Plake, B. S. (1990). Computer-based testing in higher education. Measurement and evaluation in counseling and development, 23, 3-10.

Elena C. Papanastasiou, Ph.D. University of Kansas and University of Cyprus Department of Education P.O. Box 20537 1678 Nicosia Cyprus

One of the actively developing and promising areas in modern methods of teaching foreign languages is the use of computer technologies to control the level of development of speech skills and abilities.

Computer testing makes it possible to integrate text, graphic, audio and video information into test tasks, as well as to fully automate the process of conducting control measurements.

Computer testing allows you to:

quickly process input information;

ensure operational feedback, which allows the test subject to constantly and immediately receive reinforcement for the correctness of the answer, and the teacher to carry out step-by-step or operational control of the test taker’s actions;

increase the motivation of the test taker, since when working with a computer program there is an element of unusualness, similar to a game situation, a spirit of competition with the computer appears;

significantly save time and costs on organizing and conducting testing.

So, the first task that a computer can effectively solve is to store test tasks and create tests from them, namely, process the primary, original author’s material, make the necessary clarifications, corrections, and additions to it; store information, select tasks from an electronic data bank according to specified criteria and produce the required layout of tests.

The second task that the computer implements is registering test takers and preparing them to complete tasks. For example, registration, which may be pre-test or just prior to testing, involves filling out a registration card on a computer screen. Having received the necessary information, the system gives the test taker an identification number.

The computer can prepare the test taker to take the test - provide instructions. The computer program includes information about the methodology for working with the test: recommendations on the technology for performing the test, data on the testing time, the assessment procedure, etc. Preparation for the test may also include training explaining how to react to certain tasks, avoid random (not related to the language and speech competence of the test taker) errors, to develop the necessary temporary stereotypes.

The next stage is conducting a testing session using a computer. The key problem of this stage is the duration of the work. Therefore, it is important to include in the computer program for testing the display, accounting and control of the allotted, spent and remaining time of the test subject.

To begin work, the test taker must indicate his identification number, i.e. the number received during registration. After this, he is presented with a test with tasks and instructions for completing them.

Completion of a testing session can be either voluntary (at the request of the test taker and with the permission of the instructor, as tasks are completed) or forced (at the end of the time limit).

If the authors-compilers of the test did not specifically order the test tasks by degree of difficulty, did not divide the test into subtest sections that were autonomous in terms of performance goals and types of speech activity, then it is permissible to perform the test in any order. Otherwise, skipping certain tasks, for example those that seemed difficult, and returning to them are prohibited by the computer program.

After completing the test tasks, the stage of processing answers and scoring begins. According to the classification of V. I. Nardyuzhev, I. V. Nardyuzhev, processing can be:

local, performed at the testing site;

remote, carried out outside the location of testing sessions;

formal, if a simple comparison with keys is possible;

expert, if such a comparison is impossible and the involvement of experts and specialists is required (for example, to evaluate a detailed oral or written response);

operational, allowing results to be demonstrated immediately after testing;

postponed due to a complex algorithm for calculating points or the need to obtain an opinion from a rater or expert.

The use of a computer allows for statistical analysis of information, i.e., on the one hand, to provide information about test participants, on the other - which is most important at the present stage of development of linguodidactic testing - to collect data on the quality of test materials.

In the first case, the analysis algorithm assumes:

)selection of the object of statistical analysis (subtest);

) determination of the number of testing participants at a given level;

)ranking test takers according to the number of points scored;

) determination of the percentage of correct answers to each test task;

)construction of graphs using digital data;

)if necessary, comparison of test results for various objects.

In the second case, statistical analysis is carried out through:

) determination of the minimum, average, maximum value of test results;

)establishing the statistical parameters of the task: level of difficulty, differentiating ability (the ability of the task to distinguish strong students from weak ones);

)analysis of the work of distractors, including determining the frequency of choice of answers by everyone, as well as weak and strong;

) determination of the independence of tasks in the test.

Computer testing is possible with specially developed software that implements the information and pedagogical testing model proposed by the authors.

Computer software significantly influences both the content of test tasks (for example, the use of sound requires equipping the computer with a sound card) and the method of implementing the information and pedagogical model (for example, connecting computers to the Internet allows you to organize and conduct testing in real time).

Computer programs for testing foreign language can be classified depending on the programming method. The program can be linear: in this case, the only thing provided is possible direction work with the test, regardless of the quality of the students' answer to a specific question or task. For example, the test taker must choose one of the answer options when completing tasks to test reading comprehension:

A linear program can be complicated by an adjustment stage (for example, when performing tasks to test grammatical skills). In this case, if the answer is incorrect, the computer returns the test taker to the original task, instruction or rule.

The branched program provides explanations, additional, guiding questions, and instructions to help complete the initial tasks and obtain permission for sequential movement or movement through the frame.

Programs that combine linear and branched sections are classified as mixed or combined. They provide greater flexibility of control and adapt the work to the individual capabilities of students. At the same time, computer testing in a foreign language has its own specifics and its own requirements for the presentation of controlled material and for completing tasks. One of the main tasks is to make maximum use of all channels for presenting information, using multimedia technologies (graphics, animation clips, video images), as well as various links to documents and resources (reference books, lexical minimums, intonation contours, etc.). In turn, the use of computer didactic visualization, simulating communication situations and organizing the completion of tasks and answer correction, increases the productivity of monitoring computer programs and the motivation of test takers to master a foreign language.

Computer testing capabilities

Today in the world there are various organizations, engaged not only in the development of problems, but also in computer testing systems. Among them is the Educational Testing Service (ETS) - Educational Testing Service ( #"justify">), which since 1970 has been dealing with computer testing issues and offers this moment computer versions of TOEFL ( #"justify">) - Test of English as a Foreign Language. This test by English language as a foreign language it is used for admission to colleges in the USA and Canada.

In France, the National Center for Distance Learning (Centre national denseignement à distance) offers a computer version of the test in English as a foreign language: Test FLE - Test de Français langue étrangère et seconde - niveau général (élémentaire, intermédiaire, avancé): compréhension écrite, grammaire, vocabulaire , compréhension orale ( #"justify">). The test allows you to determine your level of proficiency in English as a foreign language. The English language school "LEcole des Trois Ponts" also offers interactive tests for general English proficiency ( #"justify">).

In Russia, employees of the Department of Humanitarian Technologies of Moscow State University were among the first to engage in computer testing. Computerized remote testing technologies have been developed in which the functions of educational or psychological testing are distributed between the user's local computer (the "client") and the developer's central computer (the "server"). This new information technology enables rapid and widespread dissemination of tests that meet international scientific standards. Every year during the spring holidays, the telecommunications Olympiad “Teletesting” is held for graduates ( #"justify">). On the above site you can interactively practice performing some tasks (with multiple choice answers) of demo test versions different years, including in English.

Computer testing in English is also carried out by various language schools in Russia. For example, language schools BKC-International House ( #"justify">), Transparent Language ( #"justify">) offer tests to determine the level of English language proficiency.

A computerized control system opens up wide opportunities for individualizing the process of learning by students. The principle of individualization of learning underlies adaptive testing. Adaptive testing is a control that allows you to adjust the difficulty and number of tasks presented to each student depending on his answer to the previous task: in the case of a correct answer, the student will receive a more difficult next task, in the case of an incorrect answer, the task is easier than the previous one. The adaptive testing mode (and not only testing, but also training) involves a set of tasks in test form, requiring the student to work to the limit of his capabilities and thereby ensuring maximum effect. The use of tasks that correspond to the student’s level of preparation in adaptive testing increases the accuracy of measurements and reduces the time of individual testing.

Based on the analysis of the results of adaptive testing, it is possible to build the learning process from the perspective of a personality-oriented approach, i.e., select educational tasks at the optimal level of difficulty for each student. It is known that easy tasks do not contribute to development, and difficult ones reduce learning motivation. Therefore, the optimal level of difficulty 4 tasks in testology are considered 50%.

Computerization of education and the development of the theory of pedagogical measurements make it possible to create a rating control system for a more objective and accurate assessment of students’ knowledge, skills and abilities. A rating assessment of learning makes it possible to characterize with a high degree of reliability the quality of a student’s preparation in a given academic subject. "Rating" translated from English is an assessment, a certain numerical characteristic of a qualitative concept. Typically, a rating is understood as a “cumulative score” or a score that takes into account “prehistory”.

Modular training requires strict structuring educational information, content of training and organization of students’ work with complete, logically completed educational blocks (modules). The content of the module coincides with the topic of study of the academic subject. For example, a module for studying the topic “Geography of England and America”. However, unlike the topic in the module, everything is measured, everything is assessed: completion of each task, work in class, attendance at classes, starting, intermediate and final levels of student preparation. The module clearly defines the learning objectives, tasks and levels of study of this module, and names the skills and abilities.

During modular learning, students should always know a list of basic concepts, skills and abilities for each specific module, including a quantitative measure of assessing the quality of learning material. Based on this list, questions and training assignments are drawn up, covering all types of work on the module, and submitted for control after studying the module. As a rule, in modular technology training, a test form of control is used.

Training modules and tests can be easily transferred to a computer-based learning environment. Many Russian institutions Distance education providers build their curricula on the basis of modules.

In modular training, each task is assessed in points, its rating and deadlines are established (timely completion of the task is also assessed by the corresponding number of points), i.e. the main principle of rating control is control and assessment of the quality of knowledge, skills and abilities, taking into account the systematic work of students .

After completion of training, an overall grade is determined based on module assessments, which is taken into account when determining the results of the final control in the subject.

So, computer testing, along with training, is today one of the main methods of new information technology for assessing the level of foreign language proficiency.