MODULE THREE ----

Test Assembling, Test administration and Test Analysis

Module Coverage

v Assembling of classroom tests

v Test administration

v Recoding and scoring the answers

v Summarizing test results

v Item analysis: Assessing the level of difficult and discrimination

v Giving Feedback

v Reporting test performance

Intended Learning Outcomes

By the end of this module, students should be able:

§ To define correctly the concepts test assembling, test administration and marking

§ To outline the advantages of preparing and assembling tests in advance

§ To describe procedures of appraising the classroom tests

§ To describe procedures for giving constructive feedback

Assembling Classroom Tests

v Presentation Outline

q Introduction

q Definition of the concepts test assembling

q Process of assembling classroom tests

q Test administration

q Recoding and scoring the answers

Intended Learning Outcomes

By the end of this lecture, EP 300 students should be able:

Ø Appreciate the importance of paying careful attention to the assembly, reproduction, administration, and scoring aspects of class- room tests.

Ø Follow the guidelines for assembling the various item formats into a test.

Ø Appreciate the importance of having clear, concise directions for the student.

Ø Follow the guidelines for writing test directions.

Ø Appreciate the importance of encouraging all students to attempt all test items, even though they may be unsure of the correctness of their answers.

Ø Follow the guidelines for laying out and reproducing the test.

Ø Recognize the importance of physical and psychological conditions in test taking.

Ø Understand why cheating must be discouraged and know how to minimize it.

Introduction

§ The results of classroom tests provide very important information that can be used to make serious decisions that affect the lives of individual students, their future lives as well as the lives of their families.

§ Thus, tests and examinations should be well planned and administered if they are to collect valid and reliable information that truly reflects the individual's ability.

§ It should be noted that objective tests such as multiple choice tests and some variants of the true- false items, cannot be administered orally. Neither can the items be written on the black board a few minutes before the test or examination is scheduled to begin.

§ Thus the test must be prepared in advance and reproduced. In this lecture, therefore, we will focus on test assembling, reproduction and administration

Definition of Concept Test

assembling

v What is test assembling?

Ø It is the preparation of test items for use in a test

v Is involves:

Ø Writing tests items at least several days before they can be used,

Ø Grouping together similar item formats in order to have clear and concise directions, and deciding upon the manner in which the pupils are to record their answers,

Ø Constructing extra test items

Effective Test assembling

§ Effective test assembling calls for consideration of two important sets of factors:

a. Factors about the individuals to be tested: These include consideration of the following:

Ø Why test your students at that particular point in time?

Ø It is the most opportune time for you to collect the data?

Ø It is the time when students are ready to be tested?

Ø Are you really ready to assess the learners at such material time

b. Factors about what to be tested:

These include

Ø What shall be assessed?

Ø Why should it be assessed?

Ø How should it best be assessed?

Note: To best address these issues, one needs to have a table of specification of educational objectives (also known as the blue print/ test specification/ test blue print or test grid.

§ A table of specification of educational objectives is a two-way chart that relates the instructional objectives/competences and content.

§ Importance of the Table of Specification:

Ø Helps to balance between what is tested and what was taught

Ø Helps to determine the kinds of learning outcomes/skills that will be tested/assessed

Ø Helps to determine the kind of content knowledge to be covered

called upon to manifest themselves.

Note: It should be noted that the starting point should be the mental operations that should be

This implies that:

Ø The test should reflect the content and objectives/competences in proportion to the importance given in instruction as reflected in the amount of time spent for that content.

Ø Priority subject matter and/or objectives will be assessed using more items than less important subject matter and/or objectives.

Building a table of specifications

§ Involves:

Ø Preparing a list of instructional objectives/competences

Ø Outlining the instructional content

Ø Preparing the two way chart by listing the major content areas down the left side of the table

§ According to Bloom’s cognitive taxonomy, there six major categories of objectives that are arranged in the hierarchical order on the basis of the complexity of the task. Each of these six classes is

subdivided further. These classes comprise:

a. Knowledge (the simplest): is defined as the remembering of previously learned material.

v Verbs used: define, distinguish, acquire, identify, recall, recognize, etc.

b. Comprehension: is defined as the ability to understand the meaning of material.

v Verbs used: Translate, transform, give in words, illustrate, prepare, read, represent, change, rephrase, restate, interpret, reorder, rearrange, differentiate, distinguish, make, draw, explain, demonstrate, estimate, infer, conclude, predict, differentiate, determine, extend, interpolate, extrapolate, fill in, draw, etc.

c. Application: is defined as the ability to use learned material in new situations.

v Verbs used: Apply, generalize, relate, choose, develop, organize, use, employ, transfer, restructure, classify, etc.

d. Analysis: refers to the ability to break material down into specific parts so that the over-all organizational structure may be comprehended.

v Verbs used: Distinguish, detect, identify, classify, discriminate, recognize, categorize, deduce, analyze, contrast, compare, distinguish, distinguish, detect, deduce, etc.

e. Synthesis: is the ability to put parts together to form a whole.

v Verbs used: Write, tell, relate, produce, constitute,

transmit, originate, modify, document, propose, plan, produce, design, modify, specify, derive, develop, combine, organize, synthesize, classify, to

deduce, develop, formulate, modify, etc

f. Evaluation (the most complex): Refers to the ability to judge the worth of material for a given purpose.

etc.

Verbs used: judge, argue, validate, assess, decide, consider, compare, contrast, standardize, appraise,

§ Each test item should be evaluated against the ability and content areas, and this is the only way of achieving validity. i.e. each item need to be recorded with enough space, so that one also record such information as:

üThe instructional objective being addressed

üThe learning outcome being measured

üThe content area being used to measure it

A. THE PROCESS OF ASSEMBLING CLASSROOM TESTS

§ Test assembling is not a single-time event, but something that involves a continuous process involving item setting and item reviewing.

§ A good teacher prepares test items as she/he teaches in every lesson

§ All possible test items in every topic or sub- topic should be identified and assembled in the course of teaching

Questions to Consider During Test

Assembling

i. How should the various item formats be organized in the test?

ii. How should the various items within a particular format be organized?

iii. How should the test be reproduced?

iv. Should pupils be encouraged to answer all test items, even those they are unsure of?

v. What kinds of directions should the student be given?

vi. Should the students record their answers directly in the test booklet or should a separate answer sheet be used for objective-type tests?

vii. How should the test items be analysed?

viii. How should the test results be interpreted?

Important Steps to be Followed.

§ For valid and reliable assessment of students’ achievements, the following steps are necessary to followed:

1. Recording test items

§ When constructing the test items, it is desirable to write each one on a separate sheet of paper (index card). This allows modification and improvement of the item.

§ The card should also contain:

ü Instructional objectives

ü Specific learning outcome

ü Content measured by the item

ü A space for item analysis

An example of test item card with item analysis data record at the back

SUBJECT: Geography TOPIC: Weather

OBJECTIVE: Identifies the use of weather measuring instruments

ITEM

Which of the following instruments is used for measuring atmospheric pressure?

A. Anemometer

B. Barometer

C. Thermometer

D. Hygrometer

Back of the item card

ITEM ANALYSIS DATA
		Alternatives
Dates	Pupils	A	B	C	D	E	Omits	Diff.	Disc.
1/4/2013	Upper 10 Lower 10	0 2	10 4	0 1	0 3		0 0	70%	0.6
	Upper 10 Lower 10
	Upper 10 Lower 10
Comments

ii. Arranging the Test Items

§ Test items must be as clearly and humanely as possible.

§ One way to achieve this is to group all items of the same format together rather than to intersperse them throughout the test.

§ Grouping test items is advantageous for a variety of reasons:

a. Young children may not realize that the first set of directions is applicable to all items of a particular format and may become confused

b. It makes it easier for the examinee to maintain a particular mental set instead of having to change from one to another

c. It makes it easier for the teacher to score the test, especially if hand scoring is done.

§ There are various methods of arranging or grouping items. The method will vary according to the use of the results of the test items.

§ For most of the classroom purposes, the items can be arranged by a systematic consideration of:

Ø the types of items used

Ø the learning outcomes measured

Ø the difficulty of the items

Ø the subject matter measured

§ The various item formats should be presented in such a way that the complexity of mental activity required by the student will progress from the simple to the complex. For example, simple recall measured by the completion item should precede the interpretive exercise.

§ Many experts in the field of assessment recommend the following scheme to be used:

i. True-false or alternative response items

ii. Matching items

iii. Short answer items

iv. Multiple choice items

v. Interpretive exercises

vi. Essay questions

§ The test items should be arranged so that they are easily read by the examinees i.e. the reproduction should be legible and the items not be crowded together.

§ Test items should also be arranged in such a way that the correct answers follow a random pattern.

iii. Writing test directions

§ Test directions/instructions are rubrics that students are supposed to follow in answering the questions

§ Very often many teachers include no written directions assuming that the test items are self explanatory.

The directions provided should be clear and concise and should tell the students what they are to do , how they are to do it, and where they are to record their answers.

§ Whether written, oral, or both, the directions should include the following:

i. The purpose of the test

ii. The time to be allotted to the various sections,

iii. The value of the items

iv. Basis for answering each item e.g. select the correct answer, select the best answer etc.

v. Procedure for recording answers

vi. Whether or not students should guess at any answers they may be unsure of. i.e. what to do about guessing e.g. if guessing is to be penalized, students should be told and the method that will be used to penalize guessing

iv. Reviewing test items

§ No matter how carefully test items have been prepared, defects may always creep during construction. Such defects can be most easily detected by:

Ø Reviewing the items a few days after they have been constructed.

Ø Asking a fellow teacher to review and comment on them

§ In reviewing test items, we should try to view the items from pupil’s viewpoint as well as from that of the test maker.

§ Items review is important in that it:

Ø Helps to make items appropriate to learners’ outcomes that are intended to be measured

Ø Makes the items clear and free from ambiguity

Ø Rationalize the level of difficulty

Ø Helps to check if the answers to the items would be agreed upon by experts

Ø Helps to identify technical errors and clues to answers and correct them

v. Reproducing the test

§ Careful attention to the reproduction phase will not only make it easier for the examinee, but may also make hand scoring much easier.

§ In preparing the test materials for reproduction, it is important that the items be spaced and arranged so that they can be read, answered, and scored with the least amount of difficulty

§ It is desirable to proofread the entire test before it is administered. Charts, graphs, and other pictorial material must be checked to ensure that the reproduction has been accurate and details clear.

§ To assist both examinee and examiner, the following practices are recommended:

i. Space the items so that they are not crowded

ii. For the alternate response test, have a column of T's and F's at either the right or left hand side of the items.

iii. For matching exercises, have the two lists on the same page.

iv. For the multiple choice item that uses a key list, try to keep all item using a particular key on the same page. If this is not possible, the key should be repeated on the new page.

v. For the interpretive exercise, the introductory Material be it a graph, chart, diagram, or piece of Prose and the items based on it should be on the same page.

vi. All items should be numbered consecutively.

vii. For the short-answer items (1 to2words),the blanks should be numbered and the responses recorded in blanks (vertically arranged and numbered to correspond to the number of the blank) on one side of the answer sheet used. For example:

The product of 10 and 7 is (i) times as large as the sum of 8 and 6.

viii. If the responses are recorded directly on the test booklet, it will make scoring easier if all responses to objective items are recorded on one side (left or right) of the page, regardless of the item format used.

ix. In the elementary grades, if work space is needed to solve numerical problems, provide this space in the test booklet rather than having examinees use scratch paper.

x. All illustrative material used should be dear, legible, and accurate.

xi. Proof the test carefully before it is reproduced

xi. Every pupil should have a copy of the test.

B. ADMINISTERING CLASSROOM TESTS

§ In test administration we are concerned with providing optimum conditions for obtaining the pupils’ responses.

§ The guiding principle in administering tests is that all pupils must be given fair chance to demonstrate their achievement of the learning outcomes being measured.

§ This can be done by doing the following:

Ø Providing conducive physical conditions such as adequate work place, quiet, proper light and ventilation, and comfortable temperature

Ø Providing appropriate psychological climate such as tension-free environment.

• Psychological conditions such as tension and anxiety that have significant influence on test results must be avoided.

• Some of the things that cause test anxiety are:

üThreatening pupils with test if they do not behave.

ü Warning pupils to do their best “because this test is important”.

ü Telling pupils they must work fast in order to complete the test on time.

ü Threatening dire consequences if they fail the test.

§ The following things should be avoided

DURING administering a classroom test:

Ø Do not talk unnecessarily before the test.

Ø Keep interruptions to a minimum during test

Ø Avoid giving hints to pupils who ask about individual question items because the response provided can give clues to the answers.

Ø Discourage cheating, if necessary.

§ The actual administration of the test is relatively simple, because a properly prepared classroom test is self-administering

Some additional considerations

§ When administering the test, the teacher should make sure that the students understand the directions and that answer sheets, if they are being used with the younger pupils, are being used correctly.

§ The teacher should keep the students informed of time remaining (e.g. writing the time left on the blackboard at 15-minute intervals).

§ Careful proctoring should take place so that cheating is eliminated, discouraged, and/or detected.

C. RECORDING AND SCORING THE

ANSWERS

§ Understanding how examinees will record and score the answers to objective tests is very important if teachers have to obtain valid and reliable data.

§ Whether pupils will record their answers directly on the test papers or use separate sheets depends upon:

Ø The item format used,

Ø The age,

Ø The ability level of the pupils, and

Ø The nature of the content.

§ Generally speaking, separate answer sheets will provide more accurate and reliable scores, especially if they are machine-scored.

§ However, for tests involving computation, it might be better to have pupils record their answers directly in the test booklet, rather than use a separate answer sheet.

§ Thus, there are two methods by which pupils can record their answers:

Ø In the booklets themselves

Ø On separate answer sheets

§ The manner in which the answers are to be recorded, will be governed (in part at least ) by:

i. Availability of special scoring equipment,

ii. The speed with which test results are needed,

iii. The monetary resources available to have the answer sheets scored by an independent scoring service.

Marking/Scoring

§ Marking is an important part of teaching

Ø Marking scheme is carefully designed beforehand: possible for standardized and objective item questions

Ø Marking scheme should be used just as a guideline for the expected answer. It should include major points, characteristics of the answer and the amount of credit to be allocated to each point

Types of marking

§ Impressionistic marking: feeling of worth

Ø common for essay

Ø Some implicit criteria may be developed for impressionistic marking

Ø Involves subjectivity: marking results tend to vary from one marker to the other (very little consistency

§ Clinical analysis of questions and answers: Common for mathematics and science marking

§ A combination of impression and marks allocation for specified responses

üRequires systematic analysis of questions and answers

üTakes time but it helps to judge the sources of difficulties.

Some Considerations in Marking

§ What key statements should receive marks?

§ How many marks should be allocated to each point?

§ Are all the errors the candidate’s faulty, and therefore should marks be automatically slashed?

§ What is the use of the response pattern? Diagnostic , formative or summative

Test Scoring

§ There are essentially two types of scoring processes:

a. Hand scoring:

Ø It is done either in the booklets themselves or on separate answer sheets

Ø If the pupils’ answers are recorded on the test paper itself, a scoring key can be made by marking the correct answers on a blank copy of the test.

Ø When separate answer sheets are used, a scoring stencil is used.

ü A scoring stencil is a blank answer sheet with holes punched where the correct answer should appear.

ü The stencil is laid over each sheet, and the number of checks appearing through the holes are counted

§ In scoring the completion-type item, the teacher may:

Ø Prepare a scoring key by writing out the answers on a test paper or may

Ø Make a separate strip key that corresponds to the column of blanks provided to the students.

• In either of the two methods, the teacher or aide can place the scoring key next to the pupils' responses and score the papers rather quickly.

Ø Use the silver-overlay answer sheet.

• This is a self-scoring procedure in which the correct answers are previously placed in the appropriate squares, and the total answer sheet is covered by a silver overlay that conceals the correct answers.

• Students erase the square they feel corresponds to the correct answers.

• This procedure is used quite effectively in classroom testing because it provides immediate feedback.

§ Objective items could encourage guessing.

§ Correction of guessing is done by using the following formula:

S=R-W/n-1

Where:

Ø S=the score,

Ø R=the items scored correctly

Ø W= the items scored wrongly and

Ø n=the number of alternatives for items.

b. Machine Scoring

§ Is the simplest way to score select-type objective test items

§ Here students record their answers on separate answer sheets and the machine is used to score them.

D: SUMMARIZING AND INTERPRETING TEST RESULTS

§ The process of marking/scoring is usually followed by summarization and interpretation of test results

§ Effective summarization of the results depends on various process such as scoring and interpretation of scores.

Interpreting Test Scores

§ Interpretation of educational measurements refers to determining the meaning of test scores.

§ Interpreting educational measurements is not as simple as it is in physical measurements such as length, height, weight etc.

§ Physical measurements are based on the scale that has a true zero point and have equal units while educational measurements do not have a true zero

e.g. a zero score in PSY 2102 does not mean that a student knows completely nothing.

§ We can safely say Marx is two times taller than Alex but we cannot ascertain the zero

But we cannot say Marx is two times more intelligent than Alex simply because Marx scored 80 marks and Alex scored 40 marks.

q Thus, we can simply say Marx is more intelligent than Alex.

Summarizing and interpreting

educational measurement

§ Effective interpretation of test scores calls for a clear understanding of the nature of the information that can be inferred from numbers.

§ Thus, to effectively and correctly interpret test scores one needs to have basic but clear knowledge about the following:

v Different kinds of scores, such as nominal, ordinal, interval, and ratio scales.

a. Nominal scale

Ø Is the simplest scale of measurement.

Ø Involves assigning numerals to different categories that are qualitatively different.

ü For example, for purposes of storing data on computer cards, we might use the symbol 0 to represent a female and the symbol 1 to represent a male

Ø These symbols (or numerals) do not have any of the three characteristics (order, distance, or origin) of the real number series.

b. Ordinal Scales

Ø An ordinal scale has the order property of a real number series and gives an indication of rank order.

Ø It indicates the magnitude, though only in a very gross fashion. For example, rankings in a music contest or in an athletic event denote who best, second best, third best, and so on.

Ø However, the ranks provide no information with regard to the differences between the scores.

Ø Ranking is sufficient if our decision in that it involves selecting the top pupils for some task, but insufficient if we wish to obtain any idea

of the magnitude of differences or to use the process to perform certain kinds of statistical manipulations.

C. Interval Scale:

Ø Here we can interpret the distances between scores. For example, if Mabula has a score of 60, Mary a score of 50, and Hamis a score of 30, we could say that the distance Mary’s and Hamis’ scores (50 to 30) is twice the distance between Mabula’s and Mary’s scores (60 to 50).

Ø This additional information is has potentially greater use than just knowing the rank order of the three students.

d. Ratio Scales

Ø If one measures with a ratio scale, the ratio of the

scores has meaning. For example, a person who is 86" is twice as tall as a person who is 43".

Ø Here a measurement of 0 actually indicates no height (i.e. there meaningful zero)

v Methods of tabulating and graphing data

Ø Frequency distribution

Ø Histograms

Ø Frequency polygons

v Various kinds of distributions

Ø Normal distributions,

Ø Positively skewed distributions,

Ø Negatively skewed distributions, and

Ø Rectangular distributions.

v Basic concepts of descriptive statistics, such as:

i. Measures of central tendency,

Ø Mean

Ø Mode

Ø Median

ii. Measures of variability

Ø Standard deviation

Ø Variance

iii. Measures of correlations

Ø Pearson product moment correlation coefficient

(r) is the statistic most often used to give us an indication of this relationship

§ Several methods measurements have been proposed, These are broadly categorized into raw scores and derived scores

Interpreting Raw scores

Ø What is a raw score?

Ø It is simply the number of points received by a student on a test, when the test has been scored according to the marking scheme.

Ø It is a numerical summary of students’ set of performance

Ø Raw score has no educational meaning (i.e.it is not very meaningful without further information).

For example, John answered 25 items correctly on an arithmetic test and his raw score is 25.

Then:

ü What does a 25 mean?

ü Is that a good score?

ü How many items were in the test?

ü What kind of arithmetic problems were presented?

ü How difficult was the test?

ü How does a score of 25 compare with the scores received by other pupils in the John’s class?

Ø In general, we can provide meaning to a raw score by either:

i. Describing it in relation to specific tasks that the pupils performs. This is known as Criterion Reference interpretation

ii. By using the type of derived score that indicates the pupils’ relative position in a clearly defined reference group (Norm Reference interpretation)

Criterion-referenced interpretation

§ Here we interpret pupil’s performance in terms of certain criteria: e.g.

Ø Speed with which a task is performed (e.g. types 40 words per minute without error)

Ø Precision with which a task is performed (e.g. measures the length of a line within one fourth of a millimeter).

Ø The percentage of items correct on some clearly defined set of learning tasks (e.g. identifies 80% of the terms used to describe elements of weather)

Ø The percentage-correct score is widely used in criterion-referenced test interpretation.

Ø Permits us to describe an individual’s test performance without referring to performance of others.

Norm-Referenced Interpretation

§ This is the most common test interpretation

§ Tells us how an individual compares with other persons who have taken the same test.

§ To obtain a more general framework for norm- referenced interpretation, raw scores are converted into some types of derived scores.

Derived Score

What is a derived score?

§ Is a numerical report of test performance on a score scale that has a well-defined characteristic and yields normative meaning.

§ The most common types of derived scores are:

Ø Percentile ranks

Ø Standard scores

§ These are interpreted in relation to established norm

§ Both indicate the individual student relative position within a particular group

§ Three forms of comparison can be made when raw scores are transformed into comparable scores as follows:

Ø One student can be compared to other students who sat for the same test

Ø Student’s performance in one test can be compared to performance in another test

Ø Student’s performance in one form of a test can be compared to another form of a test

§ These comparisons give us power to predict the students success in various areas and enable us to diagnose students’ strengths and weaknesses.

a. Percentile rank (percentile score)

§ A Percentile rank or scores indicates a person’s group rank (relative position) in terms of percentage of individuals at or below that person’s score.

§ For example, if John’s score is on history gives him a rank of 10 in the class of 40, we can say that:

Ø ¾ of the class made a lower score than John

Ø 75 is his percentile rank

Ø His score is at 75th percentile

Ø 25% of the pupils are above his score

§ By using percentile ranks, the position of individuals in groups of unequal size can be compared.

Standard score

§ What is a standard score?

Ø Is a score that indicates a pupils relative position in a group by showing how far the raw score is above or below the average.

Ø Is described in terms of standard deviation units from the mean of the distribution

Ø The mean (M) is the arithmetic average which is determined by adding all the scores and dividing by the number of scores.

Ø The standard deviation (SD) is the measure of spread of scores in a group.

Types of standard scores

§ The most common types of standard scores are:

Ø Z-Score

Ø T-score

z- Score

§ The simplest standard score

§ Expresses test performance simply and directly as the number of standard deviation units a raw score is above or below the mean.

§ Formula:

§ z-score =

X - M /SD

M SD

Example:

Find z-score for the raw scores of 58 and 50 if M

=56 and SD = 4

– If X =58 then z = = 0.5

– If X =50 then z =

5856

= -1.5

– A z-score is always negative when the raw score is

smaller than the mean

5056

T-score

• Any set of normally distributed standard scores that has a mean of 50 and standard deviation of 10

• T-score (linear conversion) can be obtained by multiplying the z-score by 10 and adding the product by 50.

• i.e. T-score = 50 + 10z

• e.g. if z=0.5 then T = 50 + 10(0.5)=55

if z =-1.5 then T =50 + 10 (-1.5) = 35

Test Norms

§ Test norms merely represent the typical performance of pupils in the reference groups on which the test was standardized - should not be viewed as desired goals or standards.

§ How to develop test norms

Ø Test developer decides on population the norm sample is supposed to represent

Ø Identifies ideal sample to represent that population use most resent census data.

Ø Seek cooperation of school administrators.

Ø Test students in the sample.

Types of Test Norms

§ The most common types are

Ø Grade norms: grade group in which pupil’s raw score is average

Ø Percentile norm: percentage of pupils in the reference group who fall below pupils raw score

Ø Standard score norm: Distance of pupil’s raw score above or below the mean of the reference group in terms of standard deviation units.

Difficulties in developing test norms:

§ Need scores of large and representative sample examinees (not easy)

§ Testing must always be under standard conditions Ref. to Tanzania not possible

§ Student must be willing to take time in developing the norming process.

TEST (ITEM) ANALYSIS

§ This is a process of apprising classroom tests in order to determine their effectiveness of each test item

§ It is the process of examining the students‘ responses to each test item, to judge the quality of the item.

§ It is performed after a test has been done and scored

§ It done by analyzing students’ responses on each item

§ Our interest in item analysis is to establish whether each item in the test functioned as it was intended

§ Specifically, what one looks for is the difficulty and discriminating ability of the item as well as the effectiveness of each alternative.

§ We need to understand whether:

Ø Each item was able to discriminate between the best and weak students in terms of achievement

Ø Each item was able to measure the effect of teaching and learning process

Ø The items were of appropriate difficulty

Ø The items were free from clues

Ø Each alternative in the multiple choice questions was an effective distracter

Types of Item Analysis

1. Analysis for Norm-Referenced Test

§ Norm-referenced tests are used to discriminate between low and high achievers whereas criterion-referenced tests measure the effect of instruction

§ The procedures for analyzing items in norm- referenced tests differ from those of analyzing items in criterion-referenced tests

Procedures of analyzing items in a norm-referenced test

i. Rank/arrange the test papers from the highest to the lowest score

ii. From the ordered set of papers make two groups: Put those with the highest scores in one group (the top half) and those with the lowest scores in the other group (the bottom half). i.e. select about 25 percent of the papers from the top and 25 percent from the bottom

iii. Put aside the middle papers as they will not be used in the analysis

iv. For each test item, tabulate the number of students in the upper and lower groups who selected each alternative (in the completion or true-false item, it would be the number who answered the item correctly)

i.e. Record the count as follows for each item (assume a total of 30 papers,15 in each group, for this example), in which the asterisk indicates the correct answer.

Alternatives

	A	B	C*	D	E	Omits
Students in upper group	0	0	15	0	0	0
Students in lower group	4	2	8	1	0	0

v. For each item, compute the percentage of students who get the item correct. This is called the item difficulty index (P),s which can range

from .00 to 1.00. The formula for item difficulty is

Item Difficulty = P= R/T x 100

T=total number of pupils tested.

where R=number of pupils who answered the item correctly and

§ This means that the difficulty index is expressed in percentage

§ In the example in step v, R=23 (this is the total number ofstudents who answered C, the correct answer) and T =30 (the number of students tested). Applying the formula,

P= 23/28 X 100 =77%

vi. Compute the item discrimination index for each item by subtracting the number of students in the lower group who answered the item

correctly from the number in the upper group

who got the item right and dividing by the number of students in either group (e.g., half the total number of students when we divide the group into upper and lower halves).

§ In our example,

Discrimination =D=RU-RL =15-8=.47

1/2xT 15

Where D is the item discriminating power

RD is the number of students in the lower group who got the item right

RU is the number of students in the upper group who got the item right

T is the number of students included in the analysis

§ This value is usually expressed as a decimal and can range from-1.00to +1.00. If it has a positive value, the item has positive discrimination. This means that a larger proportion of the more knowledgeable students than poor students (as determined by the total test score) got the item right).

§ If the value is zero, the item has zero discrimination. This can occur:

i. Because the item is too easy or too hard (if everybody got the item right) everybody missed the item, there would be zero discrimination) or

ii. Because the item is ambiguous.

§ If more poorer than better students get the item right, one would obtain a negative discrimination. With a small number of students, this could be a chance result. But it may indicate that the item is ambiguous or miskeyed.

§ Generally, the higher the discrimination index

NOTE:

§ An item discriminates positively if more students of the upper group get it right than students in the lower group

§ Positive discrimination indicates that the item is discriminating in the same direction as the total tests

§ If all students in the upper group got the item right and all students in the lower group got it wrong, the students will have an index of 1.00 which is the maximum positive discriminating power

§ The opposite will be negative 1.00 index and this is a perfect negative discriminating power.

vii. Evaluate the effectiveness of distracters in each item (i.e. the attractiveness of incorrect alternatives

i.e. how the distracters (incorrect options or alternatives) worked. That is, more poor students than good students should choose incorrect answers.

§ Is done by inspecting the number of students in the upper and lower groups who selected the disc tractors being evaluated. For example, the results in item 1 of a test was follows:

Alternative	A	B	C	D
Students in upper group	10	8	0	1
Students in lower group	6	4	0	4

Ø From the example, there were at least 80 students. The 25% form the upper group were 20 students and the same for the lower group

Ø If A is taken as the correct answer to the item 1, the following interpretations can be made:

• Distracter A: Is a good distracter and functions as intended by attracting more students from the upper group

• Distracter B: Is a poor distracter because it attracted attracting more students from the upper group than students from the lower group

• Distracter C: Is completely ineffective distracter because it attracted no student

• Distracter D: Is a good distracter it functioned as intended by attracting more students from the lower group than from the upper group

NOTE: To improve the discriminating power of the item, one has to revise or replace alternatives B and C

Procedures of analyzing items in a Criterion-referenced test

§ Here we are interested in the extent to which test items are measuring the effects of the teaching and learning process

§ It is determined by administering the same test twice (i.e. before the instruction [pre-test] and after the instruction [post-test]

§ The results are compared at two levels, namely level of analysis in the item comparison and checking the effectiveness of distracters.

Ø Level One: Item Comparison

• Its aim is to compare the index for each item effectiveness by obtaining a measure of sensitivity to instructional effects

S=AR-RB

Where: S is sensitivity of instructional effects

AR is the number of students who got the item right after instruction

RB is the number of students who got the item right before instruction

T is the total number of students who tried the item both times

• Interpretation of the Results is as follows

üIf an item yields a value between 0.00 and 1.00, it is classified as an effective item.

üThe higher the value above 1.00, the more sensitive the item will be to instructional effect

üItems with negative values are not effective since they do not reflect the intended effect of instruction

Ø Level Two: Checking the effectiveness of

distracters

• Is done by checking the frequency with which each distracter is selected by those failing the item.

• Distracters not selected at all or rarely selected should be revised or replaced

Effect of item analysis

§ We get an objective basis for efficient class discussion of the test results

§ We get basis for remedial work in areas seemingly to be difficult to students

§ We get basis for improvement of classroom instruction by revisiting the curriculum on the parts that were too difficult or too easy

§ We get the basis for improving skills in test

FEEDBACK

§ Feedback is an objective description of a student’s performance intended to guide future performance.

§ Is the process of helping our students assess their performance, identify areas where they are right on target and provide them tips on what they can do in the future to improve in areas that need correcting

Informative/meaningful Feedback

§ Informative feedback calls for consideration of the following:

Ø Areas of strength

Ø Areas of weakness

Ø Why students went wrong in answering the question (s)

Ø Helping students to improve future perfomance

Importance of effective feedback

§ Shows where we are in relationship to the objectives and what we need to do to get there.

§ Helps students see the assignments and tasks we give them as opportunities to learn and grow rather than as assaults on their self- concept.

§ Allows us to tap into a powerful means of not only helping students learn, but helping them get better at learning.”

Cont…

§ Helps students how to improve the next time they engage the task.

Feedback Focus

Characteristics of Effective Feedback

§ Timely

Ø “The more delay that occurs in giving feedback, the less improvement there is in achievement.” (Marzano(1), p. 97)

Ø As often as possible, for all major assignments

§ Constructive/Corrective

Ø What students are doing that is correct

Ø What students are doing that is not correct

Ø Feedback areas should relate to major learning goals and essential elements of the assignment

learning

Should be encouraging and help students realize that effort on their part results in more

§ Specific to a Criterion

Ø Precise language on what to do to improve

Ø Reference where a student stands in relation to a specific learning target/goal

Ø Specific to the learning at hand

Ø Based on personal observations

§ Focused on the product/behavior – not on the student

§ Verified

Ø Did the student understand the feedback?

Ø Opportunities are provided to modify assignments, products, etc. based on the feedback

student in these areas?)

What is my follow up plan to monitor and assist the

REPORTING TEST PERFORMANCE

§ Test scores (marks and grades) must be reported and interpreted meaningfully and correctly

§ Several ways exist for reporting test scores but each has its strengths and limitations. The following are the commonly used methods:

a. Reporting places/positions of individuals in each subject e.g. John got 64 out of 100

b. Reporting scores as a percentage

Summary and Concluding Remarks

§ Assessing and evaluating students learning is a necessary step in any formalized educational system

§ The is always a need to develop some valid and reliable assessment of students’ progress, potential and standing relative to the peer group

§ Assessment should be clear and understood by key players- students, teachers, parents, and

Cont…

even employers

§ The purpose of exams must be considered with respect to two criteria:

Ø Validity: Any evaluation must measure what it is supposedly designed to measure

Ø Reliability: the evaluation must be consistent when used with the same or similar students

Thanks for your attention

Pages

Sunday, July 11, 2021

Test Assembling and procedures....

MODULE THREE ----

Test Assembling, Test administration and Test Analysis

Module Coverage

Intended Learning Outcomes

Assembling Classroom Tests

Intended Learning Outcomes

Introduction

Definition of Concept Test

Effective Test assembling

Building a table of specifications

A. THE PROCESS OF ASSEMBLING CLASSROOM TESTS

Important Steps to be Followed.

1. Recording test items

§ When constructing the test items, it is desirable to write each one on a separate sheet of paper (index card). This allows modification and improvement of the item.

ii. Arranging the Test Items

iii. Writing test directions

iv. Reviewing test items

v. Reproducing the test

B. ADMINISTERING CLASSROOM TESTS

Some additional considerations

C. RECORDING AND SCORING THE

Marking/Scoring

Types of marking

Some Considerations in Marking

Test Scoring

Ø It is done either in the booklets themselves or on separate answer sheets

D: SUMMARIZING AND INTERPRETING TEST RESULTS

Interpreting Test Scores

Summarizing and interpreting

v Methods of tabulating and graphing data

v Various kinds of distributions

Interpreting Raw scores

Criterion-referenced interpretation

Norm-Referenced Interpretation

Derived Score

What is a derived score?

§ These are interpreted in relation to established norm

a. Percentile rank (percentile score)

Standard score

Types of standard scores

z- Score

T-score

Test Norms

Types of Test Norms

Difficulties in developing test norms:

TEST (ITEM) ANALYSIS

Types of Item Analysis

Procedures of analyzing items in a norm-referenced test

Procedures of analyzing items in a Criterion-referenced test

Effect of item analysis

FEEDBACK

Informative/meaningful Feedback

Importance of effective feedback

Cont…

Feedback Focus

Characteristics of Effective Feedback

REPORTING TEST PERFORMANCE

Summary and Concluding Remarks

Cont…

Featured Post

Quick Links

Search This Blog

YR Links