MODULE THREE ----
Test Assembling, Test
administration and Test Analysis
Module Coverage
v
Assembling of classroom tests
v
Test administration
v
Recoding and scoring the answers
v
Summarizing test results
v
Item
analysis: Assessing the level of difficult and discrimination
v
Giving Feedback
v
Reporting test performance
Intended Learning Outcomes
By the end of this module,
students should be able:
§
To
define correctly the concepts test assembling, test administration and marking
§
To outline
the advantages of preparing and assembling tests in
advance
§
To describe
procedures of appraising the classroom tests
§
To describe
procedures for giving
constructive feedback
Assembling Classroom Tests
v Presentation Outline
q
Introduction
q
Definition of the concepts
test assembling
q
Process of assembling classroom tests
q
Test administration
q
Recoding and scoring the answers
Intended Learning Outcomes
By the end of this lecture,
EP 300 students should be able:
Ø
Appreciate the importance of paying careful
attention to the assembly, reproduction, administration, and scoring
aspects of class-
room tests.
Ø
Follow the guidelines for assembling the various item formats into a
test.
Ø
Appreciate the importance of having clear,
concise directions for the student.
Ø
Follow the guidelines
for writing test directions.
Ø
Appreciate the importance of encouraging all students
to attempt all test items, even though they may be unsure of the correctness of their answers.
Ø
Follow
the guidelines for laying out and reproducing
the test.
Ø
Recognize the importance of physical and psychological
conditions in test taking.
Ø
Understand
why cheating must be discouraged and know how to
minimize it.
Introduction
§
The results
of classroom tests provide very important information that can be used to make serious decisions
that affect the lives of individual students, their future lives as well as the lives of their families.
§
Thus,
tests and examinations should be well planned
and administered if they are to collect
valid and reliable information that truly reflects the individual's ability.
§
It should be noted that objective
tests such as multiple
choice tests and some variants of the true- false
items, cannot be administered orally. Neither
can the items be written on the black board a few minutes before the test or examination
is scheduled to begin.
§
Thus the test must be prepared in advance and reproduced.
In this lecture, therefore, we will focus on test assembling, reproduction and administration
Definition of Concept Test
assembling
v What
is test assembling?
Ø
It is the preparation of test items for use in a
test
v Is
involves:
Ø
Writing tests
items at least several days before they
can be used,
Ø
Grouping together
similar item formats in order to have clear and concise directions, and
deciding upon the manner in which the pupils
are to record their answers,
Ø
Constructing extra
test items
Effective Test assembling
§
Effective test assembling calls for consideration of two important
sets of factors:
a.
Factors about the individuals to be tested: These include
consideration of the following:
Ø
Why test your students
at that particular point in time?
Ø
It is the most opportune time for you to collect
the data?
Ø
It is the time when students
are ready to be tested?
Ø
Are you really ready to assess
the learners at such material time
b.
Factors about what to be tested:
These include
Ø
What shall be assessed?
Ø
Why should it be assessed?
Ø
How should it best be assessed?
Note: To best address these
issues, one needs
to have a table of specification of
educational objectives (also known as
the blue print/ test specification/ test blue print or test grid.
§
A table of specification of educational objectives is a two-way chart that relates
the instructional objectives/competences and content.
§
Importance of the Table of Specification:
Ø Helps to balance between what is
tested and what was taught
Ø Helps to determine the kinds of
learning outcomes/skills that will be tested/assessed
Ø Helps to determine the kind of content knowledge to be covered
|
This implies that:
Ø
The test should reflect
the content and objectives/competences in proportion to the importance given in instruction as reflected in the amount
of time spent for that content.
Ø
Priority subject
matter and/or objectives will be assessed using more items than less
important subject matter and/or objectives.
Building a table of specifications
§
Involves:
Ø Preparing a list of instructional objectives/competences
Ø Outlining the instructional content
Ø Preparing the two way chart by listing the major content
areas down the left side of the table
§
According
to Bloom’s cognitive taxonomy, there six
major categories of objectives that are arranged in the hierarchical order on the basis of the complexity of the task. Each of these six classes is
subdivided further.
These classes comprise:
a.
Knowledge (the
simplest): is defined
as the remembering of previously learned material.
v Verbs used: define, distinguish, acquire, identify, recall,
recognize, etc.
b.
Comprehension: is defined as the ability
to understand the meaning of material.
v Verbs used: Translate, transform, give in words, illustrate,
prepare, read, represent, change, rephrase, restate, interpret, reorder, rearrange, differentiate, distinguish, make, draw, explain, demonstrate, estimate, infer, conclude,
predict, differentiate, determine, extend, interpolate, extrapolate, fill in, draw, etc.
c.
Application: is defined as the ability
to use learned material in new situations.
v Verbs used: Apply, generalize, relate, choose, develop, organize, use, employ, transfer,
restructure, classify, etc.
d.
Analysis: refers to the ability
to break material
down into specific
parts so that the over-all
organizational structure may be comprehended.
v Verbs used: Distinguish, detect,
identify, classify, discriminate, recognize, categorize,
deduce, analyze, contrast, compare,
distinguish, distinguish, detect,
deduce, etc.
e.
Synthesis: is the ability
to put parts together to form a whole.
v Verbs used: Write, tell,
relate, produce, constitute,
transmit, originate, modify,
document, propose, plan, produce, design,
modify, specify, derive,
develop, combine, organize,
synthesize, classify, to
deduce, develop,
formulate, modify, etc
f.
Evaluation (the
most complex): Refers to the ability
to judge the worth of material for a given purpose.
v
|
§
Each test item should be evaluated
against the ability and content areas, and this is the
only way of achieving validity.
i.e. each item need to be recorded with enough space, so that one
also record such information as:
üThe
instructional objective being addressed
üThe
learning outcome being measured
üThe content area being used to measure
it
A. THE PROCESS OF ASSEMBLING CLASSROOM TESTS
§
Test assembling is not a single-time
event, but something that involves
a continuous process involving
item setting and item reviewing.
§
A good teacher prepares
test items as she/he teaches
in every lesson
§ All
possible test items in every
topic or sub- topic should be identified and assembled
in the course of teaching
Questions to Consider
During Test
Assembling
i.
How should the various
item formats be organized in the
test?
ii.
How should
the various items within a particular format
be organized?
iii.
How
should the test be reproduced?
iv.
Should pupils be
encouraged to answer all test
items, even those they are unsure of?
v.
What
kinds of directions should the student be given?
vi.
Should the students record
their answers directly
in the test booklet or
should a separate answer sheet be used for objective-type tests?
vii.
How should
the test items be analysed?
viii.
How should
the test results be interpreted?
Important Steps to be Followed.
§ For valid and reliable assessment of students’
achievements, the following steps are necessary to followed:
1.
Recording test items
§
When
constructing the test items, it is desirable
to write each one on a separate sheet
of paper (index card). This allows modification
and improvement of the item.
§
The card should also contain:
ü Instructional objectives
ü Specific learning outcome
ü Content measured by the item
ü A
space for item analysis
An example of test item card with item analysis data record at the back
SUBJECT: Geography TOPIC: Weather
OBJECTIVE: Identifies the use of weather measuring
instruments
ITEM
Which of the following instruments is used for measuring atmospheric pressure?
A. Anemometer
B. Barometer
C. Thermometer
D. Hygrometer
Back of the item card
ITEM ANALYSIS DATA |
|||||||||
|
|
Alternatives |
|
|
|
||||
Dates |
Pupils |
A |
B |
C |
D |
E |
Omits |
Diff. |
Disc. |
1/4/2013 |
Upper 10 Lower 10 |
0 2 |
10 4 |
0 1 |
0 3 |
|
0 0 |
70% |
0.6 |
|
Upper 10 Lower 10 |
|
|
|
|
|
|
|
|
|
Upper 10 Lower 10 |
|
|
|
|
|
|
|
|
Comments |
ii. Arranging the Test Items
§
Test items must be as clearly
and humanely as possible.
§
One
way to achieve this is to group all items of the same format together rather than to intersperse them throughout the test.
§
Grouping
test items is advantageous for a variety of
reasons:
a.
Young
children may not realize that the first set of
directions is applicable to all items
of a particular format and may become confused
b.
It makes it easier
for the examinee to maintain
a particular mental set instead of having to change from one to another
c.
It
makes it easier for the teacher to score the test, especially if hand scoring is done.
§
There are various methods
of arranging or grouping
items. The method will vary according to the use of the results of the test items.
§
For most of the classroom purposes, the items can be arranged
by a systematic consideration of:
Ø
the types of items used
Ø
the learning
outcomes measured
Ø
the difficulty of the items
Ø
the subject
matter measured
§
The various
item formats should
be presented in such a way that the
complexity of mental activity required by the
student will progress from the simple to the complex. For example,
simple recall measured by the completion item
should precede the interpretive exercise.
§
Many experts
in the field of assessment recommend the following scheme to be used:
i.
True-false or alternative response
items
ii.
Matching items
iii.
Short answer items
iv.
Multiple choice
items
v.
Interpretive
exercises
vi.
Essay questions
§
The
test items should be arranged so that they are
easily read by the examinees i.e. the reproduction should be legible
and the items not be crowded together.
§
Test
items should also be arranged in such a way that the correct answers follow
a random pattern.
iii.
Writing test directions
§
Test
directions/instructions are rubrics that students are supposed to follow in answering the questions
§
Very often many teachers
include no written
directions assuming that the test items are self explanatory.
The directions provided should
be clear and concise
and should tell the students what they are to do , how they are to do it, and where they are
to record their answers.
§
Whether written, oral, or both, the directions
should include the following:
i.
The purpose
of the test
ii.
The time to
be allotted to the various
sections,
iii.
The value
of the items
iv.
Basis
for answering each item e.g. select the correct answer,
select the best answer etc.
v.
Procedure for recording
answers
vi.
Whether or not students should guess at any answers
they may be unsure of. i.e. what to
do about guessing e.g. if guessing is to be penalized, students
should be told and the method that will be used
to penalize guessing
iv.
Reviewing test items
§
No matter how carefully test items have been prepared, defects may always creep during
construction. Such defects
can be most easily detected by:
Ø Reviewing the items a few days after they have been constructed.
Ø Asking a fellow teacher
to review and comment on them
§
In
reviewing test items, we should try to view the items from pupil’s
viewpoint as well as from that of the test maker.
§
Items review is important in that it:
Ø Helps to make items appropriate to
learners’ outcomes that are intended
to be measured
Ø Makes the items clear and free from ambiguity
Ø Rationalize the level of difficulty
Ø Helps to check if the answers to the items
would be agreed upon by experts
Ø Helps to identify technical
errors and clues to answers
and correct them
v.
Reproducing the test
§
Careful attention to the reproduction phase will not only
make it easier for the examinee, but may also
make hand scoring much easier.
§
In preparing the test materials for reproduction, it is important that the items be spaced and
arranged so that they can be read,
answered, and scored with the least amount of difficulty
§
It is desirable
to proofread the entire test before it is administered. Charts, graphs, and other pictorial
material must be checked to ensure that the reproduction has been accurate and details clear.
§
To assist
both examinee and examiner, the following practices are recommended:
i.
Space the items so that they are not crowded
ii.
For the alternate response
test, have a column of T's
and F's at either the right
or left hand side of the items.
iii.
For matching
exercises, have the two lists on the same page.
iv.
For the multiple choice item that uses a key list,
try to keep all item using a
particular key on the same page. If
this is not possible, the key should be repeated on the new page.
v.
For the interpretive exercise, the introductory Material
be it a graph, chart, diagram, or piece of Prose
and the items based on it should be on the same page.
vi.
All items should be numbered consecutively.
vii.
For
the short-answer items (1 to2words),the blanks
should be numbered and the responses recorded in blanks (vertically arranged and
numbered to correspond to the number of the blank) on one side of the answer sheet used. For
example:
The product of 10 and 7 is (i) times as large as the sum of 8 and 6.
viii.
If
the responses are recorded directly on the test booklet, it will make scoring
easier if all responses to objective items are recorded on one
side (left or right) of the page,
regardless of the item format used.
ix.
In
the elementary grades, if work space is needed
to solve numerical problems, provide this space in the test booklet rather than having
examinees use scratch paper.
x.
All illustrative material
used should be dear, legible,
and accurate.
xi.
Proof the test
carefully before it is reproduced
xi.
Every pupil should have a copy of the test.
B. ADMINISTERING CLASSROOM TESTS
§
In
test administration we are concerned with providing optimum
conditions for obtaining
the pupils’ responses.
§
The
guiding principle in administering tests is that all pupils must be given fair chance to demonstrate their achievement of the
learning outcomes being measured.
§
This can be done by doing the following:
Ø
Providing
conducive physical conditions such as
adequate work place, quiet, proper light and
ventilation, and comfortable temperature
Ø
Providing appropriate psychological climate such as tension-free environment.
•
Psychological
conditions such as tension and anxiety
that have significant influence on test results must be avoided.
•
Some of the things
that cause test anxiety are:
üThreatening pupils with test if
they do not behave.
ü
Warning pupils
to do their best “because
this test is important”.
ü Telling pupils they must work fast in order
to complete the test on time.
ü
Threatening dire consequences if they fail the test.
§
The following
things should be avoided
DURING administering a classroom test:
Ø Do
not talk unnecessarily before the test.
Ø Keep interruptions to a minimum
during test
Ø Avoid giving hints to pupils who ask about
individual question items because the response provided
can give clues to the answers.
Ø Discourage cheating, if necessary.
§
The actual administration of the test is relatively simple, because a properly prepared
classroom test is self-administering
Some
additional considerations
§
When administering the test, the teacher should make
sure that the students understand the directions and that answer sheets, if they are being used with the younger pupils,
are being used correctly.
§
The
teacher should keep the students informed of
time remaining (e.g. writing the time left on the blackboard at 15-minute intervals).
§
Careful
proctoring should take place so that cheating
is eliminated, discouraged, and/or detected.
C. RECORDING AND SCORING THE
ANSWERS
§
Understanding
how examinees will record and score the
answers to objective tests is very important if teachers have to obtain valid and reliable data.
§
Whether
pupils will record their answers directly on
the test papers or use separate sheets depends upon:
Ø
The item format used,
Ø
The age,
Ø
The ability level of the
pupils, and
Ø
The nature
of the content.
§ Generally speaking, separate answer sheets will provide
more accurate and reliable scores, especially
if they are machine-scored.
§ However, for tests involving
computation, it might be better to
have pupils record their answers directly in
the test booklet, rather than use a separate answer sheet.
§ Thus, there are two methods by which
pupils can record their answers:
Ø
In the booklets themselves
Ø
On separate
answer sheets
§ The manner in which the answers are
to be recorded, will be governed (in part at least ) by:
i.
Availability of special scoring
equipment,
ii.
The speed with which test results are needed,
iii.
The monetary
resources available to have the answer sheets
scored by an independent scoring
service.
Marking/Scoring
§ Marking is an important part of teaching
Ø
Marking
scheme is carefully designed beforehand: possible
for standardized and objective item questions
Ø
Marking
scheme should be used just as a guideline for
the expected answer. It should include major
points, characteristics of the answer and the amount of credit to be allocated
to each point
Types of marking
§
Impressionistic marking: feeling of worth
Ø
common for essay
Ø
Some
implicit criteria may be developed for impressionistic marking
Ø
Involves
subjectivity: marking results tend to vary
from one marker to the other (very little consistency
§
Clinical analysis of questions and answers:
Common for mathematics and science marking
§
A combination of impression and marks allocation for specified responses
üRequires systematic analysis of
questions and answers
üTakes time but it helps to judge the sources of difficulties.
Some
Considerations in Marking
§
What key statements should receive marks?
§
How many marks should
be allocated to each point?
§
Are all the errors
the candidate’s faulty,
and therefore should marks
be automatically slashed?
§
What is the
use of the response pattern? Diagnostic , formative or summative
Test
Scoring
§ There are essentially two types
of scoring processes:
a.
Hand scoring:
Ø
It
is done either in the booklets themselves or
on separate answer sheets
Ø
If
the pupils’ answers are recorded on the test
paper itself, a scoring key can be made by marking
the correct answers on a blank copy of the test.
Ø
When
separate answer sheets are used, a scoring
stencil is used.
ü A
scoring stencil is a blank answer sheet with
holes punched where the correct answer should appear.
ü
The stencil
is laid over each sheet, and the number of checks appearing
through the holes are counted
§ In
scoring the completion-type item, the teacher
may:
Ø
Prepare
a scoring key by writing out the answers on
a test paper or may
Ø
Make
a separate strip key that corresponds to the
column of blanks provided
to the students.
•
In
either of the two methods, the teacher or aide can place the scoring key next to the pupils' responses and score the papers
rather quickly.
Ø
Use the silver-overlay answer
sheet.
•
This
is a self-scoring procedure in which the correct answers are previously
placed in the appropriate
squares, and the total answer sheet is
covered by a silver overlay that conceals the
correct answers.
• Students erase the square they feel corresponds to the correct
answers.
• This procedure is used quite
effectively in classroom testing because it provides immediate
feedback.
§ Objective items could encourage
guessing.
§ Correction of guessing
is done by using the following formula:
S=R-W/n-1
Where:
Ø
S=the score,
Ø
R=the items scored correctly
Ø
W= the items scored wrongly and
Ø
n=the number of alternatives for items.
b.
Machine Scoring
§
Is
the simplest way to score select-type objective test items
§ Here students record their answers on
separate answer sheets and the machine is used to score them.
D: SUMMARIZING
AND INTERPRETING TEST RESULTS
§
The process of marking/scoring is usually
followed by summarization and interpretation of test results
§
Effective summarization of the results
depends on various process
such as scoring
and interpretation of scores.
Interpreting Test Scores
§ Interpretation of educational measurements refers to determining the meaning of test scores.
§ Interpreting educational measurements
is not as simple as it is in physical measurements such
as length, height, weight etc.
§ Physical measurements are based on the scale that has a true zero point and have equal units while educational measurements do not have a true zero
e.g. a zero score in PSY 2102 does not mean that a student
knows completely nothing.
§ We can safely say Marx is two times taller than Alex but
we cannot ascertain
the zero
But we cannot say Marx is two times more intelligent
than Alex simply because Marx scored 80 marks and Alex scored 40 marks.
q Thus, we can simply say Marx is more
intelligent than Alex.
Summarizing and interpreting
educational measurement
§ Effective interpretation of test
scores calls for a clear understanding
of the nature of the information that can be inferred
from numbers.
§ Thus, to effectively and correctly interpret
test scores one needs to have
basic but clear knowledge about the following:
v
Different kinds of scores, such as nominal,
ordinal, interval, and ratio scales.
a.
Nominal
scale
Ø
Is the simplest scale of measurement.
Ø
Involves
assigning numerals to different categories that are qualitatively different.
ü For example, for purposes of storing
data on computer cards, we might use the symbol 0 to represent
a female and the symbol 1 to represent a male
Ø
These symbols
(or numerals) do not have any of the
three characteristics (order, distance, or origin) of the real number series.
b.
Ordinal Scales
Ø
An
ordinal scale has the order property of a real number
series and gives an indication of rank order.
Ø It
indicates the magnitude, though only in a very
gross fashion. For example, rankings in a music contest or in an athletic event denote who best, second
best, third best, and so on.
Ø However, the ranks provide
no information with regard
to the differences between the scores.
Ø Ranking is sufficient if our decision
in that it involves selecting the top pupils
for some task, but insufficient if we wish to
obtain any idea
of the magnitude of differences or to use the process to perform
certain kinds of statistical manipulations.
C. Interval
Scale:
Ø Here
we can interpret the distances
between scores. For example,
if Mabula has a score of 60, Mary a
score of 50, and Hamis a score of 30, we could say that the distance Mary’s
and Hamis’ scores (50 to 30) is twice the distance
between Mabula’s and Mary’s scores (60 to 50).
Ø This
additional information is has potentially greater use than just knowing the rank order of the three students.
d. Ratio
Scales
Ø
If one measures with a
ratio scale, the ratio of the
scores has meaning.
For example, a person who is 86" is twice as tall as a person who
is 43".
Ø
Here a measurement of 0 actually indicates no height (i.e. there meaningful zero)
v Methods of tabulating and graphing data
Ø
Frequency distribution
Ø
Histograms
Ø
Frequency polygons
v Various kinds of distributions
Ø
Normal distributions,
Ø
Positively skewed distributions,
Ø
Negatively skewed distributions, and
Ø
Rectangular distributions.
v Basic concepts of descriptive statistics, such as:
i.
Measures of central
tendency,
Ø
Mean
Ø
Mode
Ø
Median
ii.
Measures of variability
Ø
Standard deviation
Ø
Variance
iii.
Measures of correlations
Ø
Pearson product
moment correlation coefficient
(r) is the statistic most often used to give us an indication
of this relationship
§ Several methods measurements have
been proposed, These are broadly categorized into raw scores and derived scores
Interpreting Raw scores
Ø
What is a
raw score?
Ø
It is simply the number of points received
by a student on a test, when the test has been scored according to the marking scheme.
Ø
It is a numerical
summary of students’
set of performance
Ø
Raw
score has no educational meaning (i.e.it is not very meaningful without further information).
For example, John answered
25 items correctly
on an arithmetic test and his raw
score is 25.
Then:
ü
What does a 25 mean?
ü
Is
that a good score?
ü
How many items were in
the test?
ü
What kind of arithmetic problems were presented?
ü
How difficult was the test?
ü
How
does a score of 25 compare with the scores received
by other pupils in the John’s class?
Ø
In
general, we can provide meaning to a raw score
by either:
i.
Describing
it in relation to specific tasks that the pupils
performs. This is known as Criterion Reference interpretation
ii. By using the type of derived score that indicates
the pupils’ relative position in a clearly
defined reference group (Norm Reference interpretation)
Criterion-referenced interpretation
§ Here
we interpret pupil’s
performance in terms of certain
criteria: e.g.
Ø Speed with which a
task is performed (e.g. types 40 words
per minute without error)
Ø Precision with which a task is performed
(e.g. measures the length
of a line within one fourth of a millimeter).
Ø The percentage of items correct on some clearly
defined set of learning tasks (e.g. identifies 80% of the terms used to describe
elements of weather)
Ø
The percentage-correct score
is widely used
in criterion-referenced test interpretation.
Ø
Permits us to describe
an individual’s test
performance without referring to performance
of others.
Norm-Referenced Interpretation
§
This is the most common test interpretation
§
Tells
us how an individual compares with other persons
who have taken the same test.
§
To obtain
a more general framework for norm-
referenced interpretation, raw scores are converted
into some types of
derived scores.
Derived Score
What
is a derived score?
§
Is
a numerical report of test performance on a score scale that has a well-defined characteristic and yields normative meaning.
§
The most common types of derived
scores are:
Ø
Percentile ranks
Ø
Standard scores
§
These are interpreted in relation to established norm
§
Both indicate
the individual student
relative position within a particular group
§
Three forms of comparison can be made when raw scores are transformed into
comparable scores as follows:
Ø
One
student can be compared to other students who sat for the same test
Ø
Student’s performance in one test can be compared to performance in another test
Ø
Student’s performance in one form of a test can be
compared to another
form of a test
§
These
comparisons give us power to predict the students
success in various
areas and enable us to diagnose
students’ strengths and weaknesses.
a.
Percentile rank (percentile score)
§ A Percentile rank or scores indicates a person’s group
rank (relative position)
in terms of percentage
of individuals at or below that person’s score.
§ For
example, if John’s
score is on history gives
him a rank of 10 in the class of 40, we can say that:
Ø
¾ of the class made a lower score
than John
Ø
75 is his percentile rank
Ø
His score is
at 75th percentile
Ø
25% of the pupils are above his
score
§
By
using percentile ranks, the position of individuals in groups of unequal size can be compared.
Standard score
§
What
is a
standard score?
Ø
Is
a score that indicates a pupils relative position in a group by showing
how far the raw
score is above or below the average.
Ø Is described in terms of standard
deviation units from the mean of the distribution
Ø
The mean (M) is the arithmetic average which is determined by adding all the scores
and dividing by the number of scores.
Ø The
standard deviation (SD) is the measure of spread of scores in a group.
Types of standard scores
§
The
most common types of standard scores are:
Ø Z-Score
Ø T-score
z- Score
§
The simplest
standard score
§
Expresses test performance simply
and directly as the
number of standard deviation units a raw score is above or below the mean.
§
Formula:
§
z-score =
M SD
Example:
Find z-score
for the raw scores
of 58 and 50 if M
=56 and SD = 4
– If
X =58 then z = =
0.5
– If
X =50 then z =
5856
4
= -1.5
– A
z-score is always
negative when the raw
score is
smaller than the mean
5056
4
T-score
•
Any set of normally
distributed standard scores
that has a mean of 50 and standard deviation of 10
•
T-score (linear
conversion) can be obtained by multiplying
the z-score by 10 and adding the product by 50.
•
i.e. T-score
= 50 + 10z
•
e.g. if z=0.5 then T = 50
+ 10(0.5)=55
if z =-1.5 then T =50 + 10 (-1.5) =
35
Test
Norms
§ Test
norms merely represent
the typical performance of pupils in the reference groups on which the test was standardized - should not be viewed as
desired goals or standards.
§ How
to develop test norms
Ø Test
developer decides on population
the norm sample is supposed
to represent
Ø
Identifies
ideal sample to represent that population use most resent census data.
Ø Seek
cooperation of school
administrators.
Ø Test
students in the sample.
Types of Test Norms
§ The
most common types are
Ø
Grade
norms: grade group in which pupil’s raw score is average
Ø
Percentile norm: percentage of pupils in the reference
group who fall below pupils raw score
Ø
Standard score norm: Distance
of pupil’s raw score
above or below the mean of the reference group in terms
of standard deviation
units.
Difficulties in developing test norms:
§ Need scores of large and
representative sample examinees
(not easy)
§ Testing must always be under standard conditions Ref. to Tanzania
not possible
§ Student must be willing to take time
in developing the norming process.
TEST (ITEM) ANALYSIS
§
This
is a process of apprising classroom tests in
order to determine their effectiveness of each test item
§
It
is the process of examining the students‘ responses
to each test item, to judge the quality of the item.
§
It is performed after a test has been done and scored
§
It done by analyzing students’ responses on each
item
§ Our
interest in item analysis is to establish
whether each item in
the test functioned as it was intended
§ Specifically, what one looks
for is the difficulty and discriminating
ability of the item as well as the effectiveness of each alternative.
§
We need to understand whether:
Ø
Each
item was able to discriminate between the best and
weak students in terms of achievement
Ø
Each item was able to measure
the effect of teaching
and learning process
Ø
The items were
of appropriate difficulty
Ø
The items were free from clues
Ø
Each alternative in the multiple choice questions was an effective
distracter
Types of Item Analysis
1.
Analysis for Norm-Referenced Test
§ Norm-referenced tests are
used to discriminate between
low and high achievers whereas criterion-referenced tests measure the effect of instruction
§
The
procedures for analyzing items in norm- referenced tests differ from those of analyzing items
in criterion-referenced tests
Procedures of analyzing items in a norm-referenced test
i.
Rank/arrange the test papers from the highest to the lowest score
ii.
From the
ordered set of papers make two groups: Put those with the highest
scores in one group (the top half) and those with the
lowest scores in the other group (the
bottom half). i.e. select about 25
percent of the papers from the top
and 25 percent from the bottom
iii.
Put aside
the middle papers as
they will not be used in
the analysis
iv.
For
each test item, tabulate the number of students in the upper and lower groups who selected each alternative (in the completion or
true-false item, it would be the number who answered
the item correctly)
i.e. Record the count as follows for each item (assume a total of 30 papers,15 in each group, for this example), in which the asterisk indicates
the correct answer.
Alternatives
|
A |
B |
C* |
D |
E |
Omits |
Students in upper group |
0 |
0 |
15 |
0 |
0 |
0 |
Students in lower
group |
4 |
2 |
8 |
1 |
0 |
0 |
v.
For each item, compute the percentage of students who get the item correct.
This is called the item difficulty index (P),s which can range
from .00 to 1.00. The formula for item difficulty is
Item Difficulty
= P= R/T x 100
|
§
This means that the difficulty index is expressed in percentage
§
In the example in step v, R=23 (this is the total number
ofstudents who answered
C, the correct answer) and T =30 (the number of students tested). Applying
the formula,
P= 23/28 X 100 =77%
vi.
Compute the item discrimination index for each item by subtracting the number of students in the lower group who answered the item
correctly from the number in the upper group
who got the item right and dividing by the number of students
in either group (e.g., half the
total number of students when we divide the group into upper and lower halves).
§
In our example,
Discrimination =D=RU-RL =15-8=.47
1/2xT 15
Where D is the item discriminating power
RD is
the number of students in the lower group who got the item right
RU is
the number of students in the
upper group who got the item right
T is
the number of students included in the analysis
§ This value is usually expressed as a
decimal and can range from-1.00to
+1.00. If it has a positive value, the
item has positive discrimination. This means that a larger proportion of the more knowledgeable students than poor students (as determined
by the total test score) got the item right).
§ If
the value is zero, the item has zero discrimination. This can occur:
i.
Because the item is too easy or too hard (if everybody got the item right) everybody
missed the item, there
would be zero discrimination) or
ii.
Because the item is ambiguous.
§ If more poorer than better students
get the item right, one would obtain
a negative discrimination. With a small number of students, this could be a chance result. But it may indicate that
the item is ambiguous or miskeyed.
§ Generally, the higher the discrimination index
NOTE:
§ An item discriminates positively if
more students of the upper group get it right than students
in the lower group
§ Positive discrimination indicates that the item is discriminating in the same direction as
the total tests
§ If all students in the upper group
got the item right and all students
in the lower group got it wrong, the
students will have an index of 1.00 which is the maximum
positive discriminating power
§ The
opposite will be negative
1.00 index and this
is a perfect negative
discriminating power.
vii. Evaluate the
effectiveness of distracters in each item (i.e. the attractiveness of incorrect alternatives
i.e. how the distracters (incorrect options or alternatives) worked. That is, more poor students than good students
should choose incorrect answers.
§ Is done by inspecting the number of
students in the upper and lower
groups who selected the disc tractors being
evaluated. For example,
the results in item
1 of a test was follows:
Alternative |
A |
B |
C |
D |
Students in upper group |
10 |
8 |
0 |
1 |
Students in lower
group |
6 |
4 |
0 |
4 |
Ø
From the example, there were
at least 80 students. The 25% form the upper group were
20 students and the same for the lower group
Ø
If A is taken as
the correct answer to the item 1, the following interpretations can be made:
•
Distracter A: Is
a good distracter and
functions as intended by attracting
more students from the upper group
•
Distracter
B: Is a poor distracter because it attracted
attracting more students from the upper
group than students from the lower group
•
Distracter C: Is completely ineffective distracter because
it attracted no student
•
Distracter
D: Is a good distracter it functioned as
intended by attracting more students from the lower group than from
the upper group
NOTE: To improve the discriminating power of the item, one has to revise or replace alternatives B and C
Procedures of analyzing items
in a Criterion-referenced test
§ Here we are interested in the extent
to which test items are measuring
the effects of the teaching and learning process
§ It
is determined by administering the same test twice
(i.e. before the instruction [pre-test] and after
the instruction [post-test]
§ The
results are compared
at two levels, namely level
of analysis in the item comparison and checking the effectiveness of distracters.
Ø Level One: Item Comparison
•
Its aim is to compare the index for each
item effectiveness by obtaining a
measure of sensitivity to instructional effects
S=AR-RB
T
Where: S is sensitivity of instructional effects
AR is
the number of students who got the item right after instruction
RB is the number of students
who got the item right before instruction
T is the total number of students
who tried the item both times
•
Interpretation of the Results
is as follows
üIf an item yields a value between
0.00 and 1.00, it is classified as an effective item.
üThe higher the value above 1.00, the
more sensitive the item will be to instructional effect
üItems with negative values are not
effective since they do not reflect the intended effect
of instruction
Ø Level Two: Checking the effectiveness of
distracters
•
Is
done by checking the frequency with which each distracter is selected by those
failing the item.
•
Distracters not selected at all or rarely selected
should be revised
or replaced
Effect of item analysis
§ We
get an objective basis for efficient class
discussion of the test results
§ We
get basis for remedial work in areas
seemingly to be difficult to students
§ We get basis for improvement of
classroom instruction by revisiting the curriculum on the parts
that were too difficult or too
easy
§ We
get the basis for improving skills in test
FEEDBACK
§ Feedback is an objective description
of a student’s performance intended to guide future
performance.
§ Is the process of helping our
students assess their performance, identify areas where they are right on target and provide them
tips on what they can do in the
future to improve in areas that need correcting
Informative/meaningful Feedback
§ Informative feedback calls for consideration of the following:
Ø
Areas of strength
Ø
Areas of weakness
Ø
Why
students went wrong in answering
the question (s)
Ø
Helping students
to improve future perfomance
Importance of effective feedback
§
Shows where we are in relationship to the objectives and what we need to do to get there.
§
Helps students see the assignments and tasks we give them as opportunities to learn and grow rather than as assaults on their self- concept.
§
Allows us to tap into a powerful means of not only helping students learn, but helping
them get better at learning.”
Cont…
§
Helps students
how to improve the next time they
engage the task.
Feedback Focus
Characteristics of Effective Feedback
§ Timely
Ø “The more delay that
occurs in giving feedback, the less improvement there is in achievement.” (Marzano(1), p. 97)
Ø As often as possible, for all major assignments
§ Constructive/Corrective
Ø What students
are doing that is correct
Ø What students
are doing that is not correct
Ø
Feedback areas should relate to major
learning goals and essential elements
of the assignment
Ø
|
§
Specific to a Criterion
Ø Precise language
on what to do to improve
Ø
Reference where a student stands in relation to a specific
learning target/goal
Ø Specific to the
learning at hand
Ø Based on personal observations
§
Focused on the product/behavior – not on the student
§
Verified
Ø Did the student understand the feedback?
Ø Opportunities are provided to modify assignments, products, etc. based on the feedback
Ø
|
REPORTING TEST PERFORMANCE
§ Test
scores (marks and grades) must be reported
and interpreted meaningfully and correctly
§ Several ways exist for reporting test scores but each has
its strengths and limitations. The following are the commonly used methods:
a.
Reporting places/positions of individuals in each subject
e.g. John got 64 out of 100
b.
Reporting scores as
a percentage
Summary and Concluding Remarks
§ Assessing and evaluating students
learning is a necessary step in any formalized educational system
§ The is always a need to develop
some valid and reliable
assessment of students’ progress, potential
and standing relative to the peer group
§ Assessment should be clear and
understood by key players-
students, teachers, parents, and
Cont…
even employers
§ The
purpose of exams must be considered with respect to two criteria:
Ø Validity: Any evaluation must measure what it is supposedly designed
to measure
Ø Reliability: the evaluation must be consistent when used with the same or similar students
Thanks for your
attention