Assessment

Validity
Reliability
Fairness
Purposes of Assessment
One more distinction

Assessment makes teaching into teaching. Mere presentation—without assessment of what the learners have made of what you have offered them—is not teaching. So assessment is not a discrete process, but integral to every stage of teaching, from minute to minute as much as module to module.

And informal assessment (or evaluation) is going on all the time. Every time a student answers a question, or asks one, or starts looking out of the window, or cracks a joke, he is providing you with feedback about whether learning is taking place. It's more an evaluation of the teaching session than about his learning, but the two are inextricable.

Assessment “reaches back” into the rest of teaching: in particular, poorly designed formal assessment regimes can severely hinder student learning and distort the process and subject matter.

All assessment is ultimately subjective: there is no such thing as an “objective test”. Even when there is a high degree of standardisation, the judgement of what things are tested and what constitutes a criterion of satisfactory performance is in the hands of the assessor.

However, we can still make every effort to ensure that assessment is valid, reliable and fair.

Validity

A valid form of assessment is one which measures what it is supposed to measure.

It does not assess memory, when it is supposed to be assessing problem-solving (and vice versa).
It does not grade someone on the quality of their writing, when writing skills are not relevant to the topic being assessed, but it does when they are.
It does seek to cover as much of the assessable material as practicable, not relying on inference from a small and arbitrary sample (and here it spills over into reliability).

Unfortunately, no assessment is completely valid, and assessment creep or drift is endemic.

Reliability

Or "replicability". A reliable assessment will produce the same results on re-test, and will produce similar results with a similar cohort of students, so it is consistent in its methods and criteria.

Fairness

This is really an aspect of validity, but important enough to note in its own right. Fairness ensures that everyone has an equal chance of getting a good assessment. This may include (where appropriate) anonymity of submitted material, so that extraneous considerations (such as the quality of contributions in seminars, if they are not part of the assessment scheme) cannot influence the final result.

Purposes of Assessment

The traditional distinction is between summative and formative assessment.

Summative assessment is what students tend to focus on. It is the assessment, usually on completion of a course or module, which says whether or not you have "passed". It is—or should be—undertaken with reference to all the objectives or outcomes of the course, and is usually fairly formal.

Considerations of security—ensuring that the student who gets the credit is the person who did the work—assume considerable importance in summative assessment, which may push in the direction of using conservative approaches such as examinations, which are not necessarily highly valid.

Note that all summative assessment can also be formative, if the feedback offered is sufficient.

Formative assessment is going on all the time. Its purpose is to provide feedback on what students are learning:

to the student: to identify achievement and areas for further work
to the teacher: to evaluate the effectiveness of teaching to date, and to focus future plans.

While grades or marks may assume primary importance in summative assessment, their role in formative assessment is simply to contribute to the feedback process: marks awarded against specific criteria (such as "use of sources", "presentation of argument") may be much more use than global judgements.

One more distinction

It is also possible to distinguish between Norm- and Criterion- and even Ipsative-referenced assessment schemes.

Norm-referencing is basically competitive: it is a ranking exercise. Out of any given group, the top 5% get "A"s, the next 10% get "B"s, etc. and the bottom 50% fail. (The figures are of course arbitrary) This may be fair enough when the purpose is to select for a fixed and limited number of positions, such as jobs or places on a course or a sports team. The quality, however, can vary widely from group to group of candidates. It may reassure the public in sensitive areas, because a fixed proportion of candidates is always rejected, but can be grossly unfair. It also effectively demands a test in which less able candidates are progressively rejected, like a high-jump competition in which the bar is progressively raised until competitors fail to jump it (or contestants are progressively voted off a reality-TV show). IQ tests tend to be structured like this, and of course the IQ is a norm-referenced measure.

Criterion-referencing is the term used for assessment against fixed criteria. [Personal beef here: "criterion" is the singular, "criteria" the plural: I heard someone refers to "criterias" the other day!] Theoretically, it can mean that everyone who undertakes a given assessment may pass it, or no-one might. Even norm-referencing requires reference to criteria, of course, but full criterion-referencing ignores the statistical implications of the assessment profile: it is thus inherently fairer, as long as the criteria are determined in advance, and they are valid and reliable. In practice of course, the criteria are often worked out on the basis that some notional percentage of candidates will reach them... so norm-referencing creeps in by the back door.

And then there is ipsative assessment, which is assessment against yourself, or more particularly against your own "personal best" performance. It is more relevant to performance coaching, special needs education and therapy than to most mainstream teaching.

The story goes that at a college of one of our ancient universities, the rowing crew desperately needed the services of a certain undergraduate, who was not noted for his academic prowess, and who was in danger of being thrown out if he did not pass his end-of-year history exams, which took the form of a viva.

He had to score a minimum of 50% to pass and retain his place. The examiner first asked him when the New Poor Law was passed. He guessed at 1650. This was incorrect. [Look it up!]

The examiners were getting desperate—the college's reputation was at stake—so the examiner said: "Now listen carefully, and do not guess. Do you know what significant event took place in 1776?"

The undergraduate thought for a moment and then said, regretfully, "No". This was obviously correct, so he passed and the college was saved.

Valid? Reliable? Fair? Norm-referenced? Criterion-referenced? Ipsative? Discuss!

↑ Top

Views on request

This is an archived copy of Atherton J S (2013) Learning and Teaching [On-line: UK] Original material by James Atherton: last up-dated overall 10 February 2013

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.

"This site is independent and self-funded, although the contribution of the Higher Education Academy to its development via the award of a National Teaching Fellowship, in 2004 has been greatly appreciated."