
www.Usenet.com
| <-- __Chronological__ | <-- __Thread__ --> |
In article <[EMAIL PROTECTED]>, "Stephen J. Herschkorn" <[EMAIL PROTECTED]> writes:
>>>>>Q Julie kept a record of the number of pairs of shoes she had.
>>>>>
>>>>>year no of pairs of shoes
>>>>>
>>>>>1996 11
>>>>>1997 9
>>>>>1998 12
>>>>>1999 10
>>>>>2000 13
>>>>>
>>>>>The mode is
>>>>>a. 1996
>>>>>b 1997
>>>>>c 1998
>>>>>d 1999
>>>>>e 2000
>>>>>
>>>>>the correct answer given was e 2000.
>>>>>
...
>>>In any case, the teacher is wrong. S/he may have had something else
>>>(i.e., 1996 appearing eleven times, etc.) in mind, but that would be a
>>>rather unusual (even nonsensical) data set for this situation.
>>>
>>>
>>
>>Minor change in question produces a plausible situation:
>>
>>} Q Julie has marked each pair of shoes with the year in which she
>>} purchased them, and now wants to know which mark occurs most
>>} frequently.
>>}
>>} year no of pairs of shoes marked with that year
>>}
>>} 1996 11
>>} 1997 9
>>} 1998 12
>>} 1999 10
>>} 2000 13
>>}
>>} The mode is
>>} a. 1996
>>} b 1997
>>} c 1998
>>} d 1999
>>} e 2000
>>
>>I think we can agree that (e) is the right answer to this question.
>>
> Bestimmt. But here we are twisting the wording of the question to fit
> the intended answer. As far as I can tell (from this and other examples
> elsewhere in this thread), the teacher has the misconception that the
> mode is defined as the argmax of a natural-number-valued function.
One strategy when responding to a multiple choice question is to
choose a problem interpretation that fits the available answers,
even when this interpretation is literally incorrect.
I've been trying to come up with a good understanding of the
misunderstandings/misinterpretations/ambiguities here. And I have
a few thoughts.
One problem area is the presentation format for data sets. If
you have a set of 55 shoes and a year of purchase for each shoe,
you have two choices. You can present that as a list of 55
year numbers. Or you can present it as a frequency table of 5
ordered pairs (year, number of shoes purchased in that year)
On a data set for which it is useful to calculate a mode, it is
likely that the frequency table presentation format is going to
be far more compact than the list format.
If we assume that the question is being asked about the 55 element
data set (year of purchase for each of 55 shoes) and that this data
set is being presented in frequency table format then everything
makes sense.
In a perfect world, we'd be able to look at a data set, see that
it presented as a list of ordered pairs and say "ahah! -- it's a
data set in frequency table format" or we could look at a list of
values and say "ahah! -- it's a data set in straight list format".
But in the real world, we'll get data presented in many ways and
with lots of relevant and irrelevant data tacked on. Even with a single
variable data set, we may find an index value presented. e.g
1 11
2 9
3 12
4 10
5 13
We are perfectly willing to ignore the line number when computing
the mean, median or mode of the second column.
And sometimes the index column is semi-meaningful.
1996 11
1997 9
1998 12
1999 10
2000 13
Again, we are perfectly correct to ignore the first column when
computing the mean, median or mode of the second column.
But this last format is _indistinguishable_ from the frequency
table presentation format for the data set:
1996
1996
1996
1996
1996
1996
1996
1996
1996
1996
1996
1997
1997
1997
1997
1997
1997
1997
1997
1997
1998
1998
1998
1998
1998
1998
1998
1998
1998
1998
1998
1998
1999
1999
1999
1999
1999
1999
1999
1999
1999
1999
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
I know which presentation format I'd rather see when I'm asked to
compute a mode.
John Briggs
| <-- __Chronological__ | <-- __Thread__ --> |