Statistics Mode - Calculating with LINQ

by 1. August 2010 12:03

Measures of central tendency, aka averages, is an attempt to find a single value that represents a series of values.  Mode is a kind of average in statistics that is defined as the most frequently occurring value.  [1][2] 

In the following collection there are 3 "F", 2 "Female", 3 "M" and 1 "Male".   Since the "most frequently occurring value is both "F" and "M" (both have 3) this collection is bimodal [2]. Had there been more than two then it would have been multimodal [2].   As a result Mode cannot be used as a  measure of central tendency for this data [1].

However, we can see that there are really 5 females and 4 males, the reality is that the Mode for the following collection is Female  (the most frequently observed data value).   To get the proper mode we first have to simplify this parameter of the population.

Note: Dump() is an extension method of LinqPad that displays the contents.

Below you'll see we'll generate a new list (genderLetters) composed of only a single character; we then sort the list so that we can visually see the results.

With a couple of lines of LINQ code above we can get a clean sorted list that better represents this parameter of the population.  All that remains is to process the counts so that we can determine the Mode - below is all of the LINQ source code that is required:

The GetMode() method above was originally written for an int array provided by reference [2], thus the naming conventions used within the method.   Note in the image below that we use the very same method GetMode() to process our list of integers which produces a record representing the Mode for the array - in this case there are three sixes which is the most frequently observed data value.

Generics permit us to use the same LINQ query for both strings as well as int.  Note below that I was able to simply use GetMode(season) because "int" was inferred where above we had to explicitly provide the type, e.g., GetMode<string>(genderLetters) - required because the compiler was not able to infer the type from the dynamically generated list.

Below we use query.First() to get the Mode since we are sorted in descending order; likewise above we could have done a g.First() to receive it's Mode. 

Above source code Mode.linq (1.29 kb) 

Go to http://www.linqpad.net/ to get a free program to run the source code.  After finding this gem of a program I purchased the book and was not disappointed; it is one of the few books I own that I reguarly use for reference.

References

1. ^ Clark/Schkade. "Statistical Analysis for Administrative Decisions". 
    South-Western Publishing Co. 1983. pp. 24-26.
2. ^  http://www.statcan.gc.ca/edu/power-pouvoir/ch11/mode/5214873-eng.htm

VS2010 Source code w/Unit Test for int, double and string =>  Statistics.Library.zip (56.78 kb)

Notice

Blog videos and references to CodePlex projects are no longer valid