Simple Method to Calculate Median in Python
March 17, 2008 at 10:14 pm 19 comments
(Note: Please see my latest posts at my new blog!)
def getMedian(numericValues):
theValues = sorted(numericValues)
if len(theValues) % 2 == 1:
return theValues[(len(theValues)+1)/2-1]
else:
lower = theValues[len(theValues)/2-1]
upper = theValues[len(theValues)/2]
return (float(lower + upper)) / 2
def validate(valueShouldBe, valueIs):
print “Value Should Be: %.6f, Value Is: %.6f, Correct: %s” % (valueShouldBe, valueIs, valueShouldBe==valueIs)
validate(2.5, getMedian([0,1,2,3,4,5]))
validate(2, getMedian([0,1,2,3,4]))
validate(2, getMedian([3,1,2]))
validate(3, getMedian([3,2,3]))
validate(1.234, getMedian([1.234, 3.678, -2.467]))
validate(1.345, getMedian([1.234, 3.678, 1.456, -2.467]))
Entry filed under: CodeSnippet, Python, Statistics. Tags: Python, Statistics.
1. cw | September 30, 2008 at 4:00 am
one less computation if you do this instead:)
return theValues[(len(theValues)-1)/2]
2. drgoettel | July 16, 2009 at 10:05 am
this doesn’t work for continuos variables…
does it?
3. utah_guy | July 16, 2009 at 2:11 pm
You mean the code in the post? Or the one in the first comment?
4. drgoettel | July 16, 2009 at 4:31 pm
both.
A simply way to calculate median in python is using numpy module, you can read documentation at http://docs.scipy.org/doc/numpy/user/
5. utah_guy | October 9, 2009 at 6:05 pm
It should work for both. The numpy module can be used, too. This is partially for instructional purposes but also for those who don’t want to install external libraries such as numpy.
6. utah_guy | October 9, 2009 at 6:07 pm
Actually, I should correct that statement. This is designed to work with integers and floats. It should also work with discrete variables with some minor tweaks.
7. Oliver Nina | February 26, 2010 at 3:44 pm
Why not using the mean() function in numpy
numpy.mean(numericValues)
8. Oliver Nina | February 26, 2010 at 3:45 pm
or median
numpy.median(numericValues)
9. utah_guy | February 27, 2010 at 6:29 am
Olivery that’s probably a great way to go. As long as you are willing to install that library. Part of the point of this post is to show how the logic behind how you would find the median, rather than to say it’s the solution people should necessarily be using.
10. Troy McConaghy | June 4, 2010 at 8:23 pm
You do the len(theValues) calculation four times on the same theValues list. You could save some time by doing it once and storing the result in a variable, then using the value in that variable:
def getMedian(numericValues):
theValues = sorted(numericValues)
count = len(theValues)
if count % 2 == 1:
return theValues[(count+1)/2-1]
else:
lower = theValues[count/2-1]
upper = theValues[count/2]
return (float(lower + upper)) / 2
11. Troy McConaghy | June 4, 2010 at 8:25 pm
Note: I had proper Python indenting when I entered the comment above but the commenting system removed it.
12. Noe | June 28, 2010 at 4:20 pm
Hola no entendi el codigo, alguien me puede ayudar y mandarlo de una manera mas clara, me urge.
Saludos
13. Noe | June 28, 2010 at 4:25 pm
Hello I did not understand the code, someone can help me and send it in a more clear, I urge.
Greeting
14. aperture11 | February 18, 2011 at 9:07 am
I can’t, it keep saying “list indices must be integers, not float”
15. aperture11 | February 18, 2011 at 9:09 am
Another way say %2 != 0
16. Shears | May 1, 2012 at 9:03 pm
Why to “programmers” always want to show each other up? The post is useful and does what it says on the box. Either appreciate it for keep walking.
17. neurotik | May 13, 2012 at 5:05 pm
@Shears: I don’t think it’s about showing other people up, but rather a discussion of better / alternate solutions. To that end here’s my version of it (only works with v2.5+):
def median(values):
“”" Returns the median value from a list of numbers “”"
s = sorted(values)
l = len(s)
return float(s[(l-1)/2]) if (l%2 == 1) else float((s[l/2]+s[(l/2)-1]))/2
18. ms4py | July 10, 2012 at 2:14 pm
@neurotik You can speed up the floor division with the real floor division or with bit shifting:
In [6]: %timeit (13 – 1) / 2
10000000 loops, best of 3: 53.3 ns per loop
In [7]: %timeit 13 // 2
10000000 loops, best of 3: 21.4 ns per loop
In [8]: %timeit 13 >> 1
10000000 loops, best of 3: 21.1 ns per loop
19. Chad | July 25, 2012 at 6:02 pm
Since integer division truncates, we can get rid of the if statements:
def median(values):
s=sorted(values)
l=int(len(s)) #this is probably redundant
return (float(s[(l-1)/2]+s[l/2)])/2
If the length is 6, this code returns the mean of s[2] and s[3], but if the the length is 5, it returns the mean of s[2] and s[2].