A comprehensive benchmark suite comparing different median calculation algorithms in Python.
- Pure Python (sorting) - Built-in
sorted()function - NumPy -
np.median() - Pandas -
pd.Series.median() - Quickselect - Average O(n) selection algorithm
- Two Heaps - Maintains balanced min/max heaps
- heapq.nlargest - File-based heap approach
- Median of Medians - Guaranteed O(n) but slower in practice
- Python 3.11+
- pip
- Docker installed and running
Build the Docker image:
docker build -t median-benchmark .Run the benchmark:
docker run median-benchmarkThe container will:
- Generate a test file with 10 million random numbers
- Run all benchmark tests
- Display results and timing summary
Install dependencies:
pip install -r requirements.txtRun the benchmark:
python median_benchmark.pyYou can modify the test size by editing median_benchmark.py:
# Change the number of test values (default: 10,000,000)
generate_test_file(filename, n=10000000)The benchmark will display:
- Test name and result for each algorithm
- Execution time for each algorithm
- Summary table sorted by performance
Example output:
Testing Pure Python (sorting)...
Result: 5000123.45
Time: 2.1234 seconds
...
SUMMARY
NumPy : 0.5234 seconds
Quickselect : 1.2345 seconds
Pure Python : 2.1234 seconds
...
The timing functionality is implemented using a decorator pattern:
@benchmark('Test Name')
def my_function(data):
# function implementation
passThe decorator automatically:
- Prints the test name
- Times execution
- Displays the result and elapsed time
- Stores timing in
function.last_elapsed
.
├── Dockerfile # Docker configuration
├── .dockerignore # Files excluded from Docker build
├── requirements.txt # Python dependencies
├── median_benchmark.py # Main benchmark script
└── README.md # This file
- NumPy typically offers the best performance for large datasets
- Quickselect provides good average-case performance
- Median of Medians guarantees O(n) but has high constants
- Two Heaps is optimized for streaming data scenarios
- Test file is generated fresh each run and stored as
numbers.txt
MIT