Median of A Very Big Array

Image placeholder 9899

(This question has been seen in the interviews of the following companies: Airbnb)

Find the median of a very big array of integer. Only iterator of array is given

SOLUTION: Find the median of a very big array of integer. Only iterator is available
There are 3 common ways to find median of an array.
1. Quick/Merge Sort
2. Quick Select
3. Counting Sort
1 and 2 demands WRITE access to the arrays if the array is too big to fit in memory. However only the iterator (READ) is given.
Since the numbers are integer, the range of possible numbers is [-min_int, max_int]. Create a counter array or map to store frequency of numbers in array. Then find median from counter array in O(N).

public static double getMedian(Iterator iterator) {
        int size = 0;
        Map counter = new HashMap<>();
        while(iterator.hasNext()) {
            size++;
            int number = iterator.next();
            counter.put(number, counter.getOrDefault(number, 0) + 1);
        }
        int mid_idx1 = (size - 1) / 2, mid_idx2 = size / 2;
        int mid1 = Integer.MAX_VALUE, mid2 = Integer.MAX_VALUE;
        for(int value = Integer.MIN_VALUE; value < Integer.MAX_VALUE; value++) {
            if(counter.containsKey(value)) {
                size -= counter.get(value);
                if(size <= mid_idx1 && mid2 < Integer.MAX_VALUE) { //array has even size
                    return (mid2 + value) / 2.0;
                } else if(size <= mid_idx1) {       //array has odd size
                    return value;
                } else if(size == mid_idx2) {       //found the first middle number for array with even size
                    mid2 = value;
                }
            }
        }
        return (mid1 + mid2) / 2;
    }

More Optimal Solution:
Still this method requires storing the full range of integers, which has high demand for memory ~10G.
Another approach is distributed quick select - divide arrays into trunks that fits in memory. Find median of each trunk by in-memory sorting or quick select.
Next round ignore all numbers less than minimum median or bigger than maximum median. Divide remaining elements into trunks that fits in memory.
This way every round we'd be able to exclude one trunk of numbers and narrow down the search range, until eventually the range is small enough to fit-in memory.




Get one-to-one training from Google Facebook engineers

Top-notch Professionals

Learn from Facebook and Google senior engineers interviewed 100+ candidates.
Most recent interview questions and system design topics gathered from aonecode alumnus.
One-to-one online classes. Get feedbacks from real interviewers.

Customized Private Class

Already a coding expert? - Advance straight to hard interview topics of your interest.
New to the ground? - Develop basic coding skills with your own designated mentor.
Days before interview? - Focus on most important problems in target company question bank.