How to find the duplicate in an array using Big O of N?

Question

Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.

Information

The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.

    int[] i = new int[1000];

There is one duplicate and it is the worst case duplicate.  You can simulate a worst case duplicate as follows:

    for (int i = 0; i < 999; i++)
    {
        array[i] = i+1;
    }
    array[999] = 999;

So the duplicates are the second to last and the last.

Answer

This is one possible answer that assumes the values are between 0 and MAX_VALUE.

    /// <summary>
    /// Efficient because it has a Big O of n.
    /// </summary>
    public int FindDuplcate(int[] inArray)
    {
        bool[] tmpArray = new bool[1000];
        for (int i = 0; i < inArray.Length; i++)
        {
            if (tmpArray[inArray[i]] == true)
                return i;
            tmpArray[inArray[i]] = true;
        }
        return -1;
    }

Incorrect answers

Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace FindDuplicateInArray
{
    class Program
    {
        const int MAX_VALUE = 1000;        

        static void Main(string[] args)
        {
            // Worst case
            int[] array = new int[MAX_VALUE];
            for (int i = 0; i < MAX_VALUE - 1; i++)
            {
                array[i] = i + 1;
            }
            array[MAX_VALUE - 1] = MAX_VALUE - 1;

            // Incorrect answer: Big O of N^2
            //IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder();
            //int dup = dupFinder1.FindDuplcate(array);

            // Incorrect answer: Big O of N^2
            //IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder();
            //int dup = dupFinder2.FindDuplcate(array);

            // Possible correct answer: Big O of N but with a limitation
            //IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation();
            //int dup = dupFinder3.FindDuplcate(array);

            // Possbile correct answer: Big O of N
            IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder();
            int dup = dupFinder3.FindDuplcate(array);

            Console.WriteLine(dup);
        }

        public interface IFindADuplicateInArrays
        {
            int FindDuplcate(int[] inArray);
        }

        /// <summary>
        /// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2
        /// </summary>
        public class BruteForceDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                for (int i = 0; i < inArray.Length; i++)
                {
                    for (int j = 0; j < inArray.Length; j++)
                    {
                        if (i != j && inArray[i] == inArray[j])
                            return j;
                    }
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N. 
        /// When there is N^2-N you can keep the N that grows fastest: Big O = N2
        /// But the original alrogithm is less N^2 than just N^2.
        /// </summary>
        public class LessBruteForceDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                for (int i = 0; i < inArray.Length; i++)
                {
                    for (int j = i + 1; j < inArray.Length; j++)
                    {
                        if (inArray[i] == inArray[j])
                            return j;
                    }
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Possible Answer: Efficient because it has a Big O of n.
        /// Limitation: If an item in the array is greater than the MAX_VALUE, 
        ///             an ArrayIndexOutOfBoundsException occurs.
        /// </summary>
        public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                bool[] tmpArray = new bool[1000];
                for (int i = 0; i < inArray.Length; i++)
                {
                    if (tmpArray[inArray[i]] == true)
                        return i;
                    tmpArray[inArray[i]] = true;
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Possible Answer: Efficient because it has a Big O of n.
        /// </summary>
        public class EfficientDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members
            public int FindDuplcate(int[] inArray)
            {
                Dictionary<int, bool> table = new Dictionary<int, bool>();
                for (int i = 0; i < inArray.Length; i++)
                {
                    if (table.Keys.Contains(inArray[i]))
                        return i;
                    table.Add(inArray[i], true);
                }
                return -1;
            }
            #endregion
        }
    }
}

Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.

5 Comments

  1. Koutheir Attouchi says:

    tmpArray[inArray[i]] is wrong. Simply put 20000 in inArray[0] and you get an out of bounds exception.

    • Rhyous says:

      As I mentioned below, you are right. It depends on if your input is limited between 0 and MAX_VALUE. If so, it works, if not, you will get the ArrayIndexOutOfBoundsException.

      I tossed a fourth answer out there. Let me know what you think.

  2. Seems to me as error approach, or at least incorrect condition.

    What your program will do in such case:
    array[50] = 387468;
    bool[] tmpArray = new bool[1000];
    for (int i = 0; i < inArray.Length; i++)
    {
    if (tmpArray[inArray[i]] == true)
    ...

    Isn't this is ArrayIndexOutOfBoundsException Error? 🙂

    Seems for me, you taken as given condition, that if your input data array have length of 1000, then all of values more than or equals 0 and less than 1000.

    That's only subset of real-world cases 🙂

    To make usage of tmpArray you need make additional procedure with inArray[i], like hashing or something else. But even with that case, you should have not array of boolean, but instead array of indexes, to make exact comparison.

    I think most effective duplicate finding way is modified quick-sort algo, which quits at finding equals values.

Leave a Reply

How to post code in comments?

*