How to find the duplicate in an array using Big O of N?

December 9, 2011, 11:33 am by Rhyous

Question

Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.

Information

The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.

    int[] i = new int[1000];

There is one duplicate and it is the worst case duplicate. You can simulate a worst case duplicate as follows:

    for (int i = 0; i < 999; i++)
    {
        array[i] = i+1;
    }
    array[999] = 999;

So the duplicates are the second to last and the last.

Answer

This is one possible answer that assumes the values are between 0 and MAX_VALUE.

    /// <summary>
    /// Efficient because it has a Big O of n.
    /// </summary>
    public int FindDuplcate(int[] inArray)
    {
        bool[] tmpArray = new bool[1000];
        for (int i = 0; i < inArray.Length; i++)
        {
            if (tmpArray[inArray[i]] == true)
                return i;
            tmpArray[inArray[i]] = true;
        }
        return -1;
    }

Incorrect answers

Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace FindDuplicateInArray
{
    class Program
    {
        const int MAX_VALUE = 1000;        

        static void Main(string[] args)
        {
            // Worst case
            int[] array = new int[MAX_VALUE];
            for (int i = 0; i < MAX_VALUE - 1; i++)
            {
                array[i] = i + 1;
            }
            array[MAX_VALUE - 1] = MAX_VALUE - 1;

            // Incorrect answer: Big O of N^2
            //IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder();
            //int dup = dupFinder1.FindDuplcate(array);

            // Incorrect answer: Big O of N^2
            //IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder();
            //int dup = dupFinder2.FindDuplcate(array);

            // Possible correct answer: Big O of N but with a limitation
            //IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation();
            //int dup = dupFinder3.FindDuplcate(array);

            // Possbile correct answer: Big O of N
            IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder();
            int dup = dupFinder3.FindDuplcate(array);

            Console.WriteLine(dup);
        }

        public interface IFindADuplicateInArrays
        {
            int FindDuplcate(int[] inArray);
        }

        /// <summary>
        /// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2
        /// </summary>
        public class BruteForceDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                for (int i = 0; i < inArray.Length; i++)
                {
                    for (int j = 0; j < inArray.Length; j++)
                    {
                        if (i != j && inArray[i] == inArray[j])
                            return j;
                    }
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N. 
        /// When there is N^2-N you can keep the N that grows fastest: Big O = N2
        /// But the original alrogithm is less N^2 than just N^2.
        /// </summary>
        public class LessBruteForceDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                for (int i = 0; i < inArray.Length; i++)
                {
                    for (int j = i + 1; j < inArray.Length; j++)
                    {
                        if (inArray[i] == inArray[j])
                            return j;
                    }
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Possible Answer: Efficient because it has a Big O of n.
        /// Limitation: If an item in the array is greater than the MAX_VALUE, 
        ///             an ArrayIndexOutOfBoundsException occurs.
        /// </summary>
        public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members

            public int FindDuplcate(int[] inArray)
            {
                bool[] tmpArray = new bool[1000];
                for (int i = 0; i < inArray.Length; i++)
                {
                    if (tmpArray[inArray[i]] == true)
                        return i;
                    tmpArray[inArray[i]] = true;
                }
                return -1;
            }

            #endregion
        }

        /// <summary>
        /// Possible Answer: Efficient because it has a Big O of n.
        /// </summary>
        public class EfficientDupFinder : IFindADuplicateInArrays
        {
            #region IFindADuplicateInArrays Members
            public int FindDuplcate(int[] inArray)
            {
                Dictionary<int, bool> table = new Dictionary<int, bool>();
                for (int i = 0; i < inArray.Length; i++)
                {
                    if (table.Keys.Contains(inArray[i]))
                        return i;
                    table.Add(inArray[i], true);
                }
                return -1;
            }
            #endregion
        }
    }
}

Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.

Tags: Interview Question
Category: csharp | Comment (RSS) | Trackback

5 Comments

Koutheir Attouchi says:

December 10, 2011 at 10:43 am

tmpArray[inArray[i]] is wrong. Simply put 20000 in inArray[0] and you get an out of bounds exception.

Reply to this comment
- Rhyous says:
  
  December 10, 2011 at 1:51 pm
  
  As I mentioned below, you are right. It depends on if your input is limited between 0 and MAX_VALUE. If so, it works, if not, you will get the ArrayIndexOutOfBoundsException.
  
  I tossed a fourth answer out there. Let me know what you think.
  
  Reply to this comment
Alexander Yerenkow says:

December 10, 2011 at 6:34 am

Seems to me as error approach, or at least incorrect condition.

What your program will do in such case:
array[50] = 387468;
bool[] tmpArray = new bool[1000];
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
...

Isn't this is ArrayIndexOutOfBoundsException Error? 🙂

Seems for me, you taken as given condition, that if your input data array have length of 1000, then all of values more than or equals 0 and less than 1000.

That's only subset of real-world cases 🙂

To make usage of tmpArray you need make additional procedure with inArray[i], like hashing or something else. But even with that case, you should have not array of boolean, but instead array of indexes, to make exact comparison.

I think most effective duplicate finding way is modified quick-sort algo, which quits at finding equals values.

Reply to this comment
- Rhyous says:
  
  December 10, 2011 at 1:10 pm
  
  Yes, you guys are correct. But that was how it was in the question. I didn't state that. I will update to state that.
  
  However, would a hashtable work better for a "real world" solution than sorting?
  
  Reply to this comment
- Rhyous says:
  
  December 10, 2011 at 1:49 pm
  
  I added a Dictionary option (as I was going to use Hashtable but Dictionary is the same without boxing. As Dictionary is O(1), it should still be O(n) overall.
  
  Reply to this comment

Rhyous

Knight of the Code

How to find the duplicate in an array using Big O of N?

Question

Information

Answer

Incorrect answers

Like this:

Related

5 Comments

Leave a Reply to Alexander Yerenkow

Are you a Jeek?

Categories

Recent Posts

My other blogs