How to find the duplicate in an array using Big O of N?

Question

Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.

Information

The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.

int[] i = new int;

There is one duplicate and it is the worst case duplicate.  You can simulate a worst case duplicate as follows:

for (int i = 0; i < 999; i++)
{
array[i] = i+1;
}
array = 999;

So the duplicates are the second to last and the last.

This is one possible answer that assumes the values are between 0 and MAX_VALUE.

/// <summary>
/// Efficient because it has a Big O of n.
/// </summary>
public int FindDuplcate(int[] inArray)
{
bool[] tmpArray = new bool;
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
return i;
tmpArray[inArray[i]] = true;
}
return -1;
}

Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace FindDuplicateInArray
{
class Program
{
const int MAX_VALUE = 1000;

static void Main(string[] args)
{
// Worst case
int[] array = new int[MAX_VALUE];
for (int i = 0; i < MAX_VALUE - 1; i++)
{
array[i] = i + 1;
}
array[MAX_VALUE - 1] = MAX_VALUE - 1;

// Incorrect answer: Big O of N^2
//IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder();
//int dup = dupFinder1.FindDuplcate(array);

// Incorrect answer: Big O of N^2
//IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder();
//int dup = dupFinder2.FindDuplcate(array);

// Possible correct answer: Big O of N but with a limitation
//IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation();
//int dup = dupFinder3.FindDuplcate(array);

// Possbile correct answer: Big O of N
IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder();
int dup = dupFinder3.FindDuplcate(array);

Console.WriteLine(dup);
}

{
int FindDuplcate(int[] inArray);
}

/// <summary>
/// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2
/// </summary>
public class BruteForceDupFinder : IFindADuplicateInArrays
{

public int FindDuplcate(int[] inArray)
{
for (int i = 0; i < inArray.Length; i++)
{
for (int j = 0; j < inArray.Length; j++)
{
if (i != j && inArray[i] == inArray[j])
return j;
}
}
return -1;
}

#endregion
}

/// <summary>
/// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N.
/// When there is N^2-N you can keep the N that grows fastest: Big O = N2
/// But the original alrogithm is less N^2 than just N^2.
/// </summary>
public class LessBruteForceDupFinder : IFindADuplicateInArrays
{

public int FindDuplcate(int[] inArray)
{
for (int i = 0; i < inArray.Length; i++)
{
for (int j = i + 1; j < inArray.Length; j++)
{
if (inArray[i] == inArray[j])
return j;
}
}
return -1;
}

#endregion
}

/// <summary>
/// Possible Answer: Efficient because it has a Big O of n.
/// Limitation: If an item in the array is greater than the MAX_VALUE,
///             an ArrayIndexOutOfBoundsException occurs.
/// </summary>
public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays
{

public int FindDuplcate(int[] inArray)
{
bool[] tmpArray = new bool;
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
return i;
tmpArray[inArray[i]] = true;
}
return -1;
}

#endregion
}

/// <summary>
/// Possible Answer: Efficient because it has a Big O of n.
/// </summary>
public class EfficientDupFinder : IFindADuplicateInArrays
{
public int FindDuplcate(int[] inArray)
{
Dictionary<int, bool> table = new Dictionary<int, bool>();
for (int i = 0; i < inArray.Length; i++)
{
if (table.Keys.Contains(inArray[i]))
return i;
}
return -1;
}
#endregion
}
}
}

Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.

1. Koutheir Attouchi says:

tmpArray[inArray[i]] is wrong. Simply put 20000 in inArray and you get an out of bounds exception.

• Rhyous says:

As I mentioned below, you are right. It depends on if your input is limited between 0 and MAX_VALUE. If so, it works, if not, you will get the ArrayIndexOutOfBoundsException.

I tossed a fourth answer out there. Let me know what you think.

2. Alexander Yerenkow says:

Seems to me as error approach, or at least incorrect condition.

What your program will do in such case:
array = 387468;
bool[] tmpArray = new bool;
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
...

Isn't this is ArrayIndexOutOfBoundsException Error? 🙂

Seems for me, you taken as given condition, that if your input data array have length of 1000, then all of values more than or equals 0 and less than 1000.

That's only subset of real-world cases 🙂

To make usage of tmpArray you need make additional procedure with inArray[i], like hashing or something else. But even with that case, you should have not array of boolean, but instead array of indexes, to make exact comparison.

I think most effective duplicate finding way is modified quick-sort algo, which quits at finding equals values.

• Rhyous says:

Yes, you guys are correct. But that was how it was in the question. I didn't state that. I will update to state that.

However, would a hashtable work better for a "real world" solution than sorting?

• Rhyous says:

I added a Dictionary option (as I was going to use Hashtable but Dictionary is the same without boxing. As Dictionary is O(1), it should still be O(n) overall.