How to find the duplicate in an array using Big O of N?
Question
Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.
Information
The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.
int[] i = new int[1000];
There is one duplicate and it is the worst case duplicate. You can simulate a worst case duplicate as follows:
for (int i = 0; i < 999; i++)
{
array[i] = i+1;
}
array[999] = 999;
So the duplicates are the second to last and the last.
Answer
This is one possible answer that assumes the values are between 0 and MAX_VALUE.
/// <summary>
/// Efficient because it has a Big O of n.
/// </summary>
public int FindDuplcate(int[] inArray)
{
bool[] tmpArray = new bool[1000];
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
return i;
tmpArray[inArray[i]] = true;
}
return -1;
}
Incorrect answers
Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace FindDuplicateInArray
{
class Program
{
const int MAX_VALUE = 1000;
static void Main(string[] args)
{
// Worst case
int[] array = new int[MAX_VALUE];
for (int i = 0; i < MAX_VALUE - 1; i++)
{
array[i] = i + 1;
}
array[MAX_VALUE - 1] = MAX_VALUE - 1;
// Incorrect answer: Big O of N^2
//IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder();
//int dup = dupFinder1.FindDuplcate(array);
// Incorrect answer: Big O of N^2
//IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder();
//int dup = dupFinder2.FindDuplcate(array);
// Possible correct answer: Big O of N but with a limitation
//IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation();
//int dup = dupFinder3.FindDuplcate(array);
// Possbile correct answer: Big O of N
IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder();
int dup = dupFinder3.FindDuplcate(array);
Console.WriteLine(dup);
}
public interface IFindADuplicateInArrays
{
int FindDuplcate(int[] inArray);
}
/// <summary>
/// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2
/// </summary>
public class BruteForceDupFinder : IFindADuplicateInArrays
{
#region IFindADuplicateInArrays Members
public int FindDuplcate(int[] inArray)
{
for (int i = 0; i < inArray.Length; i++)
{
for (int j = 0; j < inArray.Length; j++)
{
if (i != j && inArray[i] == inArray[j])
return j;
}
}
return -1;
}
#endregion
}
/// <summary>
/// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N.
/// When there is N^2-N you can keep the N that grows fastest: Big O = N2
/// But the original alrogithm is less N^2 than just N^2.
/// </summary>
public class LessBruteForceDupFinder : IFindADuplicateInArrays
{
#region IFindADuplicateInArrays Members
public int FindDuplcate(int[] inArray)
{
for (int i = 0; i < inArray.Length; i++)
{
for (int j = i + 1; j < inArray.Length; j++)
{
if (inArray[i] == inArray[j])
return j;
}
}
return -1;
}
#endregion
}
/// <summary>
/// Possible Answer: Efficient because it has a Big O of n.
/// Limitation: If an item in the array is greater than the MAX_VALUE,
/// an ArrayIndexOutOfBoundsException occurs.
/// </summary>
public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays
{
#region IFindADuplicateInArrays Members
public int FindDuplcate(int[] inArray)
{
bool[] tmpArray = new bool[1000];
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
return i;
tmpArray[inArray[i]] = true;
}
return -1;
}
#endregion
}
/// <summary>
/// Possible Answer: Efficient because it has a Big O of n.
/// </summary>
public class EfficientDupFinder : IFindADuplicateInArrays
{
#region IFindADuplicateInArrays Members
public int FindDuplcate(int[] inArray)
{
Dictionary<int, bool> table = new Dictionary<int, bool>();
for (int i = 0; i < inArray.Length; i++)
{
if (table.Keys.Contains(inArray[i]))
return i;
table.Add(inArray[i], true);
}
return -1;
}
#endregion
}
}
}
Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.


tmpArray[inArray[i]] is wrong. Simply put 20000 in inArray[0] and you get an out of bounds exception.
As I mentioned below, you are right. It depends on if your input is limited between 0 and MAX_VALUE. If so, it works, if not, you will get the ArrayIndexOutOfBoundsException.
I tossed a fourth answer out there. Let me know what you think.
Seems to me as error approach, or at least incorrect condition.
What your program will do in such case:
array[50] = 387468;
bool[] tmpArray = new bool[1000];
for (int i = 0; i < inArray.Length; i++)
{
if (tmpArray[inArray[i]] == true)
...
Isn't this is ArrayIndexOutOfBoundsException Error? 🙂
Seems for me, you taken as given condition, that if your input data array have length of 1000, then all of values more than or equals 0 and less than 1000.
That's only subset of real-world cases 🙂
To make usage of tmpArray you need make additional procedure with inArray[i], like hashing or something else. But even with that case, you should have not array of boolean, but instead array of indexes, to make exact comparison.
I think most effective duplicate finding way is modified quick-sort algo, which quits at finding equals values.
Yes, you guys are correct. But that was how it was in the question. I didn't state that. I will update to state that.
However, would a hashtable work better for a "real world" solution than sorting?
I added a Dictionary option (as I was going to use Hashtable but Dictionary is the same without boxing. As Dictionary is O(1), it should still be O(n) overall.