# How to find the duplicate in an array using Big O of N?

## Question

Write an algorithm that finds the duplicate in an array using Big O of N, not Big O of N^2.

## Information

The array is an integer array and has a MAX_VALUE items (lets just use 1000 as an example). All integers are between 0 and MAX_VALUE (pretend the input is limited to values between 0 and MAX_VALUE). There is only one duplicate and you are not sure where it is. What is the most efficient way to find it.

int[] i = new int[1000];

There is one duplicate and it is the worst case duplicate. You can simulate a worst case duplicate as follows:

for (int i = 0; i < 999; i++) { array[i] = i+1; } array[999] = 999;

So the duplicates are the second to last and the last.

## Answer

This is one possible answer that assumes the values are between 0 and MAX_VALUE.

/// <summary> /// Efficient because it has a Big O of n. /// </summary> public int FindDuplcate(int[] inArray) { bool[] tmpArray = new bool[1000]; for (int i = 0; i < inArray.Length; i++) { if (tmpArray[inArray[i]] == true) return i; tmpArray[inArray[i]] = true; } return -1; }

## Incorrect answers

Here is a little test app with two incorrect answers and two possible correct answer. Feel free to comment with your answers.

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Collections; namespace FindDuplicateInArray { class Program { const int MAX_VALUE = 1000; static void Main(string[] args) { // Worst case int[] array = new int[MAX_VALUE]; for (int i = 0; i < MAX_VALUE - 1; i++) { array[i] = i + 1; } array[MAX_VALUE - 1] = MAX_VALUE - 1; // Incorrect answer: Big O of N^2 //IFindADuplicateInArrays dupFinder1 = new BruteForceDupFinder(); //int dup = dupFinder1.FindDuplcate(array); // Incorrect answer: Big O of N^2 //IFindADuplicateInArrays dupFinder2 = new LessBruteForceDupFinder(); //int dup = dupFinder2.FindDuplcate(array); // Possible correct answer: Big O of N but with a limitation //IFindADuplicateInArrays dupFinder3 = new EfficientDupFinderWithLimitation(); //int dup = dupFinder3.FindDuplcate(array); // Possbile correct answer: Big O of N IFindADuplicateInArrays dupFinder3 = new EfficientDupFinder(); int dup = dupFinder3.FindDuplcate(array); Console.WriteLine(dup); } public interface IFindADuplicateInArrays { int FindDuplcate(int[] inArray); } /// <summary> /// Incorrect Answer: This is simple N*N alogithm. Big 0 = N^2 /// </summary> public class BruteForceDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { for (int i = 0; i < inArray.Length; i++) { for (int j = 0; j < inArray.Length; j++) { if (i != j && inArray[i] == inArray[j]) return j; } } return -1; } #endregion } /// <summary> /// Incorrect Answer: This is N(N-1)/2 which is N*N/2-n/2 so remove the constants N^2-N. /// When there is N^2-N you can keep the N that grows fastest: Big O = N2 /// But the original alrogithm is less N^2 than just N^2. /// </summary> public class LessBruteForceDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { for (int i = 0; i < inArray.Length; i++) { for (int j = i + 1; j < inArray.Length; j++) { if (inArray[i] == inArray[j]) return j; } } return -1; } #endregion } /// <summary> /// Possible Answer: Efficient because it has a Big O of n. /// Limitation: If an item in the array is greater than the MAX_VALUE, /// an ArrayIndexOutOfBoundsException occurs. /// </summary> public class EfficientDupFinderWithLimitation : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { bool[] tmpArray = new bool[1000]; for (int i = 0; i < inArray.Length; i++) { if (tmpArray[inArray[i]] == true) return i; tmpArray[inArray[i]] = true; } return -1; } #endregion } /// <summary> /// Possible Answer: Efficient because it has a Big O of n. /// </summary> public class EfficientDupFinder : IFindADuplicateInArrays { #region IFindADuplicateInArrays Members public int FindDuplcate(int[] inArray) { Dictionary<int, bool> table = new Dictionary<int, bool>(); for (int i = 0; i < inArray.Length; i++) { if (table.Keys.Contains(inArray[i])) return i; table.Add(inArray[i], true); } return -1; } #endregion } } }

Run this through Visual Studio’s Performance Analysis tool once for each answer (comment and uncomment). To really see the difference, change MAX_VALUE to 1 million.

tmpArray[inArray[i]] is wrong. Simply put 20000 in inArray[0] and you get an out of bounds exception.

As I mentioned below, you are right. It depends on if your input is limited between 0 and MAX_VALUE. If so, it works, if not, you will get the ArrayIndexOutOfBoundsException.

I tossed a fourth answer out there. Let me know what you think.

Seems to me as error approach, or at least incorrect condition.

What your program will do in such case:

array[50] = 387468;

bool[] tmpArray = new bool[1000];

for (int i = 0; i < inArray.Length; i++)

{

if (tmpArray[inArray[i]] == true)

...

Isn't this is ArrayIndexOutOfBoundsException Error? ðŸ™‚

Seems for me, you taken as given condition, that if your input data array have length of 1000, then all of values more than or equals 0 and less than 1000.

That's only subset of real-world cases ðŸ™‚

To make usage of tmpArray you need make additional procedure with inArray[i], like hashing or something else. But even with that case, you should have not array of boolean, but instead array of indexes, to make exact comparison.

I think most effective duplicate finding way is modified quick-sort algo, which quits at finding equals values.

Yes, you guys are correct. But that was how it was in the question. I didn't state that. I will update to state that.

However, would a hashtable work better for a "real world" solution than sorting?

I added a Dictionary option (as I was going to use Hashtable but Dictionary is the same without boxing. As Dictionary is O(1), it should still be O(n) overall.