C# – Email Regular Expression

I wrote a regex for email that is gets the best results of any I have found online. Along with getting better results, it is shorter too.

Download the C# project with unit tests here: EmailRegEx on GitHub
The pattern of an email is described as follows:

  1. It will always have a single @ sign
  2. 1 to 64 characters before the @ sign called the local-part. Can contain characters a–z, A–Z, 0-9, ! # $ % & ‘ * + – / = ? ^ _ ` { | } ~, and . if it is not at the first or end of the local-part.
  3. Some characters after the @ sign that have a pattern as follows called the domain.
    1. It will always have a period “.”.
    2. One or more character before the period.
    3. Two to four characters after the period.

So a simple patterns of an email address should be something like these:

  1. This one just makes sure there are characters before and after the @
    .+@.+
  2. This one makes sure the are characters before and after the @ as well as a character before and after the . in the domain.
    .+@.*+\..+
  3. This one makes sure that there is only one @ symbol.
    [^@]+@[^@]+\.

These are all quick an easy examples and will not work in every instance but are usually accurate enough for casual programs.

But a comprehensive example is much more complex.

  1. I wrote one myself that is the shortest and gets the best results of any I have found:
    ^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))\z
    
  2. Here is another complex one I found: [reference]
    ^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$
    

So let me explain the first one that I wrote as it passes my unit tests below:

The start
[\w!#$%&’*+\-/=?\^_`{|}~]+ At least one valid local-part character not including a period.
(\.[\w!#$%&’*+\-/=?\^_`{|}~]+)* Any number (including zero) of a group that starts with a single period and has at least one valid local-part character after the period.
@ The @ character
( Start group 1
( Start group 2
([\-\w]+\.)+ At least one group of at least one valid word character or hyphen followed by a period. The attached project has a more complex hostname regex option too.
[\w]{2,4} Any two to four valid top level domain characters.
) End group 2
| an OR statement
( Start group 3
([0-9]{1,3}\.){3}[0-9]{1,3} A regular expression for an IP Address. The attached project has a more complex IP regex example too.
) End group 3
) End group 1
\z No end of line: \r or \n.

Code for the Email Regular Expression

Here is code for both examples. My email regular expression is enabled and the one I found on line is commented out. To see how they work differently, just comment out mine, and uncomment the one I found online.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace RegularExpressionsTest
{
    class Program
    {
        static void Main(string[] args)
        {
            String theEmailPattern = @"^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*"
                                   + "@"
                                   + @"((([\-\w]+\.)+[a-zA-Z]{2,4})|(([0-9]{1,3}\.){3}[0-9]{1,3}))\z";

            // The string pattern from here doesn't not work in all instances.
            // http://www.cambiaresearch.com/c4/bf974b23-484b-41c3-b331-0bd8121d5177/Parsing-Email-Addresses-with-Regular-Expressions.aspx
            //String theEmailPattern = @"^(([^<>()[\]\\.,;:\s@\""]+(\.[^<>()[\]\\.,;:\s@\""]+)*)|(\"".+\""))"
            //                       + "@"
            //                       + @"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])"
            //                       + "|"
            //                       + @"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";

            Console.WriteLine("Bad emails");
            foreach (String email in GetBadEmails())
            {
                Log(Regex.IsMatch(email, theEmailPattern));
            }

            Console.WriteLine("Good emails");
            foreach (String email in GetGoodEmails())
            {
                Log(Regex.IsMatch(email, theEmailPattern));
            }
        }

        private static void Log(bool inValue)
        {
            if (inValue)
            {
                Console.WriteLine("It matches the pattern");
            }
            else
            {
                Console.WriteLine("It doesn't match the pattern");
            }
        }

        private static List<String> GetBadEmails()
        {
            List<String> emails = new List<String>();
            emails.Add("joe"); // should fail
            emails.Add("joe@home"); // should fail
            emails.Add("a@b.c"); // should fail because .c is only one character but must be 2-4 characters
            emails.Add("joe-bob[at]home.com"); // should fail because [at] is not valid
            emails.Add("joe@his.home.place"); // should fail because place is 5 characters but must be 2-4 characters
            emails.Add("joe.@bob.com"); // should fail because there is a dot at the end of the local-part
            emails.Add(".joe@bob.com"); // should fail because there is a dot at the beginning of the local-part
            emails.Add("john..doe@bob.com"); // should fail because there are two dots in the local-part
            emails.Add("john.doe@bob..com"); // should fail because there are two dots in the domain
            emails.Add("joe<>bob@bob.com"); // should fail because <> are not valid
            emails.Add("joe@his.home.com."); // should fail because it can't end with a period
            emails.Add("john.doe@bob-.com"); // should fail because there is a dash at the start of a domain part
            emails.Add("john.doe@-bob.com"); // should fail because there is a dash at the end of a domain part
            emails.Add("a@10.1.100.1a");  // Should fail because of the extra character
            emails.Add("joe<>bob@bob.com\n"); // should fail because it end with \n
            emails.Add("joe<>bob@bob.com\r"); // should fail because it ends with \r
            return emails;
        }

        private static List<String> GetGoodEmails()
        {
            List<String> emails = new List<String>();
            emails.Add("joe@home.org");
            emails.Add("joe@joebob.name");
            emails.Add("joe&bob@bob.com");
            emails.Add("~joe@bob.com");
            emails.Add("joe$@bob.com");
            emails.Add("joe+bob@bob.com");
            emails.Add("o'reilly@there.com");
            emails.Add("joe@home.com");
            emails.Add("joe.bob@home.com");
            emails.Add("joe@his.home.com");
            emails.Add("a@abc.org");
            emails.Add("a@abc-xyz.org");
            emails.Add("a@192.168.0.1");
            emails.Add("a@10.1.100.1");
            return emails;
        }
    }
}

Well, now you have the best C# Email Regular Expression out there.

Update: My attached project has an even better and more accurate one now too.

(Reference: wikipedia)

14 Comments

  1. Spot on with this write-up, I seriously feel this site
    needs much more attention. I'll probably be back again to
    see more, thanks for the info!

  2. […] 자세한 내용은 여기를 참조하십시오 : C # – Email Regular Expression […]

  3. Alexandre says:

    joe@home is a valid email, not invalid, could you fix it?

    See this question on Stack Overflow: https://stackoverflow.com/questions/21810464/why-is-this-angular-email-validation-valid

    • Rhyous says:

      What is your use case? While, yes, joe@home is RFC valid, it is a special case that we don't want to allow to pass. Having this fail is desired.

      While joe@home may be valid in highly localized environment, it is invalid 99.99999% of time on the internet.
      On the internet, an email without a tld is a bad email. For internet and remote environments, which requires URL registration and dns which means you must have a format like: {domain}.{tld}.

      You want me to support the .00001% instead of supporting the %99.99999.

      However, you are more than welcome to update the regex yourself. You can change this part.

      ([\-\w]+\.)+[a-zA-Z]{2,4})
      

      To something like this: (Untested)

      ([\-\w]((\.[\-\w])+\.[a-zA-Z]{2,4})*)
      
  4. Cristi says:

    public static string ComplexEmailPattern4 = "..." ; I would add "readonly" there

  5. shah says:

    Hi

    it doesnot work fine when i enter xxx@xxx.xxxxxxxxxxxxxxx
    when domain name exceeds 4 characters, this doesnot work.
    Can someone fix this?

    • Rhyous says:

      It used to be that all TLDs were 1-3 characters. However, that has recently changed.

      Here is a list of valid TLDs
      http://data.iana.org/TLD/tlds-alpha-by-domain.txt

      Using regex to validate against this list would be a bad idea. Here is a new regex that is the most complete and passes all RFC rules.

      ^[\w!#$%&'*+\-/=?\^_`{|}~]+(\.[\w!#$%&'*+\-/=?\^_`{|}~]+)*@((([\w]+([-\w]*[\w]+)*\.)+[a-zA-Z]+)|((([01]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]).){3}[01]?[0-9]{1,2}|2[0-4][0-9]|25[0-5]))\z
      

      Then after that, I would have a following up check where your code checks that the email ends with a TLD contained in this list: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

  6. greatonkar says:

    tested...i works fine but only one issue
    regular expression not working for match test@-test.com or test@test-.com

    • Rhyous says:

      Are you sure? Neither test-.com or -test.com are valid. I have specific unit tests for both of those being invalid and the unit tests are passing.

      From RFC952
      The last character must not be a minus sign or period.
      The first character must be an alpha character.

      From RFC 1123
      One aspect of host name syntax is hereby changed: the restriction on the first character is relaxed to allow either a letter or a digit.

  7. MS says:

    If a newline character is at the end of the address it passes as valid even against the most complex regex when it should not I guess. Such as "test@test.com\n"

    • Rhyous says:

      Interesting...

      \r works
      \r\n works
      \n doesn't

      I am reading about it here:
      http://stackoverflow.com/questions/988951/net-regex-and-newline

      Looks like if I replace the very last $ with \z (lowercase z) it works.

  8. Thank you, the shortest and still accurate one I have come across so far.

  9. [...] See updates here: C# – Email Regular Expression [...]

  10. Rhyous says:

    I added a project with Unit tests and fixed a bug or two.

    In the project, in the constructor of the EmailValidator object, just set the Pattern to the desired static email regular expression.

Leave a Reply

How to post code in comments?