Archive for October 2014

Splitting sentences in C# using Stanford.NLP

So I need to break some sentences up. I have a pretty cool regex that does this, however, I want to try out Stanford.NLP for this. Let’s check it out.

  1. Create a Visual Studio C# project.
    I chose a New Console Project and named it SentenceSplitter.
  2. Right-click on the project and choose “Manage NuGet Packages.
  3. Add the Stanford.NLP.CoreNLP nuget package.
  4. Add the following code to Program.cs (This is a variation of the code provide here: http://sergey-tihon.github.io/Stanford.NLP.NET/StanfordCoreNLP.html
    using edu.stanford.nlp.ling;
    using edu.stanford.nlp.pipeline;
    using java.util;
    using System;
    using System.IO;
    using Console = System.Console;
    
    namespace SentenceSplitter
    {
        class Program
        {
            static void Main(string[] args)
            {
                // Path to the folder with models extracted from `stanford-corenlp-3.4-models.jar`
                var jarRoot = @"stanford-corenlp-3.4-models\";
    
                const string text = "I went or a run. Then I went to work. I had a good lunch meeting with a friend name John Jr. The commute home was pretty good.";
    
                // Annotation pipeline configuration
                var props = new Properties();
                props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
                props.setProperty("sutime.binders", "0");
    
                // We should change current directory, so StanfordCoreNLP could find all the model files automatically 
                var curDir = Environment.CurrentDirectory;
                Directory.SetCurrentDirectory(jarRoot);
                var pipeline = new StanfordCoreNLP(props);
                Directory.SetCurrentDirectory(curDir);
    
                // Annotation
                var annotation = new Annotation(text);
                pipeline.annotate(annotation);
    
                // these are all the sentences in this document
                // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
                var sentences = annotation.get(typeof(CoreAnnotations.SentencesAnnotation));
                if (sentences == null)
                {
                    return;
                }
                foreach (Annotation sentence in sentences as ArrayList)
                {
                    Console.WriteLine(sentence);
                }
            }
        }
    }
    

    Warning! If you try to run here, you will get the following exception: Unrecoverable error while loading a tagger model

    java.lang.RuntimeException was unhandled
      HResult=-2146233088
      Message=edu.stanford.nlp.io.RuntimeIOException: Unrecoverable error while loading a tagger model
      Source=stanford-corenlp-3.4
      StackTrace:
           at edu.stanford.nlp.pipeline.StanfordCoreNLP.4.create()
           at edu.stanford.nlp.pipeline.AnnotatorPool.get(String name)
           at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(Properties A_1, Boolean A_2)
           at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props, Boolean enforceRequirements)
           at edu.stanford.nlp.pipeline.StanfordCoreNLP..ctor(Properties props)
           at SentenceSplitter.Program.Main(String[] args) in c:\Users\jbarneck\Documents\Projects\NLP\SentenceSplitter\SentenceSplitter\Program.cs:line 20
           at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
           at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
           at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
           at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
           at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
           at System.Threading.ThreadHelper.ThreadStart()
      InnerException: edu.stanford.nlp.io.RuntimeIOException
           HResult=-2146233088
           Message=Unrecoverable error while loading a tagger model
           Source=stanford-corenlp-3.4
           StackTrace:
                at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(Properties config, String modelFileOrUrl, Boolean printLoading)
                at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile, Properties config, Boolean printLoading)
                at edu.stanford.nlp.tagger.maxent.MaxentTagger..ctor(String modelFile)
                at edu.stanford.nlp.pipeline.POSTaggerAnnotator.loadModel(String A_0, Boolean A_1)
                at edu.stanford.nlp.pipeline.POSTaggerAnnotator..ctor(String annotatorName, Properties props)
                at edu.stanford.nlp.pipeline.StanfordCoreNLP.4.create()
           InnerException: java.io.IOException
                HResult=-2146233088
                Message=Unable to resolve "edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger" as either class path, filename or URL
                Source=stanford-corenlp-3.4
                StackTrace:
                     at edu.stanford.nlp.io.IOUtils.getInputStreamFromURLOrClasspathOrFileSystem(String textFileOrUrl)
                     at edu.stanford.nlp.tagger.maxent.MaxentTagger.readModelAndInit(Properties config, String modelFileOrUrl, Boolean printLoading)
                InnerException: 
    
    
  5. Download the stanford-corenlp-full-3.4.x.zip file from here: http://nlp.stanford.edu/software/corenlp.shtml#Download
  6. Extract the stanford-corenlp-full-2014-6-16.x.zip.
    Note: Over time, as new versions come out, make sure the version you download matches the version of your NuGet package.
  7. Extract the stanford-corenlp-3.4-models.jar file to stanford-corenlp-3.4-models.
    I used 7zip to extract the jar file.
  8. Copy the stanford-corenlp-3.4-models folder to your Visual Studio project files.
    Note: This is one way to include the jar file in your project. Other ways might be a copy action or another good way would be to use an app.config appSetting. I chose this way because it makes all my files part of the project for this demo. I would probably use the app.config method in production.
  9. In Visual Studio, use ctrl + left click to  highlight the stanford-corenlp-3.4-models folder and all subfolders.
  10. Open Properties (Press F4), and change the namespace provider setting to false.
  11. In Visual Studio, use ctrl + left click to  highlight the files under the stanford-corenlp-3.4-models folder and all files in all subfolders.
  12. Open Properties (Press F4), and change the Build Action to Content and the Copy to Output Directory setting to Copy if newer.
  13. Run the code.

 

Note: At first I tried to just load the model file. That doesn’t work. I got an exception. I had to set the @jarpath as shown above. I needed to copy all the contents of the jar file.

Results

Notice that I through it curve ball by ending a sentence with Jr. It still figured it out.

I went or a run. Then I went to work. I had a good lunch meeting with a friend name John Jr. The commute home was pretty good.

However, I just tried this paragraph and it did NOT detect the break after the first sentence.

Exit Room A. Turn right. Go down the hall to the first door. Enter Room B.

I am pretty sure this second failure is due to the similarity in string with a legitimate first name, middle initial, last name.

Jared A. Barneck
Room A. Turn

Now the question is, how do I train it to not make such mistakes?

Aspx CheckBoxList Alternative that allows for the OnCheckedChanged event

My team had to edit an older apsx website and add a list of CheckBoxes, which are actually html <input type="checkbox"/> tags.  We assumed that CheckBoxList would be perfect for our task. We assumed wrong.

I wrote desktop apps for years and now that I am writing web apps, I am using MVC and Razor, so I missed the aspx days. However, having to work on legacy aspx applications is catching me up.

My use case for list of CheckBoxes

Really, my use case is simple. There are icons that should show up for some products. We wanted to store a map between products and the icons in the database. On the product page, we want a list of the icons with a checkbox by each list item. This would allow the person who creates new products to select which icons to show on a product download page.

  1. Get a list of option from a database and display that list as html checkboxes.
  2. Checking a checkbox should add a row to a database table.
  3. Unchecking a checkbox should remove that row from a database table.

CheckBoxList Fails

So what I needed was an OnClick event which is actually called OnCheckedChanged in an asp:Checkbox. Well, it turns out that the CheckBoxList control doesn’t support the OnCheckedChanged event per CheckBox. The CheckBoxList doesn’t have any type of OnClick method. Instead, there is an OnSelectedIndexChanged event. But OnSelectedIndexChanged doesn’t even tell you which CheckBox was clicked. If a CheckBox is checked, then in the OnSelectedIndexChanged event, the SelectedItem equals the clicked item, however, if the CheckBox is unchecked, the SelectedItem was a completely different CheckBox that is checked.

So to me, this made the CheckBoxList control not usable. So I set out to replicate a CheckBoxList in a way that supports OnCheckedChanged.

Repeater with CheckBox for ItemTemplate

I ended up using the Repeater control with an ItemTemplate containing a CheckBox. Using this method worked pretty much flawlessly. It leaves me to wonder, why was CheckBoxList created in the first place if a Repeater with a CheckBox in the template works perfectly well.

Here is my code:

<%@ Page Title="Home Page" Language="C#" MasterPageFile="~/Site.Master" AutoEventWireup="true" CodeBehind="Default.aspx.cs" Inherits="CheckBoxListExample._Default" %>

<%@ Import Namespace="CheckBoxListExample" %>
<%@ Import Namespace="CheckBoxListExample.Models" %>

<asp:Content ID="BodyContent" ContentPlaceHolderID="MainContent" runat="server">
    <div>
        <asp:Repeater ID="Repeater1" runat="server">
            <ItemTemplate>
                <asp:CheckBox ID="cb1" runat="server" AutoPostBack="true" OnCheckedChanged="RepeaterCheckBoxChanged"
                    Text="<%# ((CheckBoxViewModel)Container.DataItem).Name %>"
                    Checked="<%# ((CheckBoxViewModel)Container.DataItem).IsChecked %>" />
            </ItemTemplate>
        </asp:Repeater>
    </div>
</asp:Content>
using System;
using System.Collections.Generic;
using System.Web.UI;
using System.Web.UI.WebControls;
using CheckBoxListExample.Models;

namespace CheckBoxListExample
{
    public partial class _Default : Page
    {
        private List<CheckBoxViewModel> _ViewModels;

        protected void Page_Load(object sender, EventArgs e)
        {
            if (!IsPostBack)
            {
                var _ViewModels = new List<CheckBoxViewModel>
                {
                    new CheckBoxViewModel {Name = "Test1", IsChecked = true},
                    new CheckBoxViewModel {Name = "Test2"},
                    new CheckBoxViewModel {Name = "Test3"}
                };
                Repeater1.DataSource = _ViewModels;
                Repeater1.DataBind();
            }
        }

        protected void RepeaterCheckBoxChanged(object sender, EventArgs e)
        {
            var cb = sender as CheckBox;
            if (cb == null) return;
            if (cb.Checked)
            {
                // Insert
            }
            else
            {
                // Delete
            }
        }
    }
}
namespace CheckBoxListExample.Models
{
    public class CheckBoxViewModel
    {
        public string Name { get; set; }
        public bool IsChecked { get; set; }
    }
}