Removing all xml or html tags using Notepad++

December 11, 2012, 6:00 am by Rhyous

Let’s say you have an xml or an html document and you want to remove the tags.

<h2>Shopping List</h2>
<ol>
	<li>Milk</li>
	<li>eggs</li>
	<li>butter</li>
	<li>cereal</li>
	<li>bananas</li>
	<li>apples</li>
	<li>orange juice</li>
	<li>yogurt</li>
	<li>bread</li>
	<li>cheese</li>
</ol>

This can be done rather quickly in a tool like notepad++ using the find and replace with regular expressions feature.

Go to Find and Replace.
Enter this regular expression: <[^>]+>
Select regular expression.
Make sure the cursor is at the start of the document.
Click replace all.

That is it.

Category: Software Applications | Comment (RSS) | Trackback

6 Comments

Clifton Willard says:

November 3, 2019 at 3:27 pm

does nothing. Replace all, ) occurrences were replaced

Reply to this comment
Chris says:

May 19, 2017 at 8:51 am

Thanks, great help!

Reply to this comment
Natasha C says:

April 23, 2017 at 3:27 pm

God Bless. I have no idea what I'm doing, and you've saved me. I downloaded corpora, wanting .txt files with no markup, but it gave me .xml and I was annoyed, but you've given me an easy way out.

Reply to this comment
Rhyous says:

December 12, 2012 at 4:51 pm

Yeah...This is for quick and dirty lists from html code. Say for example, you want to grab the list from an HTML drop down menu to add into documentation. There are 100 items in the drop down menu. So I right-click and "Inspect element" on Google Chrome. Grab the HTML and stick it in Notepad++. I use the steps to remove the html and then I have a nice list to put in my documentation.

Reply to this comment
Bob Pelerson says:

December 11, 2012 at 8:44 am

echo "" | sed -E 's/]+>//g'

It removes the comment. Don't parse XML with regex

Reply to this comment
- Bob Pelerson says:
  
  December 11, 2012 at 8:45 am
  
  %echo "[helo][foo id="hello"][--- book.xml (revision<yoda--][/foo][/hel]" | sed -E 's/]+>//g'
  
  (where [ ] and are replaced with < and >)
  
  Reply to this comment

Rhyous

Knight of the Code

Removing all xml or html tags using Notepad++

Like this:

Related

6 Comments

Leave a Reply

Are you a Jeek?

Categories

Recent Posts

My other blogs