java

Play with JSoup

· John Doe

972 Views

Removing Html tags except few specific ones from String in java

public String clean(String unsafe){
     Whitelist whitelist = Whitelist.none();
     whitelist.addTags(new String[]{"p","br","ul"});

     String safe = Jsoup.clean(unsafe, whitelist);
     return StringEscapeUtils.unescapeXml(safe);
}

For input string

String unsafe = "<p class='p1'>paragraph</p>< this is not html > <a link='#'>Link</a> <![CDATA[<sender>John Smith</sender>]]>";

I get following output which is pretty much I require.

<p>paragraph</p>< this is not html > Link <sender>John Smith</sender>

jsoup