November 2011

Regex to remove DOCTYPE prolog

While using HTML Tidy I needed to remove the DOCTYPE prolog to prevent ‘org.xml.sax.SAXParseException: Already seen doctype.’ exception. Regex is quite simple, only catch is that we need to make sure we include the \n\r in our selecton and make it not greedy. convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); This will consume multiline …

Regex to remove DOCTYPE prolog Read More »