Regex to remove DOCTYPE prolog

Posted on 1 CommentPosted in Uncategorized

While using HTML Tidy I needed to remove the DOCTYPE prolog to prevent ‘org.xml.sax.SAXParseException: Already seen doctype.’ exception. Regex is quite simple, only catch is that we need to make sure we include the \n\r in our selecton and make it not greedy. convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); This will consume multiline […]