regex

Regex to remove DOCTYPE prolog

While using HTML Tidy I needed to remove the DOCTYPE prolog to prevent ‘org.xml.sax.SAXParseException: Already seen doctype.’ exception. Regex is quite simple, only catch is that we need to make sure we include the \n\r in our selecton and make it not greedy. convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); This will consume multiline …

Regex to remove DOCTYPE prolog Read More »

Triming leading and trailing new lines with regex.

Here is some regex to trim leading/trailing newlines carriage and spaces returns from some text. Without replacing ‘spaces’ just new lines/carriage returns. ^(\n|\r)+|(\n|\r)+\Z   This will trim also spaces ^(\n|\r|\s)+|(\n|\r|\s)+\ZWithout replacing 'spaces' just new lines/carriage returns. ^(\n|\r)+|(\n|\r)+\Z This will trim also spaces ^(\n|\r|\s)+|(\n|\r|\s)+\Z Quick explanation might be in order. This regex consists of two parts, …

Triming leading and trailing new lines with regex. Read More »