Removing Specific Punctuation Marks Using str_replace()
You could leverage the fact that str_replace()
can take an array of values/needles to find and replace in a string, and do something like the following:
$string = 'Hello, how are you?'; echo str_replace(['?', '!', '.'], '', $string); // output: 'Hello, how are you';
In the example above, we're only replacing ?
, !
, and .
characters. Therefore, the resulting string has question mark stripped, but the comma remains.
Removing Punctuation Using preg_replace()
Below are some examples of how we can remove punctuation using regular expressions and built-in PCRE (Perl Compatible Regular Expressions) character classes:
Removing Specific Punctuation Marks:
You could specify which marks you wish to remove in a regular expression and replace them in a string using preg_replace()
like so:
$string = 'Hello, how are you?'; echo preg_replace('/[?|.|!]?/', '', $string); // output: 'Hello, how are you';
In the example above, we're only replacing ?
, !
, and .
characters. Therefore, the resulting string has question mark stripped, but the comma remains.
In a character class (i.e. characters inside square brackets in a regular expression) any character, except ^
, -
, ]
or \
, is a literal and does not need to be escaped.
Removing Unicode Punctuation Characters Using PCRE Character Classes:
Using the PCRE unicode character class \p{P}
(or \pP
) we can remove all unicode punctuation characters from a string like so:
$string = 'Hello, how are you?'; echo preg_replace('/\p{P}/', '', $string); // output: 'Hello how are you';
There are, of course, other PHP PCRE unicode punctuation character sets that you can use:
Sequence | Description |
---|---|
\p{P} |
All punctuation characters. |
\p{Pd} |
All Hyphens and dashes. |
\p{Ps} |
Any kind of opening bracket. |
\p{Pe} |
Any kind of closing bracket. |
\p{Pi} |
Any kind of opening/initial quote. |
\p{Pf} |
Any kind of closing/final quote. |
\p{Pc} |
Any kind of character that connects words (e.g. underscore, etc.). |
If these character classes are not available in your environment, then perhaps you need to use the --enable-parle-utf32
option at compilation time.
Following is a way you could use multiple character sets together using the pipe/alternation operator:
$string = 'Strip-All-Dashes_And_Underscores'; echo preg_replace('/\p{Pd}|\p{Pc}/', '', $string); // output: 'StripAllDashesAndUnderscores';
Removing Punctuation Characters Using POSIX Character Class:
We could use the POSIX punct
character class to find and replace all the punctuation characters with preg_replace()
, like so:
$string = 'Hello, how are you?'; echo preg_replace('/[[:punct:]]/', '', $string); // output: 'Hello how are you';
[:punct:]
is a POSIX-style bracket expression that denotes a locale-aware punctuation character class/set. The syntax is also supported by PCRE regex syntax — which is what PHP uses.
POSIX character classes could be a bit flakey in some instances since they are locale-dependent. This means that if the vendor implementation of a character set for a locale is not up-to-date, then the results might suffer. Comparatively, unicode character classes, rely soley on standard unicode punctuations and are, therefore, more reliable.
Removing All Punctuation Excluding Some:
We can combine negative lookahead (?!
) with a punctuation character set to exclude some punctuation characters from being removed, like so:
$string = 'Hello, how are you?'; // using PCRE character class: echo preg_replace('/(?![!,])\p{P}/', '', $string); // using POSIX character class: echo preg_replace('/(?![!,])[[:punct:]]/', '', $string); // output: 'Hello, how are you';
In the example above, we're replacing all punctuation characters except exclamation mark and a comma.
Another way of doing the same could be to strip off everything from the string except for the characters we allow. We can achieve this by using a negation (^
) of all the characters we want to allow:
$string = 'Hello, how are you?'; // using PCRE character class: echo preg_replace('/[^a-z0-9!, ]/i', '', $string); // output: 'Hello, how are you';
This post was published by Daniyal Hamid. Daniyal currently works as the Head of Engineering in Germany and has 20+ years of experience in software engineering, design and marketing. Please show your love and support by sharing this post.