Skip to main content
We are Brand SEO Beijing serving international business, your marketing partner, Contact us by mi@mgsh.com.cn

Regular expression URL submission search engine PC and mobile adaptation rules

We are doingWebsite optimization, In the process of website revision, we often encounter URL replacement and adaptation of PC and mobile terminals. When we submit through Baidu Webmaster Tools, we often encounter the requirement of regular expression replacement.

Regular Expression-Website Revision Adaptation-Website Optimization-米国生活

Regular Expression Concepts

Regular expressions, also known as regular expressions. (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept in computer science.Regular expressions are usually used to retrieve and replace text that matches a certain pattern (rule).

A regular expression is a logical formula for manipulating strings (including ordinary characters (for example, letters between a and z) and special characters (called "metacharacters")) by using predefined specific characters , and the combination of these specific characters to form a "rule string", this "rule string" is used to express a filtering logic for strings.A regular expression is a text pattern that describes one or more strings to match when searching for text.

Regular expression foreign tools

Online tools test the writing of regular expressions and whether the actual case matches: https://regex101.com/

Download code writing tools for various application languages ​​during the development of regular expressions: https://www.regexbuddy.com/

Of course, you can also find some tools on the Internet to download the free test version, which can only help here.

Regular Expression Purpose

Given a regular expression and another string, we can achieve the following:

1. Whether the given string matches the filtering logic of the regular expression (called "match"):
2. We can get the specific part we want from the string through regular expressions.

Regular Expression Composition Symbols

Regular expressions consist of some ordinary characters and some metacharacters.Normal characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings, which we explain below.

In the simplest case, a regular expression looks like an ordinary search string.For example, the regular expression "testing" does not contain any metacharacters, it can match strings like "testing" and "testing123", but not "Testing".

metacharacter
Description
\
Converts the next character token, or a backreference, or an octal escape.For example, "\\n" matches \n. "\n" matches a newline.The sequence "\\" matches "\" and "\(" matches "(". This is equivalent to the concept of "escape character" found in many programming languages.
^
Matches the beginning of the input word line.If the Multiline property of the RegExp object is set, ^ also matches the position after "\n" or "\r".
$
Match end of input line.If the Multiline property of the RegExp object is set, $ also matches the position before "\n" or "\r".
*
Matches the preceding subexpression any number of times.For example, zo* matches "z", as well as "zo" and "zoo". * Equivalent to {0,}.
+
Match the preceding subexpression one or more times (greater than or equal to 1).For example, "zo+" matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
?
Matches the preceding subexpression zero or one time.For example, "do(es)?" can match "do" or "does". ? is equivalent to {0,1}.
{n}
nis a non-negative integer.match determinednSecond-rate.For example, "o{2}" cannot match the "o" in "Bob", but can match the two o's in "food".
{n,}
nis a non-negative integer.match at leastnSecond-rate.For example, "o{2,}" would not match the "o" in "Bob", but would match all o's in "foooood". "o{1,}" is equivalent to "o+". "o{0,}" is equivalent to "o*".
{n,m}
mnare non-negative integers, wheren<=m.least matchntimes and at most matchesmSecond-rate.For example, "o{1,3}" will match the first three o's in "fooooood" as a set, and the last three o's as a set. "o{0,1}" is equivalent to "o?".Note that there can be no spaces between the comma and the two numbers.
?
When the character immediately follows any one of the other qualifiers (*,+,?,{n}, {n,}, {n,m}), the matching pattern is non-greedy.The non-greedy mode matches as little of the searched string as possible, while the default greedy mode matches as much of the searched string as possible.For example, for the string "oooo", "o+" will match as much "o" as possible, yielding the result ["oooo"], while "o+?" will match as little "o" as possible, yielding the result ['o ', 'o', 'o', 'o']
.point
Matches any single character except "\n" and "\r".To match any character including "\n" and "\r", use a pattern like "[\s\S]".
(pattern)
Match pattern and get that match.The retrieved matches can be obtained from the resulting Matches collection, using the SubMatches collection in VBScript and the $0…$9 properties in JScript.To match parentheses characters, use "\(" or "\)".
(?: pattern)
Non-fetching matches, matches the pattern but does not obtain the matching result, and does not store it for later use.This is useful when using the or character "(|)" to combine parts of a pattern.For example "industr(?:y|ies)" is a shorter expression than "industry|industries".
(?=pattern)
Non-acquisition matching, positive positive lookahead, matches the lookup string at the beginning of any string matching pattern, the match does not need to be acquired for later use.For example, "Windows(?=95|98|NT|2000)" can match "Windows" in "Windows2000", but not "Windows" in "Windows3.1".Lookahead consumes no characters, that is, after a match occurs, the search for the next match begins immediately after the last match, not after the character containing the lookahead.
(?! pattern)
Non-fetch matching, forward negative lookahead, matches the lookup string at the beginning of any string that does not match pattern, the match does not need to be fetched for later use.For example, "Windows(?!95|98|NT|2000)" can match "Windows" in "Windows3.1", but not "Windows" in "Windows2000".
(?<=pattern)
Non-acquisition matching, reverse positive pre-check, is similar to positive positive pre-check, but in the opposite direction.For example, "(?<=95|98|NT|2000)Windows" matches "Windows" in "2000Windows", but not "Windows" in "3.1Windows".
*python's regular expressions are not fully implemented according to the regular expression specification, so some advanced features are recommended to use other languages ​​such as java, scala, etc.
(?
Non-acquisition matches, reverse negative pre-checks, are similar to forward negative pre-checks, but in the opposite direction.E.g"(?
*python's regular expressions are not fully implemented according to the regular expression specification, so some advanced features are recommended to use other languages ​​such as java, scala, etc.
x | y
matches x or y.For example, "z|food" can match "z" or "food" (be careful here). "[z|f]ood" matches "zood" or "food".
[xyz]
character collection.Matches any one of the included characters.For example, "[abc]" can match "a" in "plain".
[^xyz]
A collection of negative characters.Matches any character not included.For example, "[^abc]" can match any character of "plin" in "plain".
[az]
character range.Matches any character in the specified range.For example, "[az]" matches any lowercase alphabetic character in the range "a" to "z".
Note: Only when the hyphen is inside a character group and appears between two characters can it represent a range of characters; if it is out of the beginning of a character group, it can only represent the hyphen itself.
[^az]
Negative character range.Matches any arbitrary character not in the specified range.For example, "[^az]" matches any character that is not in the range "a" to "z".
\b
Match the boundary of a word, that is, the position between the word and the space (that is, there are two concepts of "matching" in regular expressions, one is the matching character, the other is the matching position, where \b is the matching position) .For example, "er\b" can match "er" in "never", but not "er" in "verb"; "\b1_" can match "1_" in "23_1", but not "21_3" "1_" in .
\B
Match non-word boundaries. "er\B" can match "er" in "verb", but not "er" in "never".
\cx
Matches the control character specified by x.For example, \cM matches a Control-M or carriage return. The value of x must be one of AZ or az.Otherwise, treat c as a literal "c" character.
\d
Match a numeric character.Equivalent to [0-9]. grep to add -P, perl regular support
\D
Matches a non-numeric character.Equivalent to [^0-9]. grep to add -P, perl regular support
\f
Matches a form feed character.Equivalent to \x0c and \cL.
\n
Matches a newline character.Equivalent to \x0a and \cJ.
\r
Matches a carriage return.Equivalent to \x0d and \cM.
\s

matches anything not %

Back to Top