{"id":3807,"date":"2020-06-22T19:00:29","date_gmt":"2020-06-22T13:30:29","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=3807"},"modified":"2025-03-07T05:55:11","modified_gmt":"2025-03-07T10:55:11","slug":"python-regex-tutorial","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/python-regex-tutorial\/","title":{"rendered":"Python Regex Tutorial"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Regular expressions (Regex)<\/h2>\n\n\n\n<p>A regular expression is a special kind of sequence of characters that is used to match a string in different applications. One big example of the<a href=\"https:\/\/www.h2kinfosys.com\/blog\/python-regex-tutorial\/\"> regex<\/a> is email validation.<\/p>\n\n\n\n<p>When you sign on any website the normal pattern is that the browser sends a request to the server and ask if the email and password are correct. But if millions of users are sending requests then it is better to filter out emails that have a correct format. To check the format of email programmers use regex on the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Client-side\" rel=\"nofollow noopener\" target=\"_blank\">client-side application <\/a>that makes sure that the email has the correct format. For example, the regex will help us to check if the user has entered \u201c@\u201d in the email at the correct place and \u201c.com\u201d at the end of the email. This type of checking is done using regex.<\/p>\n\n\n\n<p>Let\u2019s get to know about that special kind of sequence of characters.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>[ ]<\/strong>\u00a0 <em>used to specify a special class. [1234]<\/em><\/li>\n\n\n\n<li><strong>.<\/strong> <em>(dot) it matches any character except a new line.<\/em><\/li>\n\n\n\n<li><strong>* <\/strong><em>it means zero or more<\/em><\/li>\n\n\n\n<li><strong>+<\/strong> <em>means one or more<\/em><\/li>\n\n\n\n<li><strong>\u2013<\/strong><em> it is used to express a range:- [a-z]<\/em><\/li>\n<\/ul>\n\n\n\n<p>Now let&#8217;s tell you about some predefined sets<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\\d<\/strong> <em>matches any decimal digits of class [0-9]<\/em><\/li>\n\n\n\n<li><strong>\\D<\/strong> <em>matches any non-decimal character. [^0-9]<\/em><\/li>\n\n\n\n<li><strong>\\w<\/strong> <em>matches any alphanumeric. [a-zA-z0-9 ]<\/em><\/li>\n\n\n\n<li><strong>\\W<\/strong> <em>matches any non alphanumeric [^a-zA-Z0-9 ]<\/em><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">re.match()<\/h2>\n\n\n\n<p>re.match() function checks whether a string matches a specific format or not. It returns true or false.<\/p>\n\n\n\n<p>In the example below, we created a regex \u201c (A\\w+) \u201c that means find a string that starts with \u201cA\u201d and \\w means any alphanumeric and + stands for one or more. The whole string will be described as any string that starts with \u2018A\u2019 and contains any length.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import re\nlist = [\"Alina\", \"Alex\", \"Bob\"]\nfor element in list:\nz = re.match(\"(A\\w+)\", element)\nif z:\n&nbsp; &nbsp; print(\"matched\")\nelse:\n&nbsp; &nbsp; print(\"not matched\")<\/pre>\n\n\n\n<p><em>The following will be the output.<\/em><br><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img fetchpriority=\"high\" decoding=\"async\" width=\"669\" height=\"518\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/06\/image-13.png\" alt=\"\" class=\"wp-image-23622\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/06\/image-13.png 669w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/06\/image-13-300x232.png 300w\" sizes=\"(max-width: 669px) 100vw, 669px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">re.search()<\/h2>\n\n\n\n<p>re.search() function is used to find the first occurrence in the required string. If we have a string \u201cHave a nice day\u201d and we want to find out whether \u201cBob\u201d is present in this string or not we will use re.search()<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import re\npatterns = ['nice', 'bob']\ntext = 'Have a nice day'\nfor pattern in patterns:\nprint('Looking for \"%s\" in \"%s\" -&gt;' % (pattern, text), end=' ')\nif re.search(pattern, text):\n&nbsp; &nbsp; print('found a match!')\nelse:\nprint('no match')<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">re.findall()<\/h2>\n\n\n\n<p>The function re.search() returns when it finds the first occurrence but on the other hand re.findall() search for all the occurrence of the words in the string.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import re\npatterns = ['nice', 'bob']\ntext = 'Have a nice day bob. bob is a nice boy. bob'\nname = re.findall('bob',text)\nnice = re.findall('nice',text)\n\nprint(name)\nprint(nice)<\/pre>\n\n\n\n<p><em>The following will be the output.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">re.split()<\/h2>\n\n\n\n<p>Suppose you have a string of names space-separated and you want to separate them. For this purpose re.split() function is used.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import re\n\nstring = 'Ana B0B Ali'\npattern = '\\W'\n\nresult = re.split(pattern, string)\nprint(result)<\/pre>\n\n\n\n<p><em>The following will be the output.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Regular expressions (Regex) A regular expression is a special kind of sequence of characters that is used to match a string in different applications. One big example of the regex is email validation. When you sign on any website the normal pattern is that the browser sends a request to the server and ask if [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3820,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[342],"tags":[1014,1018,1016,1017,1019,1015],"class_list":["post-3807","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python-tutorials","tag-python-regex","tag-re-findall","tag-re-match","tag-re-search","tag-re-split","tag-regular-expressions"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/3807","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=3807"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/3807\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/3820"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=3807"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=3807"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=3807"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}