{"id":4571,"date":"2020-08-26T18:38:25","date_gmt":"2020-08-26T13:08:25","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=4571"},"modified":"2025-02-24T04:44:41","modified_gmt":"2025-02-24T09:44:41","slug":"part-of-speech-tagging-chunking-with-nltk","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/part-of-speech-tagging-chunking-with-nltk\/","title":{"rendered":"Chunking with NLTK: POS Tagging and Phrase Extraction"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\"><strong>What is Part of Speech (POS) Tagging?<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Introduction to Parts of Speech<\/strong><\/h3>\n\n\n\n<p><strong>Part of Speech (POS) tagging<\/strong> is a fundamental concept in <strong>linguistics and Natural Language Processing (NLP)<\/strong> that classifies words based on their grammatical roles in a sentence. In the English language, words are categorized into <strong>eight main parts of speech<\/strong>, each serving a specific function in sentence structure. Understanding these classifications is crucial for various NLP applications such as <strong>text analysis, machine translation, speech recognition, and information retrieval<\/strong>.<\/p>\n\n\n\n<p>The <strong>eight parts of speech<\/strong> in English are:<br>1&#xfe0f;&#x20e3; <strong>Nouns<\/strong><br>2&#xfe0f;&#x20e3; <strong>Pronouns<\/strong><br>3&#xfe0f;&#x20e3; <strong>Verbs<\/strong><br>4&#xfe0f;&#x20e3; <strong>Adverbs<\/strong><br>5&#xfe0f;&#x20e3; <strong>Adjectives<\/strong><br>6&#xfe0f;&#x20e3; <strong>Prepositions<\/strong><br>7&#xfe0f;&#x20e3; <strong>Conjunctions<\/strong><br>8&#xfe0f;&#x20e3; <strong>Interjections<\/strong><\/p>\n\n\n\n<p>Each of these categories plays a unique role in sentence formation. Let\u2019s explore them in detail with definitions and examples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>1. Nouns \u2013 Naming Words<\/strong><\/h2>\n\n\n\n<p>Nouns are words that <strong>identify people, places, things, or ideas<\/strong>. They serve as the <strong>subject or object<\/strong> of a sentence.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Examples of Nouns:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Person:<\/strong> Mike, Jennifer, scientist<\/li>\n\n\n\n<li><strong>Place:<\/strong> Tokyo, beach, university<\/li>\n\n\n\n<li><strong>Thing:<\/strong> Laptop, elephant, vehicle<\/li>\n\n\n\n<li><strong>Idea:<\/strong> Happiness, freedom, democracy<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; <strong>Tokyo<\/strong> is a beautiful city.<br>&#x2705; The <strong>elephant<\/strong> is the largest land animal.<br>&#x2705; <strong>Freedom<\/strong> is important for personal growth.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>2. Pronouns \u2013 Replacing Nouns<\/strong><\/h2>\n\n\n\n<p>Pronouns are used to <strong>replace nouns<\/strong> to avoid repetition and enhance fluency in sentences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Types of Pronouns:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Personal Pronouns:<\/strong> I, you, he, she, it, we, they<\/li>\n\n\n\n<li><strong>Possessive Pronouns:<\/strong> His, hers, yours, ours, theirs<\/li>\n\n\n\n<li><strong>Demonstrative Pronouns:<\/strong> This, that, these, those<\/li>\n\n\n\n<li><strong>Interrogative Pronouns:<\/strong> Who, what, which, whom<\/li>\n\n\n\n<li><strong>Reflexive Pronouns:<\/strong> Myself, yourself, himself, herself<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; <strong>He<\/strong> is my best friend. <em>(Replaces &#8220;Mike&#8221;)<\/em><br>&#x2705; <strong>They<\/strong> went to the beach. <em>(Replaces &#8220;John and Lisa&#8221;)<\/em><br>&#x2705; Is this <strong>yours<\/strong>? <em>(Refers to an object belonging to someone)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>3. Verbs \u2013 Action Words<\/strong><\/h2>\n\n\n\n<p>Verbs describe <strong>actions, occurrences, or states of being<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Types of Verbs:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Action Verbs:<\/strong> Run, eat, write, sleep, drive<\/li>\n\n\n\n<li><strong>Linking Verbs:<\/strong> Is, am, are, was, were, seem, become<\/li>\n\n\n\n<li><strong>Helping (Auxiliary) Verbs:<\/strong> Can, could, will, would, should, do, have<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; She <strong>writes<\/strong> every day. <em>(Action verb: &#8220;writes&#8221;)<\/em><br>&#x2705; He <strong>is<\/strong> a doctor. <em>(Linking verb: &#8220;is&#8221;)<\/em><br>&#x2705; They <strong>have<\/strong> finished the project. <em>(Helping verb: &#8220;have&#8221;)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>4. Adverbs \u2013 Describing Verbs, Adjectives, or Other Adverbs<\/strong><\/h2>\n\n\n\n<p>Adverbs modify <strong>verbs, adjectives, or other adverbs<\/strong>, providing additional information about <strong>how, when, where, or to what extent<\/strong> an action is performed.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Types of Adverbs:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Manner (How?):<\/strong> Quickly, boldly, carefully<\/li>\n\n\n\n<li><strong>Time (When?):<\/strong> Yesterday, often, yearly<\/li>\n\n\n\n<li><strong>Place (Where?):<\/strong> Here, there, everywhere<\/li>\n\n\n\n<li><strong>Degree (To what extent?):<\/strong> Very, quite, too<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; She speaks <strong>quickly<\/strong>. <em>(Modifies the verb &#8220;speaks&#8221;)<\/em><br>&#x2705; He is <strong>very<\/strong> tall. <em>(Modifies the adjective &#8220;tall&#8221;)<\/em><br>&#x2705; They arrived <strong>early<\/strong>. <em>(Modifies the verb &#8220;arrived&#8221;)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>5. Adjectives \u2013 Describing Nouns and Pronouns<\/strong><\/h2>\n\n\n\n<p>Adjectives describe or <strong>modify nouns and pronouns<\/strong>, adding details about <strong>quality, size, color, quantity, and more<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Examples of Adjectives:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Quality:<\/strong> Cheerful, intelligent, beautiful<\/li>\n\n\n\n<li><strong>Size:<\/strong> Small, huge, enormous<\/li>\n\n\n\n<li><strong>Color:<\/strong> Red, yellow, blue<\/li>\n\n\n\n<li><strong>Quantity:<\/strong> Few, many, several<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; She has a <strong>cheerful<\/strong> personality. <em>(Describes &#8220;personality&#8221;)<\/em><br>&#x2705; The <strong>blue<\/strong> car is parked outside. <em>(Describes &#8220;car&#8221;)<\/em><br>&#x2705; We need <strong>many<\/strong> volunteers. <em>(Describes &#8220;volunteers&#8221;)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>6. Prepositions \u2013 Connecting Words<\/strong><\/h2>\n\n\n\n<p>Prepositions show the <strong>relationship<\/strong> between a noun (or pronoun) and other words in a sentence. They indicate <strong>position, direction, time, cause, manner, or possession<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Common Prepositions:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Place:<\/strong> In, on, under, above, between<\/li>\n\n\n\n<li><strong>Time:<\/strong> At, before, after, during, since<\/li>\n\n\n\n<li><strong>Direction:<\/strong> To, from, toward, into, out of<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; The book is <strong>on<\/strong> the table. <em>(Position \u2013 &#8220;on&#8221;)<\/em><br>&#x2705; She arrived <strong>before<\/strong> noon. <em>(Time \u2013 &#8220;before&#8221;)<\/em><br>&#x2705; He walked <strong>toward<\/strong> the park. <em>(Direction \u2013 &#8220;toward&#8221;)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>7. Conjunctions \u2013 Joining Words and Phrases<\/strong><\/h2>\n\n\n\n<p>Conjunctions connect <strong>words, phrases, or clauses<\/strong> in a sentence, making it more structured and fluid.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Types of Conjunctions:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Coordinating Conjunctions:<\/strong> And, but, or, yet, so<\/li>\n\n\n\n<li><strong>Subordinating Conjunctions:<\/strong> Because, although, since, while<\/li>\n\n\n\n<li><strong>Correlative Conjunctions:<\/strong> Either\u2026or, neither\u2026nor, not only\u2026but also<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; She likes coffee <strong>and<\/strong> tea. <em>(Joins two nouns)<\/em><br>&#x2705; He stayed home <strong>because<\/strong> he was sick. <em>(Joins two clauses)<\/em><br>&#x2705; You can <strong>either<\/strong> go <strong>or<\/strong> stay. <em>(Correlative conjunctions)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>8. Interjections \u2013 Expressing Emotions<\/strong><\/h2>\n\n\n\n<p>Interjections are words used to <strong>express strong emotions, feelings, or reactions<\/strong>. They are often followed by an exclamation mark.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Examples of Interjections:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Excitement:<\/strong> Wow!, Hurrah!, Yay!<\/li>\n\n\n\n<li><strong>Surprise:<\/strong> Oh!, Really?, What!?<\/li>\n\n\n\n<li><strong>Sorrow:<\/strong> Alas!, Oh no!, Oops!<\/li>\n<\/ul>\n\n\n\n<p><strong>Example Sentences:<\/strong><br>&#x2705; <strong>Wow!<\/strong> That\u2019s amazing. <em>(Expresses excitement)<\/em><br>&#x2705; <strong>Oh no!<\/strong> I forgot my keys. <em>(Expresses worry)<\/em><br>&#x2705; <strong>Hurrah!<\/strong> We won the match. <em>(Expresses happiness)<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Importance of POS Tagging in NLP<\/strong><\/h2>\n\n\n\n<p><strong>Part of Speech (POS) tagging<\/strong> is widely used in <strong>Natural Language Processing (NLP)<\/strong> to <strong>identify grammatical categories<\/strong> in a text. It is crucial for:<br>&#x1f539; <strong>Speech Recognition:<\/strong> Understanding sentence structure for accurate transcriptions.<br>&#x1f539; <strong>Machine Translation:<\/strong> Ensuring correct word usage across languages.<br>&#x1f539; <strong>Chatbots &amp; AI Assistants:<\/strong> Improving sentence interpretation.<br>&#x1f539; <strong>Text Mining &amp; Sentiment Analysis:<\/strong> Analyzing patterns and extracting meaning from texts.<\/p>\n\n\n\n<p><strong>Example of POS Tagging in Python using NLTK:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python<code>import nltk\nnltk.download('averaged_perceptron_tagger')\n\ntext = \"The quick brown fox jumps over the lazy dog\"\ntokens = nltk.word_tokenize(text)\npos_tags = nltk.pos_tag(tokens)\n\nprint(pos_tags)\n<\/code><\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bash<code>&#91;('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), \n('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]<\/code><\/code><\/pre>\n\n\n\n<p>Now that you know what each part of speech are, let&#8217;s discuss Part of Speech Tagging&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Part of Speech (POS) Tagging&nbsp;<\/strong><\/h2>\n\n\n\n<p>POS tagging in simple terms means allocating every word in a sentence to a part of speech. <a href=\"https:\/\/www.h2kinfosys.com\/blog\/natural-language-processing-nlp-tutorial\/\">NLTK<\/a> has a method called pos_tag that performs POS tagging on a sentence. The methods apply <a href=\"https:\/\/en.wikipedia.org\/wiki\/Supervised_learning#:~:text=Supervised%20learning%20is%20the%20machine,on%20example%20input%2Doutput%20pairs.&amp;text=A%20supervised%20learning%20algorithm%20analyzes,used%20for%20mapping%20new%20examples.\" rel=\"nofollow noopener\" target=\"_blank\">supervised learning approaches<\/a> that utilize features such as context, the capitulation of words, punctuations, and so on to determine the part of speech.\u00a0<\/p>\n\n\n\n<p>POS tagging is a critical procedure to understand the meaning of a sentence and know the relationship between words.&nbsp;<\/p>\n\n\n\n<p>There are 35 POS tags in NLTK\u2019s pos_tag methods. The tags are shown in the table below<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Tag<\/strong><\/td><td><strong>Abbreviation&nbsp;<\/strong><\/td><td><strong>Words<\/strong><\/td><\/tr><tr><td><strong>Coordinating Conjunction&nbsp;<\/strong><\/td><td>CC<\/td><td>But, yet, although<\/td><\/tr><tr><td><strong>Determiner&nbsp;<\/strong><\/td><td>DT<\/td><td>A, An, The, This, My, Most<\/td><\/tr><tr><td><strong>Cardinal Digit<\/strong><\/td><td>CD<\/td><td>One, Two, Three, Forty<\/td><\/tr><tr><td><strong>Existential There<\/strong><\/td><td>EX<\/td><td>There<\/td><\/tr><tr><td><strong>Foreign Word<\/strong><\/td><td>FW<\/td><td>En masse, bona fide, et cetera, et al<\/td><\/tr><tr><td><strong>Subordinating Conjunction or Preposition<\/strong><\/td><td>IN<\/td><td>Over, Behind, Into<\/td><\/tr><tr><td><strong>Adjective<\/strong><\/td><td>JJ<\/td><td>Beautiful, Slow, New<\/td><\/tr><tr><td><strong>Adjective, Comparative&nbsp;<\/strong><\/td><td>JJR<\/td><td>Greater, Better, Older<\/td><\/tr><tr><td><strong>Adjective, Superlative<\/strong><\/td><td>JJS<\/td><td>Greatest, Best, Oldest<\/td><\/tr><tr><td><strong>List Marker<\/strong><\/td><td>LS<\/td><td>i, ii, iii, iv, \u2026&nbsp;<\/td><\/tr><tr><td><strong>Modal<\/strong><\/td><td>MD<\/td><td>Have, Can, Shall<\/td><\/tr><tr><td><strong>Noun, Singular<\/strong><\/td><td>NN<\/td><td>School, Table, Pen<\/td><\/tr><tr><td><strong>Noun, Plural<\/strong><\/td><td>NNS<\/td><td>Schools, Tables, Pens<\/td><\/tr><tr><td><strong>Proper Noun, Singular&nbsp;<\/strong><\/td><td>NNP<\/td><td>Monday, Chicago, Mark<\/td><\/tr><tr><td><strong>Proper Noun, Plural&nbsp;<\/strong><\/td><td>NNPS<\/td><td>Koreans, Universities, Americans&nbsp;<\/td><\/tr><tr><td><strong>Predeterminer&nbsp;<\/strong><\/td><td>PDT<\/td><td>Both, All, The<\/td><\/tr><tr><td><strong>Possessive Endings&nbsp;<\/strong><\/td><td>POS<\/td><td>David\u2019s, Dan\u2019s, Francis\u2019<\/td><\/tr><tr><td><strong>Personal Pronoun<\/strong><\/td><td>PRP<\/td><td>I, They, She<\/td><\/tr><tr><td><strong>Possessive Pronoun<\/strong><\/td><td>PRP$<\/td><td>His, Her, Their<\/td><\/tr><tr><td><strong>Adverb<\/strong><\/td><td>RB<\/td><td>Later, Very, Already<\/td><\/tr><tr><td><strong>Adverb, Comparative<\/strong><\/td><td>RBR<\/td><td>Better, More, Worse<\/td><\/tr><tr><td><strong>Adverb, Superlative<\/strong><\/td><td>RBS<\/td><td>Best, Most, Worst<\/td><\/tr><tr><td><strong>Particle&nbsp;<\/strong><\/td><td>RP<\/td><td>At, Across, About<\/td><\/tr><tr><td><strong>To<\/strong><\/td><td>TO<\/td><td>To<\/td><\/tr><tr><td><strong>Verb, Base Form<\/strong><\/td><td>VB<\/td><td>Jump, Eat, Play<\/td><\/tr><tr><td><strong>Verb, Past Tense<\/strong><\/td><td>VBD&nbsp;<\/td><td>Jumped, Ate, Played&nbsp;<\/td><\/tr><tr><td><strong>Verb, Present Participle&nbsp;<\/strong><\/td><td>VBG<\/td><td>Jumping, Eating, Playing<\/td><\/tr><tr><td><strong>Verb, Past Participle&nbsp;<\/strong><\/td><td>VBN<\/td><td>Taken, Given, Gone<\/td><\/tr><tr><td><strong>Verb, Present Tense but not Third Person Singular<\/strong><\/td><td>VBP<\/td><td>End, Go, Endure<\/td><\/tr><tr><td><strong>Verb, Present Tense, Third Person Singular&nbsp;<\/strong><\/td><td>VBZ<\/td><td>Jumps, Eats, Plays<\/td><\/tr><tr><td><strong>Wh \u2013 Determiner&nbsp;<\/strong><\/td><td>WDT<\/td><td>Which, What, Whichever<\/td><\/tr><tr><td><strong>Wh \u2013 Pronouns<\/strong><\/td><td>WP&nbsp;<\/td><td>Which, Whom, What<\/td><\/tr><tr><td><strong>Possessive Wh \u2013 Pronoun&nbsp;<\/strong><\/td><td>WP$<\/td><td>Whose<\/td><\/tr><tr><td><strong>Wh \u2013 Adverb&nbsp;<\/strong><\/td><td>WRB<\/td><td>Where, Why, When<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now that you know what the POS tags are, let&#8217;s take a code example to demonstrate the steps involved in POS tagging<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#import the nltk library<\/em>\n<strong>import<\/strong> <strong>nltk<\/strong>\n<em>#define a text<\/em>\nsentence = \"The man was excited after he was informed about his promotion at work\"\n<em>#tokenize the text<\/em>\ntokens = nltk.word_tokenize(sentence)\n\n<em>#Perform POS tagging<\/em>\nnltk.pos_tag(tokens)<\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">[('The', 'DT'),\n&nbsp;('man', 'NN'),\n&nbsp;('was', 'VBD'),\n&nbsp;('excited', 'VBN'),\n&nbsp;('after', 'IN'),\n&nbsp;('he', 'PRP'),\n&nbsp;('was', 'VBD'),\n&nbsp;('informed', 'VBN'),\n&nbsp;('about', 'IN'),\n&nbsp;('his', 'PRP$'),\n&nbsp;('promotion', 'NN'),\n&nbsp;('at', 'IN'),\n&nbsp;('work', 'NN')]<\/pre>\n\n\n\n<p>You can also check for more information about a tag using the help.upenn_tagset() method. Say I have forgotten what JJ means, I can find out by typing this line of code<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>nltk.help.upenn_tagset(\"JJ\")<\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>JJ: adjective or numeral, ordinal\n    third ill-mannered pre-war regrettable oiled calamitous first separable\n    ectoplasmic battery-powered participatory fourth still-to-be-named\n    multilingual multi-disciplinary ...<\/code><\/pre>\n\n\n\n<p>The code informs us that JJ means \u2018adjective\u2019 and went on to list some examples.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Chunking<\/strong><\/h2>\n\n\n\n<p>Chunking can be defined as the process of extracting phrases or chunks of texts from unstructured texts. There are situations where a single word cannot encapsulate the complete meaning of a text. In such cases, chunks can be used to extract meaningful insights. In other words, chunking allows more flexibility in the extraction process.&nbsp;&nbsp;<\/p>\n\n\n\n<p>Chunking works on top of POS tags such that it takes input from the POS tags and outputs the chunks. A common group of chunk tags is the noun phrase chunk (NP chunk). To create a noun phrase chunk, a chunk grammar is first defined using POS tags. This chunk grammar contains the rule with which the chunks would be created.&nbsp;<\/p>\n\n\n\n<p>The rule is created using regular expressions and the following syntax<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>? means match 0 or 1 repetitions \n* means match 0 or more repetitions\n+ means match 1 or more\n. means any character but not a new line<\/code><\/pre>\n\n\n\n<p>The POS tags and regular expressions are placed inside the &lt; &gt; placeholders. &lt;RB.?&gt; for instance would mean 0 or more of any adverbial tense. Let\u2019s take a coding example to drive home our point.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><em>#import the library<\/em>\n<strong>import<\/strong> <strong>nltk<\/strong>\n<em>#define the text<\/em>\nsentence = \"I told the children I was going to tell them a story. They were excited\"\n<em>#tokenize the text<\/em>\ntokens = nltk.word_tokenize(sentence)\n<em>#perform POS tagging<\/em>\ntags = nltk.pos_tag(tokens)\n<em>#define a chunk grammar named mychunk<\/em>\nchunk_grammar = \"\"\" mychunk: {&lt;NNS.?&gt;*&lt;PRP.?&gt;*&lt;VBD?&gt;}\"\"\"\n<em>#parse the grammar with regular expression parser<\/em>\nparser = nltk.RegexpParser(chunk_grammar)\n<em>#assign the chunk<\/em>\ntree = parser.parse(tags)\n<strong>print<\/strong>(tree)<\/pre>\n\n\n\n<p>After getting the POS tags, the chunk grammar defined would select plural nouns with not more than 1 repetition, followed by personal pronouns with not more than 1 repetition, followed by the past tense verb with not more than 1 repetition, anywhere in the text. A RegexpParser was used to parse the chunk grammar. The POS tags were parsed with the parse() method to print the chunk. See the out output.<\/p>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">(S\n&nbsp;&nbsp;(mychunk I\/PRP told\/VBD)\n&nbsp;&nbsp;the\/DT\n&nbsp;&nbsp;(mychunk children\/NNS I\/PRP was\/VBD)\n&nbsp;&nbsp;going\/VBG\n&nbsp;&nbsp;to\/TO\n&nbsp;&nbsp;(mychunk tell\/VB)\n&nbsp;&nbsp;them\/PRP\n&nbsp;&nbsp;a\/DT\n&nbsp;&nbsp;story\/NN\n&nbsp;&nbsp;.\/.\n&nbsp;&nbsp;(mychunk They\/PRP were\/VBD)\n&nbsp;&nbsp;excited\/VBN)<\/pre>\n\n\n\n<p>As seen, \u201cI told\u201d, \u201cChildren I was\u201d, \u201cTell\u201d and \u201cThey were\u201d were the selected chunk.&nbsp;To visualize the results better, you can use draw() method<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>tree.draw()<\/code><\/pre>\n\n\n\n<p><strong>Output:&nbsp;<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">(S\n&nbsp;&nbsp;(mychunk I\/PRP told\/VBD)\n&nbsp;&nbsp;the\/DT\n&nbsp;&nbsp;(mychunk children\/NNS I\/PRP was\/VBD)\n&nbsp;&nbsp;going\/VBG\n&nbsp;&nbsp;to\/TO\n&nbsp;&nbsp;(mychunk tell\/VB)\n&nbsp;&nbsp;them\/PRP\n&nbsp;&nbsp;a\/DT\n&nbsp;&nbsp;story\/NN\n&nbsp;&nbsp;.\/.\n&nbsp;&nbsp;(mychunk They\/PRP were\/VBD)\n&nbsp;&nbsp;excited\/VBN)\n(mychunk I\/PRP told\/VBD)\n(mychunk children\/NNS I\/PRP was\/VBD)\n(mychunk tell\/VB)\n(mychunk They\/PRP were\/VBD)<\/pre>\n\n\n\n<p><img decoding=\"async\" width=\"624\" height=\"63\" src=\"https:\/\/lh4.googleusercontent.com\/T5-P5BMKS2vxA1o8ZTtzW55VVGs9taC499H-ZsuqoC71g3j3-pb6xLOtlqucXY3fuyfMRc8vKCUhl_fOYl7oGjuX5x_cIpJ2JbmpEU42VpYXogxXK3X_5k2ll30gSb0cJap7eoM\" alt=\"\" title=\"\"><\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why is Chunking Important in Natural Language Processing (NLP)?<\/strong><\/h2>\n\n\n\n<p>Chunking is a crucial technique in <strong>Natural Language Processing (NLP)<\/strong> that allows for <strong>structured information extraction<\/strong> from text data. While <strong>Part of Speech (POS) tagging<\/strong> classifies words into categories such as nouns, verbs, and adjectives, <strong>chunking goes a step further<\/strong> by grouping these tagged words into <strong>meaningful phrases<\/strong> (also called <strong>chunks<\/strong>).<\/p>\n\n\n\n<p>This process is particularly useful in <strong>entity detection, information retrieval, and text analysis<\/strong>, as it helps extract specific patterns from text without having to analyze the entire dataset. Let\u2019s explore the importance of chunking, how it works, and why it is widely used in NLP applications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>&#x1f539; The Role of Chunking in NLP<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1. Extracting Meaningful Phrases from Text<\/strong><\/h3>\n\n\n\n<p>Chunking helps identify <strong>groups of words that form coherent phrases<\/strong>, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Noun Phrases (NP):<\/strong> &#8220;The big brown fox&#8221;<\/li>\n\n\n\n<li><strong>Verb Phrases (VP):<\/strong> &#8220;is running quickly&#8221;<\/li>\n\n\n\n<li><strong>Prepositional Phrases (PP):<\/strong> &#8220;on the hill&#8221;<\/li>\n<\/ul>\n\n\n\n<p>Instead of analyzing individual words, <strong>chunking allows us to extract information at the phrase level<\/strong>, which provides better context and meaning.<\/p>\n\n\n\n<p>For example, consider the sentence:<br><em>&#8220;The quick brown fox jumps over the lazy dog.&#8221;<\/em><\/p>\n\n\n\n<p><strong>POS Tagging Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bash<code>&#91;('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), \n ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]\n<\/code><\/code><\/pre>\n\n\n\n<p>With <strong>chunking<\/strong>, we can group words into noun phrases:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">cssCopyEdit<code>[('The quick brown fox', 'NP'), ('jumps', 'VP'), ('over the lazy dog', 'PP')]\n<\/code><\/pre>\n\n\n\n<p>This makes <strong>text analysis more structured and meaningful<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2. Chunking for Entity Detection<\/strong><\/h3>\n\n\n\n<p>Chunking is particularly useful for <strong>Named Entity Recognition (NER)<\/strong>, where we need to <strong>extract specific entities such as names, dates, locations, or product details<\/strong> from large text datasets.<\/p>\n\n\n\n<p>For example, if you have a large set of customer transactions and you only need to extract:<br>&#x2705; <strong>Customer Name<\/strong><br>&#x2705; <strong>Item Purchased<\/strong><br>&#x2705; <strong>Price<\/strong><br>&#x2705; <strong>Date of Purchase<\/strong><\/p>\n\n\n\n<p>You can define a <strong>chunk grammar<\/strong> to detect patterns and extract this information <strong>without having to analyze the entire text<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Example Use Case: Extracting Purchase Information<\/strong><\/h4>\n\n\n\n<p>Imagine we have the sentence:<br><em>&#8220;John Doe bought a Samsung Galaxy S21 for $999 on March 5, 2023.&#8221;<\/em><\/p>\n\n\n\n<p>Using <strong>chunking rules<\/strong>, we can extract key details:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>bash<code>&#91;('John Doe', 'CUSTOMER_NAME'), ('Samsung Galaxy S21', 'PRODUCT_NAME'),\n ('$999', 'PRICE'), ('March 5, 2023', 'DATE')]\n<\/code><\/code><\/pre>\n\n\n\n<p>This structured format makes it easier to process transactions, analyze consumer behavior, and automate <strong>data extraction tasks<\/strong> in business applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3. Faster Information Extraction<\/strong><\/h3>\n\n\n\n<p>One of the major benefits of chunking is its ability to <strong>rapidly filter and extract words based on defined grammar rules<\/strong>.<\/p>\n\n\n\n<p>For instance, when processing a <strong>large volume of customer reviews, news articles, or business documents<\/strong>, chunking allows us to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Extract key insights without scanning the entire text<\/strong><\/li>\n\n\n\n<li><strong>Group related words together for better context<\/strong><\/li>\n\n\n\n<li><strong>Perform filtering and summarization more efficiently<\/strong><\/li>\n<\/ul>\n\n\n\n<p><strong>Example:<\/strong> If we have thousands of product reviews, chunking can help extract:<br>&#x2705; <strong>Customer names<\/strong><br>&#x2705; <strong>Product attributes<\/strong><br>&#x2705; <strong>Sentiments (positive\/negative opinions)<\/strong><\/p>\n\n\n\n<p>This significantly improves <strong>data processing speed<\/strong> in NLP applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Chunking vs. POS Tagging: Why Both Are Needed<\/strong><\/h3>\n\n\n\n<p>While <strong>POS tagging helps identify the grammatical role of individual words<\/strong>, it does not provide <strong>structured phrase-level understanding<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\"><strong>Comparison of POS Tagging and Chunking:<\/strong><\/h4>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Feature<\/th><th>POS Tagging<\/th><th>Chunking<\/th><\/tr><\/thead><tbody><tr><td><strong>Focus<\/strong><\/td><td>Individual words<\/td><td>Groups of words (phrases)<\/td><\/tr><tr><td><strong>Purpose<\/strong><\/td><td>Identifies grammatical category<\/td><td>Identifies meaningful phrases<\/td><\/tr><tr><td><strong>Example Output<\/strong><\/td><td>(&#8216;fox&#8217;, &#8216;NN&#8217;), (&#8216;jumps&#8217;, &#8216;VBZ&#8217;)<\/td><td>(&#8216;The quick brown fox&#8217;, &#8216;NP&#8217;)<\/td><\/tr><tr><td><strong>Use Cases<\/strong><\/td><td>Syntax analysis, spell check<\/td><td>Entity recognition, information extraction<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Thus, <strong>using both POS tagging and chunking together<\/strong> ensures <strong>better language understanding<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>&#x1f539; Practical Applications of Chunking<\/strong><\/h2>\n\n\n\n<p>Chunking plays a key role in many NLP applications, including:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>&#x1f4cc; 1. Named Entity Recognition (NER)<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Extracts <strong>names, locations, organizations, dates, and product names<\/strong> from text.<\/li>\n\n\n\n<li>Used in <strong>customer service chatbots, sentiment analysis, and search engines<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>&#x1f4cc; 2. Information Retrieval<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Helps <strong>filter relevant content<\/strong> in <strong>news analysis, financial reports, and legal documents<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>&#x1f4cc; 3. Question Answering Systems<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Enhances <strong>AI assistants like Siri, Alexa, and ChatGPT<\/strong> to understand user queries better.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>&#x1f4cc; 4. Resume Screening in HR<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Automates <strong>extraction of candidate details such as skills, education, and experience<\/strong>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>&#x1f4cc; 5. Customer Sentiment Analysis<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Identifies <strong>positive and negative sentiments<\/strong> in product reviews and social media posts.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>&#x1f539; Example of Chunking in Python using NLTK<\/strong><\/h2>\n\n\n\n<p>Let\u2019s see <strong>how chunking works using the Natural Language Toolkit (NLTK)<\/strong> in Python.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>python<code>import nltk\n\n# Sample sentence\nsentence = \"John Doe bought a Samsung Galaxy S21 for $999 on March 5, 2023.\"\n\n# Tokenizing words and assigning POS tags\nwords = nltk.word_tokenize(sentence)\npos_tags = nltk.pos_tag(words)\n\n# Defining a simple chunk grammar for extracting noun phrases\nchunk_grammar = r\"NP: {&lt;DT>?&lt;JJ>*&lt;NN>+}\"  # NP = Noun Phrase\n\n# Creating a chunk parser\nchunk_parser = nltk.RegexpParser(chunk_grammar)\nchunked = chunk_parser.parse(pos_tags)\n\n# Displaying chunked structure\nprint(chunked)\nchunked.draw()  # Visualize the chunks\n<\/code><\/code><\/pre>\n\n\n\n<p><strong>Output:<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>scss<code>(NP John\/NN)\n(VP bought\/VBD)\n(NP a\/DT Samsung\/NN Galaxy\/NN S21\/NN)\n(PP for\/IN)\n(NP $999\/CD)\n(PP on\/IN)\n(NP March\/NN 5\/CD 2023\/CD)\n<\/code><\/code><\/pre>\n\n\n\n<p>This output shows <strong>noun phrases (NP), verb phrases (VP), and prepositional phrases (PP)<\/strong> extracted using chunking.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why Chunking is Essential<\/strong><\/h2>\n\n\n\n<p>Chunking is a <strong>powerful NLP technique<\/strong> that enhances <strong>POS tagging<\/strong> by grouping words into <strong>meaningful phrases<\/strong>.<\/p>\n\n\n\n<p>&#x2705; It is essential for <strong>entity detection, information extraction, and text analysis<\/strong>.<br>&#x2705; It allows <strong>rapid data processing without analyzing the entire text<\/strong>.<br>&#x2705; It provides <strong>better phrase-level understanding<\/strong> than POS tagging alone.<br>&#x2705; It is widely used in <strong>search engines, AI assistants, financial analysis, and e-commerce applications<\/strong>.<\/p>\n\n\n\n<p>By <strong>combining POS tagging with chunking<\/strong>, NLP models can <strong>extract structured data<\/strong> from text more effectively, leading to <strong>better automation, search relevance, and business intelligence<\/strong>. &#x1f680;<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is Part of Speech (POS) Tagging? Introduction to Parts of Speech Part of Speech (POS) tagging is a fundamental concept in linguistics and Natural Language Processing (NLP) that classifies words based on their grammatical roles in a sentence. In the English language, words are categorized into eight main parts of speech, each serving a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4622,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[498],"tags":[1289,1290,1287,1288],"class_list":["post-4571","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-tutorials","tag-chunking","tag-ntlk","tag-parts-of-speech","tag-tagging"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4571","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=4571"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4571\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/4622"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=4571"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=4571"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=4571"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}