{"id":10633,"date":"2022-02-15T15:18:45","date_gmt":"2022-02-15T09:48:45","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=10633"},"modified":"2022-02-17T19:18:17","modified_gmt":"2022-02-17T13:48:17","slug":"using-pandas-in-python","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/using-pandas-in-python\/","title":{"rendered":"Using Pandas in Python"},"content":{"rendered":"\n<p>Python panda is considered a python library used for working with data sets. This functions for analyzing, cleaning, exploring, and manipulating data. Its name panda is taken as both \u201cpanel data\u201d and \u201cPython data analysis\u201d and is created by Wes McKinney in 2008.<\/p>\n\n\n\n<p>Pandas used to get to know big data and also make conclusions that are based on statistical theories. Pandas can clean messy data sets and them readable and relevant. This relevant data is very important in data science. Pandas will be able to delete rows that are not relevant or may contain wrong values like empty or NULL values. This is called cleaning data. Pandas are considered an open-source python library that is utilized for high-performance data manipulation and data analysis by using its powerful data structures. Python with pandas will be in use in a variety of academic and commercial domains, including finance, economics, statistics, advertising, web analytics. By pandas, we can accomplish five typical steps in the processing and analysis of data, apart from the origin of data load, organize, manipulate, model, and analyze the data.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Key features:<\/strong><\/h2>\n\n\n\n<ol class=\"wp-block-list\"><li>They are fast and efficient DataFrame object with default and customised indexing.<\/li><li>There are tools for loading data into in-memory data objects from various different formats.<\/li><li>The data alignment and also combined handling of missing data.<\/li><li>Reshaping and pivoting of data sets.<\/li><li>Label based slicing, indexing and also subsetting of large data sets.<\/li><li>The columns from the data structures can be deleted or may be inserted.<\/li><li>The group of data for aggregation and transformation.<\/li><li>The high performance joining of data<\/li><li>Time series are functionality.<\/li><\/ol>\n\n\n\n<p>The pandas will consist of three data structures<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Series<\/li><li>DataFrame<\/li><\/ol>\n\n\n\n<p>These data structures will be built on top of a Numpy, array, making them fast and efficient<\/p>\n\n\n\n<p>The dimension and description<\/p>\n\n\n\n<p>There is a better way to think of these data structures that are a higher-dimensional data structure that is the container of their lower-dimensional data structure. Consider an example, DataFrame will be a container of series, the panel is a container of DataFrame.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Data structure<\/strong><\/td><td><strong>Dimension<\/strong><\/td><td><strong>Description<\/strong><\/td><\/tr><tr><td>Series<\/td><td>1<\/td><td>1D labeled homogenous array, size-immutable.<\/td><\/tr><tr><td>Data Frame<\/td><td>1<\/td><td>General 2D labeled, which is a size-mutable tabular structure with potential heterogeneously typed columns.<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Here the dataframe will be widely used and it is the most important data structure.<\/p>\n\n\n\n<p>The Series is known as a one-dimensional array-like structure with the same data. Considering the series or maybe collection of integers 10, 23, 56 can be written as<\/p>\n\n\n\n<p>10, 23, 56, 17, 52, 61, 73, 90, 26, 72<\/p>\n\n\n\n<p>The main points of the series are<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Homogenous data<\/li><li>size immutable<\/li><li>value of data mutable<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>DataFrame<\/strong><\/h2>\n\n\n\n<p>DataFrame will be of the two-dimensional array with heterogeneous data. For example<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>Name<\/td><td>Age<\/td><td>Gender<\/td><td>Rating<\/td><\/tr><tr><td>Raghav<\/td><td>32<\/td><td>Male<\/td><td>3.45<\/td><\/tr><tr><td>Mia<\/td><td>28<\/td><td>Female<\/td><td>4.6<\/td><\/tr><tr><td>Rahul<\/td><td>45<\/td><td>Male<\/td><td>3.9<\/td><\/tr><tr><td>Meenal<\/td><td>38<\/td><td>Female<\/td><td>2.78<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Here the table represents the data of the data sales team of an organization with all overall performance ratings. This data will be represented in rows and columns. Each column represents attributes and each row represents an attribute and each row represents a person.<\/p>\n\n\n\n<p>Main points of DataFrame:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Heterogenous data<\/li><li>Size mutable<\/li><li>Data Mutable<\/li><\/ol>\n\n\n\n<p>Working with pandas<\/p>\n\n\n\n<p>Loading and saving the data with pandas<\/p>\n\n\n\n<p>Whenever we want to use the pandas for data analysis, we will be usually use it in one of the three different ways<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>By converting a python\u2019s list, dictionary or Numpy array to pandas data frame.<\/li><li>By open a local file using pandas,u sually a CSV file but could also&nbsp; delimited text file and excel etc.<\/li><li>By opening a remote file or database like CSV or may be JSON on website through a URL or read from SQL table\/database<\/li><\/ol>\n\n\n\n<p>We have a different command to each of these options but when we open a file it will look like<\/p>\n\n\n\n<p>pd.read_filetype()<\/p>\n\n\n\n<p>There are different types of pandas that can work with so we can replace \u201cfiletype\u201d with the actual, well, filetype. We would give the path, filename, etc inside the parenthesis.<\/p>\n\n\n\n<p><strong>Questions<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>What is meant by Python Pandas? Explain its features?<\/li><li>What are the data structures of Python pandas?<\/li><\/ol>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Python panda is considered a python library used for working with data sets. This functions for analyzing, cleaning, exploring, and manipulating data. Its name panda is taken as both \u201cpanel data\u201d and \u201cPython data analysis\u201d and is created by Wes McKinney in 2008. Pandas used to get to know big data and also make conclusions [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":10675,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[342],"tags":[],"class_list":["post-10633","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-python-tutorials"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/10633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=10633"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/10633\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/10675"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=10633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=10633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=10633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}