What is Pandas?

Pandas is a Python library used for:

Data analysis
Data cleaning
Handling tables and CSV files
Working with structured data easily

Main data structures:

Series → 1D data
DataFrame → 2D tabular data

Installation

Install Pandas

pip install pandas

Importing Pandas

import pandas as pd

pd is an alias used for shorter syntax.

Series in Pandas

`pd.Series()`

Creates a one-dimensional labeled array.

s = pd.Series([10,20,30,40,50])

s

Output

  10
  20
  30
  40
  50
dtype: int64

Explanation

Left side → index
Right side → values
dtype → datatype of values

Series Attributes

`.dtype`

Returns datatype of Series values.

s.dtype

Output

dtype('int64')

Explanation

All values are integers, so datatype is int64.

`.values`

Returns all values as NumPy array.

s.values

Output

array([10, 20, 30, 40, 50])

Explanation

Converts Series values into NumPy array format.

`.index`

Returns indexes of the Series.

s.index

Output

RangeIndex(start=0, stop=5, step=1)

Explanation

Indexes start from 0 and end at 4.

`.name`

Assigns a name to the Series.

s.name = "number"

s

Output

  10
  20
  30
  40
  50
Name: number, dtype: int64

Explanation

The Series now has label "number".

Indexing in Pandas Series

Access Single Value

s[0]

Output

Explanation

Gets value present at index 0.

Slicing

s[0:2]

Output

0    10
1    20
dtype: int64

Explanation

Returns values from index 0 to 1.
End index is excluded.

`iloc` → Position Based Indexing

Uses numeric positions.

Single Position

s.iloc[3]

Output

Explanation

Returns value at position 3.

Multiple Positions

s.iloc[[1,2,3]]

Output

  20
  30
  40
dtype: int64

Explanation

Fetches multiple positions together.

Custom Index

index = ["apple","banana","grapes","orange","guava"]

s.index = index

s

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
Name: number, dtype: int64

Explanation

Numeric indexes replaced with custom labels.

Label Based Access

s['apple']

Output

Explanation

Returns value associated with label "apple".

`loc` → Label Based Indexing

Includes both start and end labels.

s['banana':'orange']

Output

banana    20
grapes    30
orange    40
Name: calories, dtype: int64

Explanation

Returns values from "banana" to "orange" inclusive.

Creating Series from Dictionary

fruit_protein = {
    "apple":10,
    "banana":20,
    "grapes":30,
    "orange":40,
    "guava":50
}

s2 = pd.Series(fruit_protein)

s2

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
dtype: int64

Explanation

Dictionary keys become indexes and values become Series values.

Conditional Indexing

Filtering Values

s2[s2 > 20]

Output

grapes    30
orange    40
guava     50
dtype: int64

Explanation

Returns only values greater than 20.

Logical Operators

AND `&`

s2[(s2>30) & (s2<50)]

Output

orange    40
dtype: int64

Explanation

Both conditions must be true.

OR `|`

s2[(s2>30) | (s2<50)]

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
dtype: int64

Explanation

At least one condition should be true.

NOT `~`

s2[~(s2>10)]

Output

apple    10
dtype: int64

Explanation

Reverses the condition.

Modifying Series

s2['apple'] = 100

s2

Output

apple     100
banana     20
grapes     30
orange     40
guava      50
dtype: int64

Explanation

Updates value of "apple".

DataFrame in Pandas

`pd.DataFrame()`

Creates table-like data.

df = pd.DataFrame(data)

df

Output

   Name  Age  Salary Department
John   25   50000         IT
Jane   30   60000         HR
Jack   35   70000    Finance
Jill   40   80000  Marketing

Explanation

Rows and columns together form a DataFrame.

`head()`

Returns first rows.

df.head(2)

Output

   Name  Age  Salary Department
0  John   25   50000         IT
1  Jane   30   60000         HR

Explanation

Useful for previewing dataset.

`tail()`

Returns last rows.

df.tail()

Explanation

Shows ending rows of dataset.

`iloc` in DataFrame

df.iloc[1:3]

Output

   Name  Age  Salary Department
1  Jane   30   60000         HR
2  Jack   35   70000    Finance

Explanation

Selects rows using positions.

`loc` in DataFrame

df.loc[1:3, ["Name","Age"]]

Output

   Name  Age
Jane   30
Jack   35
Jill   40

Explanation

Selects rows and specific columns using labels.

`drop()`

Removes rows or columns.

df.drop("Age", axis=1)

Explanation

axis=1 → column
axis=0 → row

`.shape`

Returns dataset dimensions.

df.shape

Output

(4, 4)

Explanation

4 rows and 4 columns.

`info()`

Shows dataset summary.

df.info()

Explanation

Displays:

columns
datatype
null values
memory usage

`describe()`

Shows statistical summary.

df.describe()

Explanation

Provides:

mean
std
min
max
quartiles

Broadcasting

Applies operation to entire column.

df["Salary"] = df["Salary"] + 10000

Explanation

Adds 10000 to every salary value.

`rename()`

Renames column names.

df.rename(columns={"Name":"Employee Name"}, inplace=True)

Explanation

Changes "Name" column to "Employee Name".

`unique()`

Returns unique values.

df["Department"].unique()

Output

['IT' 'HR' 'Finance' 'Marketing']

Explanation

Removes duplicates and shows distinct values.

`value_counts()`

Counts occurrences of values.

df["Department"].value_counts()

Explanation

Counts frequency of each department.

Missing Values

`isnull()`

Checks missing values.

df1.isnull()

`isnull().sum()`

Counts missing values column-wise.

df1.isnull().sum()

`dropna()`

Removes missing values.

df1.dropna()

Explanation

Deletes rows containing null values.

`fillna()`

Fills missing values.

df1.fillna(0)

Explanation

Replaces null values with 0.

Fill Missing Values with Mean

df1['Age'].fillna(df1['Age'].mean())

Explanation

Uses average age to replace null.

Forward Fill

df1['Age'].fillna(method='ffill')

Explanation

Uses previous row value.

Backward Fill

df1['Age'].fillna(method='bfill')

Explanation

Uses next row value.

`replace()`

Replaces specific values.

df1["Name"].replace('Jack','Tharun')

Explanation

Changes "Jack" to "Tharun".

Duplicates

`duplicated()`

Finds duplicate rows.

df[df.duplicated()]

Lambda Functions

`apply()`

Applies function to every value.

df["Age"] = df["Age"].apply(lambda x: x/2)

Explanation

Divides every age by 2.

Concatenation

`concat()`

Combines DataFrames.

pd.concat([df,df_department])

Explanation

Stacks DataFrames vertically.

Merge / Join

`merge()`

Joins DataFrames using common column.

pd.merge(df,df_department,on="Department")

Explanation

Combines matching department rows together.

Reading CSV Files

`read_csv()`

Reads CSV files into DataFrame.

data = pd.read_csv("/content/house-prices.csv")

Dataset Summary

`data.info()`

Shows column information.

data.info()

Statistical Summary

`data.describe()`

Shows statistical details.

data.describe()

​What is Pandas?

​Installation

​Install Pandas

​Importing Pandas

​Series in Pandas

​pd.Series()

​Output

​Explanation

​Series Attributes

​.dtype

​Output

​Explanation

​.values

​Output

​Explanation

​.index

​Output

​Explanation

​.name

​Output

​Explanation

​Indexing in Pandas Series

​Access Single Value

​Output

​Explanation

​Slicing

​Output

​Explanation

​iloc → Position Based Indexing

​Single Position

​Output

​Explanation

​Multiple Positions

​Output

​Explanation

​Custom Index

​Output

​Explanation

​Label Based Access

​Output

​Explanation

​loc → Label Based Indexing

​Output

​Explanation

​Creating Series from Dictionary

​Output

​Explanation

​Conditional Indexing

​Filtering Values

​Output

​Explanation

​Logical Operators

​AND &

​Output

​Explanation

​OR |

​Output

​Explanation

​NOT ~

​Output

​Explanation

​Modifying Series

​Output

​Explanation

​DataFrame in Pandas

​pd.DataFrame()

​Output

​Explanation

​head()

​Output

​Explanation

​tail()

​Explanation

​iloc in DataFrame

​Output

​Explanation

​loc in DataFrame

​Output

​Explanation

​drop()

What is Pandas?

Installation

Install Pandas

Importing Pandas

Series in Pandas

`pd.Series()`

Output

Explanation

Series Attributes

`.dtype`

Output

Explanation

`.values`

Output

Explanation

`.index`

Output

Explanation

`.name`

Output

Explanation

Indexing in Pandas Series

Access Single Value

Output

Explanation

Slicing

Output

Explanation

`iloc` → Position Based Indexing

Single Position

Output

Explanation

Multiple Positions

Output

Explanation

Custom Index

Output

Explanation

Label Based Access

Output

Explanation

`loc` → Label Based Indexing

Output

Explanation

Creating Series from Dictionary

Output

Explanation

Conditional Indexing

Filtering Values

Output

Explanation

Logical Operators

AND `&`

Output

Explanation

OR `|`

Output

Explanation

NOT `~`

Output

Explanation

Modifying Series

Output

Explanation

DataFrame in Pandas

`pd.DataFrame()`

Output

Explanation

`head()`

Output

Explanation

`tail()`

Explanation

`iloc` in DataFrame

Output

Explanation

`loc` in DataFrame

Output

Explanation

`drop()`