Skip to main content

Documentation Index

Fetch the complete documentation index at: https://ai.tharung.in/llms.txt

Use this file to discover all available pages before exploring further.

What is Pandas?

Pandas is a Python library used for:
  • Data analysis
  • Data cleaning
  • Handling tables and CSV files
  • Working with structured data easily
Main data structures:
  • Series → 1D data
  • DataFrame → 2D tabular data

Installation

Install Pandas

pip install pandas

Importing Pandas

import pandas as pd
pd is an alias used for shorter syntax.

Series in Pandas

pd.Series()

Creates a one-dimensional labeled array.
s = pd.Series([10,20,30,40,50])

s

Output

0    10
1    20
2    30
3    40
4    50
dtype: int64

Explanation

  • Left side → index
  • Right side → values
  • dtype → datatype of values

Series Attributes

.dtype

Returns datatype of Series values.
s.dtype

Output

dtype('int64')

Explanation

All values are integers, so datatype is int64.

.values

Returns all values as NumPy array.
s.values

Output

array([10, 20, 30, 40, 50])

Explanation

Converts Series values into NumPy array format.

.index

Returns indexes of the Series.
s.index

Output

RangeIndex(start=0, stop=5, step=1)

Explanation

Indexes start from 0 and end at 4.

.name

Assigns a name to the Series.
s.name = "number"

s

Output

0    10
1    20
2    30
3    40
4    50
Name: number, dtype: int64

Explanation

The Series now has label "number".

Indexing in Pandas Series

Access Single Value

s[0]

Output

10

Explanation

Gets value present at index 0.

Slicing

s[0:2]

Output

0    10
1    20
dtype: int64

Explanation

Returns values from index 0 to 1.
End index is excluded.

iloc → Position Based Indexing

Uses numeric positions.

Single Position

s.iloc[3]

Output

40

Explanation

Returns value at position 3.

Multiple Positions

s.iloc[[1,2,3]]

Output

1    20
2    30
3    40
dtype: int64

Explanation

Fetches multiple positions together.

Custom Index

index = ["apple","banana","grapes","orange","guava"]

s.index = index

s

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
Name: number, dtype: int64

Explanation

Numeric indexes replaced with custom labels.

Label Based Access

s['apple']

Output

10

Explanation

Returns value associated with label "apple".

loc → Label Based Indexing

Includes both start and end labels.
s['banana':'orange']

Output

banana    20
grapes    30
orange    40
Name: calories, dtype: int64

Explanation

Returns values from "banana" to "orange" inclusive.

Creating Series from Dictionary

fruit_protein = {
    "apple":10,
    "banana":20,
    "grapes":30,
    "orange":40,
    "guava":50
}

s2 = pd.Series(fruit_protein)

s2

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
dtype: int64

Explanation

Dictionary keys become indexes and values become Series values.

Conditional Indexing

Filtering Values

s2[s2 > 20]

Output

grapes    30
orange    40
guava     50
dtype: int64

Explanation

Returns only values greater than 20.

Logical Operators

AND &

s2[(s2>30) & (s2<50)]

Output

orange    40
dtype: int64

Explanation

Both conditions must be true.

OR |

s2[(s2>30) | (s2<50)]

Output

apple     10
banana    20
grapes    30
orange    40
guava     50
dtype: int64

Explanation

At least one condition should be true.

NOT ~

s2[~(s2>10)]

Output

apple    10
dtype: int64

Explanation

Reverses the condition.

Modifying Series

s2['apple'] = 100

s2

Output

apple     100
banana     20
grapes     30
orange     40
guava      50
dtype: int64

Explanation

Updates value of "apple".

DataFrame in Pandas

pd.DataFrame()

Creates table-like data.
df = pd.DataFrame(data)

df

Output

   Name  Age  Salary Department
0  John   25   50000         IT
1  Jane   30   60000         HR
2  Jack   35   70000    Finance
3  Jill   40   80000  Marketing

Explanation

Rows and columns together form a DataFrame.

head()

Returns first rows.
df.head(2)

Output

   Name  Age  Salary Department
0  John   25   50000         IT
1  Jane   30   60000         HR

Explanation

Useful for previewing dataset.

tail()

Returns last rows.
df.tail()

Explanation

Shows ending rows of dataset.

iloc in DataFrame

df.iloc[1:3]

Output

   Name  Age  Salary Department
1  Jane   30   60000         HR
2  Jack   35   70000    Finance

Explanation

Selects rows using positions.

loc in DataFrame

df.loc[1:3, ["Name","Age"]]

Output

   Name  Age
1  Jane   30
2  Jack   35
3  Jill   40

Explanation

Selects rows and specific columns using labels.

drop()

Removes rows or columns.
df.drop("Age", axis=1)

Explanation

  • axis=1 → column
  • axis=0 → row

.shape

Returns dataset dimensions.
df.shape

Output

(4, 4)

Explanation

4 rows and 4 columns.

info()

Shows dataset summary.
df.info()

Explanation

Displays:
  • columns
  • datatype
  • null values
  • memory usage

describe()

Shows statistical summary.
df.describe()

Explanation

Provides:
  • mean
  • std
  • min
  • max
  • quartiles

Broadcasting

Applies operation to entire column.
df["Salary"] = df["Salary"] + 10000

Explanation

Adds 10000 to every salary value.

rename()

Renames column names.
df.rename(columns={"Name":"Employee Name"}, inplace=True)

Explanation

Changes "Name" column to "Employee Name".

unique()

Returns unique values.
df["Department"].unique()

Output

['IT' 'HR' 'Finance' 'Marketing']

Explanation

Removes duplicates and shows distinct values.

value_counts()

Counts occurrences of values.
df["Department"].value_counts()

Explanation

Counts frequency of each department.

Missing Values

isnull()

Checks missing values.
df1.isnull()

isnull().sum()

Counts missing values column-wise.
df1.isnull().sum()

dropna()

Removes missing values.
df1.dropna()

Explanation

Deletes rows containing null values.

fillna()

Fills missing values.
df1.fillna(0)

Explanation

Replaces null values with 0.

Fill Missing Values with Mean

df1['Age'].fillna(df1['Age'].mean())

Explanation

Uses average age to replace null.

Forward Fill

df1['Age'].fillna(method='ffill')

Explanation

Uses previous row value.

Backward Fill

df1['Age'].fillna(method='bfill')

Explanation

Uses next row value.

replace()

Replaces specific values.
df1["Name"].replace('Jack','Tharun')

Explanation

Changes "Jack" to "Tharun".

Duplicates

duplicated()

Finds duplicate rows.
df[df.duplicated()]

Lambda Functions

apply()

Applies function to every value.
df["Age"] = df["Age"].apply(lambda x: x/2)

Explanation

Divides every age by 2.

Concatenation

concat()

Combines DataFrames.
pd.concat([df,df_department])

Explanation

Stacks DataFrames vertically.

Merge / Join

merge()

Joins DataFrames using common column.
pd.merge(df,df_department,on="Department")

Explanation

Combines matching department rows together.

Reading CSV Files

read_csv()

Reads CSV files into DataFrame.
data = pd.read_csv("/content/house-prices.csv")

Dataset Summary

data.info()

Shows column information.
data.info()

Statistical Summary

data.describe()

Shows statistical details.
data.describe()