🐼 Top 5 Essential Pandas DataFrame Operations for Data Analysis
Whether you’re new to data analysis or brushing up your Python skills, these five essential Pandas DataFrame operations will dramatically improve how you clean, filter, and manipulate data. In this post, we’ll walk through each operation with code examples and actual output so you can follow along step by step.
🎥 This tutorial is also available as a video — perfect for visual learners!
🎬 Intro
“Hey everyone! Welcome back to the channel. In today’s video, we’re diving into the top 5 essential Pandas DataFrame operations that every data analyst and Python enthusiast should know. These operations are crucial for data manipulation tasks like filtering, grouping, merging, and more. Let’s get started!”
1️⃣ Filtering Rows
🎯 Goal: Select specific rows based on conditions
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Department': ['HR', 'Engineering', 'Engineering', 'Marketing', 'HR'],
'Salary': [50000, 70000, 65000, 60000, 52000]
}
df = pd.DataFrame(data)
# Filter rows
filtered_df = df[(df['Department'] == 'Engineering') & (df['Salary'] > 60000)]
print(filtered_df)
Output:
Name Department Salary
1 Bob Engineering 70000
2 Charlie Engineering 65000
✅ We’ve filtered the DataFrame to only include Engineering employees earning more than $60,000.
2️⃣ Grouping and Aggregation
🎯 Goal: Summarize data by category
# Group by Department and calculate average salary
avg_salary = df.groupby('Department')['Salary'].mean()
print(avg_salary)
Output:
Department
Engineering 67500.0
HR 51000.0
Marketing 60000.0
Name: Salary, dtype: float64
✅ This shows the average salary in each department.
3️⃣ Merging DataFrames
🎯 Goal: Combine related data into one DataFrame
# Performance DataFrame
performance_data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Performance Score': [88, 92, 85, 90, 87]
}
performance_df = pd.DataFrame(performance_data)
# Merge on 'Name'
merged_df = pd.merge(df, performance_df, on='Name')
print(merged_df)
Output:
Name Department Salary Performance Score
0 Alice HR 50000 88
1 Bob Engineering 70000 92
2 Charlie Engineering 65000 85
3 David Marketing 60000 90
4 Eve HR 52000 87
✅ Now we have salary and performance score combined into one dataset.
4️⃣ Sorting Data
🎯 Goal: Reorder rows based on a column value
# Sort by Performance Score descending
sorted_df = merged_df.sort_values(by='Performance Score', ascending=False)
print(sorted_df)
Output:
Name Department Salary Performance Score
1 Bob Engineering 70000 92
3 David Marketing 60000 90
0 Alice HR 50000 88
4 Eve HR 52000 87
2 Charlie Engineering 65000 85
✅ The DataFrame is now sorted from highest to lowest performance.
5️⃣ Pivot Tables
🎯 Goal: Summarize large datasets easily
# Create pivot table: average salary by department
pivot_table = pd.pivot_table(df, values='Salary', index='Department', aggfunc='mean')
print(pivot_table)
Output:
Salary
Department
Engineering 67500.0
HR 51000.0
Marketing 60000.0
✅ Pivot tables make it easy to analyze data with just one line of code.
🎬 Outro
“And that’s a wrap on the top 5 Pandas DataFrame operations! These techniques are fundamental for data analysis and will greatly enhance your data manipulation skills. If you found this post helpful, please leave a comment and share it with your peers. Don’t forget to subscribe to our channel for more tutorials — see you next time!”
📌 Related Resources
🎯 Want this guide as a downloadable PDF? Let us know in the comments!