Data Loading, Cleaning, and Basic Analysis with Pandas

 


import pandas as pd

# Load data from a CSV file
data = pd.read_csv('data.csv')

# Display the first 5 rows of the DataFrame
print("First 5 rows:\n", data.head())

# Check for missing values
print("\nMissing values:\n", data.isnull().sum())

# Fill missing values with the mean of the column
data['Age'].fillna(data['Age'].mean(), inplace=True)

# Filter data based on a condition
filtered_data = data[data['Salary'] > 50000]
print("\nFiltered data (Salary > 50000):\n", filtered_data)

# Group data and calculate the average salary by department
grouped_data = data.groupby('Department')['Salary'].mean()
print("\nAverage salary by department:\n", grouped_data)

# Save the cleaned data to a new CSV file
data.to_csv('cleaned_data.csv', index=False)

-

Explanation:

This code demonstrates fundamental Pandas operations: loading data from a CSV file, inspecting the data (first few rows, missing values), handling missing data (filling with the mean), filtering data based on a condition, grouping data and performing calculations, and saving the modified data to a new file.

Requirements:

  • Pandas library installed (pip install pandas)
  • A CSV file named data.csv in the same directory as the script. The CSV should have columns like 'Name', 'Age', 'Department', 'Salary'. Example:

Name,Age,Department,Salary
Alice,25,Sales,60000
Bob,30,Marketing,55000
Charlie,28,Sales,70000
David,35,IT,80000
Eve,22,Marketing,45000
Frank,None,Sales,65000

-

Additional Notes:

Pandas is the cornerstone of data manipulation and analysis in Python. It provides DataFrames, which are powerful data structures for organizing and working with tabular data. It's essential for data cleaning, transformation, and preparation for machine learning.

Comments