Materi 08 · Tools & Methods
Python & Pandas untuk BA
Saat Excel mulai lambat di juta rows, saatnya naik kelas ke Python. Pandas = Excel on steroids. Plus jupyter notebook, matplotlib, seaborn untuk EDA professional.
⏱ 35 Menit🎯 Intermediate🐍 Python
1. Kenapa Python?
Excel Limit, Python Unlimited
Excel struggle > 100K rows. Python tangani 10M rows tanpa keringat. Plus reproducible, version-controllable, integrasi langsung dengan API/database/cloud. BA serius wajib up-skill ke Python di 2026.
2. Setup Environment
- Anaconda / Miniconda: all-in-one Python + pandas + jupyter.
- Google Colab: gratis, cloud, no install. Cocok untuk pemula.
- Jupyter Notebook: interactive coding + markdown + plot di 1 doc.
- VS Code + Python ext: kalau prefer IDE penuh.
- pip install: pandas, numpy, matplotlib, seaborn, openpyxl.
3. PANDAS — DataFrame Fundamentals
Import & Load Data
import pandas as pd
import numpy as np
df = pd.read_csv('sales.csv')
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
df = pd.read_sql(query, connection)
df.head(10)
df.info()
df.describe()
df.shape
4. SELECT & FILTER
Indexing & Filtering
df['name']
df[['name', 'email', 'country']]
df[df['amount'] > 1000]
df[(df['country'] == 'Indonesia') & (df['amount'] > 500)]
df.loc[0:5, ['name', 'amount']]
df.iloc[0:5, 2:5]
5. GROUPBY — Aggregation
Group & Aggregate
df.groupby('country')['amount'].sum()
df.groupby('country').agg({
'amount': ['sum', 'mean', 'count'],
'order_id': 'nunique'
})
df.groupby(['country', 'month'])['amount'].sum().unstack()
6. MERGE — Join Tables
Pandas Merge / Join
result = pd.merge(customers, orders, on='customer_id')
result = pd.merge(customers, orders, on='customer_id', how='left')
result = pd.merge(c, o, left_on='id', right_on='cust_id')
7. DATA CLEANING
Handle Missing & Duplicates
df.isnull().sum()
df.dropna(subset=['email'])
df['age'].fillna(df['age'].median(), inplace=True)
df.drop_duplicates(subset=['email'], keep='first')
df['name'] = df['name'].str.strip().str.title()
df['email_lower'] = df['email'].str.lower()
8. APPLY & LAMBDA — Custom Logic
Custom Transformation
df['segment'] = df['amount'].apply(
lambda x: 'High' if x > 1000 else 'Low'
)
def classify(row):
if row['amount'] > 5000 and row['frequency'] > 10:
return 'VIP'
return 'Regular'
df['tier'] = df.apply(classify, axis=1)
9. VISUALIZATION dengan Matplotlib & Seaborn
Quick Visualization
import matplotlib.pyplot as plt
import seaborn as sns
df.groupby('country')['amount'].sum().plot(kind='bar')
df.groupby('month')['amount'].sum().plot()
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
sns.histplot(df['amount'], bins=30)
10. PROFESSIONAL EDA WORKFLOW
| Step | Aksi | Function Pandas |
| 1. Load | Import data | read_csv, read_excel, read_sql |
| 2. Quick look | Lihat struktur | head(), info(), describe() |
| 3. Clean | Handle missing/duplicates | isnull, dropna, fillna, drop_duplicates |
| 4. Engineer | Buat feature baru | apply, lambda, str methods |
| 5. Aggregate | Group & summarize | groupby, agg, pivot_table |
| 6. Visualize | Plot insight | matplotlib, seaborn |
| 7. Export | Share hasil | to_csv, to_excel, to_sql |
📝 Tugas Praktik
- Buka Google Colab. Install pandas (sudah preinstalled).
- Download dataset Online Retail dari Kaggle. Load ke pandas.
- Lakukan EDA lengkap: head, info, missing, duplicates, basic stats.
- Hitung top 10 customer by revenue, AOV per country, monthly trend.
- Buat 5 chart dengan matplotlib/seaborn untuk insight utama.
- Export hasil ke Excel dengan multiple sheet (raw, summary, charts).
Rangkuman
- Python + pandas = Excel pada level enterprise. Tangani jutaan row.
- DataFrame = struktur data inti. Tabel di pandas = DataFrame.
- groupby + agg = senjata utama untuk aggregation pivot-style.
- merge() = JOIN versi pandas, mirip SQL.
- Apply + lambda untuk custom transformation flexible.
- EDA workflow 7-step: Load → Look → Clean → Engineer → Aggregate → Visualize → Export.