Materi 08 · Tools & Methods

Python & Pandas untuk BA

Saat Excel mulai lambat di juta rows, saatnya naik kelas ke Python. Pandas = Excel on steroids. Plus jupyter notebook, matplotlib, seaborn untuk EDA professional.

⏱ 35 Menit🎯 Intermediate🐍 Python

1. Kenapa Python?

Excel Limit, Python Unlimited

Excel struggle > 100K rows. Python tangani 10M rows tanpa keringat. Plus reproducible, version-controllable, integrasi langsung dengan API/database/cloud. BA serius wajib up-skill ke Python di 2026.

2. Setup Environment

3. PANDAS — DataFrame Fundamentals

Import & Load Data import pandas as pd
import numpy as np

# Baca data dari berbagai source
df = pd.read_csv('sales.csv')
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')
df = pd.read_sql(query, connection)

# Eksplorasi cepat
df.head(10) # 10 row pertama
df.info() # tipe data, null count
df.describe() # statistik numerical
df.shape # (rows, cols)

4. SELECT & FILTER

Indexing & Filtering # Pilih kolom
df['name']
df[['name', 'email', 'country']]

# Filter row
df[df['amount'] > 1000]
df[(df['country'] == 'Indonesia') & (df['amount'] > 500)]

# Loc & iloc
df.loc[0:5, ['name', 'amount']]
df.iloc[0:5, 2:5]

5. GROUPBY — Aggregation

Group & Aggregate # Total revenue per country
df.groupby('country')['amount'].sum()

# Multiple metric
df.groupby('country').agg({
  'amount': ['sum', 'mean', 'count'],
  'order_id': 'nunique'
})

# Multiple group columns
df.groupby(['country', 'month'])['amount'].sum().unstack()

6. MERGE — Join Tables

Pandas Merge / Join # INNER JOIN
result = pd.merge(customers, orders, on='customer_id')

# LEFT JOIN
result = pd.merge(customers, orders, on='customer_id', how='left')

# Different column names
result = pd.merge(c, o, left_on='id', right_on='cust_id')

7. DATA CLEANING

Handle Missing & Duplicates # Cek missing
df.isnull().sum()

# Drop / fill
df.dropna(subset=['email'])
df['age'].fillna(df['age'].median(), inplace=True)

# Drop duplicates
df.drop_duplicates(subset=['email'], keep='first')

# String operations
df['name'] = df['name'].str.strip().str.title()
df['email_lower'] = df['email'].str.lower()

8. APPLY & LAMBDA — Custom Logic

Custom Transformation # Tagging segment
df['segment'] = df['amount'].apply(
  lambda x: 'High' if x > 1000 else 'Low'
)

# Function untuk multiple kolom
def classify(row):
  if row['amount'] > 5000 and row['frequency'] > 10:
    return 'VIP'
  return 'Regular'
df['tier'] = df.apply(classify, axis=1)

9. VISUALIZATION dengan Matplotlib & Seaborn

Quick Visualization import matplotlib.pyplot as plt
import seaborn as sns

# Bar chart
df.groupby('country')['amount'].sum().plot(kind='bar')

# Line chart trend
df.groupby('month')['amount'].sum().plot()

# Heatmap correlation
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

# Distribution
sns.histplot(df['amount'], bins=30)

10. PROFESSIONAL EDA WORKFLOW

StepAksiFunction Pandas
1. LoadImport dataread_csv, read_excel, read_sql
2. Quick lookLihat strukturhead(), info(), describe()
3. CleanHandle missing/duplicatesisnull, dropna, fillna, drop_duplicates
4. EngineerBuat feature baruapply, lambda, str methods
5. AggregateGroup & summarizegroupby, agg, pivot_table
6. VisualizePlot insightmatplotlib, seaborn
7. ExportShare hasilto_csv, to_excel, to_sql

📝 Tugas Praktik

  1. Buka Google Colab. Install pandas (sudah preinstalled).
  2. Download dataset Online Retail dari Kaggle. Load ke pandas.
  3. Lakukan EDA lengkap: head, info, missing, duplicates, basic stats.
  4. Hitung top 10 customer by revenue, AOV per country, monthly trend.
  5. Buat 5 chart dengan matplotlib/seaborn untuk insight utama.
  6. Export hasil ke Excel dengan multiple sheet (raw, summary, charts).

Rangkuman