Materi 11 · Deep Learning

NLP, RNN, LSTM, Transformer

Bagaimana mesin memahami bahasa? Dari word embedding ke RNN, LSTM, dan akhirnya Transformer — arsitektur yang melahirkan GPT, BERT, dan revolusi LLM.

⏱ 32 Menit🎯 Advanced📚 Module 11/15

1. NLP — Tantangan Dasar

Bahasa manusia ambigu, sequential, kontekstual, kompleks. Mesin harus capture makna kata, urutan, konteks panjang, dan nuansa budaya.

Tugas NLP Utama

Klasifikasi (sentiment), Named Entity Recognition (NER), POS tagging, Machine Translation, Question Answering, Summarization, Text Generation, Semantic Search, Speech-to-Text.

2. Word Representation

📊

Bag of Words

Hitung frekuensi tiap kata. Kehilangan urutan & semantic. Baseline.

📈

TF-IDF

Term Frequency × Inverse Document Frequency. Bobot kata penting di doc tertentu.

🌐

Word Embedding

Word2Vec, GloVe — representasi dense vector yang capture semantic.

Word2Vec Magic vec("king") − vec("man") + vec("woman") ≈ vec("queen")
// Embeddings capture relasi semantik

3. RNN — Recurrent Neural Network

Network dengan loop — output di-feed kembali ke input layer berikutnya. Cocok untuk sequence (text, time-series, audio).

RNN Step h_t = tanh(W_h · h_{t−1} + W_x · x_t + b)
y_t = W_y · h_t + b_y
// h = hidden state (memori), x = input, y = output

⚠️ Problem

Vanishing Gradient

RNN biasa tidak bisa pelajari dependency panjang. Setelah ~10 step, gradient mengecil sampai mendekati nol — network "lupa" konteks awal.

4. LSTM — Solusi Vanishing Gradient

Long Short-Term Memory (Hochreiter & Schmidhuber, 1997). Tambahkan cell state + 3 gates (forget, input, output) untuk control informasi.

LSTM Gates forget_gate = σ(W_f · [h_{t-1}, x_t] + b_f)
input_gate = σ(W_i · [h_{t-1}, x_t] + b_i)
output_gate = σ(W_o · [h_{t-1}, x_t] + b_o)
cell_state = forget_gate × c_{t-1} + input_gate × candidate
h_t = output_gate × tanh(cell_state)

Bidirectional LSTM: baca sequence dari kiri-ke-kanan DAN kanan-ke-kiri.
GRU (Gated Recurrent Unit): simplified LSTM, lebih cepat dengan performance comparable.
Sequence-to-sequence: encoder-decoder LSTM untuk translation, summarization.

5. Attention Mechanism

Insight 2014: alih-alih compress seluruh input ke 1 vector, biarkan decoder "perhatikan" bagian input yang relevan untuk setiap output.

Attention Score score(query, key) = query · key^T / √d_k
attention_weights = softmax(scores)
output = Σ attention_weights · values
// "Tanya ke setiap input: seberapa relevan kamu untuk output ini?"

6. Transformer — "Attention Is All You Need" (2017)

Revolusi Tanpa Recurrence

Vaswani et al., 2017: buang recurrence sepenuhnya. Pure attention. Hasilnya: bisa di-paralelkan di GPU, training jauh lebih cepat, capture long-range dependency lebih baik.

Komponen Transformer

Multi-Head Self-Attention: beberapa "kepala" attention paralel, capture pola berbeda.
Positional Encoding: tambahkan info posisi karena attention agnostic terhadap urutan.
Layer Normalization: stabilize training di deep network.
Feed-Forward Network: MLP yang diaplikasikan per posisi.
Residual Connection: output = input + sublayer(input).

7. Era LLM — BERT, GPT, T5

Model	Tahun	Arsitektur	Fokus
BERT (Google)	2018	Encoder-only	Understanding (klasifikasi, QA)
GPT-2 (OpenAI)	2019	Decoder-only	Generation
T5 (Google)	2019	Encoder-Decoder	Text-to-text universal
GPT-3	2020	Decoder, 175B	Few-shot learning
ChatGPT/GPT-4	2022/23	RLHF on top	Conversational
Claude, Llama, Gemini	2023+	Various decoder	State-of-the-art

8. Studi Kasus

🌟 Real World

Google Search & BERT (2019)

Saat Google deploy BERT ke search engine 2019, mereka tunjukkan contoh: query "can you get medicine for someone pharmacy". Pre-BERT, Google fokus pada keyword "medicine, pharmacy". Post-BERT, Google paham konteks "for someone" — pertanyaan tentang pickup obat untuk orang lain.

Pelajaran: Transformer mengubah semantic understanding fundamental. 10% query Google diuntungkan dari BERT, salah satu update terbesar dalam sejarah search engine.

📝 Tugas

Build Sentiment Classifier

Pakai Hugging Face Transformers di Colab.
Load pretrained model (DistilBERT atau IndoBERT).
Fine-tune di dataset sentiment Indonesia (mis. Twitter Indonesia).
Bandingkan dengan baseline LSTM dan TF-IDF + Logistic Regression.
Test inference: tulis 5 kalimat custom, lihat prediksi sentiment.

Rangkuman

NLP butuh handle ambiguity, sequence, context.
RNN bisa proses sequence tapi vanishing gradient problem.
LSTM solve vanishing dengan gates + cell state.
Attention mechanism: biarkan model "perhatikan" bagian input yang relevan.
Transformer (2017) buang recurrence — pure attention. Fondasi semua LLM modern.
BERT, GPT, T5 — variants Transformer dengan fokus berbeda.