The project keeps raw and cleaned Dukcapil review datasets together with scraping and analysis notebooks.
sentimentAnalysis Dukcapil
This project is framed as a practical NLP learning artefact with raw and cleaned datasets, notebooks, requirements, and README narrative preserved for traceability.
A source-backed case study built for recruiter review
This reading path makes the problem choice, evidence quality, user framing, execution decisions, and proof trail visible without overstating what the sources support.
NLP sentiment-analysis project using notebook-driven preprocessing, local datasets, and model benchmarking for Dukcapil app reviews.
Shows disciplined NLP experimentation by keeping data artefacts, requirements, and analysis notebooks together for source-level review.
Notebook-first ML workflow with raw and processed CSV datasets, slang-word resources, scraping notebooks, and model-analysis notebooks documented in the local archive.
Problem framing before execution
The case-study layer starts with why this problem was selected and how the context justified investment.
Problem Framing Map
Public-app review text needs repeatable preprocessing before sentiment models can be compared responsibly.
The project preserves raw and cleaned datasets, scraping notebooks, analysis notebooks, requirements, and README narrative so the full NLP path stays reviewable.
It adds language-specific ML depth to the portfolio by showing that preprocessing and auditability matter as much as the final model choice.
Problem statement
Public-app review text needs repeatable preprocessing before sentiment models can be compared responsibly.
Solution thesis
Built a notebook-based workflow covering data collection artefacts, cleaned datasets, Indonesian text preprocessing, and model benchmarking.
What supports the narrative
Evidence is surfaced with its source type and credibility note so the recruiter can quickly see what is directly backed versus intentionally constrained.
The workflow explicitly handles Indonesian review text before benchmarking models.
Credibility Notes
- ●The project is framed as notebook-driven NLP experimentation, not as a deployed sentiment product.
- ●No live user feedback, service integration, or product-impact claim is added beyond the source-backed analysis artefacts.
User framing stays explicit
When formal research artefacts are not available, the page still explains who the work served and why that user framing is justified by the existing sources.
The strongest project value lies in the explicit data trail and preprocessing discipline, not a production-facing interface.
The source-backed narrative connects cleaned review text and comparative experimentation to practical analysis goals.
How design thinking translated into decisions
The goal is to show the trace from research and insight to concrete product or system decisions, then to the outcomes those decisions supported.
Design Thinking Flow
Each step keeps the movement from evidence to action explicit before the rationale expands it.
- Step 1Data acquisition framing
Started from preserving source and cleaned review datasets before optimizing model comparison.
Signal: Raw-to-processed traceability became part of the evidence story. - Step 2Preprocessing discipline
Handled Indonesian review normalization as a first-class problem rather than a hidden notebook detail.
Signal: Language-specific text preparation became central to model credibility. - Step 3Benchmarking transparency
Used notebook-based experimentation so model behavior remains reviewable alongside the data pipeline.
Signal: The workflow supports auditable comparison rather than opaque model claims.
Decision Rationale
Each decision keeps the path from insight to execution visible before ending on the outcome signal.
Sentiment experiments become harder to audit when source text and processed data are merged too early.
Kept raw and cleaned review datasets separate inside the project workflow.
The analysis path becomes easier to verify and discuss during portfolio review.
Preprocessing choices can change model conclusions as much as algorithm selection.
Used notebooks to keep preprocessing and model benchmarking steps explicit.
The project demonstrates disciplined NLP reasoning rather than only a final score narrative.
Execution choices and delivery details
This section preserves the technical and operational substance: architecture, responsibilities, trade-offs, and implementation quality signals.
System Design
Notebook-first ML workflow with raw and processed CSV datasets, slang-word resources, scraping notebooks, and model-analysis notebooks documented in the local archive.
Source-backed Impact
Shows disciplined NLP experimentation by keeping data artefacts, requirements, and analysis notebooks together for source-level review.
Responsibilities
- ●Prepared source and cleaned review datasets for analysis
- ●Implemented preprocessing steps for Indonesian review text
- ●Compared model behavior through notebook-based experimentation
Stack Decisions
- ●Used notebooks to keep exploratory NLP decisions auditable
- ●Kept raw and processed datasets separate to preserve reviewability
- ●Used requirements metadata so the experiment environment can be reconstructed
Trade-offs
- ●Accepted notebook workflow limits in exchange for transparent experimentation
- ●Avoided production-readiness claims because the available source is an analysis artefact, not a deployed service
Challenges
- ●Handling informal Indonesian review text before modeling
- ●Keeping data acquisition, preprocessing, and model comparison traceable across notebooks
What was delivered and what can be verified
Outcome claims remain conservative and source-backed, while proof records and recruiter-safe links surface the strongest verification trail available.
Validation Signals
- ●Local archive includes raw and cleaned Dukcapil app review datasets.
- ●Project contains scraping and analysis notebooks plus requirements metadata.
Source-backed Outcomes
- ●Local archive includes raw and cleaned Dukcapil app review datasets
- ●Project contains scraping and analysis notebooks plus requirements metadata
- ●Insight archive records 9 files across dataset, notebook, README, and requirements artefacts
Proof
- DBS Foundation Coding Camp Project
NLP sentiment-analysis artefacts available
DBS Foundation Coding Camp 2024
Links
What the project proves, and what it does not
Strong case studies show both what was learned and where the current evidence stops.
Retrospective
Next iteration should add a concise model card and reproducible evaluation summary before turning notebook results into product claims.
Evidence Limits
- ●Current sources do not support deployment, online inference, or ongoing feedback-loop claims.
- ●The project should remain framed as auditable NLP experimentation and analysis.
Lessons
- ●Language-specific preprocessing can matter as much as algorithm choice in sentiment analysis
- ●A clear raw-to-cleaned data trail improves ML project auditability