sentimentanalysis dukcapil

Public app reviews are incredibly noisy. You cannot compare sentiment models responsibly without a repeatable preprocessing pipeline. I built this NLP workflow to extract genuine signals from raw Dukcapil app reviews.

The project uses a notebook-first approach to keep exploratory decisions transparent. I separated the raw scraping data from the cleaned datasets. The workflow handles Indonesian slang normalization and text preprocessing before pushing the data into Scikit-learn for model benchmarking. Keeping the environment traceable with requirements metadata was a core priority. The biggest takeaway is that structured data cleaning creates actual analytical value. It shows that disciplined NLP experimentation is less about the final algorithm and more about preserving the data artifacts and analysis notebooks for source-level review.

Repository

sentimentAnalysis Dukcapil