README states Isolation Forest as focal method with OneClassSVM and LocalOutlierFactor comparators.
SST El Niño iForest
This project is structured as a reviewable ML workflow rather than a black-box notebook: input data, model comparisons, and exported outputs are separated so reviewers can audit assumptions and outcomes.
A source-backed case study built for recruiter review
This reading path makes the problem choice, evidence quality, user framing, execution decisions, and proof trail visible without overstating what the sources support.
Reproducible anomaly-detection workflow using Isolation Forest and comparator models to analyze SST variability as an El Niño proxy signal.
Improves auditability for exploratory climate ML work by preserving clear boundaries between data, analysis notebooks, and generated artefacts.
Notebook-centered pipeline with explicit dataset input (`data/`), analysis notebooks (`notebooks/`), and reproducible outputs (`results/` + `figures/`).
Problem framing before execution
The case-study layer starts with why this problem was selected and how the context justified investment.
Problem Framing Map
Anomaly detection projects are hard to trust when modelling choices and outputs are not reproducible.
The project intentionally frames SST anomaly detection as a proxy-analysis workflow with explicit comparator methods and exported evidence artefacts.
It offers strong reviewer value by coupling ML experimentation with clean evidence packaging, while keeping interpretation claims conservative.
Problem statement
Climate anomaly analysis loses credibility when preprocessing, model choice, and output artefacts are not reproducible.
Solution thesis
Built a repository with notebook-driven anomaly detection, comparator benchmarks, and exported result tables/figures for transparent review.
What supports the narrative
Evidence is surfaced with its source type and credibility note so the recruiter can quickly see what is directly backed versus intentionally constrained.
Result files and figures are persisted under dedicated folders for independent review.
Credibility Notes
- ●Public copy is restricted to reproducibility, method scope, and documented artefact availability.
- ●No operational forecasting-accuracy or climate-policy impact claim is made beyond repository-backed analysis outputs.
User framing stays explicit
When formal research artefacts are not available, the page still explains who the work served and why that user framing is justified by the existing sources.
The repository structure is designed to expose data inputs, notebooks, and generated outputs clearly.
Benchmark comparators and exported metrics enable side-by-side model review.
How design thinking translated into decisions
The goal is to show the trace from research and insight to concrete product or system decisions, then to the outcomes those decisions supported.
Design Thinking Flow
Each step keeps the movement from evidence to action explicit before the rationale expands it.
- Step 1Reproducibility-first packaging
Organized repository into clear data, notebook, results, and figure boundaries.
Signal: Auditability is treated as a design objective. - Step 2Comparator method framing
Evaluated multiple unsupervised methods rather than reporting a single model in isolation.
Signal: Model selection discussion gains context and credibility. - Step 3Proxy interpretation control
Positioned outcomes as proxy-signal analysis instead of broad climate prediction claims.
Signal: Narrative remains conservative and source-backed.
Decision Rationale
Each decision keeps the path from insight to execution visible before ending on the outcome signal.
Reviewers need immediate visibility into outputs before rerunning heavy analysis.
Included both executed and rerunnable notebook variants.
Project supports both quick inspection and full rerun workflows.
Notebook-only output becomes hard to compare and validate over time.
Exported benchmark and diagnostic outputs into dedicated CSV/figure files.
Result traceability is stronger for technical review and iteration.
Execution choices and delivery details
This section preserves the technical and operational substance: architecture, responsibilities, trade-offs, and implementation quality signals.
System Design
Notebook-centered pipeline with explicit dataset input (`data/`), analysis notebooks (`notebooks/`), and reproducible outputs (`results/` + `figures/`).
Source-backed Impact
Improves auditability for exploratory climate ML work by preserving clear boundaries between data, analysis notebooks, and generated artefacts.
Responsibilities
- ●Structured anomaly-detection workflow for reproducible reviewer access
- ●Benchmarked multiple unsupervised methods against the same proxy setup
- ●Packaged outputs and figures into a transparent artefact hierarchy
Stack Decisions
- ●Used Isolation Forest as focal method while retaining comparator fairness
- ●Kept notebook and output directories separate for traceability
- ●Included executed notebook to improve fast reviewer onboarding
Trade-offs
- ●Accepted notebook-centric execution in exchange for transparent exploratory workflow
- ●Prioritized reproducibility artefacts over production pipeline orchestration
Challenges
- ●Keeping anomaly-proxy interpretation conservative under limited domain context
- ●Balancing comparative model breadth with clear reviewer narrative
Architecture and outcome snapshot
This visual layer keeps execution readable: how the system or delivery flow was structured and which source-backed outcomes mattered most.
Execution Flow
- Step 1Data Conditioning
SST source data is loaded and transformed into monthly anomaly proxy features.
Signal: Feature preparation is explicit and reviewable. - Step 2Model Benchmarking
Isolation Forest and comparator models are evaluated under aligned analysis flow.
Signal: Model behavior is interpreted comparatively, not in isolation. - Step 3Artefact Export
Metrics and diagnostics are exported to CSV and figures for transparent review.
Signal: Outputs remain reusable beyond notebook runtime.
Outcome Snapshot
- Comparator Set3 unsupervised models
IsolationForest + OneClassSVM + LocalOutlierFactor
- Output SurfaceCSV + figure artefacts
Results and diagnostics stored in dedicated folders
- ReproducibilityExecuted + clean notebook
Supports quick audit and rerun pathways
What was delivered and what can be verified
Outcome claims remain conservative and source-backed, while proof records and recruiter-safe links surface the strongest verification trail available.
Validation Signals
- ●README and repository tree expose explicit benchmark and output artefact structure.
- ●Dedicated `results/benchmark_metrics.csv` file provides direct model-comparison evidence.
Source-backed Outcomes
- ●Comparator benchmark includes IsolationForest, OneClassSVM, and LocalOutlierFactor
- ●Result artefacts are exported into dedicated CSV output files
- ●Repository preserves both executed and rerunnable notebook variants
Proof
- Benchmark Artefacts
Comparator metrics and anomaly outputs exported as reviewable CSV files
- Reproducibility Surface
Notebook, requirements, figures, and results are all versioned
Links
What the project proves, and what it does not
Strong case studies show both what was learned and where the current evidence stops.
Retrospective
Next step should add deeper domain-grounded interpretation notes and explicit uncertainty communication for climate context.
Evidence Limits
- ●Current work is a reproducible analysis pipeline, not a production climate-monitoring service.
- ●Proxy interpretation requires additional domain validation before operational decision use.
Lessons
- ●Exploratory ML becomes stronger when output artefacts are explicit and versioned
- ●Benchmark comparators improve interpretability even in notebook-first projects