|
1 | 1 | # Data Preparation |
| 2 | + |
| 3 | +## Haiti Tourism Data Pipeline |
| 4 | + |
| 5 | +A clean, reproducible, and fully documented data foundation built from official |
| 6 | +UNWTO sources |
| 7 | + |
| 8 | +This project transforms the raw and often unwieldy United Nations World Tourism Organization |
| 9 | +(UNWTO) Compendium of Tourism Statistics into clear, structured, |
| 10 | +and analysis-ready datasets — with a special focus on **Haiti** |
| 11 | +and a complete comparative panel of **30 Caribbean countries and territories**. |
| 12 | + The data cover the full available series from **1995 to 2022**. |
| 13 | + |
| 14 | +## Project Structure |
| 15 | + |
| 16 | +### 1. `Extract_data_haiti_and_caribbean.ipynb` |
| 17 | + |
| 18 | +**Objective**: Fast, reliable, and one-time raw extraction from massive global files |
| 19 | + |
| 20 | +- Loads the four official UNWTO Excel workbooks (Arrivals, Purpose, Mode of Transport, |
| 21 | + and Expenditure) |
| 22 | +- Isolates every single record belonging to Haiti across all sheets and years |
| 23 | +- Simultaneously pulls the same indicators for 30 Caribbean countries |
| 24 | + and territories for regional comparison |
| 25 | +- Saves compact, raw extracted subsets as CSVs (one file per indicator) |
| 26 | + |
| 27 | +### 2. `script_UN_tourism_caribbean_countries_cleaned.ipynb` |
| 28 | + |
| 29 | +**Objective**: Deliver one unified, tidy dataset for the entire Caribbean region |
| 30 | + |
| 31 | +- Removes all metadata clutter (flags, country codes, footnotes, notes) |
| 32 | +- Applies intuitive and consistent column names `country_receiving`, `year`, `number_of_tourists`. |
| 33 | +- Harmonizes visitor categories across countries and years: |
| 34 | + • Tourists (overnight visitors) |
| 35 | + • Excursionists (same-day visitors) |
| 36 | + • Total visitors |
| 37 | +- Final output: **3,025 rows** of clean, |
| 38 | + immediately usable Caribbean-wide tourism data |
| 39 | + |
| 40 | +### 3. `cleaned_data_haiti.ipynb` |
| 41 | + |
| 42 | +**Objective**: Produce publication-grade, Haiti-only datasets |
| 43 | +Four focused and meticulously cleaned tables: |
| 44 | + |
| 45 | +- Arrivals by type of visitor |
| 46 | +- Purpose of visit (business, personal, total) – 66 rows |
| 47 | +- Mode of transport (air, sea/water, all modes combined) |
| 48 | +- Tourism expenditure and receipts (in current US$) – 28 rows |
| 49 | + |
| 50 | +Every file features simple column names, correct data types, |
| 51 | +transparent handling of missing values, and complete removal |
| 52 | + of long original UN labels. |
| 53 | + |
| 54 | +## Conclusion – Ready for Impact |
| 55 | + |
| 56 | +What started as sprawling, hard-to-navigate global spreadsheets has become a |
| 57 | +**clean, trustworthy, and fully documented data foundation** that finally lets |
| 58 | +the real story of Haitian and Caribbean tourism emerge clearly. |
| 59 | + |
| 60 | +These datasets are now ready to power: |
| 61 | + |
| 62 | +- Policy briefs and evidence-based recovery strategies for Haiti’s tourism sector |
| 63 | +- Academic research and economic impact studies |
| 64 | +- Interactive dashboards and compelling data visualizations |
| 65 | +- Regional benchmarking and competitiveness analyses |
| 66 | + |
| 67 | +In just three transparent and reproducible notebooks, complex international |
| 68 | +statistics have been transformed into clear, actionable evidence. |
| 69 | + |
| 70 | +The numbers cleaned here are more than data points they are the solid |
| 71 | +groundwork for informed decisions that can help rebuild, reimagine, |
| 72 | +and revitalize tourism in Haiti and across the wider Caribbean. |
0 commit comments