Skip to content

Commit 06b9982

Browse files
adding readme for data preparation
1 parent cd3661e commit 06b9982

File tree

1 file changed

+71
-0
lines changed

1 file changed

+71
-0
lines changed

2_data_preparation/README.md

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,72 @@
11
# Data Preparation
2+
3+
## Haiti Tourism Data Pipeline
4+
5+
A clean, reproducible, and fully documented data foundation built from official
6+
UNWTO sources
7+
8+
This project transforms the raw and often unwieldy United Nations World Tourism Organization
9+
(UNWTO) Compendium of Tourism Statistics into clear, structured,
10+
and analysis-ready datasets — with a special focus on **Haiti**
11+
and a complete comparative panel of **30 Caribbean countries and territories**.
12+
The data cover the full available series from **1995 to 2022**.
13+
14+
## Project Structure
15+
16+
### 1. `Extract_data_haiti_and_caribbean.ipynb`
17+
18+
**Objective**: Fast, reliable, and one-time raw extraction from massive global files
19+
20+
- Loads the four official UNWTO Excel workbooks (Arrivals, Purpose, Mode of Transport,
21+
and Expenditure)
22+
- Isolates every single record belonging to Haiti across all sheets and years
23+
- Simultaneously pulls the same indicators for 30 Caribbean countries
24+
and territories for regional comparison
25+
- Saves compact, raw extracted subsets as CSVs (one file per indicator)
26+
27+
### 2. `script_UN_tourism_caribbean_countries_cleaned.ipynb`
28+
29+
**Objective**: Deliver one unified, tidy dataset for the entire Caribbean region
30+
31+
- Removes all metadata clutter (flags, country codes, footnotes, notes)
32+
- Applies intuitive and consistent column names `country_receiving`, `year`, `number_of_tourists`.
33+
- Harmonizes visitor categories across countries and years:
34+
 • Tourists (overnight visitors)
35+
 • Excursionists (same-day visitors)
36+
 • Total visitors
37+
- Final output: **3,025 rows** of clean,
38+
immediately usable Caribbean-wide tourism data
39+
40+
### 3. `cleaned_data_haiti.ipynb`
41+
42+
**Objective**: Produce publication-grade, Haiti-only datasets
43+
Four focused and meticulously cleaned tables:
44+
45+
- Arrivals by type of visitor
46+
- Purpose of visit (business, personal, total) – 66 rows
47+
- Mode of transport (air, sea/water, all modes combined)
48+
- Tourism expenditure and receipts (in current US$) – 28 rows
49+
50+
Every file features simple column names, correct data types,
51+
transparent handling of missing values, and complete removal
52+
of long original UN labels.
53+
54+
## Conclusion – Ready for Impact
55+
56+
What started as sprawling, hard-to-navigate global spreadsheets has become a
57+
**clean, trustworthy, and fully documented data foundation** that finally lets
58+
the real story of Haitian and Caribbean tourism emerge clearly.
59+
60+
These datasets are now ready to power:
61+
62+
- Policy briefs and evidence-based recovery strategies for Haiti’s tourism sector
63+
- Academic research and economic impact studies
64+
- Interactive dashboards and compelling data visualizations
65+
- Regional benchmarking and competitiveness analyses
66+
67+
In just three transparent and reproducible notebooks, complex international
68+
statistics have been transformed into clear, actionable evidence.
69+
70+
The numbers cleaned here are more than data points they are the solid
71+
groundwork for informed decisions that can help rebuild, reimagine,
72+
and revitalize tourism in Haiti and across the wider Caribbean.

0 commit comments

Comments
 (0)