Python software development project for analyzing and extracting open data from the official dati.gov.it search portal.
- ๐ Search and filtering of available datasets
- ๐ Automatic download and analysis of CSV data
- ๐บ๏ธ Geographic visualization of road accidents
- ๐ Export to CSV and Excel formats
- ๐๏ธ Modular architecture well organized
- ๐งช Complete test suite for validation
- โ๏ธ Centralized configuration for easy maintenance
Installation instructions:
git clone https://github.com/1ESA1/AnalisiOpenData.git
cd AnalisiOpenData- Python 3.6 or higher
- Main dependencies:
requests,pandas,folium - Support files: JSON and CSV management
- Compatible with datasets in JSON and CSV format
cd src
streamlit run app.pyLocal Access (Same Machine):
http://localhost:8501
Deploy to Streamlit Cloud (External Users) ๐
- Create a GitHub repository with your project
- Sign up at [streamlit.io/cloud]
- Click "New app" and select your repository
- Share the public URL with users worldwide
๐ App Live: Public URL
https://brluhecnkuvhp99tuzhzv3.streamlit.app/
Click the link above to access the live application!
Features:
- ๐ Interactive dataset search with keyword filtering
- ๐ Browse and select datasets from dati.gov.it
- ๐ Analyze individual datasets or process all results
- ๐บ๏ธ View interactive maps for geographic data
- ๐ฅ Download analysis results as CSV
- ๐ View data summaries and statistics
cd src
python main.pycd src
python AnalisiOpenData.py# Run all tests
cd tests
python run_unified_tests.py
# Specific tests
python test_config.py # Configuration tests
python test_unified.py # Unified tests
python test_utils.py # Utility tests- ๐ Dataset Search: Enter a keyword to filter available datasets
- ๐ Dataset Selection: Choose the desired dataset from the filtered list
- โฌ๏ธ Automatic Download: The system automatically downloads CSV data
- ๐ Accident Analysis: If available, analyzes road accident data
- ๐บ๏ธ Visualization: Creates interactive maps of accidents
data/: JSON files with dataset metadataoutput/: Output files (CSV, Excel, HTML maps)๐บ๏ธ mappa_incidenti.html- Interactive map๐ output.xlsx- Excel report๐ output.csv- Data exported to CSV
- ๐ง Separation of Concerns: Each module has specific role
- ๐ฆ Modular Architecture: Independent and reusable components
- โ๏ธ Configuration Management: Centralized settings
- ๐ก๏ธ Error Handling: Robust error handling
- ๐งช Test-Driven: Complete test suite for validation
- ๐ Documentation: Detailed documentation
- ๐ Backward Compatibility: Legacy code maintained
- โ Separation of responsibilities into modules
- โ Robust error handling with exception management
- โ Centralized configuration
- โ Improved user interface
- โ Code documentation
- โ Data validation
- โ Automatic directory management
- โ Complete test suite
- โ Interactive Web Interface: Modern Streamlit-based UI for easy data exploration
- โ Advanced Search: Filter datasets by keyword from dati.gov.it
- โ Batch Processing: Analyze all search results simultaneously
- โ Enhanced Maps: Intelligent coordinate detection (latitude/longitude variations)
- โ CSV Analysis Tools: Automatic CSV separator detection
- โ Live Statistics: Real-time data summaries and metrics
- โ Download Support: Export analyzed data as CSV files
- โ Progress Tracking: Visual progress bars for batch operations
- โ
Flexible Coordinate Detection: Supports multiple column naming conventions:
- Latitude:
latitudine,latitude,lat,y_coord,y - Longitude:
longitudine,longitude,lon,x_coord,x
- Latitude:
- โ Multi-Dataset Maps: Create comprehensive geographic visualizations
- โ Enhanced Error Messages: Detailed debug information for troubleshooting
- โ Data Analysis Pipeline: Complete automatic analysis workflow
- โ
CSV Separator Detection: Auto-detect
,,;,\t,|separators - โ Dataset Retrieval: Integrated methods for package data extraction
- โ Data Cleaning: Automatic duplicate removal and validation
- โ Resource Management: Proper handling of multiple file formats
AnalisiOpenData/
โ
โโโ ๐ src/ # Modular source code
โ โโโ ๐ฏ main.py # Main control
โ โโโ โ๏ธ config.py # Centralized configuration (URLs, paths, constants)
โ โโโ ๐ services.py # API and data services
โ โโโ ๐ file_manager.py # File I/O management (JSON, CSV, Excel)
โ โโโ ๐ analyzer.py # Accident analysis
โ โโโ ๐ฌ ui.py # User interface
โ โโโ ๐ AnalisiOpenData.py # Original code (backup)
โ
โโโ ๐งช tests/ # Complete test suite
โ โโโ ๐ base_test.py # Base tests
โ โโโ ๐ test_config.py # Configuration tests
โ โโโ ๐ test_unified.py # Unified tests
โ โโโ ๐ test_utils.py # Test utilities
โ โโโ ๐ run_unified_tests.py # Unified test runner
โ
โโโ ๐ data/ # Input data
โ โโโ ๐ Condizioni.xlsx # Weather conditions
โ โโโ ๐ DatiGovIt.json # Raw data from data.gov.it
โ โโโ ๐ DatiGovItFiltrati.json # Filtered data
โ โโโ ๐ DatiSelezionati.json # Data selected for analysis
โ
โโโ ๐ output/ # Generated output files
โ โโโ ๐บ๏ธ mappa_incidenti.html # Interactive map
โ โโโ ๐ output.xlsx # Excel report
โ โโโ ๐ output.csv # Data exported to CSV
โ
โโโ ๐ README.md # Complete documentation
โโโ ๐ LICENSE # Apache 2.0 License
- โ Tests isolated in dedicated directory
- โ Do not interfere with production code
- โ Facilitates maintenance and development
- โ Follow Python best practices
- โ Each module has specific responsibility
- โ Reusable and testable code
- โ Easy debugging and maintenance
- โ Extensible for future features
- โ Tests for configuration and utilities
- โ Import and structure tests
- โ Component functionality tests
- โ Complete integration tests
- โ Modules: 6/6 source files validated
- โ Configuration: Working settings tests
- โ Functionality: Core components tested
- โ Integration: Complete system validated
cd tests
python run_unified_tests.pycd tests
python test_config.py # Configuration tests
python test_unified.py # Unified tests
python test_utils.py # Utility tests# Display project structure
tree -I '__pycache__'
# Check main files
ls -la src/
ls -la tests/
ls -la data/
ls -la output/Guidelines for those who wish to contribute:
- Before opening a new issue, verify that it has not already been reported
- Clearly describe the problem, expected behavior and actual behavior
- If possible, attach screenshots, logs or code examples that help clarify the issue
- Fork the repository and create a new branch for your changes
- Make sure your code is well formatted and doesn't introduce errors
- Clearly describe the changes in the Pull Request message
- Link the Pull Request to an Issue, if relevant
- Respond to comments and review requests from maintainers
- Follow the project's style conventions (e.g. PEP8 for Python)
- If you modify existing functionality, make sure everything continues to work correctly
- Update documentation, if necessary
- If possible, add tests that cover new functionality or fixes
- Make sure all existing tests continue to pass
- For questions or proposals, open a discussion in the Issues section
- โ Updated file structure to reflect project reality
- โ Corrected test commands to use actually present files
- โ Updated module and component counts
- โ Improved documentation of data and output directories
- ๐ The file
ouput.xlsxin/output/contains a spelling error in the name - ๐ง Tests could be extended to cover more use cases
- ๐ Documentation can be enriched with practical examples
This project is distributed under the Apache 2.0 License.
The project was developed by: