Skip to content

1ESA1/AnalisiOpenData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

22 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ AnalisiOpenData - Open Data Analyzer

๐Ÿ“‹ Description

Python software development project for analyzing and extracting open data from the official dati.gov.it search portal.

โœจ Main Features

  • ๐Ÿ” Search and filtering of available datasets
  • ๐Ÿ“Š Automatic download and analysis of CSV data
  • ๐Ÿ—บ๏ธ Geographic visualization of road accidents
  • ๐Ÿ“ˆ Export to CSV and Excel formats
  • ๐Ÿ—๏ธ Modular architecture well organized
  • ๐Ÿงช Complete test suite for validation
  • โš™๏ธ Centralized configuration for easy maintenance

๐Ÿš€ Installation

Installation instructions:

git clone https://github.com/1ESA1/AnalisiOpenData.git
cd AnalisiOpenData

๐Ÿ“‹ Requirements

  • Python 3.6 or higher
  • Main dependencies: requests, pandas, folium
  • Support files: JSON and CSV management
  • Compatible with datasets in JSON and CSV format

๐ŸŽฏ Usage

Streamlit Web Application (Recommended) โญ

cd src
streamlit run app.py

Local Access (Same Machine):

http://localhost:8501

Deploy to Streamlit Cloud (External Users) ๐ŸŒ

  1. Create a GitHub repository with your project
  2. Sign up at [streamlit.io/cloud]
  3. Click "New app" and select your repository
  4. Share the public URL with users worldwide

๐Ÿš€ App Live: Public URL

https://brluhecnkuvhp99tuzhzv3.streamlit.app/

Click the link above to access the live application!

Features:

  • ๐Ÿ” Interactive dataset search with keyword filtering
  • ๐Ÿ“‹ Browse and select datasets from dati.gov.it
  • ๐Ÿ“Š Analyze individual datasets or process all results
  • ๐Ÿ—บ๏ธ View interactive maps for geographic data
  • ๐Ÿ“ฅ Download analysis results as CSV
  • ๐Ÿ“ˆ View data summaries and statistics

Modular Version CLI (Recommended)

cd src
python main.py

Original Version (Compatibility)

cd src
python AnalisiOpenData.py

Application Testing

# Run all tests
cd tests
python run_unified_tests.py

# Specific tests
python test_config.py      # Configuration tests
python test_unified.py     # Unified tests
python test_utils.py       # Utility tests

โš™๏ธ Features

  1. ๐Ÿ” Dataset Search: Enter a keyword to filter available datasets
  2. ๐Ÿ“‹ Dataset Selection: Choose the desired dataset from the filtered list
  3. โฌ‡๏ธ Automatic Download: The system automatically downloads CSV data
  4. ๐Ÿ“Š Accident Analysis: If available, analyzes road accident data
  5. ๐Ÿ—บ๏ธ Visualization: Creates interactive maps of accidents

๐Ÿ“ Output

  • data/: JSON files with dataset metadata
  • output/: Output files (CSV, Excel, HTML maps)
    • ๐Ÿ—บ๏ธ mappa_incidenti.html - Interactive map
    • ๐Ÿ“Š output.xlsx - Excel report
    • ๐Ÿ“„ output.csv - Data exported to CSV

๐Ÿ† Improvements Implemented

  1. ๐Ÿ”ง Separation of Concerns: Each module has specific role
  2. ๐Ÿ“ฆ Modular Architecture: Independent and reusable components
  3. โš™๏ธ Configuration Management: Centralized settings
  4. ๐Ÿ›ก๏ธ Error Handling: Robust error handling
  5. ๐Ÿงช Test-Driven: Complete test suite for validation
  6. ๐Ÿ“ Documentation: Detailed documentation
  7. ๐Ÿ”„ Backward Compatibility: Legacy code maintained

๐Ÿ“Š Implemented Improvements

  • โœ… Separation of responsibilities into modules
  • โœ… Robust error handling with exception management
  • โœ… Centralized configuration
  • โœ… Improved user interface
  • โœ… Code documentation
  • โœ… Data validation
  • โœ… Automatic directory management
  • โœ… Complete test suite

โœจ Latest Updates

New Streamlit Web Application v2.0 ๐ŸŽ‰

  • โœ… Interactive Web Interface: Modern Streamlit-based UI for easy data exploration
  • โœ… Advanced Search: Filter datasets by keyword from dati.gov.it
  • โœ… Batch Processing: Analyze all search results simultaneously
  • โœ… Enhanced Maps: Intelligent coordinate detection (latitude/longitude variations)
  • โœ… CSV Analysis Tools: Automatic CSV separator detection
  • โœ… Live Statistics: Real-time data summaries and metrics
  • โœ… Download Support: Export analyzed data as CSV files
  • โœ… Progress Tracking: Visual progress bars for batch operations

Analyzer Module Enhancements

  • โœ… Flexible Coordinate Detection: Supports multiple column naming conventions:
    • Latitude: latitudine, latitude, lat, y_coord, y
    • Longitude: longitudine, longitude, lon, x_coord, x
  • โœ… Multi-Dataset Maps: Create comprehensive geographic visualizations
  • โœ… Enhanced Error Messages: Detailed debug information for troubleshooting
  • โœ… Data Analysis Pipeline: Complete automatic analysis workflow

Data Service Improvements

  • โœ… CSV Separator Detection: Auto-detect ,, ;, \t, | separators
  • โœ… Dataset Retrieval: Integrated methods for package data extraction
  • โœ… Data Cleaning: Automatic duplicate removal and validation
  • โœ… Resource Management: Proper handling of multiple file formats

๐Ÿ—๏ธ Project Structure

Module Architecture

AnalisiOpenData/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ src/                   # Modular source code
โ”‚   โ”œโ”€โ”€ ๐ŸŽฏ main.py              # Main control
โ”‚   โ”œโ”€โ”€ โš™๏ธ config.py              # Centralized configuration (URLs, paths, constants)
โ”‚   โ”œโ”€โ”€ ๐Ÿ”Œ services.py            # API and data services
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ file_manager.py        # File I/O management (JSON, CSV, Excel)
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š analyzer.py            # Accident analysis
โ”‚   โ”œโ”€โ”€ ๐Ÿ’ฌ ui.py                  # User interface
โ”‚   โ””โ”€โ”€ ๐Ÿ“œ AnalisiOpenData.py     # Original code (backup)
โ”‚
โ”œโ”€โ”€ ๐Ÿงช tests/                 # Complete test suite
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ base_test.py           # Base tests
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ test_config.py         # Configuration tests
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ test_unified.py        # Unified tests
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ test_utils.py          # Test utilities
โ”‚   โ””โ”€โ”€ ๐Ÿƒ run_unified_tests.py   # Unified test runner
โ”‚
โ”œโ”€โ”€ ๐Ÿ“Š data/                  # Input data
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š Condizioni.xlsx        # Weather conditions
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ DatiGovIt.json         # Raw data from data.gov.it
โ”‚   โ”œโ”€โ”€ ๐Ÿ“„ DatiGovItFiltrati.json # Filtered data
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ DatiSelezionati.json # Data selected for analysis
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ˆ output/                # Generated output files
โ”‚   โ”œโ”€โ”€ ๐Ÿ—บ๏ธ mappa_incidenti.html   # Interactive map
โ”‚   โ”œโ”€โ”€ ๐Ÿ“Š output.xlsx             # Excel report
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ output.csv             # Data exported to CSV
โ”‚
โ”œโ”€โ”€ ๐Ÿ“– README.md              # Complete documentation
โ””โ”€โ”€ ๐Ÿ“„ LICENSE                # Apache 2.0 License

๐ŸŽฏ Advantages of the New Organization

Test Separation โœ…

  • โœ… Tests isolated in dedicated directory
  • โœ… Do not interfere with production code
  • โœ… Facilitates maintenance and development
  • โœ… Follow Python best practices

Modular Architecture โœ…

  • โœ… Each module has specific responsibility
  • โœ… Reusable and testable code
  • โœ… Easy debugging and maintenance
  • โœ… Extensible for future features

Complete Test Coverage โœ…

  • โœ… Tests for configuration and utilities
  • โœ… Import and structure tests
  • โœ… Component functionality tests
  • โœ… Complete integration tests

๐Ÿ“Š Successfully Completed Tests

  • โœ… Modules: 6/6 source files validated
  • โœ… Configuration: Working settings tests
  • โœ… Functionality: Core components tested
  • โœ… Integration: Complete system validated

๐Ÿงช Testing

Run All Tests

cd tests
python run_unified_tests.py

Individual Tests

cd tests
python test_config.py      # Configuration tests
python test_unified.py     # Unified tests
python test_utils.py       # Utility tests

Project Structure Verification

# Display project structure
tree -I '__pycache__'

# Check main files
ls -la src/
ls -la tests/
ls -la data/
ls -la output/

๐Ÿค Contributing

Guidelines for those who wish to contribute:

Opening an Issue

  • Before opening a new issue, verify that it has not already been reported
  • Clearly describe the problem, expected behavior and actual behavior
  • If possible, attach screenshots, logs or code examples that help clarify the issue

Proposing a Pull Request

  • Fork the repository and create a new branch for your changes
  • Make sure your code is well formatted and doesn't introduce errors
  • Clearly describe the changes in the Pull Request message
  • Link the Pull Request to an Issue, if relevant
  • Respond to comments and review requests from maintainers

Coding Standards

  • Follow the project's style conventions (e.g. PEP8 for Python)
  • If you modify existing functionality, make sure everything continues to work correctly
  • Update documentation, if necessary

Testing

  • If possible, add tests that cover new functionality or fixes
  • Make sure all existing tests continue to pass

Discussion

  • For questions or proposals, open a discussion in the Issues section

๐Ÿ“ Notes on Improvements

Applied Structural Corrections:

  • โœ… Updated file structure to reflect project reality
  • โœ… Corrected test commands to use actually present files
  • โœ… Updated module and component counts
  • โœ… Improved documentation of data and output directories

Observations:

  • ๐Ÿ“‹ The file ouput.xlsx in /output/ contains a spelling error in the name
  • ๐Ÿ”ง Tests could be extended to cover more use cases
  • ๐Ÿ“š Documentation can be enriched with practical examples

๐Ÿ“„ License

This project is distributed under the Apache 2.0 License.

๐Ÿ‘จโ€๐Ÿ’ป Authors

The project was developed by:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก