Day-Ahead Electricity Price Forecasting (Bergen)

I built this project to test myself on data engineering fundamentals combined with a small, realistic forecasting problem.

The goal was not to build a highly complex machine learning model, but to understand how a real-world pipeline would look end to end: API ingestion → cleaning and structuring data → running a forecasting model → preparing outputs for a simple dashboard.

Electricity prices turned out to be a good fit for this purpose. The data is real, time-dependent, publicly available, and naturally suited for incremental ingestion. Combined with weather data, it allowed me to explore both engineering and modeling decisions without artificial assumptions.

Data source

Hourly electricity spot prices are retrieved from the publicly available and generously maintained API provided by
https://www.hvakosterstrommen.no

The data ultimately originates from the ENTSO-E Transparency Platform:
https://transparency.entsoe.eu

The openness and reliability of these sources make them well suited for experimentation, validation, and future deployment in a production-like setting.

Weather data (air temperature) is included as an additional explanatory variable to capture demand-side effects.

Background and related literature

For a more academic perspective on electricity price forecasting, the following papers provide useful context and motivation:

Jaehnert, S., Farahmand, H., & Doorman, G. L. (2009).
Modelling of Prices Using the Volume in the Norwegian Regulating Power Market.
NTNU, IEEE-style conference paper.
Jensen, M. W., Ren, H., & Shalaginov, A. (2024).
Day-ahead Electricity Price Forecasting of Elspot Markets in Norway.
ENERGY 2024 (IARIA), Kristiania University College.
Mabilangan, G. (2025).
Statistical Analysis of Electricity Prices in Southern Norway.
Noroff School of Technology and Digital Media, March 2025.
Scott, O. H., & Mellander, W. (2025).
Scenario-Based Long-Term Electricity Price Forecasting in NO4 (Norway) Using a Hybrid Machine Learning Model.
Bachelor Thesis, KTH Royal Institute of Technology.

These works highlight the importance of autoregressive effects, calendar structure, and external drivers such as weather, hydropower availability, global fuel prices, and exchange rates while also emphasizing the trade-offs between accuracy, stability, and interpretability.

Notes

This project is intentionally iterative. The current model is not presented as a final or optimal solution, but as a well-reasoned and reproducible baseline that can be extended as more data sources and operational constraints are introduced.

🔗 Live dashboard:
https://www.syedamjadali.no/portfolio.html

Project evolution

I started this project in a purely analytical way.

Initially, I worked with a static dataset covering electricity prices and weather data from January 2023 to January 2026. Using Jupyter Notebook, I focused on:

Exploratory data analysis
Understanding seasonality and intraday patterns
Testing different feature choices
Comparing simple statistical models

After experimenting with several specifications, I intentionally settled on a simple but stable OLS-based model. The objective at this stage was not to optimize performance aggressively, but to establish a baseline that behaves sensibly and is easy to reason about.

Once the modeling side felt solid enough, I moved on to the main learning goal of the project: turning an analysis notebook into a production-style pipeline.

Pipeline overview

I migrated the project to Azure Databricks and broke the large notebook into smaller, purpose-specific notebooks aligned with a bronze–silver–gold (medallion) architecture.

What started as a linear notebook gradually became a non-linear DAG of dependent tasks, closer to what I would expect in a real data platform.

Bronze

One-time historical backfill from CSV
Incremental ingestion from electricity price and weather APIs
Append-only storage using Delta tables
Ingestion timestamps preserved for auditability

Silver

Incrementally maintained, cleaned datasets
Deduplication and late-arriving data handling
Clear separation between price data and weather data
Joins performed only when required for modeling

Gold

Model-ready training datasets
Stored and versioned model parameters
Day-ahead forecasts written as reproducible outputs

At first, I focused mainly on getting the jobs to run correctly. Once the pipeline was stable, I went back to improve:

Table partitioning
Incremental logic
Proper use of Delta Lake MERGE patterns
Avoiding unnecessary full rewrites

This iterative approach mirrors how I would expect a real system to evolve over time.

Architecture and design principles

Some of the principles I consciously tried to apply while building the pipeline:

Incremental processing instead of recomputing full history
Idempotent transformations using Delta Lake
One notebook per logical responsibility
Clear separation between ingestion, transformation, and modeling
Ability to inspect table history and understand how data evolved

The pipeline is intentionally kept readable rather than overly abstract.

Data sources

Electricity prices
Hourly spot prices retrieved from:
https://www.hvakosterstrommen.no

Underlying source:
ENTSO-E Transparency Platform
https://transparency.entsoe.eu

Weather data
Hourly air temperature data retrieved from the Norwegian Meteorological Institute (Frost API).

Both data sources are free, reliable, and well suited for incremental ingestion, which made them ideal for this project.

Modeling objective

The modeling task is deliberately constrained:

Forecast the next 24 hours of electricity prices
Hourly resolution
Use only information that would realistically be available at prediction time
Keep the model transparent and explainable

This constraint influenced both feature design and pipeline structure.

Modeling approach

The current model is intentionally simple:

Ordinary Least Squares (OLS)
Hour-of-day and weekday effects
Autoregressive structure using lagged prices
Weather effects via air temperature

Model parameters are:

Trained on gold-layer datasets
Versioned monthly
Stored explicitly in Delta tables

The model is not treated as the “final answer”. Instead, it acts as a baseline that the pipeline is built around.

Operational workflow

A typical execution flow looks like this:

Bronze ingestion jobs append new hourly price and weather data
Silver tables incrementally update trusted datasets
Silver joins produce feature tables for modeling
Gold training datasets are refreshed
Model parameters are retrained on a monthly cadence
Day-ahead forecasts are generated and stored

This structure is reflected in the non-linear Databricks Job graph.

Results and evaluation

The model captures intraday price patterns reasonably well under normal conditions.
Errors are largest during sudden price spikes, which is expected given the limited information set and the linear model structure.

Predicted and realized prices are stored separately, making it possible to evaluate performance transparently and prepare the data for future dashboarding.

Project status and next steps

Planned improvements include:

Reworking the autoregressive structure (for example, testing t-168 weekly lags instead of weekday dummy variables)
Replacing weather imputation with true 24-hour weather forecast data
Updating the Databricks pipeline to support forward-looking feature generation
Separating forecast generation from historical backtesting so both can coexist cleanly

Once these changes are implemented, the system will be able to:

Generate a genuine day-ahead (24h) forecast using only information available at prediction time
Store forecasts independently from realized prices
Continuously evaluate forecast accuracy as actual prices become available

Notes again 😃

This project is intentionally iterative. The current model is not presented as a final or optimal solution, but as a well-reasoned and reproducible baseline that can be extended as more data sources and operational constraints are introduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Day-Ahead Electricity Price Forecasting (Bergen)

Data source

Background and related literature

Notes

Project evolution

Pipeline overview

Bronze

Silver

Gold

Architecture and design principles

Data sources

Modeling objective

Modeling approach

Operational workflow

Results and evaluation

Project status and next steps

Notes again 😃

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
(Clone old one) 09_gold_day_ahead_price_forecast.ipynb		(Clone old one) 09_gold_day_ahead_price_forecast.ipynb
(Clone old) 08_gold_day_ahead_price_forecast.ipynb		(Clone old) 08_gold_day_ahead_price_forecast.ipynb
01_bronze_ingest_prices.ipynb		01_bronze_ingest_prices.ipynb
02_bronze_ingest_weather_observed.ipynb		02_bronze_ingest_weather_observed.ipynb
03_silver_prices_cleaned.ipynb		03_silver_prices_cleaned.ipynb
04_silver_weather_cleaned.ipynb		04_silver_weather_cleaned.ipynb
05_silver_price_weather_joined.ipynb		05_silver_price_weather_joined.ipynb
06_silver_price_features.ipynb		06_silver_price_features.ipynb
07_gold_training.ipynb		07_gold_training.ipynb
07a_gold_actual_prices.ipynb		07a_gold_actual_prices.ipynb
08_gold_model_train_monthly.ipynb		08_gold_model_train_monthly.ipynb
09_gold_day_ahead_price_forecast.ipynb		09_gold_day_ahead_price_forecast.ipynb
README.md		README.md
day_ahead_forecast_job.ipynb		day_ahead_forecast_job.ipynb
eda_and_model_exploration.ipynb		eda_and_model_exploration.ipynb

Folders and files

Latest commit

History

Repository files navigation

Day-Ahead Electricity Price Forecasting (Bergen)

Data source

Background and related literature

Notes

Project evolution

Pipeline overview

Bronze

Silver

Gold

Architecture and design principles

Data sources

Modeling objective

Modeling approach

Operational workflow

Results and evaluation

Project status and next steps

Notes again 😃

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages