A AWS SAM template for AWS Forecast process automation using AWS Step Functions state machine, based on a real case study: Forecast of new daily positive based on COVID-19 italian datasets.
This AWS SAM template is running in my AWS Account and push daily the forecast in this repository: https://github.com/heyteacher/COVID-19 (folder dati_json_forecast)
Furtermore datasets and forecasts are visualized by this charts dashboard https://heyteacher.github.io/COVID-19 an Angular 9 project hosted in this repository https://github.com/heyteacher/ng-covid-19-ita-charts
This AWS SAM template is general purpose, so can be adapted to other forecast based on AWS Forecast removing or replacing specific use case tasks.
It's difficult automate AWS Forecast process because:
-
AWS Forecasttasks are long running proccess and cannot be start until previous step is succesfully finish -
AWS Forecastdoesn't implements push notification (for example viaAWS SNS) to inform the end of a task, so it isn't possible do create e event driven flow ofAWS Forecasttasks. It's only possible to poll entity status after creation in order to understand if it's succesfully created.
Why automate AWS Forecast task using AWS Step Functions?
Because AWS Step Functions is a Serverless State Machine which orchestrate AWS Lambda implements AWS Forecast api calls managing AWS Forecast entities, and support Retry, Fallback and other flow controls.
Only the first state machine execution creates the persistent entities Dataset, Dataset Group and Predictor, while during daily next executions, the forecast will update creating Forecast Dataset Import Job, Forecast and Export Job
The AWS Step Functions is launched by a AWS Cloud Watch Event Rule which start following the rule expression defined into StateMachineEventRuleScheduleExpression parameter. But the forecast is generated only in day of week defined into ForecastDaysOfWeekExecution parameter.
Below the daily flow of AWS Step Functions steps:
-
Extend Datasetis a specific task of case study, you can drop it. Download from daily official dataset, extend it and push in configured Github repository. It retries until a new dataset is pushed into official repository -
CheckDaysOfWeekForecastExecis a simple inline lambda which setisToExecuteForecast= true if the day of week of today is inForecastDaysOfWeekExecutionparameter -
ChoiceForecastExecutionis a choice onisToExecuteForecast: iftruegenerate forecast otherwise go toDonetask and exit -
CheckDatasetExistis the start state, check if the Dataset and (Dataset Group) exists.- If it doesn't exist means this is the first execution.
CreateDatasecreate the Dataset and Dataset Group
- If it doesn't exist means this is the first execution.
-
WaitGithubRawRefreshanother specific task of case study which can be dopped. It wait some minute in order to be sure the github raw cache is refreshed after push -
CreateDatasetImportJobdownloads from configured Github the dataset (new daily COVID-19 time series), trasform the data in order to match che Forecast dataset structure, upload into the S3 Input Bucket and create the daily Dataset Import Job -
CheckPredictorExistschecks if the predictor exists.-
If predictor doesn't exist (means this is the firt execution) run
CreatePredictorwhich create the Predictor. It will be create if there is at least one Dataset Import Job loaded. Then runWaitPredictorCreationwait 50 minutes in order to be sure of Predictor creation -
otherwise run
WaitDatasetImportwhich sleep 5 minutes
-
-
CreateForecastcreates the daily Forecast based on Predictor updated by daily Dataset Import Job -
WaitForecastCreationsleeps 15 minutes in order be sure of forecast creation is finished -
CreateForecastExportJobexports the daily forecast inS3 Output Bucket. The upload wake upPushForecastInGithubFunctionwhich download the forecast, ad push into configured Github repository (thisAWS Lambdais specific of study case) v -
WaitExportJobsleeps 3 minutes in order to be sure of export is finished -
DeleteDatasetImportExportJobdelete the daily Dataset Import Job and the daily Export Job -
WaitDeleteDatasetImportExportJobsleep 5 minutes in order to be sure of deletion is finished -
DeleteForecastdeletes the daily Forecast -
Donethe end state of workflow
Some tasks retries after a failure in order to wait that previous step is succesfully finished.
The AWS SAM Template assign the minimum permission to each AWS Lambda Functions in order to complete his task. All the entities (S3 Bucket, AWS Lambda Function, IAM Roles, AWS Step Functions, Event Rule) are created/updated/deleted by AWS SAM Template stack, so no manual activies is needes.
-
this project is ispired by https://github.com/aws-samples/amazon-automated-forecast
-
BE CAREFULL if yoy try to create a stack from this
SAM Template. First execuction costs 4,00 EUR circa and next daily execution costs 1,00 EUR circa. -
I already run a stack in my
AWS Accountwhich produces forecast here https://github.com/heyteacher/COVID-19. So you can support this project making a donation -
Only
AWS ForecastentitiesPredictor, the firstDataset Import Job,DatasetandDataset Groupmust be deleted manually if you decide to deleteAWS SAM Templatestack. -
All
AWS Lambdaare implemented inNodeJs 12.X -
AWS Forecastdoesn't implement epidemiological forecasting scenario like COVID-19 Italian new cases series, so the algorithm is choosen by PerformAutoML=True. I'm not an expert, so help is appreciated in algorithm tuning for these use case https://docs.aws.amazon.com/forecast/index.html -
I spent a lot of time to improve the
AWS SAM Templatebut I'm sure it could be better. So do not esitate so submit Issue or Pull Request
-
install
nodejsaws-cliaws-sam-clidocker -
generare
aws_ac-cess_key_idandaws_secret_access_keyfrom a AWS user with the permissions for create/update/delete CloudFormation stacks -
create the github repository
<GITHUB_REPO>in your account<GITHUB_USER> -
generate a
<GITHUB_TOKEN>in https://github.com/settings/tokens with scoperepo -
to test locally lambda functions (for example
ExtendDataFunction)sam local invoke ExtendDataFunction \ --parameter-overrides GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER>Useful bash scripts
sam_local_invoke.sh.templateandsam_local_invoke_push_github.shcan be customized in order to run locally lambda functions
Useful bash script deploy_stack.sh.template can be customized in order to automate stack deploy (steps package and deploy)
-
delete old stack
aws cloudformation delete-stack --stack-name forecast-automation-covid-19-ita -
package
aws cloudformation package --template-file template.yaml \ --output-template-file packaged.yaml \ --s3-bucket <SAM_TEMPLATE_BUCKET> -
deploy
aws cloudformation deploy --template-file packaged.yaml \ --stack-name forecast-automation-covid-19-ita \ --capabilities CAPABILITY_IAM \ --parameter-overrides GitHubToken=<GITHUB_TOKEN> GitHubRepo=<GITHUB_REPO> GitHubUser=<GITHUB_USER> -
show stack events
aws cloudformation describe-stack-events --stack-name forecast-automation-covid-19-ita -
tail lambda logs (for example ExtendDataFunction)
sam logs -n ExtendDataFunction --stack-name forecast-automation-covid-19-ita --tail
