Skip to content

Latest commit

 

History

History
348 lines (259 loc) · 16 KB

File metadata and controls

348 lines (259 loc) · 16 KB

User Management Service

Runtime: Rust

AWS Services Used: API Gateway, Lambda, SQS, DynamoDB, EventBridge

The user management services manages everything related to user accounts. It allows users to register and login, generating a JWT that is used by other services to authenticate. It also tracks the number of orders a user has placed. It is made up of 2 independent services

  1. The Api provides various API endpoints to register new users, login and retrieve details about a given user
  2. The BackgroundWorker service is an anti-corruption layer that consumes events published by external services, translates them to internal events and processes them

The Rust implementation uses Open Telemetry for all of the tracing, which means trace propagation through SNS/SQS/EventBridge needs to be done manually. Both from a producer and a consumer perspective. All of the logic to propagate traces is held in a shared observability crate. All of the logic is contained in the TracedMessage struct.

Messages are published using the TracedMessage struct as a wrapper, to ensure trace and span id's are consistently sent. The From trait is used at the consumer side to transform the Lambda Event struct SnsEvent, SqsEvent etc back into a TracedMessage.

let mut traced_message: TracedMessage = serde_json::from_str(value.sns.message.as_str()).unwrap();

let trace_id = TraceId::from_hex(traced_message.trace_id.as_str()).unwrap();
let span_id = SpanId::from_hex(traced_message.span_id.as_str()).unwrap();

let span_context = SpanContext::new(
    trace_id,
    span_id,
    TraceFlags::SAMPLED,
    false,
    TraceState::NONE,
);

let inflight_ctx = Context::new().with_remote_span_context(span_context.clone());
tracing::Span::current().set_parent(inflight_ctx.clone());

This README contains relevant instructions for deploying the sample application with each of the available IaC tools. As well as details on any Node specific implementation details when instrumenting with Datadog.

Observability for Asynchronous Systems

Span Links

The default behavious of the Datadog tracer when working with serverless is to automatically create parent-child relationships. For example, if you consume a message from Amazon SNS and that message contains the _datadog trace context, the context is automatically extracted and your Lambda handler is set as a child of the upstream call.

This is useful in some cases, but can cause more confusion by creating traces that are extremely long, or have hundreds of spans underneath them. Span Links are an alternative approach that link together causally related spans, that you don't neccessarily want to include as a parent-child relationship. This can be useful when events are crossing service boundaries, or if you're processing a batch of messages.

To configure Span Links, you can see an example in event_bridge.rs on line 40. The trace and span ID's are parsed from the inbound event, and then used to create a link to the upstream context.

let mut span_links = vec![];
let span_link = cloud_event.generate_span_link();

if let Some(span_link) = span_link {
    span_links.push(span_link);
}

let mut span: global::BoxedSpan = tracer
    .span_builder(
        record
            .source
            .clone()
            .unwrap_or("aws.eventbridge".to_string()),
    )
    .with_kind(SpanKind::Internal)
    .with_start_time(record.time)
    .with_end_time(end_time)
    .with_links(span_links)
    .start_with_context(&tracer, &current_span);

if cloud_event.remote_span_context.is_some() {
    span.add_link(cloud_event.remote_span_context.clone().unwrap(), vec![]);
}

Semantic Conventions

The Open Telemetry Semantic Conventions for Messaging Spans define a set of best practices that all spans related to messaging should follow.

You can see examples of this in handler.rs handler on line 26 for starting a span and here for adding the default attributes.

AWS CDK

There is no CDK for Rust. The CDK implementation uses NodeJS for the IaC, and Rust for the application code

The Datadog CDK Construct simplifies the setup when instrumenting with Datadog. To get started:

npm i --save-dev datadog-cdk-constructs-v2

Once installed, you can use the Construct to configure all of your Datadog settings. And then use the addLambdaFunctions function to instrument your Lambda functions.

const datadogConfiguration = new Datadog(this, "Datadog", {
  nodeLayerVersion: 130,
  extensionLayerVersion: '90',
  site: process.env.DD_SITE,
  apiKeySecret: ddApiKey,
  service,
  version,
  env,
  enableColdStartTracing: true,
  enableDatadogTracing: true,
  captureLambdaPayload: true,
});

This CDK implementation uses a custom InstrumentedFunction L3 construct to ensure all Lambda functions are instrumented correctly and consistently. This uses the RustFunction construct from the cargo-lambda-cdk package.

Deploy

To simplify deployment, all of the different microservices are managed in the same CDK project. This is not recommended in real applications, but simplifies the deployment for demonstration purposes.

Each microservice is implemented as a seperate CloudFormation Stack, and there are no direct dependencies between stacks. Each stack stores relevant resource ARN's (SNS Topic ARN etc) in SSM Parameter Store, and the other stacks dynamically load the ARN's:

const productCreatedTopicArn = StringParameter.fromStringParameterName(
  this,
  "ProductCreatedTopicArn",
  "/node/product/product-created-topic"
);
const productCreatedTopic = Topic.fromTopicArn(
  this,
  "ProductCreatedTopic",
  productCreatedTopicArn.stringValue
);

The Datadog extension retrieves your Datadog API key from a Secrets Manager secret. For this to work, ensure you create a secret in your account containing your API key and set the DD_API_KEY_SECRET_ARN environment variable before deployment.

To deploy all stacks and resources, run:

export DD_API_KEY_SECRET_ARN=<YOUR SECRET ARN>
export DD_SITE=<YOUR PREFERRED DATADOG SITE>
cdk deploy --all --require-approval never

If you wish to deploy individual stacks, you can do that by running the respective command below:

cdk deploy RustSharedStack --require-approval never
cdk deploy RustProductApiStack --require-approval never
cdk deploy RustProductPublicEventPublisherStack --require-approval never
cdk deploy RustProductPricingServiceStack --require-approval never

Cleanup

To cleanup resources run

cdk destroy --all

AWS SAM

The AWS SAM manually addds the Datadog Lambda Extension and sets the required environment variables.

Globals:
  Function:
    Runtime: provided.al2
    Timeout: 29
    MemorySize: 512
    Layers:
      - !Sub arn:aws:lambda:${AWS::Region}:464622532012:layer:Datadog-Extension:66
    Environment:
      Variables:
        ENV: !Ref Env
        DD_ENV: !Ref Env
        DD_API_KEY_SECRET_ARN: !Ref DDApiKeySecretArn
        DD_SITE: !Ref DDSite
        DD_VERSION: !Ref CommitHash
        DD_EXTENSION_VERSION: "next"
        DD_SERVICE: !Ref ServiceName

Ensure you have set the below environment variables before starting deployment:

  • DD_API_KEY_SECRET_ARN: The Secrets Manager Secret ARN holding your Datadog API Key
  • DD_SITE: The Datadog Site to use
  • AWS_REGION: The AWS region you want to deploy to

Deploy

The template.yaml file contains an example of using a nested stack to deploy all 6 backend services in a single command. This is not recommended for production use cases, instead preferring independent deployments. For the purposes of this demonstration, a single template makes test deployments easier.

sam build
sam deploy --stack-name RustTracing --parameter-overrides ParameterKey=DDApiKeySecretArn,ParameterValue="$DD_API_KEY_SECRET_ARN" ParameterKey=DDSite,ParameterValue="$DD_SITE" --resolve-s3 --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND --region $AWS_REGION

Cleanup

Use the below sh script to cleanup resources deployed with AWS SAM.

sam delete --stack-name RustTracing --region $AWS_REGION

Terraform

Terraform does not natively support compiling Rust. Before you deploy you first need to compile and ZIP up the compiled code. The deploy.sh script performs this action. Iterating over all of the Cargo.toml files and running cargo lambda build before zipping up all folders in the output folder.

Configuration

A custom lambda_function module is used to group together all the functionality for deploying Lambda functions. This handles the creation of the CloudWatch Log Groups, and default IAM roles.

The Datadog Lambda Terraform module is used to create and configure the Lambda function with the required extensions, layers and configurations.

IMPORTANT! If you are using AWS Secrets Manager to hold your Datadog API key, ensure your Lambda function has permissions to call the secretsmanager:GetSecretValue IAM action.

module "aws_lambda_function" {
  source  = "DataDog/lambda-datadog/aws"
  version = "1.4.0"

  filename                 = var.zip_file
  function_name            = var.function_name
  role                     = aws_iam_role.lambda_function_role.arn
  handler                  = var.lambda_handler
  runtime                  = "provided.al2023"
  memory_size              = 128
  logging_config_log_group = aws_cloudwatch_log_group.lambda_log_group.name
  source_code_hash = "${filebase64sha256(var.zip_file)}"
  timeout = 29
  environment_variables = merge(tomap({
    "DD_API_KEY_SECRET_ARN" : var.dd_api_key_secret_arn
    "DD_EXTENSION_VERSION": "next"
    "DD_CAPTURE_LAMBDA_PAYLOAD": "true",
    "DD_ENV" : var.env
    "DD_SERVICE" : var.service_name
    "DD_SITE" : var.dd_site
    "DD_VERSION" : var.app_version
    "ENV": var.env
    "RUST_LOG": "info",
    "POWERTOOLS_SERVICE_NAME": var.service_name
    "POWERTOOLS_LOG_LEVEL": "INFO" }),
    var.environment_variables
  )

  datadog_extension_layer_version = 90
}

Deploy

To deploy, first create a file named infra/dev.tfvars. In your tfvars file, you need to add your the AWS Secrets Manager ARN for the secret containing your Datadog API Key.

dd_api_key_secret_arn="<DD_API_KEY_SECRET_ARN>"
dd_site="<YOUR PREFERRED DATADOG SITE>"

There's a single main.tf that contains all 7 backend services as modules. This is not recommended in production, and you should deploy backend services independenly. However, to simplify this demo deployment a single file is used.

The root of the repository contains a Makefile, this will transpile all Rust code, generate the ZIP files and run terraform apply. To deploy the Terraform example, simply run:

You can optionally provide an S3 backend to use as your state store, to do this set the below environment variables and run terraform init

export AWS_REGION=<YOUR PREFERRED AWS_REGION>
export TF_STATE_BUCKET_NAME=<THE NAME OF THE S3 BUCKET>
export ENV=<ENVIRONMENT NAME>
make tf-rust-local

Alternatively, comment out the S3 backend section in `providers.tf'.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.61"
    }
  }
#  backend "s3" {}
}

provider "aws" {
  region = var.region
}

Cleanup

To cleanup all Terraform resources run:

cd infra
terraform destroy --var-file dev.tfvars

Serverless Framework

Datadog provides a plugin to simply configuration of your serverless applications when using the serverless framework. Inside your serverless.yml add a custom.datadog block. The available configuration options are available in the documentation.

IMPORTANT Ensure you add permissions to secretsmanager:GetSecretValue for the Secrets Manager secret holding your Datadog API key

custom:
  datadog:
    apiKeySecretArn: ${param:DD_API_KEY_SECRET_ARN}
    site: ${param:DD_SITE}
    env: ${sls:stage}
    service: ${self:custom.serviceName}
    version: latest
    # Use this property with care in production to ensure PII/Sensitive data is not stored in Datadog
    captureLambdaPayload: true
    propagateUpstreamTrace: true

Deploy

Ensure you have set the below environment variables before starting deployment:

  • DD_API_KEY_SECRET_ARN: The Secrets Manager Secret ARN holding your Datadog API Key
  • DD_SITE: The Datadog site to use
  • AWS_REGION: The AWS region you want to deploy to

Once set, use the below commands to deploy each of the individual backend services on by one.

serverless deploy --stage dev --region=${AWS_REGION} --config serverless-shared.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-api.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-pricing-service.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-api-worker.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-product-event-publisher.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-inventory-acl.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-inventory-ordering-service.yml &&
serverless deploy --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-analytics-service.yml

Cleanup

The same commands can be used to cleanup all resources, but replacing deploy with remove.

serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-analytics-service.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-inventory-ordering-service.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-inventory-acl.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-product-event-publisher.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-api-worker.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-pricing-service.yml &&
serverless remove --param="DD_API_KEY_SECRET_ARN=${DD_API_KEY_SECRET_ARN}" --param="DD_SITE=${DD_SITE}" --stage dev --region=${AWS_REGION} --config serverless-api.yml &&
serverless remove --stage dev --region=${AWS_REGION} --config serverless-shared.yml