Skip to content

sidsinghms/azm-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Observability-Driven Infrastructure Discovery for Azure Migration

Executive Summary

Modern enterprises face significant challenges when planning cloud migrations, with infrastructure discovery often being the most time-consuming and error-prone phase. This document outlines an innovative approach that leverages existing observability platforms as the primary source of truth for infrastructure inventory, dramatically reducing the friction traditionally associated with migration discovery.

The Challenge of Traditional Discovery

Traditional infrastructure discovery methods for cloud migration typically involve:

  • Agent-based scanning tools that require deployment across entire estates.
  • Credential Management requiring firewall changes and credentials.
  • Manual spreadsheet collection prone to errors and quickly outdated.

These approaches often take weeks or months, require significant coordination.

The Observability Advantage

Why Observability Platforms Are Ideal for Migration Discovery

Observability platforms offer unique advantages for migration discovery:

  1. Already Deployed: Agents are already running on production infrastructure
  2. Real-time Accuracy: Data reflects actual running state, not theoretical configurations
  3. Rich Telemetry: Comprehensive performance, configuration, and dependency data
  4. Historical Context: Trending data enables right-sizing decisions
  5. Zero Additional Footprint: No new agents or scanners required
  6. API-First Architecture: Programmatic access to all data

New Relic as a Source of Truth - a POC

This proof of concept demonstrates how New Relic's Infrastructure monitoring capabilities can be harnessed to generate an accurate inventory of on-premises and cloud-hosted infrastructure for Azure migration planning.

New Relic has data exporter APIs that allow for extraction of comprehensive details about infrastructure, applications, and performance metrics needed for migration assessment. We built a python script that takes in the API key of the New Relic account, uses their NerdGraph API to fetch a detailed view of the inventory. The data is extracted using a combination of GraphQL queries to enumerate entities and NRQL queries to aggregate telemetry data.

As part of this POC, we used the API contract, sample data on forums, and used copilot to generate the data we would get on fetching the data from API. This dummy data is the form of CSV files that would be generated by the script on running it against a New Relic account with Infrastructure agents deployed. This allowed us to validate the transformation logic without needing access to a live New Relic account.

The transformation logic is implemented in a separate script that reads the exported CSV files, maps the relevant fields to the Azure Migrate machine schema, and generates JSON files ready for import into Azure Migrate. We then used exporter.ps1 (from the azmc collector code) to load inventory into an Azure Migrate project and generated assessment reports.

Observations

  • As part of the POC, we were able to extract the host machine data, all the config data along with performance metrics, and the tags associated with each host.
  • Each host/application/container has a unique entity ID (GUID) in New Relic that can be used to correlate data across multiple datasets. We use this entity ID to generate a unique guid which serves as the BIOS GUID in the PUT machine payload for Azure Migrate.
  • The data extracted is comprehensive enough to generate a detailed inventory of machines suitable for Azure migration right sized assessment.
  • The entire process from data extraction to Azure Migrate import can be completed in under an hour for environments with hundreds of machines, a significant reduction from traditional methods.
  • The tags and metadata associated with each host in New Relic are preserved in the inventory, allowing for business context to be maintained in the migration planning process.

Extracted Data Points

System Information

  • Hostname and FQDN
  • Operating system type, version, and architecture
  • Kernel versions and distributions
  • BIOS/firmware information
  • System uptime and reliability metrics

Compute Resources

  • CPU cores and socket configuration
  • Processor utilization (average, peak, percentiles)
  • Memory allocation and usage patterns
  • Virtual vs. physical machine detection

Storage Configuration

  • Disk devices and mount points
  • Storage capacity and utilization
  • Filesystem types
  • I/O performance characteristics

Network Topology

  • Network interfaces and configurations
  • IP addressing schemes
  • Network throughput and utilization
  • Error rates and packet loss

Container and Kubernetes Insights

  • Container runtime detection
  • Container resource consumption
  • Kubernetes cluster, namespace, and pod inventory
  • Container image registry information

Business Context

  • Application tags and metadata
  • Environment classifications (prod, dev, test)
  • Team ownership information
  • Regional deployment data

Technical Components

The Python script leverages New Relic's NerdGraph API to extract:

  • Entity Discovery: Using GraphQL queries to enumerate all INFRA/HOST entities
  • NRQL Aggregation: Executing custom queries against telemetry data
  • Multi-Dataset Export: Parallel extraction of system, storage, network, and container metrics
  • Unified Inventory Generation: Combining multiple data sources into comprehensive host profiles

Low-Friction Discovery Process

Step-by-Step Workflow

  1. Prerequisites (One-time setup)

    export NEW_RELIC_API_KEY="NRAK-..."
    export NEW_RELIC_ACCOUNT_ID="1234567"
  2. Data Extraction (Minutes, not weeks)

    python fetch_hosts.py --since "30d" --unified --format csv
  3. Transformation (Automated)

    python new_relic_csv_to_azmachine_payload.py --input ./new_relic/ --output ./out/az_machines
  4. Azure Import (Direct integration)

    • Generated JSON files ready for Azure Migrate import
    • Automatic creation of machine inventory in Azure

Non Goals for this POC

  • Application inventory extraction
  • Dependency mapping between services
  • Software inventory and web app extraction

Benefits and ROI

Pre sales Acceleration

  • Time Savings: Reduce discovery/assessment from weeks to hours.
  • Cost Reduction: Eliminate need for additional discovery tools.
  • Credential Management: No need for elevated access or firewall changes.
  • Accuracy Improvement: Real-time data from production systems.
  • Privacy and Compliance: The exported payload can be reviewed for sensitive data before import.

Conclusion

The POC demonstrated that New Relic Infrastructure monitoring data can be effective alternative, used as a source of truth for Azure migration discovery. The pre-sales discussions can hugely benefit from this approach as it drastically reduces the time and effort required to generate accurate migration inventories to just under an hour.

  • This tool shows promising approach for pre-sales activities. With potential to have tagged resources, applications, and business context metadata present in the inventory. It can be a very quick first pass to generate assessment reports.
  • The software inventory and dependency mapping also comes out of the box with such platforms which can serve to further add dimensions to the pre-sales talks.
  • The extension of this approach to other observability platforms like Datadog, Dynatrace, Prometheus, and Elastic Stack will require some adaptation given their rich API ecosystems. While AI can not automate this part entirely, it can significantly accelerate the development of connectors and transformation logic.

About

Explore observability platform as a source of discovery for migration planning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors