MicrosoftDocs
diff --git a/‎articles/ai/advanced-retrieval-augmented-generation.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai/advanced-retrieval-augmented-generation.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai/augment-llm-rag-fine-tuning.md‎
Lines changed: 4 additions & 4 deletions b/‎articles/ai/augment-llm-rag-fine-tuning.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai/gen-ai-concepts-considerations-developers.md‎
Lines changed: 7 additions & 7 deletions b/‎articles/ai/gen-ai-concepts-considerations-developers.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎articles/ai/includes/scaling-load-balancer-capacity.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai/includes/scaling-load-balancer-capacity.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai/includes/scaling-load-balancer-cleanup-azure-api-management.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai/includes/scaling-load-balancer-cleanup-azure-api-management.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai/includes/scaling-load-balancer-cleanup-azure-container-apps.md‎
Lines changed: 2 additions & 2 deletions b/‎articles/ai/includes/scaling-load-balancer-cleanup-azure-container-apps.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai/includes/scaling-load-balancer-introduction-azure-api-management.md‎
Lines changed: 3 additions & 3 deletions b/‎articles/ai/includes/scaling-load-balancer-introduction-azure-api-management.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/ai/includes/scaling-load-balancer-introduction-azure-container-apps.md‎
Lines changed: 4 additions & 4 deletions b/‎articles/ai/includes/scaling-load-balancer-introduction-azure-container-apps.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎articles/ai/includes/scaling-load-balancer-logs-azure-container-apps.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai/includes/scaling-load-balancer-logs-azure-container-apps.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai/includes/scaling-load-balancer-procedure-azure-api-management.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai/includes/scaling-load-balancer-procedure-azure-api-management.md‎
Lines changed: 1 addition & 1 deletion
@@ -1,7 +1,7 @@
 ---
 title: Build Advanced Retrieval-Augmented Generation Systems
 description: As a developer, learn about real-world considerations and patterns for retrieval-augmented generation (RAG)-based chat systems.
-ms.date: 07/31/2025
+ms.date: 01/30/2026
 ms.topic: how-to
 ms.custom: build-2024-intelligent-apps
 ms.subservice: intelligent-apps
@@ -296,7 +296,7 @@ For more information, see these articles:
 
 ### Testing and verifying the safeguards
 
-_Red-teaming_ is key—it means to act like an attacker to find weak spots in the system. This step is especially important to stop jailbreaking. For tips on planning and managing red teaming for responsible AI, see [Planning red teaming for large language models (LLMs) and their applications](/azure/ai-foundry/openai/concepts/red-teaming).
+_Red-teaming_ is key—it means to act like an attacker to find weak spots in the system. This step is especially important to stop jailbreaking. For tips on planning and managing red teaming for responsible AI, see [Planning red teaming for large language models (LLMs) and their applications](/azure/ai-foundry/openai/concepts/red-teaming?view=foundry&preserve-view=true).
 
 Developers should test RAG system safeguards in different scenarios to make sure they work. This step makes the system stronger and also helps fine-tune responses to follow ethical standards and rules.
 
 
@@ -1,7 +1,7 @@
 ---
 title: Augment LLMs with RAGs or Fine-Tuning
 description: Get a conceptual introduction to creating retrieval-augmented generation (RAG)-based chat systems, with an emphasis on integration, optimization, and ethical considerations for delivering contextually relevant responses.
-ms.date: 07/31/2025
+ms.date: 01/30/2026
 ms.topic: concept-article
 ms.custom: build-2024-intelligent-apps
 ms.collection: ce-skilling-ai-copilot
@@ -23,7 +23,7 @@ The next sections break down both methods.
 
 RAG enables the key "chat over my data" scenario. In this scenario, an organization has a potentially large corpus of textual content, like documents, documentation, and other proprietary data. It uses this corpus as the basis for answers to user prompts.
 
-RAG lets you build chatbots that answer questions using your own documents. Here's how it works:
+RAG lets you build chatbots or agents that answer questions using your own documents. Here's how it works:
 
 1. Store your documents (or parts of them, called *chunks*) in a database
 2. Create an *embedding* for each chunk; a list of numbers that describe it
@@ -51,7 +51,7 @@ One way to create an embedding is to send your content to the Azure OpenAI Embed
 
 All these numbers together show where the content sits in a multi-dimensional space. Imagine a 3D graph, but with hundreds or thousands of dimensions. Computers can work with this kind of space, even if we can’t draw it.
 
-The [Tutorial: Explore Azure OpenAI in Azure AI Foundry Models embeddings and document search](/azure/ai-foundry/openai/tutorials/embeddings?tabs=python-new%2Ccommand-line&pivots=programming-language-python) provides a guide on how to use the Azure OpenAI Embeddings API to create embeddings for your documents.
+The [Tutorial: Explore Azure OpenAI in Microsoft Foundry Models embeddings and document search](/azure/ai-foundry/openai/tutorials/embeddings?view=foundry&tabs=command-line&pivots=programming-language-python&preserve-view=true) provides a guide on how to use the Azure OpenAI Embeddings API to create embeddings for your documents.
 
 #### Storing the vector and content
 
@@ -118,7 +118,7 @@ Fine-tuning also has some challenges:
 
 
 
-[Customize a model through fine-tuning](/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython-new&pivots=programming-language-studio) explains how to fine-tune a model.
+[Customize a model through fine-tuning](/azure/ai-foundry/openai/how-to/fine-tuning?view=foundry&tabs=oai-sdk%2Cazure-openai&pivots=programming-language-studio&preserve-view=true) explains how to fine-tune a model.
 
 ## Fine-tuning vs. RAG
 
 
@@ -1,7 +1,7 @@
 ---
 title: Key concepts and considerations in generative AI
 description: Learn about the limitations of large language models (LLMs) and how to get the best results by modifying prompts, building an inference pipeline, and adjusting API call parameters.
-ms.date: 07/31/2025
+ms.date: 01/30/2026
 content_well_notification: 
   - AI-contribution
 ai-usage: ai-assisted
@@ -64,9 +64,9 @@ For developers, the following libraries help estimate token counts for prompts a
 
 ### Token usage affects billing
 
-Each Azure OpenAI API has a different billing methodology. For processing and generating text with the Chat Completions API, you're billed based on the number of tokens you submit as a prompt and the number of tokens that are generated as a result (completion).
+Each Azure OpenAI API has a different billing methodology. For processing and generating text with the Responses or Chat Completions API, you're billed based on the number of tokens you submit as a prompt and the number of tokens that are generated as a result (completion).
 
-Each LLM model (for example, GPT-4.1, GPT-4o, or GPT-4o mini) usually has a different price, which reflects the amount of computation required to process and generate tokens. Many times, price is presented as "price per 1,000 tokens" or "price per 1 million tokens."
+Each LLM model (for example, GPT-5.2, or GPT-5.2-mini) usually has a different price, which reflects the amount of computation required to process and generate tokens. Many times, price is presented as "price per 1,000 tokens" or "price per 1 million tokens."
 
 This pricing model has a significant effect on how you design the user interactions and the amount of preprocessing and post-processing you add.
 
@@ -125,7 +125,7 @@ For information about the specific steps to take to build an inference pipeline,
 
 Beyond programmatically modifying the prompt, creating an inference pipeline, and other techniques, more details are discussed in [Augmenting a large-language model with retrieval-augmented generation and fine-tuning](augment-llm-rag-fine-tuning.md). Also, you can modify parameters when you make calls to the Azure OpenAI API.
 
-To review required and optional parameters to pass that can affect various aspects of the completion, see the [Chat endpoint documentation](https://platform.openai.com/docs/api-reference/chat/create). If you're using an SDK, see the SDK documentation for the language you use. You can experiment with the parameters in the [Playground](https://platform.openai.com/playground/chat).
+Here are some of the key parameters you can adjust to influence the model's output:
 
 - **`Temperature`**: Control the randomness of the output the model generates. At zero, the model becomes deterministic, consistently selecting the most likely next token from its training data. At a temperature of 1, the model balances between choosing high-probability tokens and introducing randomness into the output.
 
@@ -145,7 +145,7 @@ To review required and optional parameters to pass that can affect various aspec
 
 In addition to keeping the LLM's responses bound to specific subject matter or domains, you also likely are concerned about the kinds of questions your users are asking of the LLM. It's important to consider the kinds of answers it's generating.
 
-First, API calls to Microsoft Azure OpenAI Services automatically filter content that the API finds potentially offensive and reports this back to you in many filtering categories.
+First, API calls to Microsoft Azure OpenAI Models in Microsoft Foundry automatically filter content that the API finds potentially offensive and reports this back to you in many filtering categories.
 
 You can directly use the [Content Moderation API](/azure/ai-services/content-moderator/api-reference) directly to check any content for potentially harmful content.
 
@@ -155,14 +155,14 @@ Then, you can use [Azure AI Content Safety](/azure/ai-services/content-safety/ov
 
 AI agents are a new way to build generative AI apps that work on their own. They use LLMs to read and write text, and they can also connect to outside systems, APIs, and data sources.
 AI agents can manage complex tasks, make choices using real-time data, and learn from how people use them.
-For more information about AI agents, see [Quickstart: Create a new agent](/azure/ai-foundry/agents/quickstart?pivots=programming-language-python-azure).
+For more information about AI agents, see [Quickstart: Create a new agent](/azure/ai-foundry/agents/quickstart?view=foundry-classic&pivots=programming-language-python-azure&preserve-view=true).
 
 ### Tool calling
 
 AI agents can use outside tools and APIs to get information, take action, or connect with other services. This feature lets them do more than just generate text and handle more complex tasks.
 
 For example, an AI agent can get real-time weather updates from a weather API or pull details from a database based on what a user asks.
-For more information about tool calling, see [Tool calling in Azure AI Foundry](/azure/ai-foundry/agents/how-to/tools/overview).
+For more information about tool calling, see [Discover and manage tools in the Foundry tool catalog (preview)](/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry&preserve-view=true).
 
 ### Model Context Protocol (MCP)
 
 
@@ -1,13 +1,13 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ---
 
 ## Configure the TPM quota
 
-By default, each of the Azure OpenAI instances in the load balancer is deployed with a capacity of 30,000 tokens per minute (TPM). You can use the chat app with the confidence that it scales across many users without running out of quota. Change this value when:
+By default, each of the Azure OpenAI Models in Microsoft Foundry instances in the load balancer is deployed with a capacity of 30,000 tokens per minute (TPM). You can use the chat app with the confidence that it scales across many users without running out of quota. Change this value when:
 
 * You get deployment capacity errors: Lower the value.
 * You need higher capacity: Raise the value.
 
@@ -1,7 +1,7 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ---
 
 
@@ -1,7 +1,7 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ---
 
@@ -27,7 +27,7 @@ azd down --purge --force
 
 The switches provide:
 
-* `purge`: Deleted resources are immediately purged so that you can reuse the Azure OpenAI Service tokens per minute.
+* `purge`: Deleted resources are immediately purged so that you can reuse the Azure OpenAI Models in Microsoft Foundry tokens per minute.
 * `force`: The deletion happens silently, without requiring user consent.
 
 ### Clean up GitHub Codespaces and Visual Studio Code
 
@@ -1,11 +1,11 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ---
 
-Learn how to add enterprise-grade load balancing to your application to extend the chat app beyond the Azure OpenAI Service token and model quota limits. This approach uses Azure API Management to intelligently direct traffic between three Azure OpenAI resources.
+Learn how to add enterprise-grade load balancing to your application to extend the chat app beyond the Azure OpenAI Models in Microsoft Foundry token and model quota limits. This approach uses Azure API Management to intelligently direct traffic between three Azure OpenAI resources.
 
 This article requires you to deploy two separate samples:
 
@@ -19,7 +19,7 @@ This article requires you to deploy two separate samples:
 
 ## Architecture for load balancing Azure OpenAI with Azure API Management
 
-Because the Azure OpenAI resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
+Because the Azure OpenAI Models in Microsoft Foundry have specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
 
 :::image type="content" source="../media/get-started-scaling-load-balancer-azure-api-management/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with an Azure OpenAI resource highlighted.":::
 
 
@@ -1,11 +1,11 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 12/20/2024
+ms.date: 01/30/2026
 ms.service: azure
 ---
 
-Learn how to add load balancing to your application to extend the chat app beyond the Azure OpenAI Service token and model quota limits. This approach uses Azure Container Apps to create three Azure OpenAI endpoints and a primary container to direct incoming traffic to one of the three endpoints.
+Learn how to add load balancing to your application to extend the chat app beyond the Azure OpenAI Models in Microsoft Foundry token and model quota limits. This approach uses Azure Container Apps to create three Azure OpenAI endpoints and a primary container to direct incoming traffic to one of the three endpoints.
 
 This article requires you to deploy two separate samples:
 
@@ -25,9 +25,9 @@ This article requires you to deploy two separate samples:
 
 ## Architecture for load balancing Azure OpenAI with Azure Container Apps
 
-Because the Azure OpenAI resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
+Because the Azure OpenAI Models in Microsoft Foundry resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI Models in Microsoft Foundry resource is prone to have conversation failures because of those limits.
 
-:::image type="content" source="../media/get-started-scaling-load-balancer-azure-container-apps/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with the Azure OpenAI resource highlighted.":::
+:::image type="content" source="../media/get-started-scaling-load-balancer-azure-container-apps/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with the Azure OpenAI Models in Microsoft Foundry resource highlighted.":::
 
 To use the chat app without hitting those limits, use a load-balanced solution with Container Apps. This solution seamlessly exposes a single endpoint from Container Apps to your chat app server.
 
 
@@ -1,6 +1,6 @@
 ---
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ms.custom:
   - overview
 
@@ -1,7 +1,7 @@
 ---
 ms.custom: overview
 ms.topic: include
-ms.date: 06/26/2025
+ms.date: 01/30/2026
 ms.service: azure
 ---