Skip to content

Commit 50a9b98

Browse files
Merge pull request #8472 from MicrosoftDocs/main
Auto Publish – main to live - 2026-02-02 18:00 UTC
2 parents ed7bf4b + bbc4d48 commit 50a9b98

23 files changed

Lines changed: 73 additions & 75 deletions

articles/ai/advanced-retrieval-augmented-generation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Build Advanced Retrieval-Augmented Generation Systems
33
description: As a developer, learn about real-world considerations and patterns for retrieval-augmented generation (RAG)-based chat systems.
4-
ms.date: 07/31/2025
4+
ms.date: 01/30/2026
55
ms.topic: how-to
66
ms.custom: build-2024-intelligent-apps
77
ms.subservice: intelligent-apps
@@ -296,7 +296,7 @@ For more information, see these articles:
296296

297297
### Testing and verifying the safeguards
298298

299-
_Red-teaming_ is key—it means to act like an attacker to find weak spots in the system. This step is especially important to stop jailbreaking. For tips on planning and managing red teaming for responsible AI, see [Planning red teaming for large language models (LLMs) and their applications](/azure/ai-foundry/openai/concepts/red-teaming).
299+
_Red-teaming_ is key—it means to act like an attacker to find weak spots in the system. This step is especially important to stop jailbreaking. For tips on planning and managing red teaming for responsible AI, see [Planning red teaming for large language models (LLMs) and their applications](/azure/ai-foundry/openai/concepts/red-teaming?view=foundry&preserve-view=true).
300300

301301
Developers should test RAG system safeguards in different scenarios to make sure they work. This step makes the system stronger and also helps fine-tune responses to follow ethical standards and rules.
302302

articles/ai/augment-llm-rag-fine-tuning.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Augment LLMs with RAGs or Fine-Tuning
33
description: Get a conceptual introduction to creating retrieval-augmented generation (RAG)-based chat systems, with an emphasis on integration, optimization, and ethical considerations for delivering contextually relevant responses.
4-
ms.date: 07/31/2025
4+
ms.date: 01/30/2026
55
ms.topic: concept-article
66
ms.custom: build-2024-intelligent-apps
77
ms.collection: ce-skilling-ai-copilot
@@ -23,7 +23,7 @@ The next sections break down both methods.
2323

2424
RAG enables the key "chat over my data" scenario. In this scenario, an organization has a potentially large corpus of textual content, like documents, documentation, and other proprietary data. It uses this corpus as the basis for answers to user prompts.
2525

26-
RAG lets you build chatbots that answer questions using your own documents. Here's how it works:
26+
RAG lets you build chatbots or agents that answer questions using your own documents. Here's how it works:
2727

2828
1. Store your documents (or parts of them, called *chunks*) in a database
2929
2. Create an *embedding* for each chunk; a list of numbers that describe it
@@ -51,7 +51,7 @@ One way to create an embedding is to send your content to the Azure OpenAI Embed
5151

5252
All these numbers together show where the content sits in a multi-dimensional space. Imagine a 3D graph, but with hundreds or thousands of dimensions. Computers can work with this kind of space, even if we can’t draw it.
5353

54-
The [Tutorial: Explore Azure OpenAI in Azure AI Foundry Models embeddings and document search](/azure/ai-foundry/openai/tutorials/embeddings?tabs=python-new%2Ccommand-line&pivots=programming-language-python) provides a guide on how to use the Azure OpenAI Embeddings API to create embeddings for your documents.
54+
The [Tutorial: Explore Azure OpenAI in Microsoft Foundry Models embeddings and document search](/azure/ai-foundry/openai/tutorials/embeddings?view=foundry&tabs=command-line&pivots=programming-language-python&preserve-view=true) provides a guide on how to use the Azure OpenAI Embeddings API to create embeddings for your documents.
5555

5656
#### Storing the vector and content
5757

@@ -118,7 +118,7 @@ Fine-tuning also has some challenges:
118118

119119

120120

121-
[Customize a model through fine-tuning](/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython-new&pivots=programming-language-studio) explains how to fine-tune a model.
121+
[Customize a model through fine-tuning](/azure/ai-foundry/openai/how-to/fine-tuning?view=foundry&tabs=oai-sdk%2Cazure-openai&pivots=programming-language-studio&preserve-view=true) explains how to fine-tune a model.
122122

123123
## Fine-tuning vs. RAG
124124

articles/ai/gen-ai-concepts-considerations-developers.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Key concepts and considerations in generative AI
33
description: Learn about the limitations of large language models (LLMs) and how to get the best results by modifying prompts, building an inference pipeline, and adjusting API call parameters.
4-
ms.date: 07/31/2025
4+
ms.date: 01/30/2026
55
content_well_notification:
66
- AI-contribution
77
ai-usage: ai-assisted
@@ -64,9 +64,9 @@ For developers, the following libraries help estimate token counts for prompts a
6464

6565
### Token usage affects billing
6666

67-
Each Azure OpenAI API has a different billing methodology. For processing and generating text with the Chat Completions API, you're billed based on the number of tokens you submit as a prompt and the number of tokens that are generated as a result (completion).
67+
Each Azure OpenAI API has a different billing methodology. For processing and generating text with the Responses or Chat Completions API, you're billed based on the number of tokens you submit as a prompt and the number of tokens that are generated as a result (completion).
6868

69-
Each LLM model (for example, GPT-4.1, GPT-4o, or GPT-4o mini) usually has a different price, which reflects the amount of computation required to process and generate tokens. Many times, price is presented as "price per 1,000 tokens" or "price per 1 million tokens."
69+
Each LLM model (for example, GPT-5.2, or GPT-5.2-mini) usually has a different price, which reflects the amount of computation required to process and generate tokens. Many times, price is presented as "price per 1,000 tokens" or "price per 1 million tokens."
7070

7171
This pricing model has a significant effect on how you design the user interactions and the amount of preprocessing and post-processing you add.
7272

@@ -125,7 +125,7 @@ For information about the specific steps to take to build an inference pipeline,
125125

126126
Beyond programmatically modifying the prompt, creating an inference pipeline, and other techniques, more details are discussed in [Augmenting a large-language model with retrieval-augmented generation and fine-tuning](augment-llm-rag-fine-tuning.md). Also, you can modify parameters when you make calls to the Azure OpenAI API.
127127

128-
To review required and optional parameters to pass that can affect various aspects of the completion, see the [Chat endpoint documentation](https://platform.openai.com/docs/api-reference/chat/create). If you're using an SDK, see the SDK documentation for the language you use. You can experiment with the parameters in the [Playground](https://platform.openai.com/playground/chat).
128+
Here are some of the key parameters you can adjust to influence the model's output:
129129

130130
- **`Temperature`**: Control the randomness of the output the model generates. At zero, the model becomes deterministic, consistently selecting the most likely next token from its training data. At a temperature of 1, the model balances between choosing high-probability tokens and introducing randomness into the output.
131131

@@ -145,7 +145,7 @@ To review required and optional parameters to pass that can affect various aspec
145145

146146
In addition to keeping the LLM's responses bound to specific subject matter or domains, you also likely are concerned about the kinds of questions your users are asking of the LLM. It's important to consider the kinds of answers it's generating.
147147

148-
First, API calls to Microsoft Azure OpenAI Services automatically filter content that the API finds potentially offensive and reports this back to you in many filtering categories.
148+
First, API calls to Microsoft Azure OpenAI Models in Microsoft Foundry automatically filter content that the API finds potentially offensive and reports this back to you in many filtering categories.
149149

150150
You can directly use the [Content Moderation API](/azure/ai-services/content-moderator/api-reference) directly to check any content for potentially harmful content.
151151

@@ -155,14 +155,14 @@ Then, you can use [Azure AI Content Safety](/azure/ai-services/content-safety/ov
155155

156156
AI agents are a new way to build generative AI apps that work on their own. They use LLMs to read and write text, and they can also connect to outside systems, APIs, and data sources.
157157
AI agents can manage complex tasks, make choices using real-time data, and learn from how people use them.
158-
For more information about AI agents, see [Quickstart: Create a new agent](/azure/ai-foundry/agents/quickstart?pivots=programming-language-python-azure).
158+
For more information about AI agents, see [Quickstart: Create a new agent](/azure/ai-foundry/agents/quickstart?view=foundry-classic&pivots=programming-language-python-azure&preserve-view=true).
159159

160160
### Tool calling
161161

162162
AI agents can use outside tools and APIs to get information, take action, or connect with other services. This feature lets them do more than just generate text and handle more complex tasks.
163163

164164
For example, an AI agent can get real-time weather updates from a weather API or pull details from a database based on what a user asks.
165-
For more information about tool calling, see [Tool calling in Azure AI Foundry](/azure/ai-foundry/agents/how-to/tools/overview).
165+
For more information about tool calling, see [Discover and manage tools in the Foundry tool catalog (preview)](/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry&preserve-view=true).
166166

167167
### Model Context Protocol (MCP)
168168

articles/ai/includes/scaling-load-balancer-capacity.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 06/26/2025
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

88
## Configure the TPM quota
99

10-
By default, each of the Azure OpenAI instances in the load balancer is deployed with a capacity of 30,000 tokens per minute (TPM). You can use the chat app with the confidence that it scales across many users without running out of quota. Change this value when:
10+
By default, each of the Azure OpenAI Models in Microsoft Foundry instances in the load balancer is deployed with a capacity of 30,000 tokens per minute (TPM). You can use the chat app with the confidence that it scales across many users without running out of quota. Change this value when:
1111

1212
* You get deployment capacity errors: Lower the value.
1313
* You need higher capacity: Raise the value.

articles/ai/includes/scaling-load-balancer-cleanup-azure-api-management.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 06/26/2025
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

articles/ai/includes/scaling-load-balancer-cleanup-azure-container-apps.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 06/26/2025
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

@@ -27,7 +27,7 @@ azd down --purge --force
2727

2828
The switches provide:
2929

30-
* `purge`: Deleted resources are immediately purged so that you can reuse the Azure OpenAI Service tokens per minute.
30+
* `purge`: Deleted resources are immediately purged so that you can reuse the Azure OpenAI Models in Microsoft Foundry tokens per minute.
3131
* `force`: The deletion happens silently, without requiring user consent.
3232

3333
### Clean up GitHub Codespaces and Visual Studio Code

articles/ai/includes/scaling-load-balancer-introduction-azure-api-management.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 06/26/2025
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

8-
Learn how to add enterprise-grade load balancing to your application to extend the chat app beyond the Azure OpenAI Service token and model quota limits. This approach uses Azure API Management to intelligently direct traffic between three Azure OpenAI resources.
8+
Learn how to add enterprise-grade load balancing to your application to extend the chat app beyond the Azure OpenAI Models in Microsoft Foundry token and model quota limits. This approach uses Azure API Management to intelligently direct traffic between three Azure OpenAI resources.
99

1010
This article requires you to deploy two separate samples:
1111

@@ -19,7 +19,7 @@ This article requires you to deploy two separate samples:
1919
2020
## Architecture for load balancing Azure OpenAI with Azure API Management
2121

22-
Because the Azure OpenAI resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
22+
Because the Azure OpenAI Models in Microsoft Foundry have specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
2323

2424
:::image type="content" source="../media/get-started-scaling-load-balancer-azure-api-management/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with an Azure OpenAI resource highlighted.":::
2525

articles/ai/includes/scaling-load-balancer-introduction-azure-container-apps.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 12/20/2024
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

8-
Learn how to add load balancing to your application to extend the chat app beyond the Azure OpenAI Service token and model quota limits. This approach uses Azure Container Apps to create three Azure OpenAI endpoints and a primary container to direct incoming traffic to one of the three endpoints.
8+
Learn how to add load balancing to your application to extend the chat app beyond the Azure OpenAI Models in Microsoft Foundry token and model quota limits. This approach uses Azure Container Apps to create three Azure OpenAI endpoints and a primary container to direct incoming traffic to one of the three endpoints.
99

1010
This article requires you to deploy two separate samples:
1111

@@ -25,9 +25,9 @@ This article requires you to deploy two separate samples:
2525
2626
## Architecture for load balancing Azure OpenAI with Azure Container Apps
2727

28-
Because the Azure OpenAI resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI resource is prone to have conversation failures because of those limits.
28+
Because the Azure OpenAI Models in Microsoft Foundry resource has specific token and model quota limits, a chat app that uses a single Azure OpenAI Models in Microsoft Foundry resource is prone to have conversation failures because of those limits.
2929

30-
:::image type="content" source="../media/get-started-scaling-load-balancer-azure-container-apps/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with the Azure OpenAI resource highlighted.":::
30+
:::image type="content" source="../media/get-started-scaling-load-balancer-azure-container-apps/chat-app-original-architecuture.png" alt-text="Diagram that shows chat app architecture with the Azure OpenAI Models in Microsoft Foundry resource highlighted.":::
3131

3232
To use the chat app without hitting those limits, use a load-balanced solution with Container Apps. This solution seamlessly exposes a single endpoint from Container Apps to your chat app server.
3333

articles/ai/includes/scaling-load-balancer-logs-azure-container-apps.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
ms.topic: include
3-
ms.date: 06/26/2025
3+
ms.date: 01/30/2026
44
ms.service: azure
55
ms.custom:
66
- overview

articles/ai/includes/scaling-load-balancer-procedure-azure-api-management.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
ms.custom: overview
33
ms.topic: include
4-
ms.date: 06/26/2025
4+
ms.date: 01/30/2026
55
ms.service: azure
66
---
77

0 commit comments

Comments
 (0)