Is your feature request related to a problem?
The ML Commons stats framework (MLStatsJobProcessor) publishes adoption metrics for models, agents, and connectors as OTel counters with rich tags describing what was created: service provider, model type, deployment mode, etc. However, there is no way to attribute which plugin or caller provisioned a given resource. This makes it impossible to distinguish, in the metrics, between resources created by an automated plugin provisioning flow (e.g., Flow Framework plugin) vs. resources created directly by users via the API, or other plugins.
The MachineLearningClient interface (used by all plugins integrating with ML Commons) provides no mechanism to pass caller provenance. The underlying input objects, MLCreateConnectorInput, MLRegisterModelInput, and MLAgent, have no created_by field. The transport actions that persist these objects (TransportCreateConnectorAction, TransportRegisterModelAction, TransportRegisterAgentAction) never record provenance. And MLModel.getTags() / MLAgent.getTags() have no such dimension to emit.
As a concrete example: a plugin (like Flow Framework) that automates ML resource provisioning (connectors, models, agents) as a "one-and-done" setup step wants to measure how many users are in active continued use of the resources it provisioned, as distinct from resources provisioned by other means. This is currently impossible with the existing stats framework.
What solution would you like?
Add an optional created_by field as first-class metadata across the ML resource creation path, surfaced as a tag in the adoption metrics framework. The changes required span four areas:
- Domain objects and input classes (common module)
Add String createdBy to MLCreateConnectorInput, MLRegisterModelInput, MLAgent, and MLModel. (Given that Connectors are currently not used in stats and have a tight relationship to models, we can leave them out.) Implement toXContent, parse, writeTo, and StreamInput constructors in each class, version-gated on a new VERSION_X_Y_Z constant following the existing pattern.
-
Transport actions (plugin module)
TransportRegisterModelAction: copy createdBy from MLRegisterModelInput onto MLModel before indexing
TransportRegisterAgentAction: MLAgent is indexed directly, so no additional propagation is needed beyond Step 1
-
Tag emission (common module)
MLModel.getTags()/ getTags(Connector): add created_by tag to all three tag-building paths (remote, pre-trained, custom)
MLAgent.getTags(): add created_by tag
- Connector metrics in
MLStatsJobProcessor (plugin module)
AdoptionMetric.CONNECTOR_COUNT is currently defined but never incremented. As part of this work, add connector collection to MLStatsJobProcessor parallel to the existing model collection, reading created_by from the stored connector document and emitting it as a tag. This completes coverage for all three resource types.
With these changes, a plugin provisioning ML resources via the ML Client simply sets the field on the input builder:
MLCreateConnectorInput.builder()
// ... existing fields ...
.createdBy("my-plugin")
.build();
MLRegisterModelInput.builder()
// ... existing fields ...
.createdBy("my-plugin")
.build();
MLAgent.builder()
// ... existing fields ...
.createdBy("my-plugin")
.build();
The stats framework then emits metrics like:
ml.commons.MODEL_COUNT{created_by="my-plugin", deployment="remote", service_provider="bedrock", type="llm", ...}
ml.commons.AGENT_COUNT{created_by="my-plugin", type="conversational", ...}
ml.commons.CONNECTOR_COUNT{created_by="my-plugin", service_provider="bedrock", ...}
What alternatives have you considered?
-
Using the existing app_type field on MLAgent: MLAgent already has an appType field, but it is a user-facing classification of the agent's functional purpose (e.g. "chatbot"), not a record of which plugin provisioned it. Overloading it for provenance would conflate two distinct concepts and would not cover connectors or models, which have no equivalent field.
-
Tagging via connector/model parameters: A plugin could embed a created_by key in the parameters map of a connector or model. However, this is an undocumented convention with no guarantee of surviving updates, no first-class support in getTags(), and no way to filter it out of functional parameters passed to the remote endpoint.
-
Tracking provenance outside ML Commons: The calling plugin could maintain its own index of resource IDs it provisioned and join that against ML Commons data at query time. This is fragile, requires the plugin to manage additional state, and produces metrics that are disconnected from the rich tag context (service provider, model type, etc.) that MLStatsJobProcessor already computes.
Do you have any additional context?
created_by is purely informational metadata — a free-form string with no validation or enforcement by ML Commons. The framework does not need to know or care about the value.
- The field follows the exact version-gating pattern already used, ensuring backward compatibility in mixed-version clusters where older nodes simply ignore the field.
created_by will be visible in GET model/agent/connector API responses, which is desirable for operator visibility into resource provenance.
- This is not a security boundary. Any caller can set any value. It is not intended to replace or interact with the existing owner/user access control fields.
Is your feature request related to a problem?
The ML Commons stats framework (
MLStatsJobProcessor) publishes adoption metrics for models, agents, and connectors as OTel counters with rich tags describing what was created: service provider, model type, deployment mode, etc. However, there is no way to attribute which plugin or caller provisioned a given resource. This makes it impossible to distinguish, in the metrics, between resources created by an automated plugin provisioning flow (e.g., Flow Framework plugin) vs. resources created directly by users via the API, or other plugins.The
MachineLearningClientinterface (used by all plugins integrating with ML Commons) provides no mechanism to pass caller provenance. The underlying input objects,MLCreateConnectorInput,MLRegisterModelInput, andMLAgent, have nocreated_byfield. The transport actions that persist these objects (TransportCreateConnectorAction,TransportRegisterModelAction,TransportRegisterAgentAction) never record provenance. AndMLModel.getTags()/MLAgent.getTags()have no such dimension to emit.As a concrete example: a plugin (like Flow Framework) that automates ML resource provisioning (connectors, models, agents) as a "one-and-done" setup step wants to measure how many users are in active continued use of the resources it provisioned, as distinct from resources provisioned by other means. This is currently impossible with the existing stats framework.
What solution would you like?
Add an optional
created_byfield as first-class metadata across the ML resource creation path, surfaced as a tag in the adoption metrics framework. The changes required span four areas:Add
String createdBytoMLCreateConnectorInput,MLRegisterModelInput,MLAgent,and MLModel. (Given that Connectors are currently not used in stats and have a tight relationship to models, we can leave them out.) ImplementtoXContent,parse,writeTo, andStreamInputconstructors in each class, version-gated on a newVERSION_X_Y_Zconstant following the existing pattern.Transport actions (plugin module)
TransportRegisterModelAction: copycreatedByfromMLRegisterModelInputonto MLModel before indexingTransportRegisterAgentAction:MLAgentis indexed directly, so no additional propagation is needed beyond Step 1Tag emission (common module)
MLModel.getTags()/getTags(Connector): add created_by tag to all three tag-building paths (remote, pre-trained, custom)MLAgent.getTags(): addcreated_bytagMLStatsJobProcessor(plugin module)AdoptionMetric.CONNECTOR_COUNTis currently defined but never incremented. As part of this work, add connector collection toMLStatsJobProcessorparallel to the existing model collection, readingcreated_byfrom the stored connector document and emitting it as a tag. This completes coverage for all three resource types.With these changes, a plugin provisioning ML resources via the ML Client simply sets the field on the input builder:
The stats framework then emits metrics like:
What alternatives have you considered?
Using the existing
app_typefield onMLAgent:MLAgentalready has an appType field, but it is a user-facing classification of the agent's functional purpose (e.g. "chatbot"), not a record of which plugin provisioned it. Overloading it for provenance would conflate two distinct concepts and would not cover connectors or models, which have no equivalent field.Tagging via connector/model parameters: A plugin could embed a
created_bykey in the parameters map of a connector or model. However, this is an undocumented convention with no guarantee of surviving updates, no first-class support ingetTags(), and no way to filter it out of functional parameters passed to the remote endpoint.Tracking provenance outside ML Commons: The calling plugin could maintain its own index of resource IDs it provisioned and join that against ML Commons data at query time. This is fragile, requires the plugin to manage additional state, and produces metrics that are disconnected from the rich tag context (service provider, model type, etc.) that
MLStatsJobProcessoralready computes.Do you have any additional context?
created_byis purely informational metadata — a free-form string with no validation or enforcement by ML Commons. The framework does not need to know or care about the value.created_bywill be visible inGET model/agent/connectorAPI responses, which is desirable for operator visibility into resource provenance.