Runai

`zenml.integrations.runai`

Run:AI integration for ZenML.

Attributes

`RUNAI = 'runai'` `module-attribute`

`RUNAI_STEP_OPERATOR_FLAVOR = 'runai'` `module-attribute`

Classes

`Flavor`

Class for ZenML Flavors.

Attributes

`config_class: Type[StackComponentConfig]` `abstractmethod` `property`

Returns StackComponentConfig config class.

Returns:

Type	Description
`Type[StackComponentConfig]`	The config class.

`config_schema: Dict[str, Any]` `property`

The config schema for a flavor.

Returns:

Type	Description
`Dict[str, Any]`	The config schema.

`display_name: Optional[str]` `property`

The display name of the flavor.

By default, converts the technical name to a human-readable format. For example, "vm_kubernetes" becomes "VM Kubernetes". Flavors can override this to provide custom display names.

Returns:

Type	Description
`Optional[str]`	The display name of the flavor.

`docs_url: Optional[str]` `property`

A url to point at docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor docs url.

`implementation_class: Type[StackComponent]` `abstractmethod` `property`

Implementation class for this flavor.

Returns:

Type	Description
`Type[StackComponent]`	The implementation class for this flavor.

`logo_url: Optional[str]` `property`

A url to represent the flavor in the dashboard.

Returns:

Type	Description
`Optional[str]`	The flavor logo.

`name: str` `abstractmethod` `property`

The flavor name.

Returns:

Type	Description
`str`	The flavor name.

`sdk_docs_url: Optional[str]` `property`

A url to point at SDK docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor SDK docs url.

`service_connector_requirements: Optional[ServiceConnectorRequirements]` `property`

Service connector resource requirements for service connectors.

Specifies resource requirements that are used to filter the available service connector types that are compatible with this flavor.

Returns:

Type	Description
`Optional[ServiceConnectorRequirements]`	Requirements for compatible service connectors, if a service
`Optional[ServiceConnectorRequirements]`	connector is required for this flavor.

`type: StackComponentType` `abstractmethod` `property`

The stack component type.

Returns:

Type	Description
`StackComponentType`	The stack component type.

Functions

`from_model(flavor_model: FlavorResponse) -> Flavor` `classmethod`

Loads a flavor from a model.

Parameters:

Name	Type	Description	Default
`flavor_model`	`FlavorResponse`	The model to load from.	required

Raises:

Type	Description
`CustomFlavorImportError`	If the custom flavor can't be imported.
`ImportError`	If the flavor can't be imported.

Returns:

Type	Description
`Flavor`	The loaded flavor.

Source code in src/zenml/stack/flavor.py

@classmethod
def from_model(cls, flavor_model: FlavorResponse) -> "Flavor":
    """Loads a flavor from a model.

    Args:
        flavor_model: The model to load from.

    Raises:
        CustomFlavorImportError: If the custom flavor can't be imported.
        ImportError: If the flavor can't be imported.

    Returns:
        The loaded flavor.
    """
    try:
        flavor = source_utils.load(flavor_model.source)()
    except (ModuleNotFoundError, ImportError, NotImplementedError) as err:
        if flavor_model.is_custom:
            flavor_module, _ = flavor_model.source.rsplit(".", maxsplit=1)
            expected_file_path = os.path.join(
                source_utils.get_source_root(),
                flavor_module.replace(".", os.path.sep),
            )
            raise CustomFlavorImportError(
                f"Couldn't import custom flavor {flavor_model.name}: "
                f"{err}. Make sure the custom flavor class "
                f"`{flavor_model.source}` is importable. If it is part of "
                "a library, make sure it is installed. If "
                "it is a local code file, make sure it exists at "
                f"`{expected_file_path}.py`."
            )
        else:
            raise ImportError(
                f"Couldn't import flavor {flavor_model.name}: {err}"
            )
    return cast(Flavor, flavor)

`generate_default_docs_url() -> str`

Generate the doc urls for all inbuilt and integration flavors.

Note that this method is not going to be useful for custom flavors, which do not have any docs in the main zenml docs.

Returns:

Type	Description
`str`	The complete url to the zenml documentation

Source code in src/zenml/stack/flavor.py

def generate_default_docs_url(self) -> str:
    """Generate the doc urls for all inbuilt and integration flavors.

    Note that this method is not going to be useful for custom flavors,
    which do not have any docs in the main zenml docs.

    Returns:
        The complete url to the zenml documentation
    """
    component_type = self.type.plural.replace("_", "-")
    name = self.name.replace("_", "-")

    base = "https://docs.zenml.io"
    return f"{base}/stack-components/{component_type}/{name}"

`generate_default_sdk_docs_url() -> str`

Generate SDK docs url for a flavor.

Returns:

Type	Description
`str`	The complete url to the zenml SDK docs

Source code in src/zenml/stack/flavor.py

def generate_default_sdk_docs_url(self) -> str:
    """Generate SDK docs url for a flavor.

    Returns:
        The complete url to the zenml SDK docs
    """
    from zenml import __version__

    base = f"https://sdkdocs.zenml.io/{__version__}"

    component_type = self.type.plural

    if "zenml.integrations" in self.__module__:
        # Get integration name out of module path which will look something
        #  like this "zenml.integrations.<integration>....
        integration = self.__module__.split(
            "zenml.integrations.", maxsplit=1
        )[1].split(".")[0]

        # Get the config class name to point to the specific class
        config_class_name = self.config_class.__name__

        return (
            f"{base}/integration_code_docs"
            f"/integrations-{integration}"
            f"#zenml.integrations.{integration}.flavors.{config_class_name}"
        )

    else:
        return (
            f"{base}/core_code_docs/core-{component_type}/"
            f"#{self.__module__}"
        )

`to_model(integration: Optional[str] = None, is_custom: bool = True) -> FlavorRequest`

Converts a flavor to a model.

Parameters:

Name	Type	Description	Default
`integration`	`Optional[str]`	The integration to use for the model.	`None`
`is_custom`	`bool`	Whether the flavor is a custom flavor.	`True`

Returns:

Type	Description
`FlavorRequest`	The model.

Source code in src/zenml/stack/flavor.py

def to_model(
    self,
    integration: Optional[str] = None,
    is_custom: bool = True,
) -> FlavorRequest:
    """Converts a flavor to a model.

    Args:
        integration: The integration to use for the model.
        is_custom: Whether the flavor is a custom flavor.

    Returns:
        The model.
    """
    connector_requirements = self.service_connector_requirements
    connector_type = (
        connector_requirements.connector_type
        if connector_requirements
        else None
    )
    resource_type = (
        connector_requirements.resource_type
        if connector_requirements
        else None
    )
    resource_id_attr = (
        connector_requirements.resource_id_attr
        if connector_requirements
        else None
    )

    model = FlavorRequest(
        name=self.name,
        display_name=self.display_name,
        type=self.type,
        source=source_utils.resolve(self.__class__).import_path,
        config_schema=self.config_schema,
        connector_type=connector_type,
        connector_resource_type=resource_type,
        connector_resource_id_attr=resource_id_attr,
        integration=integration,
        logo_url=self.logo_url,
        docs_url=self.docs_url,
        sdk_docs_url=self.sdk_docs_url,
        is_custom=is_custom,
    )
    return model

`Integration`

Base class for integration in ZenML.

Functions

`activate() -> None` `classmethod`

Abstract method to activate the integration.

Source code in src/zenml/integrations/integration.py

@classmethod
def activate(cls) -> None:
    """Abstract method to activate the integration."""

`check_installation() -> bool` `classmethod`

Method to check whether the required packages are installed.

Returns:

Type	Description
`bool`	True if all required packages are installed, False otherwise.

Source code in src/zenml/integrations/integration.py

@classmethod
def check_installation(cls) -> bool:
    """Method to check whether the required packages are installed.

    Returns:
        True if all required packages are installed, False otherwise.
    """
    for requirement in cls.get_requirements():
        parsed_requirement = Requirement(requirement)

        if not requirement_installed(parsed_requirement):
            logger.debug(
                "Requirement '%s' for integration '%s' is not installed "
                "or installed with the wrong version.",
                requirement,
                cls.NAME,
            )
            return False

        dependencies = get_dependencies(parsed_requirement)

        for dependency in dependencies:
            if not requirement_installed(dependency):
                logger.debug(
                    "Requirement '%s' for integration '%s' is not "
                    "installed or installed with the wrong version.",
                    dependency,
                    cls.NAME,
                )
                return False

    logger.debug(
        f"Integration '{cls.NAME}' is installed correctly with "
        f"requirements {cls.get_requirements()}."
    )
    return True

`flavors() -> List[Type[Flavor]]` `classmethod`

Abstract method to declare new stack component flavors.

Returns:

Type	Description
`List[Type[Flavor]]`	A list of new stack component flavors.

Source code in src/zenml/integrations/integration.py

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
    """Abstract method to declare new stack component flavors.

    Returns:
        A list of new stack component flavors.
    """
    return []

`get_requirements(target_os: Optional[str] = None, python_version: Optional[str] = None) -> List[str]` `classmethod`

Method to get the requirements for the integration.

Parameters:

Name	Type	Description	Default
`target_os`	`Optional[str]`	The target operating system to get the requirements for.	`None`
`python_version`	`Optional[str]`	The Python version to use for the requirements.	`None`

Returns:

Type	Description
`List[str]`	A list of requirements.

Source code in src/zenml/integrations/integration.py

@classmethod
def get_requirements(
    cls,
    target_os: Optional[str] = None,
    python_version: Optional[str] = None,
) -> List[str]:
    """Method to get the requirements for the integration.

    Args:
        target_os: The target operating system to get the requirements for.
        python_version: The Python version to use for the requirements.

    Returns:
        A list of requirements.
    """
    return cls.REQUIREMENTS

`get_uninstall_requirements(target_os: Optional[str] = None) -> List[str]` `classmethod`

Method to get the uninstall requirements for the integration.

Parameters:

Name	Type	Description	Default
`target_os`	`Optional[str]`	The target operating system to get the requirements for.	`None`

Returns:

Type	Description
`List[str]`	A list of requirements.

Source code in src/zenml/integrations/integration.py

@classmethod
def get_uninstall_requirements(
    cls, target_os: Optional[str] = None
) -> List[str]:
    """Method to get the uninstall requirements for the integration.

    Args:
        target_os: The target operating system to get the requirements for.

    Returns:
        A list of requirements.
    """
    ret = []
    for each in cls.get_requirements(target_os=target_os):
        is_ignored = False
        for ignored in cls.REQUIREMENTS_IGNORED_ON_UNINSTALL:
            if each.startswith(ignored):
                is_ignored = True
                break
        if not is_ignored:
            ret.append(each)
    return ret

`RunAIIntegration`

Bases: Integration

Definition of Run:AI integration for ZenML.

Functions

`flavors() -> List[Type[Flavor]]` `classmethod`

Declare the stack component flavors for the Run:AI integration.

Returns:

Type	Description
`List[Type[Flavor]]`	List of stack component flavors for this integration.

Source code in src/zenml/integrations/runai/__init__.py

@classmethod
def flavors(cls) -> List[Type[Flavor]]:
    """Declare the stack component flavors for the Run:AI integration.

    Returns:
        List of stack component flavors for this integration.
    """
    from zenml.integrations.runai.flavors import (
        RunAIStepOperatorFlavor,
    )

    return [RunAIStepOperatorFlavor]

Modules

`client`

Run:AI client utilities.

Classes

`RunAIAuthenticationError`

Bases: RunAIClientError

Raised when authentication with Run:AI fails.

`RunAIClient(client_id: str, client_secret: str, runai_base_url: str)`

Wrapper around the runapy SDK providing typed responses.

This client encapsulates all Run:AI API interactions and provides typed dataclasses instead of raw dictionaries.

Initialize the Run:AI client.

Parameters:

Name	Type	Description	Default
`client_id`	`str`	Run:AI client ID for authentication.	required
`client_secret`	`str`	Run:AI client secret for authentication.	required
`runai_base_url`	`str`	Run:AI control plane base URL.	required

Raises:

Type	Description
`RunAIConnectionError`	If connecting to Run:AI fails.
`RunAIAuthenticationError`	If client configuration fails.

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(
    self, client_id: str, client_secret: str, runai_base_url: str
) -> None:
    """Initialize the Run:AI client.

    Args:
        client_id: Run:AI client ID for authentication.
        client_secret: Run:AI client secret for authentication.
        runai_base_url: Run:AI control plane base URL.

    Raises:
        RunAIConnectionError: If connecting to Run:AI fails.
        RunAIAuthenticationError: If client configuration fails.
    """
    try:
        config = Configuration(
            client_id=client_id,
            client_secret=client_secret,
            runai_base_url=runai_base_url,
        )
        self._raw_client = self._create_raw_client(config)
    except Exception as exc:
        if self._is_connection_error(exc):
            raise RunAIConnectionError(
                f"Failed to connect to Run:AI API ({type(exc).__name__}): "
                f"{exc}. Verify your runai_base_url and network "
                "connectivity."
            ) from exc
        raise RunAIAuthenticationError(
            f"Failed to initialize Run:AI client ({type(exc).__name__}): {exc}. "
            "Verify your client_id, client_secret, and runai_base_url are correct."
        ) from exc

Attributes

raw_client: RunapyClient property

Access the underlying runapy client for advanced operations.

Returns:

Type	Description
`RunaiClient`	The raw runapy client.

Functions

create_training_workload(request: TrainingCreationRequest) -> WorkloadSubmissionResult

Submit a training workload to Run:AI.

Parameters:

Name	Type	Description	Default
`request`	`TrainingCreationRequest`	TrainingCreationRequest from runai.models.	required

Returns:

Type	Description
`WorkloadSubmissionResult`	WorkloadSubmissionResult with the workload ID.

Source code in src/zenml/integrations/runai/client/runai_client.py

def create_training_workload(
    self, request: TrainingCreationRequest
) -> WorkloadSubmissionResult:
    """Submit a training workload to Run:AI.

    Args:
        request: TrainingCreationRequest from runai.models.

    Returns:
        WorkloadSubmissionResult with the workload ID.
    """
    try:
        response = self._raw_client.workloads.trainings.create_training1(
            training_creation_request=request
        )
        workload_id = self._extract_workload_id(response)
        return WorkloadSubmissionResult(
            workload_id=workload_id or request.name,
            workload_name=request.name,
        )
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to submit Run:AI workload "
            f"({type(exc).__name__}): {exc}",
        )

delete_training_workload(workload_id: str) -> None

Delete a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to delete.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def delete_training_workload(self, workload_id: str) -> None:
    """Delete a training workload.

    Args:
        workload_id: The workload ID to delete.
    """
    try:
        self._raw_client.workloads.trainings.delete_training(workload_id)
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to delete Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

get_cluster_by_id(cluster_id: str) -> RunAICluster

Get a Run:AI cluster by ID.

Parameters:

Name	Type	Description	Default
`cluster_id`	`str`	The cluster ID to find.	required

Returns:

Type	Description
`RunAICluster`	The matching RunAICluster.

Raises:

Type	Description
`RunAIClusterNotFoundError`	If no cluster matches the ID.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_cluster_by_id(self, cluster_id: str) -> RunAICluster:
    """Get a Run:AI cluster by ID.

    Args:
        cluster_id: The cluster ID to find.

    Returns:
        The matching RunAICluster.

    Raises:
        RunAIClusterNotFoundError: If no cluster matches the ID.
    """
    clusters = self.get_clusters()
    for cluster in clusters:
        if cluster.id == cluster_id:
            return cluster
    available = [c.id for c in clusters]
    raise RunAIClusterNotFoundError(cluster_id, available)

get_cluster_by_name(name: str) -> RunAICluster

Get a Run:AI cluster by exact name match.

Parameters:

Name	Type	Description	Default
`name`	`str`	The cluster name to find.	required

Returns:

Type	Description
`RunAICluster`	The matching RunAICluster.

Raises:

Type	Description
`RunAIClusterNotFoundError`	If no cluster matches the name.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_cluster_by_name(self, name: str) -> RunAICluster:
    """Get a Run:AI cluster by exact name match.

    Args:
        name: The cluster name to find.

    Returns:
        The matching RunAICluster.

    Raises:
        RunAIClusterNotFoundError: If no cluster matches the name.
    """
    clusters = self.get_clusters()

    for cluster in clusters:
        if cluster.name == name:
            return cluster

    available = [c.name for c in clusters]
    raise RunAIClusterNotFoundError(name, available)

get_clusters() -> List[RunAICluster]

Get all Run:AI clusters.

Returns:

Type	Description
`List[RunAICluster]`	List of RunAICluster objects.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_clusters(self) -> List[RunAICluster]:
    """Get all Run:AI clusters.

    Returns:
        List of RunAICluster objects.
    """
    try:
        response = self._raw_client.organizations.clusters.get_clusters()
        clusters_data = response.data if response.data else []

        return [
            RunAICluster(
                id=c.get("uuid", c.get("id", "")),
                name=c.get("name", ""),
            )
            for c in clusters_data
        ]
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to fetch Run:AI clusters "
            f"({type(exc).__name__}): {exc}",
        )

get_first_cluster() -> RunAICluster

Get the first available Run:AI cluster.

Returns:

Type	Description
`RunAICluster`	The first RunAICluster.

Raises:

Type	Description
`RunAIClientError`	If no clusters are available.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_first_cluster(self) -> RunAICluster:
    """Get the first available Run:AI cluster.

    Returns:
        The first RunAICluster.

    Raises:
        RunAIClientError: If no clusters are available.
    """
    clusters = self.get_clusters()
    if not clusters:
        raise RunAIClientError("No Run:AI clusters available")
    return clusters[0]

get_project_by_name(name: str) -> RunAIProject

Get a Run:AI project by exact name match.

Parameters:

Name	Type	Description	Default
`name`	`str`	The project name to find.	required

Returns:

Type	Description
`RunAIProject`	The matching RunAIProject.

Raises:

Type	Description
`RunAIProjectNotFoundError`	If no project matches the name.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_project_by_name(self, name: str) -> RunAIProject:
    """Get a Run:AI project by exact name match.

    Args:
        name: The project name to find.

    Returns:
        The matching RunAIProject.

    Raises:
        RunAIProjectNotFoundError: If no project matches the name.
    """
    projects = self.get_projects(search=name)

    for project in projects:
        if project.name == name:
            return project

    available = [p.name for p in projects]
    raise RunAIProjectNotFoundError(name, available)

get_projects(search: Optional[str] = None) -> List[RunAIProject]

Get Run:AI projects, optionally filtered by name.

Parameters:

Name	Type	Description	Default
`search`	`Optional[str]`	Optional search string to filter projects.	`None`

Returns:

Type	Description
`List[RunAIProject]`	List of RunAIProject objects.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_projects(self, search: Optional[str] = None) -> List[RunAIProject]:
    """Get Run:AI projects, optionally filtered by name.

    Args:
        search: Optional search string to filter projects.

    Returns:
        List of RunAIProject objects.
    """
    try:
        response = self._raw_client.organizations.projects.get_projects(
            search=search
        )
        projects_data = (
            response.data.get("projects", []) if response.data else []
        )

        return [
            RunAIProject(
                id=str(p.get("id")),
                name=p.get("name", ""),
                cluster_id=p.get("clusterId"),
            )
            for p in projects_data
        ]
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to fetch Run:AI projects "
            f"({type(exc).__name__}): {exc}",
        )

get_training_workload(workload_id: str) -> RunAITrainingWorkload

Get full training workload details.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to query.	required

Returns:

Type	Description
`RunAITrainingWorkload`	The workload details as a typed model.

Raises:

Type	Description
`RunAIClientError`	If the query fails or response is invalid.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_training_workload(self, workload_id: str) -> RunAITrainingWorkload:
    """Get full training workload details.

    Args:
        workload_id: The workload ID to query.

    Returns:
        The workload details as a typed model.

    Raises:
        RunAIClientError: If the query fails or response is invalid.
    """
    try:
        response = self._raw_client.workloads.trainings.get_training(
            workload_id
        )
        if not response.data:
            raise RunAIClientError(
                f"Empty response when querying workload {workload_id}. "
                "The workload may not exist or the API returned no data."
            )
        if not isinstance(response.data, dict):
            raise RunAIClientError(
                f"Unexpected response format for workload {workload_id}. "
                f"Expected dict, got {type(response.data).__name__}."
            )
        return RunAITrainingWorkload.model_validate(response.data)
    except RunAIClientError:
        raise
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to query Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

get_training_workload_status(workload_id: str) -> Optional[str]

Get the status of a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to query.	required

Returns:

Type	Description
`Optional[str]`	The workload status string, or None if the response is missing a
`Optional[str]`	status field.

Raises:

Type	Description
`RunAIWorkloadNotFoundError`	If the workload was not found (404).
`RunAIClientError`	If the API call fails for other reasons or the response is malformed.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_training_workload_status(self, workload_id: str) -> Optional[str]:
    """Get the status of a training workload.

    Args:
        workload_id: The workload ID to query.

    Returns:
        The workload status string, or None if the response is missing a
        status field.

    Raises:
        RunAIWorkloadNotFoundError: If the workload was not found (404).
        RunAIClientError: If the API call fails for other reasons or the
            response is malformed.
    """
    try:
        response = self._raw_client.workloads.trainings.get_training(
            workload_id
        )
        if not response.data:
            raise RunAIClientError(
                f"Empty response when querying workload {workload_id}. "
                "The API returned no data."
            )
        if not isinstance(response.data, dict):
            raise RunAIClientError(
                f"Unexpected response format for workload {workload_id}. "
                f"Expected dict, got {type(response.data).__name__}."
            )
        status = response.data.get("actualPhase") or response.data.get(
            "status"
        )
        if status is None:
            logger.warning(
                f"Workload {workload_id} response has no status field. "
                "Available keys: %s",
                list(response.data.keys()),
            )
        return cast(Optional[str], status)
    except RunAIClientError:
        raise
    except Exception as exc:
        status_code = self._get_status_code(exc)
        if status_code == 404:
            raise RunAIWorkloadNotFoundError(workload_id) from exc
        error_msg = str(exc).lower()
        if "not found" in error_msg or "404" in error_msg:
            raise RunAIWorkloadNotFoundError(workload_id) from exc
        self._raise_api_error(
            exc,
            f"Failed to query workload status "
            f"({type(exc).__name__}): {exc}",
        )

suspend_training_workload(workload_id: str) -> None

Suspend a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to suspend.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def suspend_training_workload(self, workload_id: str) -> None:
    """Suspend a training workload.

    Args:
        workload_id: The workload ID to suspend.
    """
    try:
        self._raw_client.workloads.trainings.suspend_training(workload_id)
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to suspend Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

`RunAIClientError`

Bases: Exception

Base exception for Run:AI client errors.

`RunAICluster`

Bases: BaseModel

Typed representation of a Run:AI cluster.

`RunAIClusterNotFoundError(cluster_name: str, available: List[str])`

Bases: RunAIClientError

Raised when a Run:AI cluster cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`cluster_name`	`str`	The cluster name that was not found.	required
`available`	`List[str]`	List of available cluster names.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, cluster_name: str, available: List[str]) -> None:
    """Initialize the exception.

    Args:
        cluster_name: The cluster name that was not found.
        available: List of available cluster names.
    """
    self.cluster_name = cluster_name
    self.available = available
    super().__init__(
        f"Cluster '{cluster_name}' not found in Run:AI. "
        f"Available clusters: {available}"
    )

Functions

`RunAIConnectionError`

Bases: RunAIClientError

Raised when connection to Run:AI API fails.

`RunAIProject`

Bases: BaseModel

Typed representation of a Run:AI project.

`RunAIProjectNotFoundError(project_name: str, available: List[str])`

Bases: RunAIClientError

Raised when a Run:AI project cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`project_name`	`str`	The project name that was not found.	required
`available`	`List[str]`	List of available project names.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, project_name: str, available: List[str]) -> None:
    """Initialize the exception.

    Args:
        project_name: The project name that was not found.
        available: List of available project names.
    """
    self.project_name = project_name
    self.available = available
    super().__init__(
        f"Project '{project_name}' not found in Run:AI. "
        f"Available projects: {available}"
    )

Functions

`RunAITrainingWorkload`

Bases: BaseModel

Typed representation of a Run:AI training workload.

`RunAIWorkloadNotFoundError(workload_id: str)`

Bases: RunAIClientError

Raised when a Run:AI workload cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID that was not found.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, workload_id: str) -> None:
    """Initialize the exception.

    Args:
        workload_id: The workload ID that was not found.
    """
    self.workload_id = workload_id
    super().__init__(f"Workload '{workload_id}' not found in Run:AI.")

Functions

`WorkloadSubmissionResult`

Bases: BaseModel

Result of submitting a workload to Run:AI.

Modules

`runai_client`

Run:AI API client wrapper with typed responses.

Classes

RunAIAuthenticationError

Bases: RunAIClientError

Raised when authentication with Run:AI fails.

RunAIClient(client_id: str, client_secret: str, runai_base_url: str)

Wrapper around the runapy SDK providing typed responses.

This client encapsulates all Run:AI API interactions and provides typed dataclasses instead of raw dictionaries.

Initialize the Run:AI client.

Parameters:

Name	Type	Description	Default
`client_id`	`str`	Run:AI client ID for authentication.	required
`client_secret`	`str`	Run:AI client secret for authentication.	required
`runai_base_url`	`str`	Run:AI control plane base URL.	required

Raises:

Type	Description
`RunAIConnectionError`	If connecting to Run:AI fails.
`RunAIAuthenticationError`	If client configuration fails.

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(
    self, client_id: str, client_secret: str, runai_base_url: str
) -> None:
    """Initialize the Run:AI client.

    Args:
        client_id: Run:AI client ID for authentication.
        client_secret: Run:AI client secret for authentication.
        runai_base_url: Run:AI control plane base URL.

    Raises:
        RunAIConnectionError: If connecting to Run:AI fails.
        RunAIAuthenticationError: If client configuration fails.
    """
    try:
        config = Configuration(
            client_id=client_id,
            client_secret=client_secret,
            runai_base_url=runai_base_url,
        )
        self._raw_client = self._create_raw_client(config)
    except Exception as exc:
        if self._is_connection_error(exc):
            raise RunAIConnectionError(
                f"Failed to connect to Run:AI API ({type(exc).__name__}): "
                f"{exc}. Verify your runai_base_url and network "
                "connectivity."
            ) from exc
        raise RunAIAuthenticationError(
            f"Failed to initialize Run:AI client ({type(exc).__name__}): {exc}. "
            "Verify your client_id, client_secret, and runai_base_url are correct."
        ) from exc

Attributes

raw_client: RunapyClient property

Access the underlying runapy client for advanced operations.

Returns:

Type	Description
`RunaiClient`	The raw runapy client.

Functions

create_training_workload(request: TrainingCreationRequest) -> WorkloadSubmissionResult

Submit a training workload to Run:AI.

Parameters:

Name	Type	Description	Default
`request`	`TrainingCreationRequest`	TrainingCreationRequest from runai.models.	required

Returns:

Type	Description
`WorkloadSubmissionResult`	WorkloadSubmissionResult with the workload ID.

Source code in src/zenml/integrations/runai/client/runai_client.py

def create_training_workload(
    self, request: TrainingCreationRequest
) -> WorkloadSubmissionResult:
    """Submit a training workload to Run:AI.

    Args:
        request: TrainingCreationRequest from runai.models.

    Returns:
        WorkloadSubmissionResult with the workload ID.
    """
    try:
        response = self._raw_client.workloads.trainings.create_training1(
            training_creation_request=request
        )
        workload_id = self._extract_workload_id(response)
        return WorkloadSubmissionResult(
            workload_id=workload_id or request.name,
            workload_name=request.name,
        )
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to submit Run:AI workload "
            f"({type(exc).__name__}): {exc}",
        )

delete_training_workload(workload_id: str) -> None

Delete a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to delete.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def delete_training_workload(self, workload_id: str) -> None:
    """Delete a training workload.

    Args:
        workload_id: The workload ID to delete.
    """
    try:
        self._raw_client.workloads.trainings.delete_training(workload_id)
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to delete Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

get_cluster_by_id(cluster_id: str) -> RunAICluster

Get a Run:AI cluster by ID.

Parameters:

Name	Type	Description	Default
`cluster_id`	`str`	The cluster ID to find.	required

Returns:

Type	Description
`RunAICluster`	The matching RunAICluster.

Raises:

Type	Description
`RunAIClusterNotFoundError`	If no cluster matches the ID.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_cluster_by_id(self, cluster_id: str) -> RunAICluster:
    """Get a Run:AI cluster by ID.

    Args:
        cluster_id: The cluster ID to find.

    Returns:
        The matching RunAICluster.

    Raises:
        RunAIClusterNotFoundError: If no cluster matches the ID.
    """
    clusters = self.get_clusters()
    for cluster in clusters:
        if cluster.id == cluster_id:
            return cluster
    available = [c.id for c in clusters]
    raise RunAIClusterNotFoundError(cluster_id, available)

get_cluster_by_name(name: str) -> RunAICluster

Get a Run:AI cluster by exact name match.

Parameters:

Name	Type	Description	Default
`name`	`str`	The cluster name to find.	required

Returns:

Type	Description
`RunAICluster`	The matching RunAICluster.

Raises:

Type	Description
`RunAIClusterNotFoundError`	If no cluster matches the name.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_cluster_by_name(self, name: str) -> RunAICluster:
    """Get a Run:AI cluster by exact name match.

    Args:
        name: The cluster name to find.

    Returns:
        The matching RunAICluster.

    Raises:
        RunAIClusterNotFoundError: If no cluster matches the name.
    """
    clusters = self.get_clusters()

    for cluster in clusters:
        if cluster.name == name:
            return cluster

    available = [c.name for c in clusters]
    raise RunAIClusterNotFoundError(name, available)

get_clusters() -> List[RunAICluster]

Get all Run:AI clusters.

Returns:

Type	Description
`List[RunAICluster]`	List of RunAICluster objects.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_clusters(self) -> List[RunAICluster]:
    """Get all Run:AI clusters.

    Returns:
        List of RunAICluster objects.
    """
    try:
        response = self._raw_client.organizations.clusters.get_clusters()
        clusters_data = response.data if response.data else []

        return [
            RunAICluster(
                id=c.get("uuid", c.get("id", "")),
                name=c.get("name", ""),
            )
            for c in clusters_data
        ]
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to fetch Run:AI clusters "
            f"({type(exc).__name__}): {exc}",
        )

get_first_cluster() -> RunAICluster

Get the first available Run:AI cluster.

Returns:

Type	Description
`RunAICluster`	The first RunAICluster.

Raises:

Type	Description
`RunAIClientError`	If no clusters are available.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_first_cluster(self) -> RunAICluster:
    """Get the first available Run:AI cluster.

    Returns:
        The first RunAICluster.

    Raises:
        RunAIClientError: If no clusters are available.
    """
    clusters = self.get_clusters()
    if not clusters:
        raise RunAIClientError("No Run:AI clusters available")
    return clusters[0]

get_project_by_name(name: str) -> RunAIProject

Get a Run:AI project by exact name match.

Parameters:

Name	Type	Description	Default
`name`	`str`	The project name to find.	required

Returns:

Type	Description
`RunAIProject`	The matching RunAIProject.

Raises:

Type	Description
`RunAIProjectNotFoundError`	If no project matches the name.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_project_by_name(self, name: str) -> RunAIProject:
    """Get a Run:AI project by exact name match.

    Args:
        name: The project name to find.

    Returns:
        The matching RunAIProject.

    Raises:
        RunAIProjectNotFoundError: If no project matches the name.
    """
    projects = self.get_projects(search=name)

    for project in projects:
        if project.name == name:
            return project

    available = [p.name for p in projects]
    raise RunAIProjectNotFoundError(name, available)

get_projects(search: Optional[str] = None) -> List[RunAIProject]

Get Run:AI projects, optionally filtered by name.

Parameters:

Name	Type	Description	Default
`search`	`Optional[str]`	Optional search string to filter projects.	`None`

Returns:

Type	Description
`List[RunAIProject]`	List of RunAIProject objects.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_projects(self, search: Optional[str] = None) -> List[RunAIProject]:
    """Get Run:AI projects, optionally filtered by name.

    Args:
        search: Optional search string to filter projects.

    Returns:
        List of RunAIProject objects.
    """
    try:
        response = self._raw_client.organizations.projects.get_projects(
            search=search
        )
        projects_data = (
            response.data.get("projects", []) if response.data else []
        )

        return [
            RunAIProject(
                id=str(p.get("id")),
                name=p.get("name", ""),
                cluster_id=p.get("clusterId"),
            )
            for p in projects_data
        ]
    except Exception as exc:
        self._raise_api_error(
            exc,
            f"Failed to fetch Run:AI projects "
            f"({type(exc).__name__}): {exc}",
        )

get_training_workload(workload_id: str) -> RunAITrainingWorkload

Get full training workload details.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to query.	required

Returns:

Type	Description
`RunAITrainingWorkload`	The workload details as a typed model.

Raises:

Type	Description
`RunAIClientError`	If the query fails or response is invalid.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_training_workload(self, workload_id: str) -> RunAITrainingWorkload:
    """Get full training workload details.

    Args:
        workload_id: The workload ID to query.

    Returns:
        The workload details as a typed model.

    Raises:
        RunAIClientError: If the query fails or response is invalid.
    """
    try:
        response = self._raw_client.workloads.trainings.get_training(
            workload_id
        )
        if not response.data:
            raise RunAIClientError(
                f"Empty response when querying workload {workload_id}. "
                "The workload may not exist or the API returned no data."
            )
        if not isinstance(response.data, dict):
            raise RunAIClientError(
                f"Unexpected response format for workload {workload_id}. "
                f"Expected dict, got {type(response.data).__name__}."
            )
        return RunAITrainingWorkload.model_validate(response.data)
    except RunAIClientError:
        raise
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to query Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

get_training_workload_status(workload_id: str) -> Optional[str]

Get the status of a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to query.	required

Returns:

Type	Description
`Optional[str]`	The workload status string, or None if the response is missing a
`Optional[str]`	status field.

Raises:

Type	Description
`RunAIWorkloadNotFoundError`	If the workload was not found (404).
`RunAIClientError`	If the API call fails for other reasons or the response is malformed.

Source code in src/zenml/integrations/runai/client/runai_client.py

def get_training_workload_status(self, workload_id: str) -> Optional[str]:
    """Get the status of a training workload.

    Args:
        workload_id: The workload ID to query.

    Returns:
        The workload status string, or None if the response is missing a
        status field.

    Raises:
        RunAIWorkloadNotFoundError: If the workload was not found (404).
        RunAIClientError: If the API call fails for other reasons or the
            response is malformed.
    """
    try:
        response = self._raw_client.workloads.trainings.get_training(
            workload_id
        )
        if not response.data:
            raise RunAIClientError(
                f"Empty response when querying workload {workload_id}. "
                "The API returned no data."
            )
        if not isinstance(response.data, dict):
            raise RunAIClientError(
                f"Unexpected response format for workload {workload_id}. "
                f"Expected dict, got {type(response.data).__name__}."
            )
        status = response.data.get("actualPhase") or response.data.get(
            "status"
        )
        if status is None:
            logger.warning(
                f"Workload {workload_id} response has no status field. "
                "Available keys: %s",
                list(response.data.keys()),
            )
        return cast(Optional[str], status)
    except RunAIClientError:
        raise
    except Exception as exc:
        status_code = self._get_status_code(exc)
        if status_code == 404:
            raise RunAIWorkloadNotFoundError(workload_id) from exc
        error_msg = str(exc).lower()
        if "not found" in error_msg or "404" in error_msg:
            raise RunAIWorkloadNotFoundError(workload_id) from exc
        self._raise_api_error(
            exc,
            f"Failed to query workload status "
            f"({type(exc).__name__}): {exc}",
        )

suspend_training_workload(workload_id: str) -> None

Suspend a training workload.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID to suspend.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def suspend_training_workload(self, workload_id: str) -> None:
    """Suspend a training workload.

    Args:
        workload_id: The workload ID to suspend.
    """
    try:
        self._raw_client.workloads.trainings.suspend_training(workload_id)
    except Exception as exc:
        self._raise_api_error(
            exc,
            "Failed to suspend Run:AI workload "
            f"{workload_id} ({type(exc).__name__}): {exc}",
        )

RunAIClientError

Bases: Exception

Base exception for Run:AI client errors.

RunAICluster

Bases: BaseModel

Typed representation of a Run:AI cluster.

RunAIClusterNotFoundError(cluster_name: str, available: List[str])

Bases: RunAIClientError

Raised when a Run:AI cluster cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`cluster_name`	`str`	The cluster name that was not found.	required
`available`	`List[str]`	List of available cluster names.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, cluster_name: str, available: List[str]) -> None:
    """Initialize the exception.

    Args:
        cluster_name: The cluster name that was not found.
        available: List of available cluster names.
    """
    self.cluster_name = cluster_name
    self.available = available
    super().__init__(
        f"Cluster '{cluster_name}' not found in Run:AI. "
        f"Available clusters: {available}"
    )

Functions

RunAIConnectionError

Bases: RunAIClientError

Raised when connection to Run:AI API fails.

RunAIProject

Bases: BaseModel

Typed representation of a Run:AI project.

RunAIProjectNotFoundError(project_name: str, available: List[str])

Bases: RunAIClientError

Raised when a Run:AI project cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`project_name`	`str`	The project name that was not found.	required
`available`	`List[str]`	List of available project names.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, project_name: str, available: List[str]) -> None:
    """Initialize the exception.

    Args:
        project_name: The project name that was not found.
        available: List of available project names.
    """
    self.project_name = project_name
    self.available = available
    super().__init__(
        f"Project '{project_name}' not found in Run:AI. "
        f"Available projects: {available}"
    )

Functions

RunAITrainingWorkload

Bases: BaseModel

Typed representation of a Run:AI training workload.

RunAIWorkloadNotFoundError(workload_id: str)

Bases: RunAIClientError

Raised when a Run:AI workload cannot be found.

Initialize the exception.

Parameters:

Name	Type	Description	Default
`workload_id`	`str`	The workload ID that was not found.	required

Source code in src/zenml/integrations/runai/client/runai_client.py

def __init__(self, workload_id: str) -> None:
    """Initialize the exception.

    Args:
        workload_id: The workload ID that was not found.
    """
    self.workload_id = workload_id
    super().__init__(f"Workload '{workload_id}' not found in Run:AI.")

Functions

WorkloadSubmissionResult

Bases: BaseModel

Result of submitting a workload to Run:AI.

Functions

`constants`

Run:AI integration constants and status mappings.

Classes

`RunAIWorkloadStatus`

Bases: str, Enum

Run:AI workload status values.

Functions

`is_failure_status(status: str) -> bool`

Check if a Run:AI status indicates failure.

Parameters:

Name	Type	Description	Default
`status`	`str`	The Run:AI workload status string.	required

Returns:

Type	Description
`bool`	True if the status indicates failure.

Source code in src/zenml/integrations/runai/constants.py

def is_failure_status(status: str) -> bool:
    """Check if a Run:AI status indicates failure.

    Args:
        status: The Run:AI workload status string.

    Returns:
        True if the status indicates failure.
    """
    parsed_status = _parse_runai_status(status)
    return parsed_status in _FAILURE_STATUSES

`is_pending_status(status: str) -> bool`

Check if a Run:AI status indicates pending scheduling.

Parameters:

Name	Type	Description	Default
`status`	`str`	The Run:AI workload status string.	required

Returns:

Type	Description
`bool`	True if the status indicates the workload is pending.

Source code in src/zenml/integrations/runai/constants.py

def is_pending_status(status: str) -> bool:
    """Check if a Run:AI status indicates pending scheduling.

    Args:
        status: The Run:AI workload status string.

    Returns:
        True if the status indicates the workload is pending.
    """
    parsed_status = _parse_runai_status(status)
    return parsed_status in _PENDING_STATUSES

`is_success_status(status: str) -> bool`

Check if a Run:AI status indicates success.

Parameters:

Name	Type	Description	Default
`status`	`str`	The Run:AI workload status string.	required

Returns:

Type	Description
`bool`	True if the status indicates successful completion.

Source code in src/zenml/integrations/runai/constants.py

def is_success_status(status: str) -> bool:
    """Check if a Run:AI status indicates success.

    Args:
        status: The Run:AI workload status string.

    Returns:
        True if the status indicates successful completion.
    """
    parsed_status = _parse_runai_status(status)
    return parsed_status in _SUCCESS_STATUSES

`map_runai_status_to_execution_status(runai_status: str) -> ExecutionStatus`

Maps Run:AI workload status to ZenML ExecutionStatus.

Parameters:

Name	Type	Description	Default
`runai_status`	`str`	The Run:AI workload status string.	required

Returns:

Type	Description
`ExecutionStatus`	The corresponding ZenML ExecutionStatus.

Source code in src/zenml/integrations/runai/constants.py

def map_runai_status_to_execution_status(runai_status: str) -> ExecutionStatus:
    """Maps Run:AI workload status to ZenML ExecutionStatus.

    Args:
        runai_status: The Run:AI workload status string.

    Returns:
        The corresponding ZenML ExecutionStatus.
    """
    status_enum = _parse_runai_status(runai_status)
    if status_enum is not None:
        return RUNAI_STATUS_TO_EXECUTION_STATUS.get(
            status_enum, ExecutionStatus.RUNNING
        )

    return ExecutionStatus.RUNNING

`flavors`

Run:AI integration flavors.

Classes

`RunAIConfigMapMountSettings`

Bases: RunAIMountBase

Settings for a Run:AI ConfigMap storage mount.

Attributes

container_mount_path: str property

The absolute container path for this ConfigMap mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAIExternalURLSettings`

Bases: _RunAIStrictSettings

Settings for exposing a Run:AI workload external URL.

`RunAIHostPathMountSettings`

Bases: RunAIMountBase

Settings for a Run:AI HostPath storage mount.

Attributes

container_mount_path: str property

The absolute container path for this HostPath mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAINFSMountSettings`

Bases: RunAIMountBase

Settings for a Run:AI NFS storage mount.

Attributes

container_mount_path: str property

The absolute container path for this NFS mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAIPVCMountSettings`

Bases: RunAIMountBase

Settings for a Run:AI PVC storage mount.

Attributes

container_mount_path: str property

The absolute container path for this PVC mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAIPortSettings`

Bases: _RunAIStrictSettings

Settings for exposing a container port on the Run:AI workload.

`RunAIS3MountSettings`

Bases: RunAIMountBase

Settings for a Run:AI S3 storage mount.

Attributes

container_mount_path: str property

The absolute container path for this S3 mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAISecretMountSettings`

Bases: RunAIMountBase

Settings for a Run:AI Secret storage mount.

Attributes

container_mount_path: str property

The absolute container path for this Secret mount.

Returns:

Type	Description
`str`	The absolute container mount path.

`RunAISecurityContextSettings`

Bases: _RunAIStrictSettings

Settings for the Run:AI workload security context.

`RunAIStepOperatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)`

Bases: BaseStepOperatorConfig, RunAIStepOperatorSettings

Configuration for the Run:AI step operator.

This step operator enables running individual pipeline steps on Run:AI clusters with fractional GPU allocation.

Example stack configuration:

zenml step-operator register runai \
    --flavor=runai \
    --client_id="xxx" \
    --client_secret="xxx" \
    --runai_base_url="https://myorg.run.ai" \
    --project_name="my-project"

Source code in src/zenml/stack/stack_component.py

def __init__(
    self, warn_about_plain_text_secrets: bool = False, **kwargs: Any
) -> None:
    """Ensures that secret references don't clash with pydantic validation.

    StackComponents allow the specification of all their string attributes
    using secret references of the form `{{secret_name.key}}`. This however
    is only possible when the stack component does not perform any explicit
    validation of this attribute using pydantic validators. If this were
    the case, the validation would run on the secret reference and would
    fail or in the worst case, modify the secret reference and lead to
    unexpected behavior. This method ensures that no attributes that require
    custom pydantic validation are set as secret references.

    Args:
        warn_about_plain_text_secrets: If true, then warns about using
            plain-text secrets.
        **kwargs: Arguments to initialize this stack component.

    Raises:
        ValueError: If an attribute that requires custom pydantic validation
            is passed as a secret reference, or if the `name` attribute
            was passed as a secret reference.
    """
    for key, value in kwargs.items():
        try:
            field = self.__class__.model_fields[key]
        except KeyError:
            # Value for a private attribute or non-existing field, this
            # will fail during the upcoming pydantic validation
            continue

        if value is None:
            continue

        if not secret_utils.is_secret_reference(value):
            if (
                secret_utils.is_secret_field(field)
                and warn_about_plain_text_secrets
            ):
                logger.warning(
                    "You specified a plain-text value for the sensitive "
                    f"attribute `{key}` for a `{self.__class__.__name__}` "
                    "stack component. This is currently only a warning, "
                    "but future versions of ZenML will require you to pass "
                    "in sensitive information as secrets. Check out the "
                    "documentation on how to configure your stack "
                    "components with secrets here: "
                    "https://docs.zenml.io/deploying-zenml/deploying-zenml/secret-management"
                )
            continue

        if pydantic_utils.has_validators(
            pydantic_class=self.__class__, field_name=key
        ):
            raise ValueError(
                f"Passing the stack component attribute `{key}` as a "
                "secret reference is not allowed as additional validation "
                "is required for this attribute."
            )

    super().__init__(**kwargs)

Attributes

is_local: bool property

Checks if this stack component is running locally.

Run:AI step operator never runs locally.

Returns:

Name	Type	Description
`bool`	`bool`	Always `False` because the Run:AI step operator is remote.

is_remote: bool property

Checks if this stack component is running remotely.

Run:AI step operator always runs remotely on Run:AI clusters.

Returns:

Name	Type	Description
`bool`	`bool`	Always `True` because the Run:AI step operator runs remotely.

`RunAIStepOperatorFlavor`

Bases: BaseStepOperatorFlavor

Run:AI step operator flavor.

Attributes

config_class: Type[RunAIStepOperatorConfig] property

Returns RunAIStepOperatorConfig config class.

Returns:

Type	Description
`Type[RunAIStepOperatorConfig]`	The config class.

display_name: str property

Display name of the flavor.

Returns:

Type	Description
`str`	The display name of the flavor.

docs_url: Optional[str] property

A url to point at docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor docs url.

implementation_class: Type[RunAIStepOperator] property

Implementation class for this flavor.

Returns:

Type	Description
`Type[RunAIStepOperator]`	The implementation class.

logo_url: str property

A url to represent the flavor in the dashboard.

Returns:

Type	Description
`str`	The flavor logo.

name: str property

Name of the flavor.

Returns:

Type	Description
`str`	The name of the flavor.

sdk_docs_url: Optional[str] property

A url to point at SDK docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor SDK docs url.

`RunAIStepOperatorSettings(warn_about_plain_text_secrets: bool = False, **kwargs: Any)`

Bases: BaseSettings

Per-step settings for Run:AI execution.

These settings can be configured per-step using the step decorator:

@step(
    step_operator="runai",
    settings={"step_operator": {"gpu_portion_request": 0.5}}
)
def my_step():
    ...

Source code in src/zenml/config/secret_reference_mixin.py

def __init__(
    self, warn_about_plain_text_secrets: bool = False, **kwargs: Any
) -> None:
    """Ensures that secret references are only passed for valid fields.

    This method ensures that secret references are not passed for fields
    that explicitly prevent them or require pydantic validation.

    Args:
        warn_about_plain_text_secrets: If true, then warns about using plain-text secrets.
        **kwargs: Arguments to initialize this object.

    Raises:
        ValueError: If an attribute that requires custom pydantic validation
            or an attribute which explicitly disallows secret references
            is passed as a secret reference.
    """
    for key, value in kwargs.items():
        try:
            field = self.__class__.model_fields[key]
        except KeyError:
            # Value for a private attribute or non-existing field, this
            # will fail during the upcoming pydantic validation
            continue

        if value is None:
            continue

        if not secret_utils.is_secret_reference(value):
            if (
                secret_utils.is_secret_field(field)
                and warn_about_plain_text_secrets
            ):
                logger.warning(
                    "You specified a plain-text value for the sensitive "
                    f"attribute `{key}`. This is currently only a warning, "
                    "but future versions of ZenML will require you to pass "
                    "in sensitive information as secrets. Check out the "
                    "documentation on how to configure values with secrets "
                    "here: https://docs.zenml.io/deploying-zenml/deploying-zenml/secret-management"
                )
            continue

        if secret_utils.is_clear_text_field(field):
            raise ValueError(
                f"Passing the `{key}` attribute as a secret reference is "
                "not allowed."
            )

        requires_validation = has_validators(
            pydantic_class=self.__class__, field_name=key
        )
        if requires_validation:
            raise ValueError(
                f"Passing the attribute `{key}` as a secret reference is "
                "not allowed as additional validation is required for "
                "this attribute."
            )

    super().__init__(**kwargs)

`RunAITolerationSettings`

Bases: BaseModel

Settings for a Kubernetes toleration on a Run:AI workload.

Modules

`runai_step_operator_flavor`

Run:AI step operator flavor.

Classes

RunAIStepOperatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)

Bases: BaseStepOperatorConfig, RunAIStepOperatorSettings

Configuration for the Run:AI step operator.

This step operator enables running individual pipeline steps on Run:AI clusters with fractional GPU allocation.

Example stack configuration:

zenml step-operator register runai \
    --flavor=runai \
    --client_id="xxx" \
    --client_secret="xxx" \
    --runai_base_url="https://myorg.run.ai" \
    --project_name="my-project"

Source code in src/zenml/stack/stack_component.py

def __init__(
    self, warn_about_plain_text_secrets: bool = False, **kwargs: Any
) -> None:
    """Ensures that secret references don't clash with pydantic validation.

    StackComponents allow the specification of all their string attributes
    using secret references of the form `{{secret_name.key}}`. This however
    is only possible when the stack component does not perform any explicit
    validation of this attribute using pydantic validators. If this were
    the case, the validation would run on the secret reference and would
    fail or in the worst case, modify the secret reference and lead to
    unexpected behavior. This method ensures that no attributes that require
    custom pydantic validation are set as secret references.

    Args:
        warn_about_plain_text_secrets: If true, then warns about using
            plain-text secrets.
        **kwargs: Arguments to initialize this stack component.

    Raises:
        ValueError: If an attribute that requires custom pydantic validation
            is passed as a secret reference, or if the `name` attribute
            was passed as a secret reference.
    """
    for key, value in kwargs.items():
        try:
            field = self.__class__.model_fields[key]
        except KeyError:
            # Value for a private attribute or non-existing field, this
            # will fail during the upcoming pydantic validation
            continue

        if value is None:
            continue

        if not secret_utils.is_secret_reference(value):
            if (
                secret_utils.is_secret_field(field)
                and warn_about_plain_text_secrets
            ):
                logger.warning(
                    "You specified a plain-text value for the sensitive "
                    f"attribute `{key}` for a `{self.__class__.__name__}` "
                    "stack component. This is currently only a warning, "
                    "but future versions of ZenML will require you to pass "
                    "in sensitive information as secrets. Check out the "
                    "documentation on how to configure your stack "
                    "components with secrets here: "
                    "https://docs.zenml.io/deploying-zenml/deploying-zenml/secret-management"
                )
            continue

        if pydantic_utils.has_validators(
            pydantic_class=self.__class__, field_name=key
        ):
            raise ValueError(
                f"Passing the stack component attribute `{key}` as a "
                "secret reference is not allowed as additional validation "
                "is required for this attribute."
            )

    super().__init__(**kwargs)

Attributes

is_local: bool property

Checks if this stack component is running locally.

Run:AI step operator never runs locally.

Returns:

Name	Type	Description
`bool`	`bool`	Always `False` because the Run:AI step operator is remote.

is_remote: bool property

Checks if this stack component is running remotely.

Run:AI step operator always runs remotely on Run:AI clusters.

Returns:

Name	Type	Description
`bool`	`bool`	Always `True` because the Run:AI step operator runs remotely.

RunAIStepOperatorFlavor

Bases: BaseStepOperatorFlavor

Run:AI step operator flavor.

Attributes

config_class: Type[RunAIStepOperatorConfig] property

Returns RunAIStepOperatorConfig config class.

Returns:

Type	Description
`Type[RunAIStepOperatorConfig]`	The config class.

display_name: str property

Display name of the flavor.

Returns:

Type	Description
`str`	The display name of the flavor.

docs_url: Optional[str] property

A url to point at docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor docs url.

implementation_class: Type[RunAIStepOperator] property

Implementation class for this flavor.

Returns:

Type	Description
`Type[RunAIStepOperator]`	The implementation class.

logo_url: str property

A url to represent the flavor in the dashboard.

Returns:

Type	Description
`str`	The flavor logo.

name: str property

Name of the flavor.

Returns:

Type	Description
`str`	The name of the flavor.

sdk_docs_url: Optional[str] property

A url to point at SDK docs explaining this flavor.

Returns:

Type	Description
`Optional[str]`	A flavor SDK docs url.

RunAIStepOperatorSettings(warn_about_plain_text_secrets: bool = False, **kwargs: Any)

Bases: BaseSettings

Per-step settings for Run:AI execution.

These settings can be configured per-step using the step decorator:

@step(
    step_operator="runai",
    settings={"step_operator": {"gpu_portion_request": 0.5}}
)
def my_step():
    ...

Source code in src/zenml/config/secret_reference_mixin.py

def __init__(
    self, warn_about_plain_text_secrets: bool = False, **kwargs: Any
) -> None:
    """Ensures that secret references are only passed for valid fields.

    This method ensures that secret references are not passed for fields
    that explicitly prevent them or require pydantic validation.

    Args:
        warn_about_plain_text_secrets: If true, then warns about using plain-text secrets.
        **kwargs: Arguments to initialize this object.

    Raises:
        ValueError: If an attribute that requires custom pydantic validation
            or an attribute which explicitly disallows secret references
            is passed as a secret reference.
    """
    for key, value in kwargs.items():
        try:
            field = self.__class__.model_fields[key]
        except KeyError:
            # Value for a private attribute or non-existing field, this
            # will fail during the upcoming pydantic validation
            continue

        if value is None:
            continue

        if not secret_utils.is_secret_reference(value):
            if (
                secret_utils.is_secret_field(field)
                and warn_about_plain_text_secrets
            ):
                logger.warning(
                    "You specified a plain-text value for the sensitive "
                    f"attribute `{key}`. This is currently only a warning, "
                    "but future versions of ZenML will require you to pass "
                    "in sensitive information as secrets. Check out the "
                    "documentation on how to configure values with secrets "
                    "here: https://docs.zenml.io/deploying-zenml/deploying-zenml/secret-management"
                )
            continue

        if secret_utils.is_clear_text_field(field):
            raise ValueError(
                f"Passing the `{key}` attribute as a secret reference is "
                "not allowed."
            )

        requires_validation = has_validators(
            pydantic_class=self.__class__, field_name=key
        )
        if requires_validation:
            raise ValueError(
                f"Passing the attribute `{key}` as a secret reference is "
                "not allowed as additional validation is required for "
                "this attribute."
            )

    super().__init__(**kwargs)

`runai_training_workload_settings`

Run:AI training workload settings.

Classes

RunAIConfigMapMountSettings

Bases: RunAIMountBase

Settings for a Run:AI ConfigMap storage mount.

Attributes

container_mount_path: str property

The absolute container path for this ConfigMap mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAIExternalURLSettings

Bases: _RunAIStrictSettings

Settings for exposing a Run:AI workload external URL.

RunAIHostPathMountSettings

Bases: RunAIMountBase

Settings for a Run:AI HostPath storage mount.

Attributes

container_mount_path: str property

The absolute container path for this HostPath mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAIMountBase

Bases: _RunAIStrictSettings

Common fields for Run:AI workload storage mount settings.

Attributes

container_mount_path: str property

The absolute container path where this mount is exposed.

Returns:

Type	Description
`str`	The absolute container mount path.

Raises:

Type	Description
`NotImplementedError`	Subclasses must override this property.

RunAINFSMountSettings

Bases: RunAIMountBase

Settings for a Run:AI NFS storage mount.

Attributes

container_mount_path: str property

The absolute container path for this NFS mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAIPVCMountSettings

Bases: RunAIMountBase

Settings for a Run:AI PVC storage mount.

Attributes

container_mount_path: str property

The absolute container path for this PVC mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAIPortSettings

Bases: _RunAIStrictSettings

Settings for exposing a container port on the Run:AI workload.

RunAIS3MountSettings

Bases: RunAIMountBase

Settings for a Run:AI S3 storage mount.

Attributes

container_mount_path: str property

The absolute container path for this S3 mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAISecretMountSettings

Bases: RunAIMountBase

Settings for a Run:AI Secret storage mount.

Attributes

container_mount_path: str property

The absolute container path for this Secret mount.

Returns:

Type	Description
`str`	The absolute container mount path.

RunAISecurityContextSettings

Bases: _RunAIStrictSettings

Settings for the Run:AI workload security context.

RunAITolerationSettings

Bases: BaseModel

Settings for a Kubernetes toleration on a Run:AI workload.

`step_operators`

Run:AI step operators.

Classes

`RunAIStepOperator(name: str, id: UUID, config: StackComponentConfig, flavor: str, type: StackComponentType, user: Optional[UUID], created: datetime, updated: datetime, environment: Optional[Dict[str, str]] = None, secrets: Optional[List[UUID]] = None, labels: Optional[Dict[str, Any]] = None, connector_requirements: Optional[ServiceConnectorRequirements] = None, connector: Optional[UUID] = None, connector_resource_id: Optional[str] = None, *args: Any, **kwargs: Any)`

Bases: BaseStepOperator

Step operator to run individual steps on Run:AI.

This step operator enables selective GPU offloading by running individual pipeline steps on Run:AI clusters.

Example usage:

@step(step_operator="runai")
def train_model(data):
    # GPU-intensive training runs on Run:AI
    ...

Source code in src/zenml/stack/stack_component.py

def __init__(
    self,
    name: str,
    id: UUID,
    config: StackComponentConfig,
    flavor: str,
    type: StackComponentType,
    user: Optional[UUID],
    created: datetime,
    updated: datetime,
    environment: Optional[Dict[str, str]] = None,
    secrets: Optional[List[UUID]] = None,
    labels: Optional[Dict[str, Any]] = None,
    connector_requirements: Optional[ServiceConnectorRequirements] = None,
    connector: Optional[UUID] = None,
    connector_resource_id: Optional[str] = None,
    *args: Any,
    **kwargs: Any,
):
    """Initializes a StackComponent.

    Args:
        name: The name of the component.
        id: The unique ID of the component.
        config: The config of the component.
        flavor: The flavor of the component.
        type: The type of the component.
        user: The ID of the user who created the component.
        created: The creation time of the component.
        updated: The last update time of the component.
        environment: Environment variables to set when running on this
            component.
        secrets: Secrets to set as environment variables when running on
            this component.
        labels: The labels of the component.
        connector_requirements: The requirements for the connector.
        connector: The ID of a connector linked to the component.
        connector_resource_id: The custom resource ID to access through
            the connector.
        *args: Additional positional arguments.
        **kwargs: Additional keyword arguments.

    Raises:
        ValueError: If a secret reference is passed as name.
    """
    if secret_utils.is_secret_reference(name):
        raise ValueError(
            "Passing the `name` attribute of a stack component as a "
            "secret reference is not allowed."
        )

    self.id = id
    self.name = name
    self._config = config
    self.flavor = flavor
    self.type = type
    self.user = user
    self.created = created
    self.updated = updated
    self.labels = labels
    self.environment = environment or {}
    self.secrets = secrets or []
    self.connector_requirements = connector_requirements
    self.connector = connector
    self.connector_resource_id = connector_resource_id
    self._connector_instance: Optional[ServiceConnector] = None

Attributes

client: RunAIClient property

Get or create the Run:AI client.

The client is cached for reuse across multiple calls.

Returns:

Type	Description
`RunAIClient`	The RunAIClient instance.

config: RunAIStepOperatorConfig property

Returns the step operator config.

Returns:

Type	Description
`RunAIStepOperatorConfig`	The configuration.

settings_class: Optional[Type[BaseSettings]] property

Settings class for the Run:AI step operator.

Returns:

Type	Description
`Optional[Type[BaseSettings]]`	The settings class.

validator: Optional[StackValidator] property

Validates the stack.

Returns:

Type	Description
`Optional[StackValidator]`	A validator that checks that the stack contains a remote container
`Optional[StackValidator]`	registry and a remote artifact store.

Functions

cancel(step_run: StepRunResponse) -> None

Cancels a submitted step.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def cancel(self, step_run: "StepRunResponse") -> None:
    """Cancels a submitted step.

    Args:
        step_run: The step run.
    """
    workload_id = self._get_workload_id(step_run)
    self.client.suspend_training_workload(workload_id)

get_docker_builds(snapshot: PipelineSnapshotBase) -> List[BuildConfiguration]

Gets the Docker builds required for the component.

Parameters:

Name	Type	Description	Default
`snapshot`	`PipelineSnapshotBase`	The pipeline snapshot for which to get the builds.	required

Returns:

Type	Description
`List[BuildConfiguration]`	The required Docker builds.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def get_docker_builds(
    self, snapshot: "PipelineSnapshotBase"
) -> List[BuildConfiguration]:
    """Gets the Docker builds required for the component.

    Args:
        snapshot: The pipeline snapshot for which to get the builds.

    Returns:
        The required Docker builds.
    """
    builds = []
    for step_name, step in snapshot.step_configurations.items():
        if step.config.uses_step_operator(self.name):
            build = BuildConfiguration(
                key=RUNAI_STEP_OPERATOR_DOCKER_IMAGE_KEY,
                settings=step.config.docker_settings,
                step_name=step_name,
            )
            builds.append(build)

    return builds

get_status(step_run: StepRunResponse) -> ExecutionStatus

Gets the status of a submitted step.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Returns:

Type	Description
`ExecutionStatus`	The step status.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def get_status(self, step_run: "StepRunResponse") -> ExecutionStatus:
    """Gets the status of a submitted step.

    Args:
        step_run: The step run.

    Returns:
        The step status.
    """
    workload_id = self._get_workload_id(step_run)
    try:
        status = self.client.get_training_workload_status(workload_id)
    except RunAIWorkloadNotFoundError:
        logger.warning(
            "Run:AI workload `%s` for step run `%s` was not found.",
            workload_id,
            step_run.id,
        )
        return ExecutionStatus.FAILED

    if status is None:
        logger.warning(
            "Run:AI workload `%s` for step run `%s` has no status.",
            workload_id,
            step_run.id,
        )
        return ExecutionStatus.FAILED

    return map_runai_status_to_execution_status(status)

submit(info: StepRunInfo, entrypoint_command: List[str], environment: Dict[str, str]) -> None

Submits a step to Run:AI as a training workload.

Parameters:

Name	Type	Description	Default
`info`	`StepRunInfo`	Information about the step run.	required
`entrypoint_command`	`List[str]`	Command that executes the step.	required
`environment`	`Dict[str, str]`	Environment variables to set in the step operator environment.	required

Raises:

Type	Description
`RuntimeError`	If building or submitting the Run:AI training request fails.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def submit(
    self,
    info: "StepRunInfo",
    entrypoint_command: List[str],
    environment: Dict[str, str],
) -> None:
    """Submits a step to Run:AI as a training workload.

    Args:
        info: Information about the step run.
        entrypoint_command: Command that executes the step.
        environment: Environment variables to set in the step operator
            environment.

    Raises:
        RuntimeError: If building or submitting the Run:AI training request fails.
    """
    settings = cast(RunAIStepOperatorSettings, self.get_settings(info))

    image = info.get_image(key=RUNAI_STEP_OPERATOR_DOCKER_IMAGE_KEY)

    project_id, cluster_id = self._resolve_project_and_cluster()

    workload_name = self._build_workload_name(info)

    try:
        training_request = self._build_training_request(
            settings=settings,
            image=image,
            workload_name=workload_name,
            project_id=project_id,
            cluster_id=cluster_id,
            entrypoint_command=entrypoint_command,
            environment=environment,
        )
    except ValueError as exc:
        raise RuntimeError(
            f"Failed to build Run:AI training request for step "
            f"'{info.pipeline_step_name}': {exc}"
        ) from exc

    info.force_write_logs()

    try:
        result = self.client.create_training_workload(training_request)
        logger.info(
            "Submitted step '%s' to Run:AI as workload '%s' (ID: %s)",
            info.pipeline_step_name,
            result.workload_name,
            result.workload_id,
        )
    except RunAIClientError as exc:
        raise RuntimeError(
            f"Failed to submit step '{info.pipeline_step_name}' to Run:AI: {exc}. "
            "Verify credentials, project name, cluster access, and quota."
        ) from exc

    publish_step_run_metadata(
        info.step_run_id,
        {
            self.id: {
                RUNAI_WORKLOAD_ID_METADATA_KEY: result.workload_id,
                RUNAI_WORKLOAD_NAME_METADATA_KEY: result.workload_name,
            }
        },
    )
    info.step_run.run_metadata[RUNAI_WORKLOAD_ID_METADATA_KEY] = (
        result.workload_id
    )
    info.step_run.run_metadata[RUNAI_WORKLOAD_NAME_METADATA_KEY] = (
        result.workload_name
    )

wait(step_run: StepRunResponse) -> ExecutionStatus

Waits for a submitted step to finish.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Returns:

Type	Description
`ExecutionStatus`	The final step status.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def wait(self, step_run: "StepRunResponse") -> ExecutionStatus:
    """Waits for a submitted step to finish.

    Args:
        step_run: The step run.

    Returns:
        The final step status.
    """
    settings = cast(RunAIStepOperatorSettings, self.get_settings(step_run))
    workload_id = self._get_workload_id(step_run)
    status = self._wait_for_completion(
        client=self.client,
        workload_id=workload_id,
        settings=settings,
    )
    logger.info("Run:AI step operator job completed.")
    return status

Modules

`runai_step_operator`

Run:AI step operator implementation.

Classes

RunAIStepOperator(name: str, id: UUID, config: StackComponentConfig, flavor: str, type: StackComponentType, user: Optional[UUID], created: datetime, updated: datetime, environment: Optional[Dict[str, str]] = None, secrets: Optional[List[UUID]] = None, labels: Optional[Dict[str, Any]] = None, connector_requirements: Optional[ServiceConnectorRequirements] = None, connector: Optional[UUID] = None, connector_resource_id: Optional[str] = None, *args: Any, **kwargs: Any)

Bases: BaseStepOperator

Step operator to run individual steps on Run:AI.

This step operator enables selective GPU offloading by running individual pipeline steps on Run:AI clusters.

Example usage:

@step(step_operator="runai")
def train_model(data):
    # GPU-intensive training runs on Run:AI
    ...

Source code in src/zenml/stack/stack_component.py

def __init__(
    self,
    name: str,
    id: UUID,
    config: StackComponentConfig,
    flavor: str,
    type: StackComponentType,
    user: Optional[UUID],
    created: datetime,
    updated: datetime,
    environment: Optional[Dict[str, str]] = None,
    secrets: Optional[List[UUID]] = None,
    labels: Optional[Dict[str, Any]] = None,
    connector_requirements: Optional[ServiceConnectorRequirements] = None,
    connector: Optional[UUID] = None,
    connector_resource_id: Optional[str] = None,
    *args: Any,
    **kwargs: Any,
):
    """Initializes a StackComponent.

    Args:
        name: The name of the component.
        id: The unique ID of the component.
        config: The config of the component.
        flavor: The flavor of the component.
        type: The type of the component.
        user: The ID of the user who created the component.
        created: The creation time of the component.
        updated: The last update time of the component.
        environment: Environment variables to set when running on this
            component.
        secrets: Secrets to set as environment variables when running on
            this component.
        labels: The labels of the component.
        connector_requirements: The requirements for the connector.
        connector: The ID of a connector linked to the component.
        connector_resource_id: The custom resource ID to access through
            the connector.
        *args: Additional positional arguments.
        **kwargs: Additional keyword arguments.

    Raises:
        ValueError: If a secret reference is passed as name.
    """
    if secret_utils.is_secret_reference(name):
        raise ValueError(
            "Passing the `name` attribute of a stack component as a "
            "secret reference is not allowed."
        )

    self.id = id
    self.name = name
    self._config = config
    self.flavor = flavor
    self.type = type
    self.user = user
    self.created = created
    self.updated = updated
    self.labels = labels
    self.environment = environment or {}
    self.secrets = secrets or []
    self.connector_requirements = connector_requirements
    self.connector = connector
    self.connector_resource_id = connector_resource_id
    self._connector_instance: Optional[ServiceConnector] = None

Attributes

client: RunAIClient property

Get or create the Run:AI client.

The client is cached for reuse across multiple calls.

Returns:

Type	Description
`RunAIClient`	The RunAIClient instance.

config: RunAIStepOperatorConfig property

Returns the step operator config.

Returns:

Type	Description
`RunAIStepOperatorConfig`	The configuration.

settings_class: Optional[Type[BaseSettings]] property

Settings class for the Run:AI step operator.

Returns:

Type	Description
`Optional[Type[BaseSettings]]`	The settings class.

validator: Optional[StackValidator] property

Validates the stack.

Returns:

Type	Description
`Optional[StackValidator]`	A validator that checks that the stack contains a remote container
`Optional[StackValidator]`	registry and a remote artifact store.

Functions

cancel(step_run: StepRunResponse) -> None

Cancels a submitted step.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def cancel(self, step_run: "StepRunResponse") -> None:
    """Cancels a submitted step.

    Args:
        step_run: The step run.
    """
    workload_id = self._get_workload_id(step_run)
    self.client.suspend_training_workload(workload_id)

get_docker_builds(snapshot: PipelineSnapshotBase) -> List[BuildConfiguration]

Gets the Docker builds required for the component.

Parameters:

Name	Type	Description	Default
`snapshot`	`PipelineSnapshotBase`	The pipeline snapshot for which to get the builds.	required

Returns:

Type	Description
`List[BuildConfiguration]`	The required Docker builds.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def get_docker_builds(
    self, snapshot: "PipelineSnapshotBase"
) -> List[BuildConfiguration]:
    """Gets the Docker builds required for the component.

    Args:
        snapshot: The pipeline snapshot for which to get the builds.

    Returns:
        The required Docker builds.
    """
    builds = []
    for step_name, step in snapshot.step_configurations.items():
        if step.config.uses_step_operator(self.name):
            build = BuildConfiguration(
                key=RUNAI_STEP_OPERATOR_DOCKER_IMAGE_KEY,
                settings=step.config.docker_settings,
                step_name=step_name,
            )
            builds.append(build)

    return builds

get_status(step_run: StepRunResponse) -> ExecutionStatus

Gets the status of a submitted step.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Returns:

Type	Description
`ExecutionStatus`	The step status.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def get_status(self, step_run: "StepRunResponse") -> ExecutionStatus:
    """Gets the status of a submitted step.

    Args:
        step_run: The step run.

    Returns:
        The step status.
    """
    workload_id = self._get_workload_id(step_run)
    try:
        status = self.client.get_training_workload_status(workload_id)
    except RunAIWorkloadNotFoundError:
        logger.warning(
            "Run:AI workload `%s` for step run `%s` was not found.",
            workload_id,
            step_run.id,
        )
        return ExecutionStatus.FAILED

    if status is None:
        logger.warning(
            "Run:AI workload `%s` for step run `%s` has no status.",
            workload_id,
            step_run.id,
        )
        return ExecutionStatus.FAILED

    return map_runai_status_to_execution_status(status)

submit(info: StepRunInfo, entrypoint_command: List[str], environment: Dict[str, str]) -> None

Submits a step to Run:AI as a training workload.

Parameters:

Name	Type	Description	Default
`info`	`StepRunInfo`	Information about the step run.	required
`entrypoint_command`	`List[str]`	Command that executes the step.	required
`environment`	`Dict[str, str]`	Environment variables to set in the step operator environment.	required

Raises:

Type	Description
`RuntimeError`	If building or submitting the Run:AI training request fails.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def submit(
    self,
    info: "StepRunInfo",
    entrypoint_command: List[str],
    environment: Dict[str, str],
) -> None:
    """Submits a step to Run:AI as a training workload.

    Args:
        info: Information about the step run.
        entrypoint_command: Command that executes the step.
        environment: Environment variables to set in the step operator
            environment.

    Raises:
        RuntimeError: If building or submitting the Run:AI training request fails.
    """
    settings = cast(RunAIStepOperatorSettings, self.get_settings(info))

    image = info.get_image(key=RUNAI_STEP_OPERATOR_DOCKER_IMAGE_KEY)

    project_id, cluster_id = self._resolve_project_and_cluster()

    workload_name = self._build_workload_name(info)

    try:
        training_request = self._build_training_request(
            settings=settings,
            image=image,
            workload_name=workload_name,
            project_id=project_id,
            cluster_id=cluster_id,
            entrypoint_command=entrypoint_command,
            environment=environment,
        )
    except ValueError as exc:
        raise RuntimeError(
            f"Failed to build Run:AI training request for step "
            f"'{info.pipeline_step_name}': {exc}"
        ) from exc

    info.force_write_logs()

    try:
        result = self.client.create_training_workload(training_request)
        logger.info(
            "Submitted step '%s' to Run:AI as workload '%s' (ID: %s)",
            info.pipeline_step_name,
            result.workload_name,
            result.workload_id,
        )
    except RunAIClientError as exc:
        raise RuntimeError(
            f"Failed to submit step '{info.pipeline_step_name}' to Run:AI: {exc}. "
            "Verify credentials, project name, cluster access, and quota."
        ) from exc

    publish_step_run_metadata(
        info.step_run_id,
        {
            self.id: {
                RUNAI_WORKLOAD_ID_METADATA_KEY: result.workload_id,
                RUNAI_WORKLOAD_NAME_METADATA_KEY: result.workload_name,
            }
        },
    )
    info.step_run.run_metadata[RUNAI_WORKLOAD_ID_METADATA_KEY] = (
        result.workload_id
    )
    info.step_run.run_metadata[RUNAI_WORKLOAD_NAME_METADATA_KEY] = (
        result.workload_name
    )

wait(step_run: StepRunResponse) -> ExecutionStatus

Waits for a submitted step to finish.

Parameters:

Name	Type	Description	Default
`step_run`	`StepRunResponse`	The step run.	required

Returns:

Type	Description
`ExecutionStatus`	The final step status.

Source code in src/zenml/integrations/runai/step_operators/runai_step_operator.py

def wait(self, step_run: "StepRunResponse") -> ExecutionStatus:
    """Waits for a submitted step to finish.

    Args:
        step_run: The step run.

    Returns:
        The final step status.
    """
    settings = cast(RunAIStepOperatorSettings, self.get_settings(step_run))
    workload_id = self._get_workload_id(step_run)
    status = self._wait_for_completion(
        client=self.client,
        workload_id=workload_id,
        settings=settings,
    )
    logger.info("Run:AI step operator job completed.")
    return status

Functions

Runai

zenml.integrations.runai

Attributes

RUNAI = 'runai' module-attribute

RUNAI_STEP_OPERATOR_FLAVOR = 'runai' module-attribute

Classes

Flavor

Attributes

config_class: Type[StackComponentConfig] abstractmethod property

config_schema: Dict[str, Any] property

display_name: Optional[str] property

docs_url: Optional[str] property

implementation_class: Type[StackComponent] abstractmethod property

logo_url: Optional[str] property

name: str abstractmethod property

sdk_docs_url: Optional[str] property

service_connector_requirements: Optional[ServiceConnectorRequirements] property

type: StackComponentType abstractmethod property

Functions

from_model(flavor_model: FlavorResponse) -> Flavor classmethod

generate_default_docs_url() -> str

generate_default_sdk_docs_url() -> str

to_model(integration: Optional[str] = None, is_custom: bool = True) -> FlavorRequest

Integration

Functions

activate() -> None classmethod

check_installation() -> bool classmethod

flavors() -> List[Type[Flavor]] classmethod

get_requirements(target_os: Optional[str] = None, python_version: Optional[str] = None) -> List[str] classmethod

get_uninstall_requirements(target_os: Optional[str] = None) -> List[str] classmethod

RunAIIntegration

Functions

flavors() -> List[Type[Flavor]] classmethod

Modules

client

Classes

RunAIAuthenticationError

RunAIClient(client_id: str, client_secret: str, runai_base_url: str)

RunAIClientError

RunAICluster

RunAIClusterNotFoundError(cluster_name: str, available: List[str])

RunAIConnectionError

RunAIProject

RunAIProjectNotFoundError(project_name: str, available: List[str])

RunAITrainingWorkload

RunAIWorkloadNotFoundError(workload_id: str)

WorkloadSubmissionResult

Modules

runai_client

constants

Classes

RunAIWorkloadStatus

Functions

is_failure_status(status: str) -> bool

is_pending_status(status: str) -> bool

is_success_status(status: str) -> bool

map_runai_status_to_execution_status(runai_status: str) -> ExecutionStatus

flavors

Classes

RunAIConfigMapMountSettings

RunAIExternalURLSettings

RunAIHostPathMountSettings

RunAINFSMountSettings

RunAIPVCMountSettings

RunAIPortSettings

RunAIS3MountSettings

RunAISecretMountSettings

RunAISecurityContextSettings

RunAIStepOperatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)

RunAIStepOperatorFlavor

RunAIStepOperatorSettings(warn_about_plain_text_secrets: bool = False, **kwargs: Any)

RunAITolerationSettings

Modules

runai_step_operator_flavor

runai_training_workload_settings

step_operators

Classes

Modules

runai_step_operator

`zenml.integrations.runai`

`RUNAI = 'runai'` `module-attribute`

`RUNAI_STEP_OPERATOR_FLAVOR = 'runai'` `module-attribute`

`Flavor`

`config_class: Type[StackComponentConfig]` `abstractmethod` `property`

`config_schema: Dict[str, Any]` `property`

`display_name: Optional[str]` `property`

`docs_url: Optional[str]` `property`

`implementation_class: Type[StackComponent]` `abstractmethod` `property`

`logo_url: Optional[str]` `property`

`name: str` `abstractmethod` `property`

`sdk_docs_url: Optional[str]` `property`

`service_connector_requirements: Optional[ServiceConnectorRequirements]` `property`

`type: StackComponentType` `abstractmethod` `property`

`from_model(flavor_model: FlavorResponse) -> Flavor` `classmethod`

`generate_default_docs_url() -> str`

`generate_default_sdk_docs_url() -> str`

`to_model(integration: Optional[str] = None, is_custom: bool = True) -> FlavorRequest`

`Integration`

`activate() -> None` `classmethod`

`check_installation() -> bool` `classmethod`

`flavors() -> List[Type[Flavor]]` `classmethod`

`get_requirements(target_os: Optional[str] = None, python_version: Optional[str] = None) -> List[str]` `classmethod`

`get_uninstall_requirements(target_os: Optional[str] = None) -> List[str]` `classmethod`

`RunAIIntegration`

`flavors() -> List[Type[Flavor]]` `classmethod`

`client`

`RunAIAuthenticationError`

`RunAIClient(client_id: str, client_secret: str, runai_base_url: str)`

`RunAIClientError`

`RunAICluster`

`RunAIClusterNotFoundError(cluster_name: str, available: List[str])`

`RunAIConnectionError`

`RunAIProject`

`RunAIProjectNotFoundError(project_name: str, available: List[str])`

`RunAITrainingWorkload`

`RunAIWorkloadNotFoundError(workload_id: str)`

`WorkloadSubmissionResult`

`runai_client`

`constants`

`RunAIWorkloadStatus`

`is_failure_status(status: str) -> bool`

`is_pending_status(status: str) -> bool`

`is_success_status(status: str) -> bool`

`map_runai_status_to_execution_status(runai_status: str) -> ExecutionStatus`

`flavors`

`RunAIConfigMapMountSettings`

`RunAIExternalURLSettings`

`RunAIHostPathMountSettings`

`RunAINFSMountSettings`

`RunAIPVCMountSettings`

`RunAIPortSettings`

`RunAIS3MountSettings`

`RunAISecretMountSettings`

`RunAISecurityContextSettings`

`RunAIStepOperatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)`

`RunAIStepOperatorFlavor`

`RunAIStepOperatorSettings(warn_about_plain_text_secrets: bool = False, **kwargs: Any)`

`RunAITolerationSettings`

`runai_step_operator_flavor`

`runai_training_workload_settings`

`step_operators`

`runai_step_operator`