Skip to content

Artifacts

zenml.artifacts special

external_artifact

External artifact definition.

ExternalArtifact (ExternalArtifactConfiguration) pydantic-model

External artifacts can be used to provide values as input to ZenML steps.

ZenML steps accept either artifacts (=outputs of other steps), parameters (raw, JSON serializable values) or external artifacts. External artifacts can be used to provide any value as input to a step without needing to write an additional step that returns this value.

This class can be configured using the following parameters: - value: The artifact value (any python object), that will be uploaded to the artifact store. - id: The ID of an artifact that is already registered in ZenML. - pipeline_name & artifact_name: Name of a pipeline and artifact to search in latest run. - model_name & model_version & model_artifact_name & model_artifact_version: Name of a model, model version, model artifact and artifact version to search.

Parameters:

Name Type Description Default
value

The artifact value.

required
id

The ID of an artifact that should be referenced by this external artifact.

required
pipeline_name

Name of a pipeline to search for artifact in latest run.

required
artifact_name

Name of an artifact to be searched in latest pipeline run.

required
model_name

Name of a model to search for artifact in (if None - derived from step context).

required
model_version

Version of a model to search for artifact in (if None - derived from step context).

required
model_artifact_name

Name of a model artifact to search for.

required
model_artifact_version

Version of a model artifact to search for.

required
materializer

The materializer to use for saving the artifact value to the artifact store. Only used when value is provided.

required
store_artifact_metadata

Whether metadata for the artifact should be stored. Only used when value is provided.

required
store_artifact_visualizations

Whether visualizations for the artifact should be stored. Only used when value is provided.

required

Examples:

from zenml import step, pipeline
from zenml.artifacts.external_artifact import ExternalArtifact
import numpy as np

@step
def my_step(value: np.ndarray) -> None:
  print(value)

my_array = np.array([1, 2, 3])

@pipeline
def my_pipeline():
  my_step(value=ExternalArtifact(my_array))
Source code in zenml/artifacts/external_artifact.py
class ExternalArtifact(ExternalArtifactConfiguration):
    """External artifacts can be used to provide values as input to ZenML steps.

    ZenML steps accept either artifacts (=outputs of other steps), parameters
    (raw, JSON serializable values) or external artifacts. External artifacts
    can be used to provide any value as input to a step without needing to
    write an additional step that returns this value.

    This class can be configured using the following parameters:
    - value: The artifact value (any python object), that will be uploaded to the
        artifact store.
    - id: The ID of an artifact that is already registered in ZenML.
    - pipeline_name & artifact_name: Name of a pipeline and artifact to search in
        latest run.
    - model_name & model_version & model_artifact_name & model_artifact_version: Name of a
        model, model version, model artifact and artifact version to search.

    Args:
        value: The artifact value.

        id: The ID of an artifact that should be referenced by this external
            artifact.

        pipeline_name: Name of a pipeline to search for artifact in latest run.
        artifact_name: Name of an artifact to be searched in latest pipeline run.

        model_name: Name of a model to search for artifact in (if None - derived from step context).
        model_version: Version of a model to search for artifact in (if None - derived from step context).
        model_artifact_name: Name of a model artifact to search for.
        model_artifact_version: Version of a model artifact to search for.

        materializer: The materializer to use for saving the artifact value
            to the artifact store. Only used when `value` is provided.
        store_artifact_metadata: Whether metadata for the artifact should
            be stored. Only used when `value` is provided.
        store_artifact_visualizations: Whether visualizations for the
            artifact should be stored. Only used when `value` is provided.

    Example:
    ```
    from zenml import step, pipeline
    from zenml.artifacts.external_artifact import ExternalArtifact
    import numpy as np

    @step
    def my_step(value: np.ndarray) -> None:
      print(value)

    my_array = np.array([1, 2, 3])

    @pipeline
    def my_pipeline():
      my_step(value=ExternalArtifact(my_array))
    ```
    """

    value: Optional[Any] = None
    materializer: Optional[MaterializerClassOrSource] = None
    store_artifact_metadata: bool = True
    store_artifact_visualizations: bool = True

    @root_validator
    def _validate_all(cls, values: Dict[str, Any]) -> Dict[str, Any]:
        value = values.get("value", None)
        id = values.get("id", None)
        pipeline_name = values.get("pipeline_name", None)
        artifact_name = values.get("artifact_name", None)
        model_name = values.get("model_name", None)
        model_version = values.get("model_version", None)
        model_artifact_name = values.get("model_artifact_name", None)

        if (value is not None) + (id is not None) + (
            pipeline_name is not None and artifact_name is not None
        ) + (model_artifact_name is not None) > 1:
            raise ValueError(
                "Only a value, an ID, pipeline/artifact name pair or "
                "model name/model version/model artifact name group can be "
                "provided when creating an external artifact."
            )
        elif all(
            v is None
            for v in [
                value,
                id,
                pipeline_name or artifact_name,
                model_name or model_version or model_artifact_name,
            ]
        ):
            raise ValueError(
                "Either a value, an ID, pipeline/artifact name pair or "
                "model name/model version/model artifact name group must be "
                "provided when creating an external artifact."
            )
        elif (pipeline_name is None) != (artifact_name is None):
            raise ValueError(
                "`pipeline_name` and `artifact_name` can be only provided "
                "together when creating an external artifact."
            )
        return values

    def upload_by_value(self) -> UUID:
        """Uploads the artifact by value.

        Returns:
            The uploaded artifact ID.

        Raises:
            RuntimeError: If artifact URI already exists.
        """
        from zenml.client import Client
        from zenml.utils.artifact_utils import upload_artifact

        client = Client()

        artifact_store_id = client.active_stack.artifact_store.id

        logger.info("Uploading external artifact...")
        artifact_name = f"external_{uuid4()}"
        materializer_class = self._get_materializer_class(value=self.value)

        uri = os.path.join(
            client.active_stack.artifact_store.path,
            "external_artifacts",
            artifact_name,
        )
        if fileio.exists(uri):
            raise RuntimeError(f"Artifact URI '{uri}' already exists.")
        fileio.makedirs(uri)

        materializer = materializer_class(uri)

        artifact_id: UUID = upload_artifact(
            name=artifact_name,
            data=self.value,
            materializer=materializer,
            artifact_store_id=artifact_store_id,
            extract_metadata=self.store_artifact_metadata,
            include_visualizations=self.store_artifact_visualizations,
        )

        # To avoid duplicate uploads, switch to referencing the uploaded
        # artifact by ID
        self.id = artifact_id
        # clean-up state after upload done
        self.value = None
        logger.info("Finished uploading external artifact %s.", artifact_id)

        return self.id

    @property
    def config(self) -> ExternalArtifactConfiguration:
        """Returns the lightweight config without hard for JSON properties.

        Returns:
            The config object to be evaluated in runtime by step interface.
        """
        return ExternalArtifactConfiguration(
            id=self.id,
            pipeline_name=self.pipeline_name,
            artifact_name=self.artifact_name,
            model_name=self.model_name,
            model_version=self.model_version,
            model_artifact_name=self.model_artifact_name,
            model_artifact_version=self.model_artifact_version,
            model_artifact_pipeline_name=self.model_artifact_pipeline_name,
            model_artifact_step_name=self.model_artifact_step_name,
        )

    def _get_materializer_class(self, value: Any) -> Type[BaseMaterializer]:
        """Gets a materializer class for a value.

        If a custom materializer is defined for this artifact it will be
        returned. Otherwise it will get the materializer class from the
        registry, falling back to the Cloudpickle materializer if no concrete
        materializer is registered for the type of value.

        Args:
            value: The value for which to get the materializer class.

        Returns:
            The materializer class.
        """
        from zenml.materializers.materializer_registry import (
            materializer_registry,
        )
        from zenml.utils import source_utils

        if isinstance(self.materializer, type):
            return self.materializer
        elif self.materializer:
            return source_utils.load_and_validate_class(
                self.materializer, expected_class=BaseMaterializer
            )
        else:
            return materializer_registry[type(value)]
config: ExternalArtifactConfiguration property readonly

Returns the lightweight config without hard for JSON properties.

Returns:

Type Description
ExternalArtifactConfiguration

The config object to be evaluated in runtime by step interface.

upload_by_value(self)

Uploads the artifact by value.

Returns:

Type Description
UUID

The uploaded artifact ID.

Exceptions:

Type Description
RuntimeError

If artifact URI already exists.

Source code in zenml/artifacts/external_artifact.py
def upload_by_value(self) -> UUID:
    """Uploads the artifact by value.

    Returns:
        The uploaded artifact ID.

    Raises:
        RuntimeError: If artifact URI already exists.
    """
    from zenml.client import Client
    from zenml.utils.artifact_utils import upload_artifact

    client = Client()

    artifact_store_id = client.active_stack.artifact_store.id

    logger.info("Uploading external artifact...")
    artifact_name = f"external_{uuid4()}"
    materializer_class = self._get_materializer_class(value=self.value)

    uri = os.path.join(
        client.active_stack.artifact_store.path,
        "external_artifacts",
        artifact_name,
    )
    if fileio.exists(uri):
        raise RuntimeError(f"Artifact URI '{uri}' already exists.")
    fileio.makedirs(uri)

    materializer = materializer_class(uri)

    artifact_id: UUID = upload_artifact(
        name=artifact_name,
        data=self.value,
        materializer=materializer,
        artifact_store_id=artifact_store_id,
        extract_metadata=self.store_artifact_metadata,
        include_visualizations=self.store_artifact_visualizations,
    )

    # To avoid duplicate uploads, switch to referencing the uploaded
    # artifact by ID
    self.id = artifact_id
    # clean-up state after upload done
    self.value = None
    logger.info("Finished uploading external artifact %s.", artifact_id)

    return self.id

external_artifact_config

External artifact definition.

ExternalArtifactConfiguration (BaseModel) pydantic-model

External artifact configuration.

Lightweight class to pass in the steps for runtime inference.

Source code in zenml/artifacts/external_artifact_config.py
class ExternalArtifactConfiguration(BaseModel):
    """External artifact configuration.

    Lightweight class to pass in the steps for runtime inference.
    """

    id: Optional[UUID] = None
    pipeline_name: Optional[str] = None
    artifact_name: Optional[str] = None
    model_name: Optional[str] = None
    model_version: Optional[Union[str, int, ModelStages]] = None
    model_artifact_name: Optional[str] = None
    model_artifact_version: Optional[str] = None
    model_artifact_pipeline_name: Optional[str] = None
    model_artifact_step_name: Optional[str] = None

    def _get_artifact_from_pipeline_run(self) -> "ArtifactResponseModel":
        """Get artifact from pipeline run.

        Returns:
            The fetched Artifact.

        Raises:
            RuntimeError: If artifact was not found in pipeline run.
        """
        from zenml.client import Client

        client = Client()

        response = None
        pipeline = client.get_pipeline(self.pipeline_name)  # type: ignore [arg-type]
        for artifact in pipeline.last_successful_run.artifacts:
            if artifact.name == self.artifact_name:
                response = artifact
                break

        if response is None:
            raise RuntimeError(
                f"Artifact with name `{self.artifact_name}` was not found "
                f"in last successful run of pipeline `{self.pipeline_name}`. "
                "Please check your inputs and try again."
            )

        return response

    def _get_artifact_from_model(
        self, model_config: Optional["ModelConfig"] = None
    ) -> "ArtifactResponseModel":
        """Get artifact from Model Control Plane.

        Args:
            model_config: The model containing the model version.

        Returns:
            The fetched Artifact.

        Raises:
            RuntimeError: If artifact was not found in model version
            RuntimeError: If `model_artifact_name` is set, but `model_name` is empty and
                model configuration is missing in @step and @pipeline.
        """
        from zenml.model.model_config import ModelConfig

        if self.model_name is None:
            if model_config is None:
                raise RuntimeError(
                    "ExternalArtifact initiated with `model_artifact_name`, "
                    "but no model config was provided and missing in @step or "
                    "@pipeline definitions."
                )
            self.model_name = model_config.name
            self.model_version = model_config.version

        _model_config = ModelConfig(
            name=self.model_name,
            version=self.model_version,
            suppress_warnings=True,
        )
        model_version = _model_config._get_model_version()

        for artifact_getter in [
            model_version.get_artifact_object,
            model_version.get_model_object,
            model_version.get_deployment,
        ]:
            response = artifact_getter(
                name=self.model_artifact_name,  # type: ignore [arg-type]
                version=self.model_artifact_version,
                pipeline_name=self.model_artifact_pipeline_name,
                step_name=self.model_artifact_step_name,
            )
            if response is not None:
                break

        if response is None:
            raise RuntimeError(
                f"Artifact with name `{self.model_artifact_name}` was not found "
                f"in model `{self.model_name}` version `{self.model_version}`. "
                "Please check your inputs and try again."
            )

        return response

    def get_artifact_id(
        self, model_config: Optional["ModelConfig"] = None
    ) -> UUID:
        """Get the artifact.

        - If an artifact is referenced by ID, it will verify that the artifact
          exists and is in the correct artifact store.
        - If an artifact is referenced by pipeline and artifact name pair, it
            will be searched in the artifact store by the referenced pipeline.
        - If an artifact is referenced by model name and model version, it will
            be searched in the artifact store by the referenced model.

        Args:
            model_config: The model config of the step (from step or pipeline).

        Returns:
            The artifact ID.

        Raises:
            RuntimeError: If the artifact store of the referenced artifact
                is not the same as the one in the active stack.
            RuntimeError: If the URI of the artifact already exists.
            RuntimeError: If `model_artifact_name` is set, but `model_name` is empty and
                model configuration is missing in @step and @pipeline.
            RuntimeError: If no value, id, pipeline/artifact name pair or model name/model version/model
                artifact name group is provided when creating an external artifact.
        """
        from zenml.client import Client

        client = Client()

        if self.id:
            response = client.get_artifact(artifact_id=self.id)
        elif self.pipeline_name and self.artifact_name:
            response = self._get_artifact_from_pipeline_run()
        elif self.model_artifact_name:
            response = self._get_artifact_from_model(model_config)
        else:
            raise RuntimeError(
                "Either an ID, pipeline/artifact name pair or "
                "model name/model version/model artifact name group can be "
                "provided when creating an external artifact configuration.\n"
                "Potential root cause: you instantiated an ExternalArtifact and "
                "called this method before `upload_by_value` was called."
            )

        artifact_store_id = client.active_stack.artifact_store.id
        if response.artifact_store_id != artifact_store_id:
            raise RuntimeError(
                f"The artifact {response.name} (ID: {response.id}) "
                "referenced by an external artifact is not stored in the "
                "artifact store of the active stack. This will lead to "
                "issues loading the artifact. Please make sure to only "
                "reference artifacts stored in your active artifact store."
            )

        self.id = response.id

        return self.id
get_artifact_id(self, model_config=None)

Get the artifact.

  • If an artifact is referenced by ID, it will verify that the artifact exists and is in the correct artifact store.
  • If an artifact is referenced by pipeline and artifact name pair, it will be searched in the artifact store by the referenced pipeline.
  • If an artifact is referenced by model name and model version, it will be searched in the artifact store by the referenced model.

Parameters:

Name Type Description Default
model_config Optional[ModelConfig]

The model config of the step (from step or pipeline).

None

Returns:

Type Description
UUID

The artifact ID.

Exceptions:

Type Description
RuntimeError

If the artifact store of the referenced artifact is not the same as the one in the active stack.

RuntimeError

If the URI of the artifact already exists.

RuntimeError

If model_artifact_name is set, but model_name is empty and model configuration is missing in @step and @pipeline.

RuntimeError

If no value, id, pipeline/artifact name pair or model name/model version/model artifact name group is provided when creating an external artifact.

Source code in zenml/artifacts/external_artifact_config.py
def get_artifact_id(
    self, model_config: Optional["ModelConfig"] = None
) -> UUID:
    """Get the artifact.

    - If an artifact is referenced by ID, it will verify that the artifact
      exists and is in the correct artifact store.
    - If an artifact is referenced by pipeline and artifact name pair, it
        will be searched in the artifact store by the referenced pipeline.
    - If an artifact is referenced by model name and model version, it will
        be searched in the artifact store by the referenced model.

    Args:
        model_config: The model config of the step (from step or pipeline).

    Returns:
        The artifact ID.

    Raises:
        RuntimeError: If the artifact store of the referenced artifact
            is not the same as the one in the active stack.
        RuntimeError: If the URI of the artifact already exists.
        RuntimeError: If `model_artifact_name` is set, but `model_name` is empty and
            model configuration is missing in @step and @pipeline.
        RuntimeError: If no value, id, pipeline/artifact name pair or model name/model version/model
            artifact name group is provided when creating an external artifact.
    """
    from zenml.client import Client

    client = Client()

    if self.id:
        response = client.get_artifact(artifact_id=self.id)
    elif self.pipeline_name and self.artifact_name:
        response = self._get_artifact_from_pipeline_run()
    elif self.model_artifact_name:
        response = self._get_artifact_from_model(model_config)
    else:
        raise RuntimeError(
            "Either an ID, pipeline/artifact name pair or "
            "model name/model version/model artifact name group can be "
            "provided when creating an external artifact configuration.\n"
            "Potential root cause: you instantiated an ExternalArtifact and "
            "called this method before `upload_by_value` was called."
        )

    artifact_store_id = client.active_stack.artifact_store.id
    if response.artifact_store_id != artifact_store_id:
        raise RuntimeError(
            f"The artifact {response.name} (ID: {response.id}) "
            "referenced by an external artifact is not stored in the "
            "artifact store of the active stack. This will lead to "
            "issues loading the artifact. Please make sure to only "
            "reference artifacts stored in your active artifact store."
        )

    self.id = response.id

    return self.id

unmaterialized_artifact

Unmaterialized artifact class.

UnmaterializedArtifact (ArtifactResponseModel) pydantic-model

Unmaterialized artifact class.

Typing a step input to have this type will cause ZenML to not materialize the artifact. This is useful for steps that need to access the artifact metadata instead of the actual artifact data.

Usage example:

from zenml import step
from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact

@step
def my_step(input_artifact: UnmaterializedArtifact):
    print(input_artifact.uri)
Source code in zenml/artifacts/unmaterialized_artifact.py
class UnmaterializedArtifact(ArtifactResponseModel):
    """Unmaterialized artifact class.

    Typing a step input to have this type will cause ZenML to not materialize
    the artifact. This is useful for steps that need to access the artifact
    metadata instead of the actual artifact data.

    Usage example:

    ```python
    from zenml import step
    from zenml.artifacts.unmaterialized_artifact import UnmaterializedArtifact

    @step
    def my_step(input_artifact: UnmaterializedArtifact):
        print(input_artifact.uri)
    ```
    """
__json_encoder__(obj) special staticmethod

partial(func, args, *keywords) - new function with partial application of the given arguments and keywords.