Welcome to the ZenML Api Docs
Core
The core
module is where all the base ZenML functionality is defined,
including a Pydantic base class for components, a git wrapper and a class for ZenML's own
repository methods.
This module is also where the local service functionality (which keeps track of all the ZenML
components) is defined. Every ZenML project has its own ZenML repository, and
the repo
module is where associated methods are defined. The
repo.init_repo
method is where all our functionality is kickstarted
when you first initialize everything through the `zenml init
CLI command.
Pipelines
A ZenML pipeline is a sequence of tasks that execute in a specific order and
yield artifacts. The artifacts are stored within the artifact store and indexed
via the metadata store. Each individual task within a pipeline is known as a
step. The standard pipelines within ZenML are designed to have easy interfaces
to add pre-decided steps, with the order also pre-decided. Other sorts of
pipelines can be created as well from scratch, building on the BasePipeline
class.
Pipelines can be written as simple functions. They are created by using decorators appropriate to the specific use case you have. The moment it is run
, a pipeline is compiled and passed directly to the orchestrator.
Materializers
Materializers are used to convert a ZenML artifact into a specific format. They
are most often used to handle the input or output of ZenML steps, and can be
extended by building on the BaseMaterializer
class.
Steps
A step is a single piece or stage of a ZenML pipeline. Think of each step as being one of the nodes of a Directed Acyclic Graph (or DAG). Steps are responsible for one aspect of processing or interacting with the data / artifacts in the pipeline.
ZenML currently implements a basic step interface, but there will be other more customized interfaces (layered in a hierarchy) for specialized implementations. Conceptually, a Step is a discrete and independent part of a pipeline that is responsible for one particular aspect of data manipulation inside a ZenML pipeline.
Steps can be subclassed from the BaseStep
class, or used via our @step
decorator.
Artifact Stores
An artifact store is a place where artifacts are stored. These artifacts may have been produced by the pipeline steps, or they may be the data first ingested into a pipeline via an ingestion step.
Definitions of the BaseArtifactStore
class and the LocalArtifactStore
that builds on it are in this module.
Other artifact stores corresponding to specific integrations are to be found in
the integrations
module. For example, the GCPArtifactStore
, used when
running ZenML on Google Cloud Platform, is defined in
integrations.gcp.artifact_stores
.
Constants
Config
The config
module contains classes and functions that manage user-specific
configuration. ZenML's configuration is stored in a file called
.zenglobal.json
, located on the user's directory for configuration files.
(The exact location differs from operating system to operating system.)
The GlobalConfig
class is the main class in this module. It provides a
Pydantic configuration object that is used to store and retrieve configuration.
This GlobalConfig
object handles the serialization and deserialization of
the configuration options that are stored in the file in order to persist the
configuration across sessions.
Post Execution
After executing a pipeline, the user needs to be able to fetch it from history and perform certain tasks. The post_execution submodule provides a set of interfaces with which the user can interact with artifacts, the pipeline, steps, and the post-run pipeline object.
Logger
Utils
The utils
module contains utility functions handling analytics, reading and
writing YAML data as well as other general purpose functions.
Exceptions
Orchestrators
An orchestrator is a special kind of backend that manages the running of each step of the pipeline. Orchestrators administer the actual pipeline runs. You can think of it as the 'root' of any pipeline job that you run during your experimentation.
ZenML supports a local orchestrator out of the box which allows you to run your pipelines in a local environment. We also support using Apache Airflow as the orchestrator to handle the steps of your pipeline.
Container Registries
Io
The io
module handles file operations for the ZenML package. It offers a
standard interface for reading, writing and manipulating files and directories.
It is heavily influenced and inspired by the io
module of tfx
.
Visualizers
The visualizers
module offers a way of constructing and displaying
visualizations of steps and pipeline results. The BaseVisualizer
class is at
the root of all the other visualizers, including options to view the results of
pipeline runs, steps and pipelines themselves.
Artifacts
Artifacts are the data that power your experimentation and model training. It is actually steps that produce artifacts, which are then stored in the artifact store. Artifacts are written in the signature of a step like so:
.. code:: python
// Some code def my_step(first_artifact: int, second_artifact: torch.nn.Module -> int: # first_artifact is an integer # second_artifact is a torch.nn.Module return 1
Artifacts can be serialized and deserialized (i.e. written and read from the
Artifact Store) in various ways like TFRecords
or saved model
pickles, depending on what the step produces.The serialization and
deserialization logic of artifacts is defined by the appropriate Materializer.
Integrations
The ZenML integrations module contains sub-modules for each integration that we
support. This includes orchestrators like Apache Airflow, visualization tools
like the facets
library, as well as deep learning libraries like PyTorch.
Metadata Stores
The configuration of each pipeline, step, backend, and produced artifacts are
all tracked within the metadata store. The metadata store is an SQL database,
and can be sqlite
or mysql
.
Metadata are the pieces of information tracked about the pipelines, experiments and configurations that you are running with ZenML. Metadata are stored inside the metadata store.