Great Expectations
zenml.integrations.great_expectations
Great Expectation integration for ZenML.
The Great Expectations integration enables you to use Great Expectations as a way of profiling and validating your data.
Attributes
GREAT_EXPECTATIONS = 'great_expectations'
module-attribute
GREAT_EXPECTATIONS_DATA_VALIDATOR_FLAVOR = 'great_expectations'
module-attribute
Classes
Flavor
Class for ZenML Flavors.
Attributes
config_class: Type[StackComponentConfig]
abstractmethod
property
Returns StackComponentConfig
config class.
Returns:
Type | Description |
---|---|
Type[StackComponentConfig]
|
The config class. |
config_schema: Dict[str, Any]
property
The config schema for a flavor.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The config schema. |
docs_url: Optional[str]
property
A url to point at docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor docs url. |
implementation_class: Type[StackComponent]
abstractmethod
property
Implementation class for this flavor.
Returns:
Type | Description |
---|---|
Type[StackComponent]
|
The implementation class for this flavor. |
logo_url: Optional[str]
property
A url to represent the flavor in the dashboard.
Returns:
Type | Description |
---|---|
Optional[str]
|
The flavor logo. |
name: str
abstractmethod
property
The flavor name.
Returns:
Type | Description |
---|---|
str
|
The flavor name. |
sdk_docs_url: Optional[str]
property
A url to point at SDK docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor SDK docs url. |
service_connector_requirements: Optional[ServiceConnectorRequirements]
property
Service connector resource requirements for service connectors.
Specifies resource requirements that are used to filter the available service connector types that are compatible with this flavor.
Returns:
Type | Description |
---|---|
Optional[ServiceConnectorRequirements]
|
Requirements for compatible service connectors, if a service |
Optional[ServiceConnectorRequirements]
|
connector is required for this flavor. |
type: StackComponentType
abstractmethod
property
Functions
from_model(flavor_model: FlavorResponse) -> Flavor
classmethod
Loads a flavor from a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
flavor_model
|
FlavorResponse
|
The model to load from. |
required |
Raises:
Type | Description |
---|---|
CustomFlavorImportError
|
If the custom flavor can't be imported. |
ImportError
|
If the flavor can't be imported. |
Returns:
Type | Description |
---|---|
Flavor
|
The loaded flavor. |
Source code in src/zenml/stack/flavor.py
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
generate_default_docs_url() -> str
Generate the doc urls for all inbuilt and integration flavors.
Note that this method is not going to be useful for custom flavors, which do not have any docs in the main zenml docs.
Returns:
Type | Description |
---|---|
str
|
The complete url to the zenml documentation |
Source code in src/zenml/stack/flavor.py
217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
|
generate_default_sdk_docs_url() -> str
Generate SDK docs url for a flavor.
Returns:
Type | Description |
---|---|
str
|
The complete url to the zenml SDK docs |
Source code in src/zenml/stack/flavor.py
243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 |
|
to_model(integration: Optional[str] = None, is_custom: bool = True) -> FlavorRequest
Converts a flavor to a model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
integration
|
Optional[str]
|
The integration to use for the model. |
None
|
is_custom
|
bool
|
Whether the flavor is a custom flavor. Custom flavors are then scoped by user and workspace |
True
|
Returns:
Type | Description |
---|---|
FlavorRequest
|
The model. |
Source code in src/zenml/stack/flavor.py
161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 |
|
GreatExpectationsIntegration
Bases: Integration
Definition of Great Expectations integration for ZenML.
Functions
activate() -> None
classmethod
Activate the Great Expectations integration.
Source code in src/zenml/integrations/great_expectations/__init__.py
37 38 39 40 |
|
flavors() -> List[Type[Flavor]]
classmethod
Declare the stack component flavors for the Great Expectations integration.
Returns:
Type | Description |
---|---|
List[Type[Flavor]]
|
List of stack component flavors for this integration. |
Source code in src/zenml/integrations/great_expectations/__init__.py
42 43 44 45 46 47 48 49 50 51 52 53 |
|
get_requirements(target_os: Optional[str] = None) -> List[str]
classmethod
Method to get the requirements for the integration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_os
|
Optional[str]
|
The target operating system to get the requirements for. |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
A list of requirements. |
Source code in src/zenml/integrations/great_expectations/__init__.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
Integration
Base class for integration in ZenML.
Functions
activate() -> None
classmethod
Abstract method to activate the integration.
Source code in src/zenml/integrations/integration.py
170 171 172 |
|
check_installation() -> bool
classmethod
Method to check whether the required packages are installed.
Returns:
Type | Description |
---|---|
bool
|
True if all required packages are installed, False otherwise. |
Source code in src/zenml/integrations/integration.py
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
|
flavors() -> List[Type[Flavor]]
classmethod
Abstract method to declare new stack component flavors.
Returns:
Type | Description |
---|---|
List[Type[Flavor]]
|
A list of new stack component flavors. |
Source code in src/zenml/integrations/integration.py
174 175 176 177 178 179 180 181 |
|
get_requirements(target_os: Optional[str] = None) -> List[str]
classmethod
Method to get the requirements for the integration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_os
|
Optional[str]
|
The target operating system to get the requirements for. |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
A list of requirements. |
Source code in src/zenml/integrations/integration.py
135 136 137 138 139 140 141 142 143 144 145 |
|
get_uninstall_requirements(target_os: Optional[str] = None) -> List[str]
classmethod
Method to get the uninstall requirements for the integration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_os
|
Optional[str]
|
The target operating system to get the requirements for. |
None
|
Returns:
Type | Description |
---|---|
List[str]
|
A list of requirements. |
Source code in src/zenml/integrations/integration.py
147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
|
plugin_flavors() -> List[Type[BasePluginFlavor]]
classmethod
Abstract method to declare new plugin flavors.
Returns:
Type | Description |
---|---|
List[Type[BasePluginFlavor]]
|
A list of new plugin flavors. |
Source code in src/zenml/integrations/integration.py
183 184 185 186 187 188 189 190 |
|
Modules
data_validators
Initialization of the Great Expectations data validator for ZenML.
Classes
GreatExpectationsDataValidator(name: str, id: UUID, config: StackComponentConfig, flavor: str, type: StackComponentType, user: Optional[UUID], workspace: UUID, created: datetime, updated: datetime, labels: Optional[Dict[str, Any]] = None, connector_requirements: Optional[ServiceConnectorRequirements] = None, connector: Optional[UUID] = None, connector_resource_id: Optional[str] = None, *args: Any, **kwargs: Any)
Bases: BaseDataValidator
Great Expectations data validator stack component.
Source code in src/zenml/stack/stack_component.py
328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 |
|
config: GreatExpectationsDataValidatorConfig
property
Returns the GreatExpectationsDataValidatorConfig
config.
Returns:
Type | Description |
---|---|
GreatExpectationsDataValidatorConfig
|
The configuration. |
context_config: Optional[DataContextConfig]
property
Get the Great Expectations data context configuration.
Raises:
Type | Description |
---|---|
ValueError
|
In case there is an invalid context_config value |
Returns:
Type | Description |
---|---|
Optional[DataContextConfig]
|
A dictionary with the GE data context configuration. |
data_context: AbstractDataContext
property
Returns the Great Expectations data context configured for this component.
Returns:
Type | Description |
---|---|
AbstractDataContext
|
The Great Expectations data context configured for this component. |
local_path: Optional[str]
property
Return a local path where this component stores information.
If an existing local GE data context is used, it is interpreted as a local path that needs to be accessible in all runtime environments.
Returns:
Type | Description |
---|---|
Optional[str]
|
The local path where this component stores information. |
root_directory: str
property
Returns path to the root directory for all local files concerning this data validator.
Returns:
Type | Description |
---|---|
str
|
Path to the root directory. |
data_profiling(dataset: pd.DataFrame, comparison_dataset: Optional[Any] = None, profile_list: Optional[Sequence[str]] = None, expectation_suite_name: Optional[str] = None, data_asset_name: Optional[str] = None, profiler_kwargs: Optional[Dict[str, Any]] = None, overwrite_existing_suite: bool = True, **kwargs: Any) -> ExpectationSuite
Infer a Great Expectation Expectation Suite from a given dataset.
This Great Expectations specific data profiling method implementation builds an Expectation Suite automatically by running a UserConfigurableProfiler on an input dataset as covered in the official GE documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset from which the expectation suite will be inferred. |
required |
comparison_dataset
|
Optional[Any]
|
Optional dataset used to generate data comparison (i.e. data drift) profiles. Not supported by the Great Expectation data validator. |
None
|
profile_list
|
Optional[Sequence[str]]
|
Optional list identifying the categories of data profiles to be generated. Not supported by the Great Expectation data validator. |
None
|
expectation_suite_name
|
Optional[str]
|
The name of the expectation suite to create or update. If not supplied, a unique name will be generated from the current pipeline and step name, if running in the context of a pipeline step. |
None
|
data_asset_name
|
Optional[str]
|
The name of the data asset to use to identify the dataset in the Great Expectations docs. |
None
|
profiler_kwargs
|
Optional[Dict[str, Any]]
|
A dictionary of custom keyword arguments to pass to the profiler. |
None
|
overwrite_existing_suite
|
bool
|
Whether to overwrite an existing expectation suite, if one exists with that name. |
True
|
kwargs
|
Any
|
Additional keyword arguments (unused). |
{}
|
Returns:
Type | Description |
---|---|
ExpectationSuite
|
The inferred Expectation Suite. |
Raises:
Type | Description |
---|---|
ValueError
|
if an |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 |
|
data_validation(dataset: pd.DataFrame, comparison_dataset: Optional[Any] = None, check_list: Optional[Sequence[str]] = None, expectation_suite_name: Optional[str] = None, data_asset_name: Optional[str] = None, action_list: Optional[List[Dict[str, Any]]] = None, **kwargs: Any) -> CheckpointResult
Great Expectations data validation.
This Great Expectations specific data validation method implementation validates an input dataset against an Expectation Suite (the GE definition of a profile) as covered in the official GE documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset to validate. |
required |
comparison_dataset
|
Optional[Any]
|
Optional dataset used to run data comparison (i.e. data drift) checks. Not supported by the Great Expectation data validator. |
None
|
check_list
|
Optional[Sequence[str]]
|
Optional list identifying the data validation checks to be performed. Not supported by the Great Expectations data validator. |
None
|
expectation_suite_name
|
Optional[str]
|
The name of the expectation suite to use to validate the dataset. A value must be provided. |
None
|
data_asset_name
|
Optional[str]
|
The name of the data asset to use to identify the dataset in the Great Expectations docs. |
None
|
action_list
|
Optional[List[Dict[str, Any]]]
|
A list of additional Great Expectations actions to run after the validation check. |
None
|
kwargs
|
Any
|
Additional keyword arguments (unused). |
{}
|
Returns:
Type | Description |
---|---|
CheckpointResult
|
The Great Expectations validation (checkpoint) result. |
Raises:
Type | Description |
---|---|
ValueError
|
if the |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 |
|
get_data_context() -> AbstractDataContext
classmethod
Get the Great Expectations data context managed by ZenML.
Call this method to retrieve the data context managed by ZenML through the active Great Expectations data validator stack component.
Returns:
Type | Description |
---|---|
AbstractDataContext
|
A Great Expectations data context managed by ZenML as configured |
AbstractDataContext
|
through the active data validator stack component. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
get_data_docs_config(prefix: str, local: bool = False) -> Dict[str, Any]
Generate Great Expectations data docs configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The path prefix for the ZenML data docs configuration |
required |
local
|
bool
|
Whether the data docs site is local or remote. |
False
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dictionary with the GE data docs site configuration. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
get_store_config(class_name: str, prefix: str) -> Dict[str, Any]
Generate a Great Expectations store configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_name
|
str
|
The store class name |
required |
prefix
|
str
|
The path prefix for the ZenML store configuration |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dictionary with the GE store configuration. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
Modules
ge_data_validator
Implementation of the Great Expectations data validator.
GreatExpectationsDataValidator(name: str, id: UUID, config: StackComponentConfig, flavor: str, type: StackComponentType, user: Optional[UUID], workspace: UUID, created: datetime, updated: datetime, labels: Optional[Dict[str, Any]] = None, connector_requirements: Optional[ServiceConnectorRequirements] = None, connector: Optional[UUID] = None, connector_resource_id: Optional[str] = None, *args: Any, **kwargs: Any)
Bases: BaseDataValidator
Great Expectations data validator stack component.
Source code in src/zenml/stack/stack_component.py
328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 |
|
config: GreatExpectationsDataValidatorConfig
property
Returns the GreatExpectationsDataValidatorConfig
config.
Returns:
Type | Description |
---|---|
GreatExpectationsDataValidatorConfig
|
The configuration. |
context_config: Optional[DataContextConfig]
property
Get the Great Expectations data context configuration.
Raises:
Type | Description |
---|---|
ValueError
|
In case there is an invalid context_config value |
Returns:
Type | Description |
---|---|
Optional[DataContextConfig]
|
A dictionary with the GE data context configuration. |
data_context: AbstractDataContext
property
Returns the Great Expectations data context configured for this component.
Returns:
Type | Description |
---|---|
AbstractDataContext
|
The Great Expectations data context configured for this component. |
local_path: Optional[str]
property
Return a local path where this component stores information.
If an existing local GE data context is used, it is interpreted as a local path that needs to be accessible in all runtime environments.
Returns:
Type | Description |
---|---|
Optional[str]
|
The local path where this component stores information. |
root_directory: str
property
Returns path to the root directory for all local files concerning this data validator.
Returns:
Type | Description |
---|---|
str
|
Path to the root directory. |
data_profiling(dataset: pd.DataFrame, comparison_dataset: Optional[Any] = None, profile_list: Optional[Sequence[str]] = None, expectation_suite_name: Optional[str] = None, data_asset_name: Optional[str] = None, profiler_kwargs: Optional[Dict[str, Any]] = None, overwrite_existing_suite: bool = True, **kwargs: Any) -> ExpectationSuite
Infer a Great Expectation Expectation Suite from a given dataset.
This Great Expectations specific data profiling method implementation builds an Expectation Suite automatically by running a UserConfigurableProfiler on an input dataset as covered in the official GE documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset from which the expectation suite will be inferred. |
required |
comparison_dataset
|
Optional[Any]
|
Optional dataset used to generate data comparison (i.e. data drift) profiles. Not supported by the Great Expectation data validator. |
None
|
profile_list
|
Optional[Sequence[str]]
|
Optional list identifying the categories of data profiles to be generated. Not supported by the Great Expectation data validator. |
None
|
expectation_suite_name
|
Optional[str]
|
The name of the expectation suite to create or update. If not supplied, a unique name will be generated from the current pipeline and step name, if running in the context of a pipeline step. |
None
|
data_asset_name
|
Optional[str]
|
The name of the data asset to use to identify the dataset in the Great Expectations docs. |
None
|
profiler_kwargs
|
Optional[Dict[str, Any]]
|
A dictionary of custom keyword arguments to pass to the profiler. |
None
|
overwrite_existing_suite
|
bool
|
Whether to overwrite an existing expectation suite, if one exists with that name. |
True
|
kwargs
|
Any
|
Additional keyword arguments (unused). |
{}
|
Returns:
Type | Description |
---|---|
ExpectationSuite
|
The inferred Expectation Suite. |
Raises:
Type | Description |
---|---|
ValueError
|
if an |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 |
|
data_validation(dataset: pd.DataFrame, comparison_dataset: Optional[Any] = None, check_list: Optional[Sequence[str]] = None, expectation_suite_name: Optional[str] = None, data_asset_name: Optional[str] = None, action_list: Optional[List[Dict[str, Any]]] = None, **kwargs: Any) -> CheckpointResult
Great Expectations data validation.
This Great Expectations specific data validation method implementation validates an input dataset against an Expectation Suite (the GE definition of a profile) as covered in the official GE documentation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset to validate. |
required |
comparison_dataset
|
Optional[Any]
|
Optional dataset used to run data comparison (i.e. data drift) checks. Not supported by the Great Expectation data validator. |
None
|
check_list
|
Optional[Sequence[str]]
|
Optional list identifying the data validation checks to be performed. Not supported by the Great Expectations data validator. |
None
|
expectation_suite_name
|
Optional[str]
|
The name of the expectation suite to use to validate the dataset. A value must be provided. |
None
|
data_asset_name
|
Optional[str]
|
The name of the data asset to use to identify the dataset in the Great Expectations docs. |
None
|
action_list
|
Optional[List[Dict[str, Any]]]
|
A list of additional Great Expectations actions to run after the validation check. |
None
|
kwargs
|
Any
|
Additional keyword arguments (unused). |
{}
|
Returns:
Type | Description |
---|---|
CheckpointResult
|
The Great Expectations validation (checkpoint) result. |
Raises:
Type | Description |
---|---|
ValueError
|
if the |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 |
|
get_data_context() -> AbstractDataContext
classmethod
Get the Great Expectations data context managed by ZenML.
Call this method to retrieve the data context managed by ZenML through the active Great Expectations data validator stack component.
Returns:
Type | Description |
---|---|
AbstractDataContext
|
A Great Expectations data context managed by ZenML as configured |
AbstractDataContext
|
through the active data validator stack component. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
get_data_docs_config(prefix: str, local: bool = False) -> Dict[str, Any]
Generate Great Expectations data docs configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
The path prefix for the ZenML data docs configuration |
required |
local
|
bool
|
Whether the data docs site is local or remote. |
False
|
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dictionary with the GE data docs site configuration. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
|
get_store_config(class_name: str, prefix: str) -> Dict[str, Any]
Generate a Great Expectations store configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
class_name
|
str
|
The store class name |
required |
prefix
|
str
|
The path prefix for the ZenML store configuration |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
A dictionary with the GE store configuration. |
Source code in src/zenml/integrations/great_expectations/data_validators/ge_data_validator.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
|
flavors
Great Expectations integration flavors.
Classes
GreatExpectationsDataValidatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)
Bases: BaseDataValidatorConfig
Config for the Great Expectations data validator.
Attributes:
Name | Type | Description |
---|---|---|
context_root_dir |
Optional[str]
|
location of an already initialized Great Expectations data context. If configured, the data validator will only be usable with local orchestrators. |
context_config |
Optional[Dict[str, Any]]
|
in-line Great Expectations data context configuration.
If the |
configure_zenml_stores |
bool
|
if set, ZenML will automatically configure
stores that use the Artifact Store as a backend. If neither
|
configure_local_docs |
bool
|
configure a local data docs site where Great Expectations docs are generated and can be visualized locally. |
Source code in src/zenml/stack/stack_component.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
is_local: bool
property
Checks if this stack component is running locally.
Returns:
Type | Description |
---|---|
bool
|
True if this config is for a local component, False otherwise. |
validate_context_config(data: Dict[str, Any]) -> Dict[str, Any]
classmethod
Convert the context configuration if given in JSON/YAML format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The configuration values. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The validated configuration values. |
Raises:
Type | Description |
---|---|
ValueError
|
If the context configuration is not a valid JSON/YAML object. |
Source code in src/zenml/integrations/great_expectations/flavors/great_expectations_data_validator_flavor.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
GreatExpectationsDataValidatorFlavor
Bases: BaseDataValidatorFlavor
Great Expectations data validator flavor.
config_class: Type[GreatExpectationsDataValidatorConfig]
property
Returns GreatExpectationsDataValidatorConfig
config class.
Returns:
Type | Description |
---|---|
Type[GreatExpectationsDataValidatorConfig]
|
The config class. |
docs_url: Optional[str]
property
A url to point at docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor docs url. |
implementation_class: Type[GreatExpectationsDataValidator]
property
Implementation class for this flavor.
Returns:
Type | Description |
---|---|
Type[GreatExpectationsDataValidator]
|
The implementation class. |
logo_url: str
property
A url to represent the flavor in the dashboard.
Returns:
Type | Description |
---|---|
str
|
The flavor logo. |
name: str
property
Name of the flavor.
Returns:
Type | Description |
---|---|
str
|
The name of the flavor. |
sdk_docs_url: Optional[str]
property
A url to point at SDK docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor SDK docs url. |
Modules
great_expectations_data_validator_flavor
Great Expectations data validator flavor.
GreatExpectationsDataValidatorConfig(warn_about_plain_text_secrets: bool = False, **kwargs: Any)
Bases: BaseDataValidatorConfig
Config for the Great Expectations data validator.
Attributes:
Name | Type | Description |
---|---|---|
context_root_dir |
Optional[str]
|
location of an already initialized Great Expectations data context. If configured, the data validator will only be usable with local orchestrators. |
context_config |
Optional[Dict[str, Any]]
|
in-line Great Expectations data context configuration.
If the |
configure_zenml_stores |
bool
|
if set, ZenML will automatically configure
stores that use the Artifact Store as a backend. If neither
|
configure_local_docs |
bool
|
configure a local data docs site where Great Expectations docs are generated and can be visualized locally. |
Source code in src/zenml/stack/stack_component.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
is_local: bool
property
Checks if this stack component is running locally.
Returns:
Type | Description |
---|---|
bool
|
True if this config is for a local component, False otherwise. |
validate_context_config(data: Dict[str, Any]) -> Dict[str, Any]
classmethod
Convert the context configuration if given in JSON/YAML format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Dict[str, Any]
|
The configuration values. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The validated configuration values. |
Raises:
Type | Description |
---|---|
ValueError
|
If the context configuration is not a valid JSON/YAML object. |
Source code in src/zenml/integrations/great_expectations/flavors/great_expectations_data_validator_flavor.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
GreatExpectationsDataValidatorFlavor
Bases: BaseDataValidatorFlavor
Great Expectations data validator flavor.
config_class: Type[GreatExpectationsDataValidatorConfig]
property
Returns GreatExpectationsDataValidatorConfig
config class.
Returns:
Type | Description |
---|---|
Type[GreatExpectationsDataValidatorConfig]
|
The config class. |
docs_url: Optional[str]
property
A url to point at docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor docs url. |
implementation_class: Type[GreatExpectationsDataValidator]
property
Implementation class for this flavor.
Returns:
Type | Description |
---|---|
Type[GreatExpectationsDataValidator]
|
The implementation class. |
logo_url: str
property
A url to represent the flavor in the dashboard.
Returns:
Type | Description |
---|---|
str
|
The flavor logo. |
name: str
property
Name of the flavor.
Returns:
Type | Description |
---|---|
str
|
The name of the flavor. |
sdk_docs_url: Optional[str]
property
A url to point at SDK docs explaining this flavor.
Returns:
Type | Description |
---|---|
Optional[str]
|
A flavor SDK docs url. |
ge_store_backend
Great Expectations store plugin for ZenML.
Classes
ZenMLArtifactStoreBackend(prefix: str = '', **kwargs: Any)
Bases: TupleStoreBackend
Great Expectations store backend that uses the active ZenML Artifact Store as a store.
Create a Great Expectations ZenML store backend instance.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
str
|
Subpath prefix to use for this store backend. |
''
|
kwargs
|
Any
|
Additional keyword arguments passed by the Great Expectations
core. These are transparently passed to the |
{}
|
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
config: Dict[str, Any]
property
Get the store configuration.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
The store configuration. |
get_public_url_for_key(key: str, protocol: Optional[str] = None) -> str
Get the public URL of an object in the store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str
|
object key identifier. |
required |
protocol
|
Optional[str]
|
optional protocol to use instead of the store protocol. |
None
|
Returns:
Type | Description |
---|---|
str
|
The public URL where the object can be accessed. |
Raises:
Type | Description |
---|---|
StoreBackendError
|
if a |
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 |
|
get_url_for_key(key: Tuple[str, ...], protocol: Optional[str] = None) -> str
Get the URL of an object in the store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
Tuple[str, ...]
|
object key identifier. |
required |
protocol
|
Optional[str]
|
optional protocol to use instead of the store protocol. |
None
|
Returns:
Type | Description |
---|---|
str
|
The URL of the object in the store. |
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 |
|
list_keys(prefix: Tuple[str, ...] = ()) -> List[Tuple[str, ...]]
List the keys of all objects identified by a partial key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
prefix
|
Tuple[str, ...]
|
partial object key identifier. |
()
|
Returns:
Type | Description |
---|---|
List[Tuple[str, ...]]
|
List of keys identifying all objects present in the store that |
List[Tuple[str, ...]]
|
match the input partial key. |
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
remove_key(key: Tuple[str, ...]) -> bool
Delete an object from the store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
Tuple[str, ...]
|
object key identifier. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the object existed in the store and was removed, otherwise |
bool
|
False. |
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
|
rrmdir(start_path: str, end_path: str) -> None
staticmethod
Recursively removes empty dirs between start_path and end_path inclusive.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_path
|
str
|
Directory to use as a starting point. |
required |
end_path
|
str
|
Directory to use as a destination point. |
required |
Source code in src/zenml/integrations/great_expectations/ge_store_backend.py
312 313 314 315 316 317 318 319 320 321 322 |
|
Functions
Modules
materializers
Materializers for Great Expectation serializable objects.
Classes
Modules
ge_materializer
Implementation of the Great Expectations materializers.
GreatExpectationsMaterializer(uri: str, artifact_store: Optional[BaseArtifactStore] = None)
Bases: BaseMaterializer
Materializer to read/write Great Expectation objects.
Source code in src/zenml/materializers/base_materializer.py
125 126 127 128 129 130 131 132 133 134 135 |
|
extract_metadata(data: Union[ExpectationSuite, CheckpointResult]) -> Dict[str, MetadataType]
Extract metadata from the given Great Expectations object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Union[ExpectationSuite, CheckpointResult]
|
The Great Expectations object to extract metadata from. |
required |
Returns:
Type | Description |
---|---|
Dict[str, MetadataType]
|
The extracted metadata as a dictionary. |
Source code in src/zenml/integrations/great_expectations/materializers/ge_materializer.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
|
load(data_type: Type[Any]) -> SerializableDictDot
Reads and returns a Great Expectations object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_type
|
Type[Any]
|
The type of the data to read. |
required |
Returns:
Type | Description |
---|---|
SerializableDictDot
|
A loaded Great Expectations object. |
Source code in src/zenml/integrations/great_expectations/materializers/ge_materializer.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
preprocess_checkpoint_result_dict(artifact_dict: Dict[str, Any]) -> None
staticmethod
Pre-processes a GE checkpoint dict before it is used to de-serialize a GE CheckpointResult object.
The GE CheckpointResult object is not fully de-serializable due to some missing code in the GE codebase. We need to compensate for this by manually converting some of the attributes to their correct data types.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
artifact_dict
|
Dict[str, Any]
|
A dict containing the GE checkpoint result. |
required |
Source code in src/zenml/integrations/great_expectations/materializers/ge_materializer.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
save(obj: SerializableDictDot) -> None
Writes a Great Expectations object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj
|
SerializableDictDot
|
A Great Expectations object. |
required |
Source code in src/zenml/integrations/great_expectations/materializers/ge_materializer.py
118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
save_visualizations(data: Union[ExpectationSuite, CheckpointResult]) -> Dict[str, VisualizationType]
Saves visualizations for the given Great Expectations object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Union[ExpectationSuite, CheckpointResult]
|
The Great Expectations object to save visualizations for. |
required |
Returns:
Type | Description |
---|---|
Dict[str, VisualizationType]
|
A dictionary of visualization URIs and their types. |
Source code in src/zenml/integrations/great_expectations/materializers/ge_materializer.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
|
steps
Great Expectations data profiling and validation standard steps.
Functions
Modules
ge_profiler
Great Expectations data profiling standard step.
great_expectations_profiler_step(dataset: pd.DataFrame, expectation_suite_name: str, data_asset_name: Optional[str] = None, profiler_kwargs: Optional[Dict[str, Any]] = None, overwrite_existing_suite: bool = True) -> ExpectationSuite
Infer data validation rules from a pandas dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset from which the expectation suite will be inferred. |
required |
expectation_suite_name
|
str
|
The name of the expectation suite to infer. |
required |
data_asset_name
|
Optional[str]
|
The name of the data asset to profile. |
None
|
profiler_kwargs
|
Optional[Dict[str, Any]]
|
A dictionary of keyword arguments to pass to the profiler. |
None
|
overwrite_existing_suite
|
bool
|
Whether to overwrite an existing expectation suite. |
True
|
Returns:
Type | Description |
---|---|
ExpectationSuite
|
The generated Great Expectations suite. |
Source code in src/zenml/integrations/great_expectations/steps/ge_profiler.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
ge_validator
Great Expectations data validation standard step.
great_expectations_validator_step(dataset: pd.DataFrame, expectation_suite_name: str, data_asset_name: Optional[str] = None, action_list: Optional[List[Dict[str, Any]]] = None, exit_on_error: bool = False) -> CheckpointResult
Shortcut function to create a new instance of the GreatExpectationsValidatorStep step.
The returned GreatExpectationsValidatorStep can be used in a pipeline to validate an input pd.DataFrame dataset and return the result as a Great Expectations CheckpointResult object. The validation results are also persisted in the Great Expectations validation store.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset
|
DataFrame
|
The dataset to run the expectation suite on. |
required |
expectation_suite_name
|
str
|
The name of the expectation suite to use to validate the dataset. |
required |
data_asset_name
|
Optional[str]
|
The name of the data asset to use to identify the dataset in the Great Expectations docs. |
None
|
action_list
|
Optional[List[Dict[str, Any]]]
|
A list of additional Great Expectations actions to run after the validation check. |
None
|
exit_on_error
|
bool
|
Set this flag to raise an error and exit the pipeline early if the validation fails. |
False
|
Returns:
Type | Description |
---|---|
CheckpointResult
|
The Great Expectations validation (checkpoint) result. |
Raises:
Type | Description |
---|---|
RuntimeError
|
if the step is configured to exit on error and the data validation failed. |
Source code in src/zenml/integrations/great_expectations/steps/ge_validator.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
utils
Great Expectations data profiling standard step.
Functions
create_batch_request(context: AbstractDataContext, dataset: pd.DataFrame, data_asset_name: Optional[str]) -> RuntimeBatchRequest
Create a temporary runtime GE batch request from a dataset step artifact.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
AbstractDataContext
|
Great Expectations data context. |
required |
dataset
|
DataFrame
|
Input dataset. |
required |
data_asset_name
|
Optional[str]
|
Optional custom name for the data asset. |
required |
Returns:
Type | Description |
---|---|
RuntimeBatchRequest
|
A Great Expectations runtime batch request. |
Source code in src/zenml/integrations/great_expectations/utils.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
|