pandera is a project that provides a flexible and expressive API for performing data validation on dataframe-like objects to make data processing pipelines more readable and robust.
Dataframes contain information that pandera explicitly validates at runtime. This is useful in production-critical or reproducible research settings.
pandera is free and open source software.
Key Features
- Define a schema once and use it to validate different dataframe types including pandas, dask, modin, and pyspark.
- Check the types and properties of columns in a DataFrame or values in a Series.
- Perform more complex statistical validation like hypothesis testing.
- Seamlessly integrate with existing data analysis/processing pipelines via function decorators.
- Define dataframe models with the class-based API with pydantic-style syntax and validate dataframes using the typing syntax.
- Synthesize data from schema objects for property-based testing with pandas data structures.
- Lazily Validate dataframes so that all validation checks are executed before raising an error.
Integrate with a rich ecosystem of python tools like pydantic, fastapi, and mypy.
Website: www.union.ai/pandera
Support: GitHub Code Repository
Developer: Niels Bantilan
License: MIT License
pandera is written in Python. Learn Python with our recommended free books and free tutorials.
Related Software
| Python Data Validation | |
|---|---|
| Pydantic | Data validation using Python type hints |
| pandera | Framework for precision data testing |
| jsonschema | Implementation of JSON Schema for Python |
| Cerberus | Lightweight and extensible data validation library |
| schema | Library for validating Python data structures |
| GX | Validating, documenting, and profiling data |
| marshmallow | ORM/ODM/framework-agnostic library |
| Voluptuous | Python data validation library |
| Schematics | Combine types into structures, validate , and transform the shapes of data |
| Colander | Serialization / deserialization / validation library |
| Valideer | Lightweight data validation and adaptation Python library |
| OpenRefine | Desktop program for data cleanup and transformation |
| Soda Core | Data quality and data contract verification engine |
| OpenMetadata | Unified metadata platform |
| Elementary OSS | dbt-native data observability command-line tool |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

