Data Science

pandera – framework for precision data testing

pandera is a project that provides a flexible and expressive API for performing data validation on dataframe-like objects to make data processing pipelines more readable and robust.

Dataframes contain information that pandera explicitly validates at runtime. This is useful in production-critical or reproducible research settings.

pandera is free and open source software.

Key Features

  • Define a schema once and use it to validate different dataframe types including pandas, dask, modin, and pyspark.
  • Check the types and properties of columns in a DataFrame or values in a Series.
  • Perform more complex statistical validation like hypothesis testing.
  • Seamlessly integrate with existing data analysis/processing pipelines via function decorators.
  • Define dataframe models with the class-based API with pydantic-style syntax and validate dataframes using the typing syntax.
  • Synthesize data from schema objects for property-based testing with pandas data structures.
  • Lazily Validate dataframes so that all validation checks are executed before raising an error.
    Integrate with a rich ecosystem of python tools like pydantic, fastapi, and mypy.

Website: www.union.ai/pandera
Support: GitHub Code Repository
Developer: Niels Bantilan
License: MIT License

pandera is written in Python. Learn Python with our recommended free books and free tutorials.


Related Software

Python Data Validation
PydanticData validation using Python type hints
panderaFramework for precision data testing
jsonschema
Implementation of JSON Schema for Python
CerberusLightweight and extensible data validation library
schemaLibrary for validating Python data structures
GXValidating, documenting, and profiling data
marshmallowORM/ODM/framework-agnostic library
VoluptuousPython data validation library
SchematicsCombine types into structures, validate , and transform the shapes of data
ColanderSerialization / deserialization / validation library
ValideerLightweight data validation and adaptation Python library
OpenRefineDesktop program for data cleanup and transformation
Soda CoreData quality and data contract verification engine
OpenMetadataUnified metadata platform
Elementary OSS dbt-native data observability command-line tool

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.