Optimus is the missing framework to profile, clean, process and do ML in a distributed fashion using Apache Spark (PySpark).
Prepare, explore, visualize and create Machine Learning models for Big Data with this library.
Optimus is free and open source software.
- Simple and robust – prepare, explore, visualize your data in few lines of code.
- Easy, fast, parallelized and scalable data cleansing, exploration and Machine Learning Models creation.
- Local or in the cloud.
- Easy to use API. Optimus expands the Spark DataFrame functionality adding .rows and .cols attributes.
- Connect to external API to enrich your data.
- String clustering – cluster similar strings and change it for a single value.
|Read our complete collection of recommended free and open source software. The collection covers all categories of software.|