KNIME
KNIME (Konstanz Information Miner) is a coherent and
comprehensive open source visual platform for data integration,
processing, analysis, reporting and exploration. It enables users to
visually create data flows (often referred to as pipelines),
selectively execute some or all analysis steps, and later investigate
the results through interactive views on data and models.
KNIME integrates various components for machine learning and
data mining through its modular data pipelining concept. The graphical
user interface enables users to assemble nodes for data
preprocessing, for modeling and data analysis and visualization.
KNIME is based on the Eclipse Interactive Development
Environment and, through its modular API, it is easily extensible.
Features include:
- Scalability through sophisticated data handling
(intelligent automatic caching of data in the background while
maximizing throughput performance)
- High, simple extensibility via a well-defined API for
plugin extensions
- Intuitive user interface
- Import/export of workflows (for exchanging with other
KNIME users)
- Parallel execution on multi-core systems
- Command line version for "headless" batch executions
- Incorporates over 100 processing nodes for data I/O
retrieving data from files or databases
- Preprocessing and cleansing with filtering,
group-by, pivoting, binning, normalization, aggregation, joining,
sampling, partitioning, and more
- Modeling
- Analysis
- Data mining:
- Clustering
- Rule induction
- Decision tree
- Association rules
- Naïve bayes
- Neural networks
- Support vector machines
- Various interactive views allowing for interactive data
exploration including:
- Box Plot - displays robust statistical parameters:
minimum, lower quartile, median, upper quartile, and maximum. These
parameters called robust, since they are not sensitive to extreme
outliers
- Conditional Box Plot - partitions the data of a numeric
column into classes according to another nominal column and creates a
box plot for each of the classes
- Histogram - displays a histogram view with different
viewing options
- Histogram (interactive) - displays an interactive
histogram view with different viewing options. The interactive
histogram
supports hiliting and the changing of the x axis and aggregation column
on the fly
- Interactive Table - displays data in a table view
- Lift Chart - used to evaluate a predictive model. The
higher the lift (the difference between the "lift" line and the base
line), the better performs the predictive model
- Line Plot - plots the numeric columns of the input table
as lines
- Parallel coordinates - a representation of
multi-dimensional information or data, in which multiple dimensions are
allocated one-to-one to an equal number of parallel axes on-screen
- Pie chart - displays a pie chart with different viewing
options
- Pie chart (interactive) - displays an interactive pie
chart with different viewing options. The interactive pie chart
supports hiliting and the changing of the pie and aggregation column on
the fly
- Scatter Matrix - each matrix element Eij is a scatterplot
of the columns i and j, where the values of the i-th column are
displayed at the x axis and the values of the j-th column at the y axis
while the coordinates are displayed alternating on all sides of the plot
- Scatter plot - creates a scatterplot of two selectable
attributes
- Integrates analysis modules of the Weka data mining
environment

Return
to Data Mining Home Page | Return
to Business Intelligence Home Page
Last Updated Saturday, October 20 2012 @ 11:11 AM EDT |