KNIME — Quick reference

Visual, no-code data science · drag-and-drop node canvas · free & open-source desktop · runs locally, data never leaves your machine

At a glance

KNIME Analytics Platform is a free, open-source desktop workbench: you build data pipelines by wiring nodes on a canvas — no programming. Each node does one step (read a CSV, filter rows, plot a chart, train a model) and passes its output table to the next. A workflow runs top-to-bottom, reproducibly, every time, and is shareable as a portable "recipe". Download from knime.com (optional registration form, then install); docs at docs.knime.com. The desktop app is fully free; team sharing, scheduling and web execution need a paid Community Hub plan (Pro ~$19/mo, Team ~$99/mo for up to 3 users).

Core building blocks — key nodes

NodeWhat it does
CSV ReaderRead a CSV (e.g. a mass-spec, NGS, or flow-cytometry export) into a table
Excel ReaderRead an Excel sheet into a table
Row FilterKeep only the rows you care about (e.g. a column above a threshold)
ConcatenateStack two tables' rows into one combined table
Column RenamerTidy column names so tables line up and downstream nodes make sense
HistogramPlot the distribution of a numeric column as an interactive chart view
Table PartitionerSplit a table into training and test sets (e.g. 80/20) — called "Partitioning" in older KNIME
Decision Tree Learner / XGBoost Tree Ensemble LearnerTrain a model on the training partition
Decision Tree Predictor / XGBoost PredictorApply the trained model to held-out test data to produce predictions
ScorerReport accuracy and a confusion matrix so you can compare two models side by side

KNIME ships 300+ connectors (Excel, databases, cloud storage, REST APIs) plus built-in machine learning. The same canvas also hosts AI/LLM nodes (OpenAI, Ollama, or local models) wired in exactly the same drag-and-drop way.

Read the canvas — ports & traffic lights

Traffic light (at each node's base)
RED  not yet configured (freshly dropped node)
YELLOW  configured and ready to run
GREEN  executed successfully — output port now carries a table
Ports (the triangles)
Data goes in on the left, out on the right. Hover a port to see what kind of data it expects or produces. A wire carries the table from one node's output port to the next node's input port.

Reading a canvas is three things: nodes (each one step), ports (where data enters and leaves), and the traffic light (red → yellow → green). Sources sit on the left (CSV / Excel / database readers), transformers in the middle (Concatenate, Column Renamer, filters), analysis and output on the right — every step visible, so the workflow is self-documenting.

Build a workflow — your first 2 nodes

StepAction
1Create a new workflow; drag a CSV Reader from the node repository onto the empty canvas and point it at a lab/research CSV
2Add a Row Filter; drag a connection from the CSV Reader's output port to the Row Filter's input port
3Configure the Row Filter to keep only the rows you want (e.g. one column above a threshold)
4Hit the green play button to execute the workflow
5Right-click the Row Filter and open its output table to see your filtered rows — no code written

Grow it from there: add a Column Renamer and Concatenate to join two messy exports into one clean table → add a Histogram to explore a column → add a Table Partitioner + Learner + Predictor branch for machine learning. Drive every node to all-green before moving on, and save the workflow so it re-runs and shares as a reproducible recipe.

Data & config — readers and combining

Bring data in
Reader nodes
Add a CSV Reader, Excel Reader, database or cloud reader for each source. Execute each and confirm a green light plus a populated output table. With 300+ connectors you can pull from Excel, databases, cloud storage and REST APIs onto one canvas.
Combine & tidy
Concatenate vs Column Renamer
Concatenate stacks two tables' rows together; Column Renamer tidies column names so they line up. Pick a shared column as your key, then wire both readers into Concatenate and clean names so downstream nodes (and collaborators) make sense of the result.
Machine learning split
Partition before you train
A Table Partitioner splits your clean table into training and test sets (e.g. 80/20). The Learner trains on the training partition only; the Predictor is applied to the held-out test partition — never train and test on the same rows.

Run, execute & share

ActionWhat it does
Green play buttonExecute the workflow; each node turns green in order as its step completes
Run a single nodeExecute just one node to check it goes red → yellow → green before continuing
Right-click → open output tableInspect any node's result table to confirm it is what you expected
Open a node's viewFor visualisation nodes (e.g. Histogram), open the interactive chart — it re-runs with the pipeline, so the figure is part of the recipe
Save the workflowPersist the whole pipeline as a portable, reproducible recipe a collaborator can re-run
KNIME Hub paid for teamsPublish/share workflows; a KNIME account is only needed for Hub. Team sharing, scheduling & web execution require a paid Community Hub plan

Reproducibility is the point: because every step and connection is a visible node, a collaborator who opens your saved workflow on their own machine can regenerate the exact same filtered table, figure, or model predictions — no Python environment, and your data never had to leave your machine.

Tips

Watch the lights to debug: a yellow node is configured but hasn't run; you can tell at a glance where a workflow would stop if one node failed — it's the last green node.

Compare models fairly: to choose between two models, wire a Scorer onto each Predictor's output and read the accuracy figures and confusion matrices side by side.

Mind the memory caveat: performance degrades on datasets large enough to exceed KNIME's allocated heap, and the visual canvas gets hard to read with complex multi-branch workflows (node sprawl) — keep pipelines tidy and partition big data.