KNIME — Quick reference

Visual, no-code data science · drag-and-drop node canvas · free & open-source desktop · runs locally, data never leaves your machine

At a glance

KNIME Analytics Platform is a free, open-source desktop workbench: you build data pipelines by wiring nodes on a canvas — no programming. Each node does one step (read a CSV, filter rows, plot a chart, train a model) and passes its output table to the next. A workflow runs top-to-bottom, reproducibly, every time, and is shareable as a portable "recipe". Download from knime.com (optional registration form, then install); docs at docs.knime.com. The desktop app is fully free; team sharing, scheduling and web execution need a paid Community Hub plan (Pro ~$19/mo, Team ~$99/mo for up to 3 users).

Core building blocks — key nodes

Node	What it does
`CSV Reader`	Read a CSV (e.g. a mass-spec, NGS, or flow-cytometry export) into a table
`Excel Reader`	Read an Excel sheet into a table
`Row Filter`	Keep only the rows you care about (e.g. a column above a threshold)
`Concatenate`	Stack two tables' rows into one combined table
`Column Renamer`	Tidy column names so tables line up and downstream nodes make sense
`Histogram`	Plot the distribution of a numeric column as an interactive chart view
`Table Partitioner`	Split a table into training and test sets (e.g. 80/20) — called "Partitioning" in older KNIME
`Decision Tree Learner` / `XGBoost Tree Ensemble Learner`	Train a model on the training partition
`Decision Tree Predictor` / `XGBoost Predictor`	Apply the trained model to held-out test data to produce predictions
`Scorer`	Report accuracy and a confusion matrix so you can compare two models side by side

KNIME ships 300+ connectors (Excel, databases, cloud storage, REST APIs) plus built-in machine learning. The same canvas also hosts AI/LLM nodes (OpenAI, Ollama, or local models) wired in exactly the same drag-and-drop way.

Read the canvas — ports & traffic lights

Traffic light (at each node's base)

RED not yet configured (freshly dropped node)
YELLOW configured and ready to run
GREEN executed successfully — output port now carries a table

Ports (the triangles)

Data goes in on the left, out on the right. Hover a port to see what kind of data it expects or produces. A wire carries the table from one node's output port to the next node's input port.

Reading a canvas is three things: nodes (each one step), ports (where data enters and leaves), and the traffic light (red → yellow → green). Sources sit on the left (CSV / Excel / database readers), transformers in the middle (Concatenate, Column Renamer, filters), analysis and output on the right — every step visible, so the workflow is self-documenting.

Build a workflow — your first 2 nodes

Step	Action
`1`	Create a new workflow; drag a `CSV Reader` from the node repository onto the empty canvas and point it at a lab/research CSV
`2`	Add a `Row Filter`; drag a connection from the CSV Reader's output port to the Row Filter's input port
`3`	Configure the Row Filter to keep only the rows you want (e.g. one column above a threshold)
`4`	Hit the green play button to execute the workflow
`5`	Right-click the Row Filter and open its output table to see your filtered rows — no code written

Grow it from there: add a Column Renamer and Concatenate to join two messy exports into one clean table → add a Histogram to explore a column → add a Table Partitioner + Learner + Predictor branch for machine learning. Drive every node to all-green before moving on, and save the workflow so it re-runs and shares as a reproducible recipe.

Data & config — readers and combining

Bring data in

Reader nodes

Add a CSV Reader, Excel Reader, database or cloud reader for each source. Execute each and confirm a green light plus a populated output table. With 300+ connectors you can pull from Excel, databases, cloud storage and REST APIs onto one canvas.

Combine & tidy

Concatenate vs Column Renamer

Concatenate stacks two tables' rows together; Column Renamer tidies column names so they line up. Pick a shared column as your key, then wire both readers into Concatenate and clean names so downstream nodes (and collaborators) make sense of the result.

Machine learning split

Partition before you train

A Table Partitioner splits your clean table into training and test sets (e.g. 80/20). The Learner trains on the training partition only; the Predictor is applied to the held-out test partition — never train and test on the same rows.

Run, execute & share

Action	What it does
Green play button	Execute the workflow; each node turns green in order as its step completes
Run a single node	Execute just one node to check it goes red → yellow → green before continuing
Right-click → open output table	Inspect any node's result table to confirm it is what you expected
Open a node's view	For visualisation nodes (e.g. `Histogram`), open the interactive chart — it re-runs with the pipeline, so the figure is part of the recipe
Save the workflow	Persist the whole pipeline as a portable, reproducible recipe a collaborator can re-run
KNIME Hub paid for teams	Publish/share workflows; a KNIME account is only needed for Hub. Team sharing, scheduling & web execution require a paid Community Hub plan

Reproducibility is the point: because every step and connection is a visible node, a collaborator who opens your saved workflow on their own machine can regenerate the exact same filtered table, figure, or model predictions — no Python environment, and your data never had to leave your machine.

Tips

Watch the lights to debug: a yellow node is configured but hasn't run; you can tell at a glance where a workflow would stop if one node failed — it's the last green node.

Compare models fairly: to choose between two models, wire a Scorer onto each Predictor's output and read the accuracy figures and confusion matrices side by side.

Mind the memory caveat: performance degrades on datasets large enough to exceed KNIME's allocated heap, and the visual canvas gets hard to read with complex multi-branch workflows (node sprawl) — keep pipelines tidy and partition big data.