Add machine learning: a Learner / Predictor branch

USE 0 - 15 min

Split your data, then train and predict as nodes

Machine learning in KNIME is built from ordinary nodes too. The standard pattern is: a Table Partitioner node splits your clean table into a training set and a test set; a Learner node trains a model on the training set; a Predictor node applies that trained model to the test set. No code — you wire the boxes.

1 Start from your clean table and add a Table Partitioner node to split it into training and test sets (for example an 80/20 split).
2 Add a Decision Tree Learner (or XGBoost Tree Ensemble Learner) and wire the training partition into it to train a model.
3 Add the matching Predictor node, wiring in both the trained model from the Learner and the test partition.
4 Execute the branch and open the Predictor's output table to see predictions alongside the actual values.

✓

The Predictor node is green and its output table shows predicted values next to your test data.

UNDERSTAND 15 - 35 min

The Learner / Predictor pattern, end to end

The Learner trains; the Predictor applies. Keeping them as two separate nodes — with a Table Partitioner node feeding clean training and test sets — is what lets you compare models fairly. To evaluate and compare two models, attach a Scorer node to each Predictor's output: the Scorer produces an accuracy table and confusion matrix for that model, so you can read the accuracy figures side by side and pick the better one.

Key concept

KNIME builds machine learning from the same wired nodes as everything else: a Table Partitioner node (called 'Partitioning' in older KNIME) splits the data, a Learner (Decision Tree, XGBoost) trains a model, and a Predictor applies it to held-out data. To compare two models, wire a Scorer node onto each Predictor's output — the Scorer reports accuracy and a confusion matrix so you can read the numbers side by side and pick the better model. Because the whole ML pipeline is on the canvas, it is reproducible and shareable as a recipe, all without writing code. The same canvas also hosts AI/LLM nodes (OpenAI, Ollama, or local models) wired in the same way.

?Why does the Partitioning node sit before both the Learner and the Predictor, and what would go wrong if you trained and tested on the same rows?
?How would you put a Decision Tree branch and an XGBoost branch on the same canvas, each with its own Scorer node, to compare their accuracy figures?
?What makes a no-code ML pipeline like this easier to hand to a collaborator than the equivalent training script?

BUILD 35 - 55 min

Train, predict, and turn it into a real deliverable

Wire a full Learner / Predictor branch onto your clean data so the workflow goes from raw export all the way to predictions you could act on.

Your task

Extend your pipeline with Partitioning, a Learner, and a Predictor to produce a model and its predictions on held-out data.

1 Wire a Table Partitioner node onto your clean table to create training and test sets.
2 Train a model with a Decision Tree Learner (or XGBoost Tree Ensemble Learner) on the training partition.
3 Apply it with the matching Predictor (e.g. Decision Tree Predictor or XGBoost Predictor) on the test partition.
4 Open the Predictor output and sanity-check the predictions against the known values.
5 Optionally add a second Learner/Predictor branch and a Scorer node on each Predictor's output to compare accuracy figures and pick the better model, then save the workflow as a reproducible recipe.

Deliverable

A saved KNIME workflow that reads real data, trains a model, and outputs predictions on held-out data.