jz1b-2026-02-09_15_48_52-101-pdfsam-hands-on-machine-learning-with-scikit-learn-keras-and-tensorflow-2nd-edition-aurelien-geron.pdf
jz1b-2026-02-09_15_48_52-101-pdfsam-hands-on-machine-learning-with-scikit-learn-keras-and-tensorflow-2nd-edition-aurelien-geron.pdf
full_pipeline equals ColumnTransformer([(num, num_pipeline, num_attribs), (cat, OneHotEncoder(), cat_attribs)],
housing_prepared equals full_pipeline.fit_transform(housing)
First we import the ColumnTransformer class, next we get the list of numerical column names and the list of categorical column names, and then we construct a ColumnTransformer. The constructor requires a list of tuples, where each tuple contains a name, a transformer, and a list of names (or indices) of columns that the transformer should be applied to. In this example, we specify that the numerical columns should be transformed using the num_pipeline that we defined earlier, and the categorical columns should be transformed using a OneHotEncoder. Finally, we apply this ColumnTransformer to the housing data: it applies each transformer to the appropriate columns and concatenates the outputs along the second axis (the transformers must return the same number of rows).
Note that the OneHotEncoder returns a sparse matrix, while the num_pipeline returns a dense matrix. When there is such a mix of sparse and dense matrices, the ColumnTransformer estimates the density of the final matrix (i.e., the ratio of nonzero cells), and it returns a sparse matrix if the density is lower than a given threshold (by default, sparse_threshold equals zero point three). In this example, it returns a dense matrix. And that's it! We have a preprocessing pipeline that takes the full housing data and applies the appropriate transformations to each column.
Instead of using a transformer, you can specify the string "drop" if you want the columns to be dropped, or you can specify "pass through" if you want the columns to be left untouched. By default, the remaining columns (i.e., the ones that were not listed) will be dropped, but you can set the remainder hyperparameter to any transformer (or to "passthrough") if you want these columns to be handled differently.
If you are using Scikit-Learn zero point nineteen or earlier, you can use a third-party library such as sklearn-pandas, or you can roll out your own custom transformer to get the same functionality as the ColumnTransformer. Alternatively, you can use the FeatureUnion
Prepare the Data for Machine Learning Algorithms
Prepare the Data for Machine Learning Algorithms
class, which can apply different transformers and concatenate their outputs. But you cannot specify different columns for each transformer; they all apply to the whole data. It is possible to work around this limitation using a custom transformer for column selection (see the Jupyter notebook for an example).