拟合单个配置¶

Auto-sklearn 使用 Scikit-Learn Pipelines 为给定任务搜索机器学习算法及其超参数配置的最佳组合。为了进一步提高性能，这些管道使用 Caruana (2004) 的 Ensemble Selection 进行集成。

此示例展示了如何拟合这些管道之一，既可以使用用户定义的配置，也可以使用从配置空间中随机抽样的配置。

Auto-Sklearn 拟合的管道与 Scikit-Learn API 兼容。您可以在此处获取有关 Scikit-Learn 模型的更多文档：<https://scikit-learn.cn/stable/getting_started.html`>_"

import numpy as np
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics

from ConfigSpace.configuration_space import Configuration

import autosklearn.classification

数据加载¶

X, y = sklearn.datasets.fetch_openml(data_id=3, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.5, random_state=3
)

定义一个估计器¶

cls = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=60,
    memory_limit=4096,
    # We will limit the configuration space only to
    # have RandomForest as a valid model. We recommend enabling all
    # possible models to get a better performance.
    include={"classifier": ["random_forest"]},
    delete_tmp_folder_after_terminate=False,
)

拟合用户提供的配置¶

# We will create a configuration that has a user defined
# min_samples_split in the Random Forest. We recommend you to look into
# how the ConfigSpace package works here:
# https://automl.net.cn/ConfigSpace/master/
cs = cls.get_configuration_space(X, y, dataset_name="kr-vs-kp")
config = cs.sample_configuration()
config._values["classifier:random_forest:min_samples_split"] = 11

# Make sure that your changed configuration complies with the configuration space
config.is_valid_configuration()

pipeline, run_info, run_value = cls.fit_pipeline(
    X=X_train,
    y=y_train,
    dataset_name="kr-vs-kp",
    config=config,
    X_test=X_test,
    y_test=y_test,
)

# This object complies with Scikit-Learn Pipeline API.
# https://scikit-learn.cn/stable/modules/generated/sklearn.pipeline.Pipeline.html
print(pipeline.named_steps)

# The fit_pipeline command also returns a named tuple with the pipeline constraints
print(run_info)

# The fit_pipeline command also returns a named tuple with train/test performance
print(run_value)

# We can make sure that our pipeline configuration was honored as follows
print("Passed Configuration:", pipeline.config)
print("Random Forest:", pipeline.named_steps["classifier"].choice.estimator)

# We can also search for new configurations using the fit() method
# Any configurations found by Auto-Sklearn -- even the ones created using
# fit_pipeline() are stored to disk and can be used for Ensemble Selection
cs = cls.fit(X, y, dataset_name="kr-vs-kp")

/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
  warnings.warn(
{'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d3f33b50>, 'balancing': Balancing(random_state=1, strategy='weighting'), 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d24aed00>, 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d3f6f6d0>}
RunInfo(config=Configuration(values={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'True',
  'classifier:random_forest:criterion': 'gini',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 0.9678506216566037,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_impurity_decrease': 0.0,
  'classifier:random_forest:min_samples_leaf': 4,
  'classifier:random_forest:min_samples_split': 11,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
  'feature_preprocessor:__choice__': 'fast_ica',
  'feature_preprocessor:fast_ica:algorithm': 'deflation',
  'feature_preprocessor:fast_ica:fun': 'cube',
  'feature_preprocessor:fast_ica:whiten': 'False',
})
, instance=None, instance_specific=None, seed=1, cutoff=60, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.06161137440758291, time=32.43082547187805, status=<StatusType.SUCCESS: 1>, starttime=1663663980.6781428, endtime=1663664013.1358533, additional_info={'duration': 32.330575466156006, 'num_run': 2, 'train_loss': 0.004670714619336769, 'configuration_origin': None})
Passed Configuration: Configuration(values={
  'balancing:strategy': 'weighting',
  'classifier:__choice__': 'random_forest',
  'classifier:random_forest:bootstrap': 'True',
  'classifier:random_forest:criterion': 'gini',
  'classifier:random_forest:max_depth': 'None',
  'classifier:random_forest:max_features': 0.9678506216566037,
  'classifier:random_forest:max_leaf_nodes': 'None',
  'classifier:random_forest:min_impurity_decrease': 0.0,
  'classifier:random_forest:min_samples_leaf': 4,
  'classifier:random_forest:min_samples_split': 11,
  'classifier:random_forest:min_weight_fraction_leaf': 0.0,
  'data_preprocessor:__choice__': 'feature_type',
  'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
  'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
  'feature_preprocessor:__choice__': 'fast_ica',
  'feature_preprocessor:fast_ica:algorithm': 'deflation',
  'feature_preprocessor:fast_ica:fun': 'cube',
  'feature_preprocessor:fast_ica:whiten': 'False',
})

Random Forest: RandomForestClassifier(max_features=62, min_samples_leaf=4,
                       min_samples_split=11, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
  warnings.warn(

脚本总运行时间： ( 2 分 46.876 秒)

由 Sphinx-Gallery 生成的图库