注意
点击这里下载完整示例代码或通过 Binder 在您的浏览器中运行此示例
拟合单个配置¶
Auto-sklearn 使用 Scikit-Learn Pipelines 为给定任务搜索机器学习算法及其超参数配置的最佳组合。为了进一步提高性能,这些管道使用 Caruana (2004) 的 Ensemble Selection 进行集成。
此示例展示了如何拟合这些管道之一,既可以使用用户定义的配置,也可以使用从配置空间中随机抽样的配置。
Auto-Sklearn 拟合的管道与 Scikit-Learn API 兼容。您可以在此处获取有关 Scikit-Learn 模型的更多文档:<https://scikit-learn.cn/stable/getting_started.html`>_"
import numpy as np
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
from ConfigSpace.configuration_space import Configuration
import autosklearn.classification
数据加载¶
X, y = sklearn.datasets.fetch_openml(data_id=3, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, test_size=0.5, random_state=3
)
定义一个估计器¶
cls = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=60,
memory_limit=4096,
# We will limit the configuration space only to
# have RandomForest as a valid model. We recommend enabling all
# possible models to get a better performance.
include={"classifier": ["random_forest"]},
delete_tmp_folder_after_terminate=False,
)
拟合用户提供的配置¶
# We will create a configuration that has a user defined
# min_samples_split in the Random Forest. We recommend you to look into
# how the ConfigSpace package works here:
# https://automl.net.cn/ConfigSpace/master/
cs = cls.get_configuration_space(X, y, dataset_name="kr-vs-kp")
config = cs.sample_configuration()
config._values["classifier:random_forest:min_samples_split"] = 11
# Make sure that your changed configuration complies with the configuration space
config.is_valid_configuration()
pipeline, run_info, run_value = cls.fit_pipeline(
X=X_train,
y=y_train,
dataset_name="kr-vs-kp",
config=config,
X_test=X_test,
y_test=y_test,
)
# This object complies with Scikit-Learn Pipeline API.
# https://scikit-learn.cn/stable/modules/generated/sklearn.pipeline.Pipeline.html
print(pipeline.named_steps)
# The fit_pipeline command also returns a named tuple with the pipeline constraints
print(run_info)
# The fit_pipeline command also returns a named tuple with train/test performance
print(run_value)
# We can make sure that our pipeline configuration was honored as follows
print("Passed Configuration:", pipeline.config)
print("Random Forest:", pipeline.named_steps["classifier"].choice.estimator)
# We can also search for new configurations using the fit() method
# Any configurations found by Auto-Sklearn -- even the ones created using
# fit_pipeline() are stored to disk and can be used for Ensemble Selection
cs = cls.fit(X, y, dataset_name="kr-vs-kp")
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
warnings.warn(
{'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d3f33b50>, 'balancing': Balancing(random_state=1, strategy='weighting'), 'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d24aed00>, 'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d3f6f6d0>}
RunInfo(config=Configuration(values={
'balancing:strategy': 'weighting',
'classifier:__choice__': 'random_forest',
'classifier:random_forest:bootstrap': 'True',
'classifier:random_forest:criterion': 'gini',
'classifier:random_forest:max_depth': 'None',
'classifier:random_forest:max_features': 0.9678506216566037,
'classifier:random_forest:max_leaf_nodes': 'None',
'classifier:random_forest:min_impurity_decrease': 0.0,
'classifier:random_forest:min_samples_leaf': 4,
'classifier:random_forest:min_samples_split': 11,
'classifier:random_forest:min_weight_fraction_leaf': 0.0,
'data_preprocessor:__choice__': 'feature_type',
'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
'feature_preprocessor:__choice__': 'fast_ica',
'feature_preprocessor:fast_ica:algorithm': 'deflation',
'feature_preprocessor:fast_ica:fun': 'cube',
'feature_preprocessor:fast_ica:whiten': 'False',
})
, instance=None, instance_specific=None, seed=1, cutoff=60, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.06161137440758291, time=32.43082547187805, status=<StatusType.SUCCESS: 1>, starttime=1663663980.6781428, endtime=1663664013.1358533, additional_info={'duration': 32.330575466156006, 'num_run': 2, 'train_loss': 0.004670714619336769, 'configuration_origin': None})
Passed Configuration: Configuration(values={
'balancing:strategy': 'weighting',
'classifier:__choice__': 'random_forest',
'classifier:random_forest:bootstrap': 'True',
'classifier:random_forest:criterion': 'gini',
'classifier:random_forest:max_depth': 'None',
'classifier:random_forest:max_features': 0.9678506216566037,
'classifier:random_forest:max_leaf_nodes': 'None',
'classifier:random_forest:min_impurity_decrease': 0.0,
'classifier:random_forest:min_samples_leaf': 4,
'classifier:random_forest:min_samples_split': 11,
'classifier:random_forest:min_weight_fraction_leaf': 0.0,
'data_preprocessor:__choice__': 'feature_type',
'data_preprocessor:feature_type:categorical_transformer:categorical_encoding:__choice__': 'one_hot_encoding',
'data_preprocessor:feature_type:categorical_transformer:category_coalescence:__choice__': 'minority_coalescer',
'data_preprocessor:feature_type:categorical_transformer:category_coalescence:minority_coalescer:minimum_fraction': 0.003464940795376728,
'feature_preprocessor:__choice__': 'fast_ica',
'feature_preprocessor:fast_ica:algorithm': 'deflation',
'feature_preprocessor:fast_ica:fun': 'cube',
'feature_preprocessor:fast_ica:whiten': 'False',
})
Random Forest: RandomForestClassifier(max_features=62, min_samples_leaf=4,
min_samples_split=11, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
warnings.warn(
脚本总运行时间: ( 2 分 46.876 秒)