注意
点击 此处 下载完整的示例代码或通过 Binder 在浏览器中运行此示例
早期停止和回调¶
下面的示例展示了如何利用 auto-sklearn 的 get_trials_callback
参数通过回调实现早期停止机制。
这些回调可以访问由 auto-sklearn 的底层优化器 SMAC 优化的每个模型+超参数配置的结果。通过检查结果的成本,我们可以实现一个简单而有效的早期停止机制!
但请注意,这仅提供对单个模型的访问,不提供对 auto-sklearn 生成的集成模型的访问。您可能希望实现一种更复杂的早期停止机制,以便 auto-sklearn 有足够多的优秀模型可以构建集成模型。这里仅提供一个简单的示例。
from __future__ import annotations
from pprint import pprint
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
from smac.optimizer.smbo import SMBO
from smac.runhistory.runhistory import RunInfo, RunValue
构建并拟合分类器¶
def callback(
smbo: SMBO,
run_info: RunInfo,
result: RunValue,
time_left: float,
) -> bool | None:
"""Stop early if we get a very low cost value for a single run
The return value indicates to SMAC whether to stop or not. False will
stop the search process while any other value will mean it continues.
"""
# You can find out the parameters in the SMAC documentation
# https://automl.net.cn/SMAC3/main/
if result.cost <= 0.02:
print("Stopping!")
print(run_info)
print(result)
return False
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120, per_run_time_limit=30, get_trials_callback=callback
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
Stopping!
RunInfo(config=Configuration(values={
'balancing:strategy': 'none',
'classifier:__choice__': 'extra_trees',
'classifier:extra_trees:bootstrap': 'False',
'classifier:extra_trees:criterion': 'gini',
'classifier:extra_trees:max_depth': 'None',
'classifier:extra_trees:max_features': 0.5707983257382487,
'classifier:extra_trees:max_leaf_nodes': 'None',
'classifier:extra_trees:min_impurity_decrease': 0.0,
'classifier:extra_trees:min_samples_leaf': 3,
'classifier:extra_trees:min_samples_split': 11,
'classifier:extra_trees:min_weight_fraction_leaf': 0.0,
'data_preprocessor:__choice__': 'feature_type',
'data_preprocessor:feature_type:numerical_transformer:imputation:strategy': 'median',
'data_preprocessor:feature_type:numerical_transformer:rescaling:__choice__': 'none',
'feature_preprocessor:__choice__': 'polynomial',
'feature_preprocessor:polynomial:degree': 2,
'feature_preprocessor:polynomial:include_bias': 'False',
'feature_preprocessor:polynomial:interaction_only': 'False',
})
, instance='{"task_id": "breast_cancer"}', instance_specific='0', seed=0, cutoff=30.0, capped=False, budget=0.0, source_id=0)
RunValue(cost=0.014184397163120588, time=1.6877820491790771, status=<StatusType.SUCCESS: 1>, starttime=1663663263.9033709, endtime=1663663265.6127412, additional_info={'duration': 1.5963304042816162, 'num_run': 7, 'train_loss': 0.0, 'configuration_origin': 'Initial design'})
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
get_trials_callback=<function callback at 0x7f05d16c1f70>,
per_run_time_limit=30, time_left_for_this_task=120)
查看 auto-sklearn 找到的模型¶
print(automl.leaderboard())
rank ensemble_weight type cost duration
model_id
7 1 0.68 extra_trees 0.014184 1.687782
2 2 0.10 random_forest 0.028369 2.002935
3 3 0.22 mlp 0.028369 1.103178
打印 auto-sklearn 构建的最终集成模型¶
pprint(automl.show_models(), indent=4)
{ 2: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d61c9370>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d45ee910>,
'ensemble_weight': 0.1,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61c9400>,
'model_id': 2,
'rank': 1,
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
random_state=1, warm_start=True)},
3: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d24ae4c0>,
'cost': 0.028368794326241176,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05e9de29a0>,
'ensemble_weight': 0.22,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d3db8670>,
'model_id': 3,
'rank': 2,
'sklearn_classifier': MLPClassifier(activation='tanh', alpha=0.0001363185819149026, beta_1=0.999,
beta_2=0.9, early_stopping=True,
hidden_layer_sizes=(115, 115, 115),
learning_rate_init=0.00018009776276177523, max_iter=32,
n_iter_no_change=32, random_state=1, verbose=0, warm_start=True)},
7: { 'balancing': Balancing(random_state=1),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d457f820>,
'cost': 0.014184397163120588,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d0d67f40>,
'ensemble_weight': 0.68,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d61aa1f0>,
'model_id': 7,
'rank': 3,
'sklearn_classifier': ExtraTreesClassifier(max_features=34, min_samples_leaf=3, min_samples_split=11,
n_estimators=512, n_jobs=1, random_state=1,
warm_start=True)}}
获取最终集成模型的得分¶
predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9440559440559441
脚本总运行时间:( 0 minutes 22.430 seconds)