注意
点击此处下载完整的示例代码或通过 Binder 在浏览器中运行此示例
可解释模型¶
以下示例展示了如何检查 auto-sklearn 优化的模型,以及如何将其限制到可解释的子集。
from pprint import pprint
import autosklearn.classification
import sklearn.datasets
import sklearn.metrics
显示可用的分类模型¶
我们将首先列出 Auto-sklearn 选择的所有分类器。预处理器(见下文)和回归(未显示)也有类似的调用可用。
from autosklearn.pipeline.components.classification import ClassifierChoice
for name in ClassifierChoice.get_components():
print(name)
adaboost
bernoulli_nb
decision_tree
extra_trees
gaussian_nb
gradient_boosting
k_nearest_neighbors
lda
liblinear_svc
libsvm_svc
mlp
multinomial_nb
passive_aggressive
qda
random_forest
sgd
显示可用的预处理器¶
from autosklearn.pipeline.components.feature_preprocessing import (
FeaturePreprocessorChoice,
)
for name in FeaturePreprocessorChoice.get_components():
print(name)
densifier
extra_trees_preproc_for_classification
extra_trees_preproc_for_regression
fast_ica
feature_agglomeration
kernel_pca
kitchen_sinks
liblinear_svc_preprocessor
no_preprocessing
nystroem_sampler
pca
polynomial
random_trees_embedding
select_percentile_classification
select_percentile_regression
select_rates_classification
select_rates_regression
truncatedSVD
数据加载¶
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
构建并拟合分类器¶
现在,我们将只使用给定分类器和预处理器的一个子集。此外,我们将集成模型的大小限制为 1
,以便最终只使用单个最佳模型。然而,我们想指出的是,哪些模型被认为是可解释的,很大程度上取决于用户,并且会因用例而异。
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
tmp_folder="/tmp/autosklearn_interpretable_models_example_tmp",
include={
"classifier": ["decision_tree", "lda", "sgd"],
"feature_preprocessor": [
"no_preprocessing",
"polynomial",
"select_percentile_classification",
],
},
ensemble_kwargs={"ensemble_size": 1},
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
ensemble_kwargs={'ensemble_size': 1},
include={'classifier': ['decision_tree', 'lda', 'sgd'],
'feature_preprocessor': ['no_preprocessing',
'polynomial',
'select_percentile_classification']},
per_run_time_limit=30, time_left_for_this_task=120,
tmp_folder='/tmp/autosklearn_interpretable_models_example_tmp')
打印 auto-sklearn 构建的最终集成模型¶
pprint(automl.show_models(), indent=4)
{ 28: { 'balancing': Balancing(random_state=1, strategy='weighting'),
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d0c22910>,
'cost': 0.007092198581560294,
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d6264250>,
'ensemble_weight': 1.0,
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d0c22490>,
'model_id': 28,
'rank': 1,
'sklearn_classifier': SGDClassifier(alpha=0.0003272354910051561, average=True,
eta0=2.9976399065090562e-05, l1_ratio=0.14999999999999974,
learning_rate='invscaling', loss='squared_hinge', max_iter=1024,
penalty='elasticnet', power_t=0.5037491320052959, random_state=1,
tol=2.59922433981394e-05, warm_start=True)}}
获取最终集成模型的分数¶
predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))
Accuracy score: 0.9440559440559441
脚本总运行时间: ( 1 分 54.458 秒)