可解释模型¶

以下示例展示了如何检查 auto-sklearn 优化的模型，以及如何将其限制到可解释的子集。

from pprint import pprint

import autosklearn.classification
import sklearn.datasets
import sklearn.metrics

显示可用的分类模型¶

我们将首先列出 Auto-sklearn 选择的所有分类器。预处理器（见下文）和回归（未显示）也有类似的调用可用。

from autosklearn.pipeline.components.classification import ClassifierChoice

for name in ClassifierChoice.get_components():
    print(name)

adaboost
bernoulli_nb
decision_tree
extra_trees
gaussian_nb
gradient_boosting
k_nearest_neighbors
lda
liblinear_svc
libsvm_svc
mlp
multinomial_nb
passive_aggressive
qda
random_forest
sgd

显示可用的预处理器¶

from autosklearn.pipeline.components.feature_preprocessing import (
    FeaturePreprocessorChoice,
)

for name in FeaturePreprocessorChoice.get_components():
    print(name)

densifier
extra_trees_preproc_for_classification
extra_trees_preproc_for_regression
fast_ica
feature_agglomeration
kernel_pca
kitchen_sinks
liblinear_svc_preprocessor
no_preprocessing
nystroem_sampler
pca
polynomial
random_trees_embedding
select_percentile_classification
select_percentile_regression
select_rates_classification
select_rates_regression
truncatedSVD

数据加载¶

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, random_state=1
)

构建并拟合分类器¶

现在，我们将只使用给定分类器和预处理器的一个子集。此外，我们将集成模型的大小限制为 1，以便最终只使用单个最佳模型。然而，我们想指出的是，哪些模型被认为是可解释的，很大程度上取决于用户，并且会因用例而异。

automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder="/tmp/autosklearn_interpretable_models_example_tmp",
    include={
        "classifier": ["decision_tree", "lda", "sgd"],
        "feature_preprocessor": [
            "no_preprocessing",
            "polynomial",
            "select_percentile_classification",
        ],
    },
    ensemble_kwargs={"ensemble_size": 1},
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")

AutoSklearnClassifier(ensemble_class=<class 'autosklearn.ensembles.ensemble_selection.EnsembleSelection'>,
                      ensemble_kwargs={'ensemble_size': 1},
                      include={'classifier': ['decision_tree', 'lda', 'sgd'],
                               'feature_preprocessor': ['no_preprocessing',
                                                        'polynomial',
                                                        'select_percentile_classification']},
                      per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_interpretable_models_example_tmp')

打印 auto-sklearn 构建的最终集成模型¶

pprint(automl.show_models(), indent=4)

{   28: {   'balancing': Balancing(random_state=1, strategy='weighting'),
            'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f05d0c22910>,
            'cost': 0.007092198581560294,
            'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f05d6264250>,
            'ensemble_weight': 1.0,
            'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f05d0c22490>,
            'model_id': 28,
            'rank': 1,
            'sklearn_classifier': SGDClassifier(alpha=0.0003272354910051561, average=True,
              eta0=2.9976399065090562e-05, l1_ratio=0.14999999999999974,
              learning_rate='invscaling', loss='squared_hinge', max_iter=1024,
              penalty='elasticnet', power_t=0.5037491320052959, random_state=1,
              tol=2.59922433981394e-05, warm_start=True)}}

获取最终集成模型的分数¶

predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9440559440559441

脚本总运行时间： ( 1 分 54.458 秒)

图集由 Sphinx-Gallery 生成