注意
点击 此处 下载完整示例代码或通过 Binder 在浏览器中运行此示例
日志记录和调试¶
此示例展示了如何为 auto-sklearn 提供自定义的日志配置。我们将拟合 2 个管道并在控制台上显示所有 INFO 级别的消息。即使您不提供 logging_configuration,autosklearn 也会在临时工作目录中创建一个日志文件。此目录可以通过 tmp_folder 来指定,如下例所示。
此示例还重点介绍了关于 auto-sklearn 内部目录结构的其他信息。
import pathlib
import sklearn.datasets
import sklearn.metrics
import sklearn.model_selection
import autosklearn.classification
数据加载¶
从 https://www.openml.org/d/3 加载 kr-vs-kp 数据集
X, y = data = sklearn.datasets.fetch_openml(data_id=3, return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
创建日志配置¶
auto-sklearn 使用默认的 logging config。我们将instead 创建一个自定义配置,如下所示
logging_config = {
"version": 1,
"disable_existing_loggers": True,
"formatters": {
"custom": {
# More format options are available in the official
# `documentation <https://docs.pythonlang.cn/3/howto/logging-cookbook.html>`_
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
}
},
# Any INFO level msg will be printed to the console
"handlers": {
"console": {
"level": "INFO",
"formatter": "custom",
"class": "logging.StreamHandler",
"stream": "ext://sys.stdout",
},
},
"loggers": {
"": { # root logger
"level": "DEBUG",
},
"Client-EnsembleBuilder": {
"level": "DEBUG",
"handlers": ["console"],
},
},
}
构建并拟合分类器¶
cls = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=30,
# Bellow two flags are provided to speed up calculations
# Not recommended for a real implementation
initial_configurations_via_metalearning=0,
smac_scenario_args={"runcount_limit": 2},
# Pass the config file we created
logging_config=logging_config,
# *auto-sklearn* generates temporal files under tmp_folder
tmp_folder="./tmp_folder",
# By default tmp_folder is deleted. We will preserve it
# for debug purposes
delete_tmp_folder_after_terminate=False,
)
cls.fit(X_train, y_train, X_test, y_test)
# *auto-sklearn* generates intermediate files which can be of interest
# Dask multiprocessing information. Useful on multi-core runs:
# * tmp_folder/distributed.log
# The individual fitted estimators are written to disk on:
# * tmp_folder/.auto-sklearn/runs
# SMAC output is stored in this directory.
# For more info, you can check the `SMAC documentation <https://github.com/automl/SMAC3>`_
# * tmp_folder/smac3-output
# Auto-sklearn always outputs to this log file
# tmp_folder/AutoML*.log
for filename in pathlib.Path("./tmp_folder").glob("*"):
print(filename)
/home/runner/work/auto-sklearn/auto-sklearn/autosklearn/data/target_validator.py:187: UserWarning: Fitting transformer with a pandas series which has the dtype category. Inverse transform may not be able preserve dtype when converting to np.ndarray
warnings.warn(
2022-09-20 08:55:45,145 - Client-EnsembleBuilder - INFO - DummyFuture: ([{'Timestamp': Timestamp('2022-09-20 08:55:45.128931'), 'ensemble_optimization_score': 0.9886363636363636, 'ensemble_test_score': 0.9899874843554443}], 50)/SingleThreadedClient() Started Ensemble builder job at 2022.09.20-08.55.45 for iteration 0.
2022-09-20 08:55:48,103 - Client-EnsembleBuilder - INFO - DummyFuture: ([{'Timestamp': Timestamp('2022-09-20 08:55:48.084418'), 'ensemble_optimization_score': 0.9911616161616161, 'ensemble_test_score': 0.9887359198998749}], 50)/SingleThreadedClient() Started Ensemble builder job at 2022.09.20-08.55.48 for iteration 1.
tmp_folder/distributed.log
tmp_folder/smac3-output
tmp_folder/AutoML(1):fa2baf03-38c1-11ed-8830-892d16569fbe.log
tmp_folder/.auto-sklearn
tmp_folder/space.json
脚本总运行时间: ( 0 分钟 23.992 秒)