注意
点击 这里 下载完整示例代码或通过 Binder 在浏览器中运行此示例
单机并行使用¶
Auto-sklearn 使用 dask.distributed <https://distributed.dask.org.cn/en/latest/index.html>_ 进行并行优化。
本示例展示了如何在单机上启动 Auto-sklearn 以使用多核。在这种模式下,Auto-sklearn 启动一个 dask 集群,管理工作进程并在计算完成后负责关闭集群。要在多台机器上运行 Auto-sklearn,请查看示例 并行使用:从命令行生成工作进程。
import sklearn.model_selection
import sklearn.datasets
import sklearn.metrics
import autosklearn.classification
数据加载¶
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
X, y, random_state=1
)
构建和拟合分类器¶
为了使用 n_jobs_
,我们必须保护代码
if __name__ == "__main__":
automl = autosklearn.classification.AutoSklearnClassifier(
time_left_for_this_task=120,
per_run_time_limit=30,
tmp_folder="/tmp/autosklearn_parallel_1_example_tmp",
n_jobs=4,
# Each one of the 4 jobs is allocated 3GB
memory_limit=3072,
seed=5,
)
automl.fit(X_train, y_train, dataset_name="breast_cancer")
# Print statistics about the auto-sklearn run such as number of
# iterations, number of models failed with a time out.
print(automl.sprint_statistics())
auto-sklearn results:
Dataset name: breast_cancer
Metric: accuracy
Best validation score: 0.985816
Number of target algorithm runs: 43
Number of successful target algorithm runs: 43
Number of crashed target algorithm runs: 0
Number of target algorithms that exceeded the time limit: 0
Number of target algorithms that exceeded the memory limit: 0
脚本总运行时间: ( 2 minutes 1.338 seconds)