跳到内容

抽象外观

smac.facade.abstract_facade #

AbstractFacade #

AbstractFacade(
    scenario: Scenario,
    target_function: Callable | str | AbstractRunner,
    *,
    model: AbstractModel | None = None,
    acquisition_function: AbstractAcquisitionFunction
    | None = None,
    acquisition_maximizer: AbstractAcquisitionMaximizer
    | None = None,
    initial_design: AbstractInitialDesign | None = None,
    random_design: AbstractRandomDesign | None = None,
    intensifier: AbstractIntensifier | None = None,
    multi_objective_algorithm: AbstractMultiObjectiveAlgorithm
    | None = None,
    runhistory_encoder: AbstractRunHistoryEncoder
    | None = None,
    config_selector: ConfigSelector | None = None,
    logging_level: int
    | Path
    | Literal[False]
    | None = None,
    callbacks: list[Callback] = None,
    overwrite: bool = False,
    dask_client: Client | None = None
)

外观(Facade)是 SMBO 后端之上的一个抽象层,用于以可配置和可分离的方式组织贝叶斯优化循环的组件,以适应不同(超参数)优化管道的各种需求。

除了用户必须提供的 `scenario` 和 target_function 外,参数 modelacquisition_functionacquisition_maximizerinitial_designrandom_designintensifiermulti_objective_algorithmrunhistory_encoder 可以通过子类的 get_* 方法明确指定(定义特定的 BO 管道),或者由用户实例化以显式覆盖管道组件。

参数#

scenario : Scenario 场景对象,包含所有环境信息。 target_function : Callable | str | AbstractRunner 此函数在内部调用以评估试验性能。如果传入字符串,则假定为脚本。在这种情况下,使用 TargetFunctionScriptRunner 运行脚本。 model : AbstractModel | None, defaults to None 代理模型。 acquisition_function : AbstractAcquisitionFunction | None, defaults to None 采集函数。 acquisition_maximizer : AbstractAcquisitionMaximizer | None, defaults to None 采集最大化器,根据代理模型和采集函数决定哪个配置最有前景。 initial_design : InitialDesign | None, defaults to None 在贝叶斯优化循环开始前,从初始设计中采样配置进行评估。 random_design : RandomDesign | None, defaults to None 随机设计用于采集最大化器,决定下一个配置是应从采集函数中抽取还是随机抽取。 intensifier : AbstractIntensifier | None, defaults to None Intensifier 决定接下来应运行哪个试验(配置、种子、预算和实例的组合)。 multi_objective_algorithm : AbstractMultiObjectiveAlgorithm | None, defaults to None 在多目标情况下,需要对目标进行解释以便进行优化。多目标算法负责此任务。 runhistory_encoder : RunHistoryEncoder | None, defaults to None 代理模型基于运行历史进行训练。然而,数据首先需要编码,这由运行历史编码器完成。例如,非活跃超参数需要编码,或者成本值可以进行对数转换。 logging_level: int | Path | Literal[False] | None 日志级别(最低级别 0 表示调试级别)。如果传入路径,则期望是包含日志配置的 yaml 文件。如果未传入任何内容,则使用 SMAC 默认的 logging.yml。如果传入 False,SMAC 不会自定义日志设置,而是由用户负责。 callbacks: list[Callback], defaults to [] 回调函数,集成到优化循环中。 overwrite: bool, defaults to False 当为 True 时,如果找到与当前设置元数据一致的先前运行结果,则覆盖。当为 False 且找到与元数据一致的先前运行结果时,则继续运行。当为 False 且找到与元数据不一致的先前运行结果时,则会询问用户具体行为(完全覆盖还是先重命名旧运行)。 dask_client: Client | None, defaults to None 用户创建的 dask 客户端,可用于启动 dask 集群并将 SMAC 附加到其上。如果显式提供,则不会自动关闭,必须手动关闭。如果未提供(默认),则会为您创建一个本地客户端并在完成后关闭。

源代码位于 smac/facade/abstract_facade.py
def __init__(
    self,
    scenario: Scenario,
    target_function: Callable | str | AbstractRunner,
    *,
    model: AbstractModel | None = None,
    acquisition_function: AbstractAcquisitionFunction | None = None,
    acquisition_maximizer: AbstractAcquisitionMaximizer | None = None,
    initial_design: AbstractInitialDesign | None = None,
    random_design: AbstractRandomDesign | None = None,
    intensifier: AbstractIntensifier | None = None,
    multi_objective_algorithm: AbstractMultiObjectiveAlgorithm | None = None,
    runhistory_encoder: AbstractRunHistoryEncoder | None = None,
    config_selector: ConfigSelector | None = None,
    logging_level: int | Path | Literal[False] | None = None,
    callbacks: list[Callback] = None,
    overwrite: bool = False,
    dask_client: Client | None = None,
):
    setup_logging(logging_level)

    if callbacks is None:
        callbacks = []

    if model is None:
        model = self.get_model(scenario)

    if acquisition_function is None:
        acquisition_function = self.get_acquisition_function(scenario)

    if acquisition_maximizer is None:
        acquisition_maximizer = self.get_acquisition_maximizer(scenario)

    if initial_design is None:
        initial_design = self.get_initial_design(scenario)

    if random_design is None:
        random_design = self.get_random_design(scenario)

    if intensifier is None:
        intensifier = self.get_intensifier(scenario)

    if multi_objective_algorithm is None and scenario.count_objectives() > 1:
        multi_objective_algorithm = self.get_multi_objective_algorithm(scenario=scenario)

    if runhistory_encoder is None:
        runhistory_encoder = self.get_runhistory_encoder(scenario)

    if config_selector is None:
        config_selector = self.get_config_selector(scenario)

    # Initialize empty stats and runhistory object
    runhistory = RunHistory(multi_objective_algorithm=multi_objective_algorithm)

    # Set the seed for configuration space
    scenario.configspace.seed(scenario.seed)

    # Set variables globally
    self._scenario = scenario
    self._model = model
    self._acquisition_function = acquisition_function
    self._acquisition_maximizer = acquisition_maximizer
    self._initial_design = initial_design
    self._random_design = random_design
    self._intensifier = intensifier
    self._multi_objective_algorithm = multi_objective_algorithm
    self._runhistory = runhistory
    self._runhistory_encoder = runhistory_encoder
    self._config_selector = config_selector
    self._callbacks = callbacks
    self._overwrite = overwrite

    # Prepare the algorithm executer
    runner: AbstractRunner
    if isinstance(target_function, AbstractRunner):
        runner = target_function
    elif isinstance(target_function, str):
        runner = TargetFunctionScriptRunner(
            scenario=scenario,
            target_function=target_function,
            required_arguments=self._get_signature_arguments(),
        )
    else:
        runner = TargetFunctionRunner(
            scenario=scenario,
            target_function=target_function,
            required_arguments=self._get_signature_arguments(),
        )

    # In case of multiple jobs, we need to wrap the runner again using DaskParallelRunner
    if (n_workers := scenario.n_workers) > 1 or dask_client is not None:
        if dask_client is not None and n_workers > 1:
            logger.warning(
                "Provided `dask_client`. Ignore `scenario.n_workers`, directly set `n_workers` in `dask_client`."
            )
        else:
            available_workers = joblib.cpu_count()
            if n_workers > available_workers:
                logger.info(f"Workers are reduced to {n_workers}.")
                n_workers = available_workers

        # We use a dask runner for parallelization
        runner = DaskParallelRunner(single_worker=runner, dask_client=dask_client)

    # Set the runner to access it globally
    self._runner = runner

    # Adding dependencies of the components
    self._update_dependencies()

    # We have to update our meta data (basically arguments of the components)
    self._scenario._set_meta(self.meta)

    # We have to validate if the object compositions are correct and actually make sense
    self._validate()

    # Finally we configure our optimizer
    self._optimizer = self._get_optimizer()
    assert self._optimizer

    # Register callbacks here
    for callback in callbacks:
        self._optimizer.register_callback(callback)

    # Additionally, we register the runhistory callback from the intensifier to efficiently update our incumbent
    # every time new information are available
    self._optimizer.register_callback(self._intensifier.get_callback(), index=0)

intensifier property #

intensifier: AbstractIntensifier

负责 BO 循环的优化器。跟踪有用的信息,例如状态。

meta property #

meta: dict[str, Any]

基于外观的所有组件生成哈希值。这用于运行名称或确定是否应继续运行。

optimizer property #

optimizer: SMBO

负责 BO 循环的优化器。跟踪有用的信息,例如状态。

runhistory property #

runhistory: RunHistory

运行历史记录,在优化过程中填充所有试验结果。

scenario property #

scenario: Scenario

包含所有环境信息的场景对象。

ask #

ask() -> TrialInfo

向 Intensifier 请求下一个试验。

源代码位于 smac/facade/abstract_facade.py
def ask(self) -> TrialInfo:
    """Asks the intensifier for the next trial."""
    return self._optimizer.ask()

get_acquisition_function abstractmethod staticmethod #

get_acquisition_function(
    scenario: Scenario,
) -> AbstractAcquisitionFunction

返回 BO 循环中使用的采集函数实例,定义探索/利用权衡。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_acquisition_function(scenario: Scenario) -> AbstractAcquisitionFunction:
    """Returns the acquisition function instance used in the BO loop,
    defining the exploration/exploitation trade-off.
    """
    raise NotImplementedError

get_acquisition_maximizer abstractmethod staticmethod #

get_acquisition_maximizer(
    scenario: Scenario,
) -> AbstractAcquisitionMaximizer

返回 BO 循环中使用的采集优化器实例,指定如何优化采集函数实例。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_acquisition_maximizer(scenario: Scenario) -> AbstractAcquisitionMaximizer:
    """Returns the acquisition optimizer instance to be used in the BO loop,
    specifying how the acquisition function instance is optimized.
    """
    raise NotImplementedError

get_config_selector staticmethod #

get_config_selector(
    scenario: Scenario,
    *,
    retrain_after: int = 8,
    retries: int = 16
) -> ConfigSelector

返回默认配置选择器。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
def get_config_selector(
    scenario: Scenario,
    *,
    retrain_after: int = 8,
    retries: int = 16,
) -> ConfigSelector:
    """Returns the default configuration selector."""
    return ConfigSelector(scenario, retrain_after=retrain_after, retries=retries)

get_initial_design abstractmethod staticmethod #

get_initial_design(
    scenario: Scenario,
) -> AbstractInitialDesign

返回 BO 循环中使用的初始设计类实例,指定如何选择用于“热启动”BO 循环的配置。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_initial_design(scenario: Scenario) -> AbstractInitialDesign:
    """Returns an instance of the initial design class to be used in the BO loop,
    specifying how the configurations the BO loop is 'warm-started' with are selected.
    """
    raise NotImplementedError

get_intensifier abstractmethod staticmethod #

get_intensifier(scenario: Scenario) -> AbstractIntensifier

返回 BO 循环中使用的 Intensifier 实例,指定如何在其他问题实例上挑战当前最佳配置。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_intensifier(scenario: Scenario) -> AbstractIntensifier:
    """Returns the intensifier instance to be used in the BO loop,
    specifying how to challenge the incumbent configuration on other problem instances.
    """
    raise NotImplementedError

get_model abstractmethod staticmethod #

get_model(scenario: Scenario) -> AbstractModel

返回 BO 循环中使用的代理成本模型实例。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_model(scenario: Scenario) -> AbstractModel:
    """Returns the surrogate cost model instance used in the BO loop."""
    raise NotImplementedError

get_multi_objective_algorithm abstractmethod staticmethod #

get_multi_objective_algorithm(
    scenario: Scenario,
) -> AbstractMultiObjectiveAlgorithm

返回 BO 循环中使用的多目标算法实例,指定多目标成本的标量化策略。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_multi_objective_algorithm(scenario: Scenario) -> AbstractMultiObjectiveAlgorithm:
    """Returns the multi-objective algorithm instance to be used in the BO loop,
    specifying the scalarization strategy for multiple objectives' costs.
    """
    raise NotImplementedError

get_random_design abstractmethod staticmethod #

get_random_design(
    scenario: Scenario,
) -> AbstractRandomDesign

返回 BO 循环中使用的随机设计类实例,指定如何在 BO 迭代中穿插随机选择的配置。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_random_design(scenario: Scenario) -> AbstractRandomDesign:
    """Returns an instance of the random design class to be used in the BO loop,
    specifying how to interleave the BO iterations with randomly selected configurations.
    """
    raise NotImplementedError

get_runhistory_encoder abstractmethod staticmethod #

get_runhistory_encoder(
    scenario: Scenario,
) -> AbstractRunHistoryEncoder

返回 BO 循环中使用的运行历史编码器类实例,指定如何为下一个代理模型准备运行历史。

源代码位于 smac/facade/abstract_facade.py
@staticmethod
@abstractmethod
def get_runhistory_encoder(scenario: Scenario) -> AbstractRunHistoryEncoder:
    """Returns an instance of the runhistory encoder class to be used in the BO loop,
    specifying how the runhistory is to be prepared for the next surrogate model.
    """
    raise NotImplementedError

optimize #

optimize(
    *, data_to_scatter: dict[str, Any] | None = None
) -> Configuration | list[Configuration]

优化算法的配置。

参数#

data_to_scatter: dict[str, Any] | None 我们首先注意,此参数仅对 dask_runner 有效!当用户将数据从本地进程分散到分布式网络时,数据会以循环方式按核心数分组分发。粗略地说,我们可以将这些数据保留在内存中,这样每次要执行带有大数据集的目标函数时就不必(反)序列化数据了。例如,当您的目标函数有一个在所有目标函数之间共享的大数据集时,此参数非常有用。

返回值#

incumbent : Configuration 找到的最佳配置。

源代码位于 smac/facade/abstract_facade.py
def optimize(self, *, data_to_scatter: dict[str, Any] | None = None) -> Configuration | list[Configuration]:
    """
    Optimizes the configuration of the algorithm.

    Parameters
    ----------
    data_to_scatter: dict[str, Any] | None
        We first note that this argument is valid only dask_runner!
        When a user scatters data from their local process to the distributed network,
        this data is distributed in a round-robin fashion grouping by number of cores.
        Roughly speaking, we can keep this data in memory and then we do not have to (de-)serialize the data
        every time we would like to execute a target function with a big dataset.
        For example, when your target function has a big dataset shared across all the target function,
        this argument is very useful.

    Returns
    -------
    incumbent : Configuration
        Best found configuration.
    """
    incumbents = None
    if isinstance(data_to_scatter, dict) and len(data_to_scatter) == 0:
        raise ValueError("data_to_scatter must be None or dict with some elements, but got an empty dict.")

    try:
        incumbents = self._optimizer.optimize(data_to_scatter=data_to_scatter)
    finally:
        self._optimizer.save()

    return incumbents

tell #

tell(
    info: TrialInfo, value: TrialValue, save: bool = True
) -> None

将试验结果添加到运行历史并更新 Intensifier。

参数#

info: TrialInfo 描述要处理结果的试验。 value: TrialValue 包含有关试验执行的相关信息。 save : bool, optional to True 是否应保存运行历史。

源代码位于 smac/facade/abstract_facade.py
def tell(self, info: TrialInfo, value: TrialValue, save: bool = True) -> None:
    """Adds the result of a trial to the runhistory and updates the intensifier.

    Parameters
    ----------
    info: TrialInfo
        Describes the trial from which to process the results.
    value: TrialValue
        Contains relevant information regarding the execution of a trial.
    save : bool, optional to True
        Whether the runhistory should be saved.
    """
    return self._optimizer.tell(info, value, save=save)

validate #

validate(
    config: Configuration, *, seed: int | None = None
) -> float | list[float]

在与优化过程中使用的种子不同的种子以及最高预算(如果预算类型为实数值)上验证配置。

参数#

config : Configuration 要验证的配置。 instances : list[str] | None, defaults to None 要验证的实例。如果为 None,则使用场景中指定的所有实例。如果预算类型为实数值,则忽略此参数。 seed : int | None, defaults to None 如果为 None,则使用场景中的种子。

返回值#

cost : float | list[float] 配置的平均成本。在多保真度情况下,每个目标的成本都会被平均。

源代码位于 smac/facade/abstract_facade.py
def validate(
    self,
    config: Configuration,
    *,
    seed: int | None = None,
) -> float | list[float]:
    """Validates a configuration on seeds different from the ones used in the optimization process and on the
    highest budget (if budget type is real-valued).

    Parameters
    ----------
    config : Configuration
        Configuration to validate
    instances : list[str] | None, defaults to None
        Which instances to validate. If None, all instances specified in the scenario are used.
        In case that the budget type is real-valued, this argument is ignored.
    seed : int | None, defaults to None
        If None, the seed from the scenario is used.

    Returns
    -------
    cost : float | list[float]
        The averaged cost of the configuration. In case of multi-fidelity, the cost of each objective is
        averaged.
    """
    return self._optimizer.validate(config, seed=seed)