blackbox: A Python module for parallel optimization of expensive black-box functions
What is this?
Let's say you need to find optimal parameters of some computationally intensive system (for example, hyperparameters of a neural network). If you can construct a simple Python function, that takes a set of trial parameters, performs evaluation, and returns some scalar measure of how good chosen parameters are, then the problem becomes a mathematical optimization. However, a corresponding function is expensive (one evaluation can take hours) and is a black-box (has input-output nature).
blackbox is a minimalistic and easy-to-use Python module that efficiently searches for a global optimum (minimum) of an expensive black-box function. User needs to provide a function, a search region (ranges of values for each input parameter) and a number of function evaluations available. A code scales well on clusters and multicore CPUs by performing all expensive function evaluations in parallel.
A mathematical method behind the code is described in this arXiv note: https://arxiv.org/pdf/1605.00998.pdf
Feel free to cite this note if you are using method/code in your research.
(a) - function given on
0 < x,y < 1.
(b) - running a procedure using 15 evaluations.
(c) - running a procedure using 30 evaluations.
How do I represent my objective function?
It simply needs to be wrapped into a Python function. If an external application is used, it can be accessed using system call:
def fun(par): # running external application for given set of parameters os.system(...) # calculating output ... return output
par is a vector of input parameters (a Python list),
output is a scalar measure to be minimized.
How do I run the procedure?
No installation is needed. Just place
blackbox.py into your working directory. Main file should look like that:
import blackbox as bb def fun(par): return par**2 + par**2 # dummy 2D example def main(): bb.search(f=fun, # given function box=[[-10., 10.], [-10., 10.]], # range of values for each parameter (2D case) n=20, # number of function calls on initial stage (global search) m=20, # number of function calls on subsequent stage (local search) batch=4, # number of calls that will be evaluated in parallel resfile='output.csv') # text file where results will be saved if __name__ == '__main__': main()
- All function calls are divided into batches that are evaluated in parallel. Total number of these parallel cycles is
nmust be greater than the number of parameters,
mmust be greater than 1,
batchshould not exceed the number of CPU cores available.
- An optional parameter
executor=...should be specified when calling
bb.search()in case when code is used on a cluster with some custom parallel engine (ipyparallel, dask.distributed, pathos etc).
executorshould be an object that has a
How about results?
Iterations are sorted by function value (best solution is in the top) and saved in a text file with the following structure:
|Parameter #1||Parameter #2||...||Parameter #n||Function value|
Paul Knysh (email@example.com)
I receive tons of useful feedback that helps me to improve the code. Feel free to email me if you have any questions or comments.