API Reference¶
This is the API for the signac-flow application.
Command Line Interface¶
Some core signac-flow functions are—in addition to the Python interface—accessible
directly via the $ flow
command.
For more information, please see $ flow --help
.
usage: flow [-h] [--debug] [--version] {init} ...
flow provides the basic components to set up workflows for projects as part of
the signac framework.
positional arguments:
{init}
init Initialize a signac-flow project.
optional arguments:
-h, --help show this help message and exit
--debug Show traceback on error for debugging.
--version Display the version number and exit.
The FlowProject¶
-
class
flow.
FlowProject
(config=None, environment=None)[source]¶ A signac project class specialized for workflow management.
This class provides a command line interface for the definition, execution, and submission of workflows based on condition and operation functions.
This is a typical example on how to use this class:
@FlowProject.operation def hello(job): print('hello', job) FlowProject().main()
Parameters: config (A signac config object.) – A signac configuration, defaults to the configuration loaded from the environment.
Attributes
FlowProject.ALIASES |
These are default aliases used within the status output. |
FlowProject.add_operation (name, cmd[, pre, post]) |
Add an operation to the workflow. |
FlowProject.classify (job) |
Generator function which yields labels for job. |
FlowProject.completed_operations (job) |
Determine which operations have been completed for job. |
FlowProject.eligible_for_submission (…) |
Deprecated since version 0.8. |
FlowProject.export_job_stati (collection, stati) |
Export the job stati to a database collection. |
FlowProject.get_job_status (job[, …]) |
Return a dict with detailed information about the status of a job. |
FlowProject.label ([label_name_or_func]) |
Designate a function to be a label function of this class. |
FlowProject.labels (job) |
Yields all labels for the given job . |
FlowProject.main ([parser]) |
Call this function to use the main command line interface. |
FlowProject.next_operation (job) |
Determine the next operation for this job. |
FlowProject.next_operations (*jobs) |
Determine the next eligible operations for jobs. |
FlowProject.operation (func[, name]) |
Add the function func as operation function to the class workflow definition. |
FlowProject.operations |
The dictionary of operations that have been added to the workflow. |
FlowProject.post (condition[, tag]) |
Specify a function of job that must evaluate to True for this operation to be considered complete. |
FlowProject.post.always (func) |
Returns True. |
FlowProject.post.copy_from (*other_funcs) |
True if and only if all post conditions of other operation-function(s) are met. |
FlowProject.post.false (key) |
True if the specified key is present in the job document and evaluates to False. |
FlowProject.post.isfile (filename) |
True if the specified file exists for this job. |
FlowProject.post.never (func) |
Returns False. |
FlowProject.post.not_ (condition) |
Returns not condition(job) for the provided condition function. |
FlowProject.post.true (key) |
True if the specified key is present in the job document and evaluates to True. |
FlowProject.pre (condition[, tag]) |
Specify a function of job that must be true for this operation to be eligible for execution. |
FlowProject.pre.after (*other_funcs) |
True if and only if all post conditions of other operation-function(s) are met. |
FlowProject.pre.always (func) |
Returns True. |
FlowProject.pre.copy_from (*other_funcs) |
True if and only if all pre conditions of other operation-function(s) are met. |
FlowProject.pre.false (key) |
True if the specified key is present in the job document and evaluates to False. |
FlowProject.pre.isfile (filename) |
True if the specified file exists for this job. |
FlowProject.pre.never (func) |
Returns False. |
FlowProject.pre.not_ (condition) |
Returns not condition(job) for the provided condition function. |
FlowProject.pre.true (key) |
True if the specified key is present in the job document and evaluates to True. |
FlowProject.run ([jobs, names, pretend, np, …]) |
Execute all pending operations for the given selection. |
FlowProject.run_operations ([operations, …]) |
Execute the next operations as specified by the project’s workflow. |
FlowProject.scheduler_jobs (scheduler) |
Fetch jobs from the scheduler. |
FlowProject.script (operations[, parallel, …]) |
Generate a run script to execute given operations. |
FlowProject.submit ([bundle_size, jobs, …]) |
Submit function for the project’s main submit interface. |
FlowProject.submit_operations (operations[, …]) |
Submit a sequence of operations to the scheduler. |
FlowProject.update_aliases (aliases) |
Update the ALIASES table for this class. |
-
class
flow.
FlowProject
(config=None, environment=None)[source] Bases:
signac.contrib.project.Project
A signac project class specialized for workflow management.
This class provides a command line interface for the definition, execution, and submission of workflows based on condition and operation functions.
This is a typical example on how to use this class:
@FlowProject.operation def hello(job): print('hello', job) FlowProject().main()
Parameters: config (A signac config object.) – A signac configuration, defaults to the configuration loaded from the environment. -
ALIASES
= {'active': 'A', 'inactive': 'I', 'queued': 'Q', 'registered': 'R', 'requires_attention': '!', 'unknown': 'U'}¶ These are default aliases used within the status output.
-
PRINT_STATUS_ALL_VARYING_PARAMETERS
= True¶ This constant can be used to signal that the print_status() method is supposed to automatically show all varying parameters.
-
add_operation
(name, cmd, pre=None, post=None, **kwargs)[source]¶ Add an operation to the workflow.
This method will add an instance of
FlowOperation
to the operations-dict of this project.See also
A Python function may be defined as an operation function directly using the
operation()
decorator.Any FlowOperation is associated with a specific command, which should be a function of
Job
. The command (cmd) can be stated as function, either by using str-substitution based on a job’s attributes, or by providing a unary callable, which expects an instance of job as its first and only positional argument.For example, if we wanted to define a command for a program called ‘hello’, which expects a job id as its first argument, we could construct the following two equivalent operations:
op = FlowOperation('hello', cmd='hello {job._id}') op = FlowOperation('hello', cmd=lambda 'hello {}'.format(job._id))
Here are some more useful examples for str-substitutions:
# Substitute job state point parameters: op = FlowOperation('hello', cmd='cd {job.ws}; hello {job.sp.a}')
Pre-requirements (pre) and post-conditions (post) can be used to trigger an operation only when certain conditions are met. Conditions are unary callables, which expect an instance of job as their first and only positional argument and return either True or False.
An operation is considered “eligible” for execution when all pre-requirements are met and when at least one of the post-conditions is not met. Requirements are always met when the list of requirements is empty and post-conditions are never met when the list of post-conditions is empty.
Please note, eligibility in this contexts refers only to the workflow pipeline and not to other contributing factors, such as whether the job-operation is currently running or queued.
Parameters:
-
classify
(job)[source]¶ Generator function which yields labels for job.
By default, this method yields from the project’s labels() method.
Parameters: job ( Job
) – The signac job handle.Yields: The labels for the provided job. Yield type: str Deprecated since version 0.8: This will be removed in 0.10. Use labels() instead.
-
completed_operations
(job)[source]¶ Determine which operations have been completed for job.
Parameters: job ( Job
) – The signac job handle.Returns: The name of the operations that are complete. Return type: str
-
detect_operation_graph
()[source]¶ Determine the directed acyclic graph defined by operation pre- and post-conditions.
In general, executing a given operation registered with a FlowProject just involves checking the operation’s pre- and post-conditions to determine eligibility. More generally, however, the pre- and post-conditions define a directed acyclic graph that governs the execution of all operations. Visualizing this graph can be useful for finding logic errors in the specified conditions, and having this graph computed also enables additional execution modes. For example, using this graph it is possible to determine exactly what operations need to be executed in order to make the operation eligible so that the task of executing all necessary operations can be automated.
The graph is determined by iterating over all pairs of operations and checking for equality of pre- and post-conditions. The algorithm builds an adjacency matrix based on whether the pre-conditions for one operation match the post-conditions for another. The comparison of operations is conservative; by default, conditions must be composed of identical code to be identified as equal (technically, they must be bytecode equivalent, i.e.
cond1.__code__.co_code == cond2.__code__.co_code
). Users can specify that conditions should be treated as equal by providing tags to the operations.Given a FlowProject subclass defined in a module
project.py
, the output graph could be visualized using Matplotlib and NetworkX with the following code:import numpy as np import networkx as nx from matplotlib import pyplot as plt from project import Project project = Project() ops = project.operations.keys() adj = np.asarray(project.detect_operation_graph()) plt.figure() g = nx.DiGraph(adj) pos = nx.spring_layout(g) nx.draw(g, pos) nx.draw_networkx_labels( g, pos, labels={key: name for (key, name) in zip(range(len(ops)), [o for o in ops])}) plt.show()
Raises a
RuntimeError
if a condition does not have a tag. This can occur when usingfunctools.partial
, and a manually specified condition tag has not been set.Raises: RuntimeError
-
eligible_for_submission
(job_operation)[source]¶ Deprecated since version 0.8: This will be removed in 0.10.
-
export_job_stati
(collection, stati)[source]¶ Export the job stati to a database collection.
Deprecated since version 0.8: This will be removed in 0.10. Use export_job_statuses() instead.
-
export_job_statuses
(collection, statuses)[source]¶ Export the job statuses to a database collection.
-
get_job_status
(job, ignore_errors=False, cached_status=None)[source]¶ Return a dict with detailed information about the status of a job.
-
classmethod
label
(label_name_or_func=None)[source]¶ Designate a function to be a label function of this class.
For example, we can define a label function like this:
@FlowProject.label def foo_label(job): if job.document.get('foo', False): return 'foo-label-text'
The
foo-label-text
label will now show up in the status view for each job, where thefoo
key evaluates true.If the label functions returns any type other than
str
, the label name will be the name of the function if and only if the return value evaluates toTrue
, for example:@FlowProject.label def foo_label(job): return job.document.get('foo', False)
Finally, you can specify a different default label name by providing it as the first argument to the
label()
decorator.Parameters: label_name_or_func (str or callable) – A label name or callable.
-
main
(parser=None)[source]¶ Call this function to use the main command line interface.
In most cases one would want to call this function as part of the class definition, e.g.:
my_project.py from flow import FlowProject class MyProject(FlowProject): pass if __name__ == '__main__': MyProject().main()
You can then execute this script on the command line:
$ python my_project.py --help
-
next_operation
(job)[source]¶ Determine the next operation for this job.
Parameters: job ( Job
) – The signac job handle.Returns: An instance of JobOperation to execute next or None, if no operation is eligible. Return type: :py:class:`~.JobOperation or NoneType Deprecated since version 0.8: This will be removed in 0.10. Use next_operations() instead.
-
next_operations
(*jobs)[source]¶ Determine the next eligible operations for jobs.
Parameters: jobs – The signac job handles. Yield: All instances of JobOperation
jobs are eligible for.
-
classmethod
operation
(func, name=None)[source]¶ Add the function func as operation function to the class workflow definition.
This function is designed to be used as a decorator function, for example:
@FlowProject.operation def hello(job): print('Hello', job)
See also:
add_operation()
.
-
operations
¶ The dictionary of operations that have been added to the workflow.
-
print_status
(jobs=None, overview=True, overview_max_lines=None, detailed=False, parameters=None, param_max_width=None, expand=False, all_ops=False, only_incomplete=False, dump_json=False, unroll=True, compact=False, pretty=False, file=None, err=None, ignore_errors=False, no_parallelize=False, template=None, profile=False, eligible_jobs_max_lines=None)[source]¶ Print the status of the project.
Parameters: - jobs (Sequence of instances
Job
.) – Only execute operations for the given jobs, or all if the argument is omitted. - overview (bool) – Aggregate an overview of the project’ status.
- overview_max_lines (int) – Limit the number of overview lines.
- eligible_jobs_max_lines (int) – Limit the number of eligible jobs that are printed in the overview.
- detailed (bool) – Print a detailed status of each job.
- parameters (list of str) – Print the value of the specified parameters.
- param_max_width (int) – Limit the number of characters of parameter columns,
see also:
update_aliases()
. - expand (bool) – Present labels and operations in two separate tables.
- all_ops (bool) – Include operations that are not eligible to run.
- only_incomplete (bool) – Only show jobs that have eligible operations.
- dump_json (bool) – Output the data as JSON instead of printing the formatted output.
- unroll (bool) – Separate columns for jobs and the corresponding operations.
- compact (bool) – Print a compact version of the output.
- pretty (bool) – Prettify the output.
- file (str) – Redirect all output to this file, defaults to sys.stdout.
- err (str) – Redirect all error output to this file, defaults to sys.stderr.
- ignore_errors (bool) – Print status even if querying the scheduler fails.
- no_parallelize (bool) – Do not parallelize the status update.
- template (str) – User provided Jinja2 template file.
- jobs (Sequence of instances
-
run
(jobs=None, names=None, pretend=False, np=None, timeout=None, num=None, num_passes=1, progress=False, order=None)[source]¶ Execute all pending operations for the given selection.
This function will run in an infinite loop until all pending operations are executed, unless it reaches the maximum number of passes per operation or the maximum number of executions.
By default there is no limit on the total number of executions, but a specific operation will only be executed once per job. This is to avoid accidental infinite loops when no or faulty post conditions are provided.
See also:
run_operations()
Parameters: - jobs (Sequence of instances
Job
.) – Only execute operations for the given jobs, or all if the argument is omitted. - names (Sequence of
str
) – Only execute operations that are in the provided set of names, or all, if the argument is omitted. - pretend (bool) – Do not actually execute the operations, but show which command would have been used.
- np (int) – Parallelize to the specified number of processors. Use -1 to parallelize to all available processing units.
- timeout (int) – An optional timeout for each operation in seconds after which execution will be cancelled. Use -1 to indicate not timeout (the default).
- num (int) – The total number of operations that are executed will not exceed this argument if provided.
- num_passes (int) – The total number of one specific job-operation pair will not exceed this argument. The default is 1, there is no limit if this argument is None.
- progress – Show a progress bar during execution.
- order (str, callable, or NoneType) –
- Specify the order of operations, possible values are:
- ’none’ or None (no specific order)
- ’by-job’ (operations are grouped by job)
- ’cyclic’ (order operations cyclic by job)
- ’random’ (shuffle the execution order randomly)
- callable (a callable returning a comparison key for an
- operation used to sort operations)
The default value is none, which is equivalent to by-job in the current implementation.
Note
Users are advised to not rely on a specific execution order, as a substitute for defining the workflow in terms of pre- and post-conditions. However, a specific execution order may be more performant in cases where operations need to access and potentially lock shared resources.
- jobs (Sequence of instances
-
run_operations
(operations=None, pretend=False, np=None, timeout=None, progress=False)[source]¶ Execute the next operations as specified by the project’s workflow.
See also:
run()
Parameters: - operations (Sequence of instances of
JobOperation
) – The operations to execute (optional). - pretend (bool) – Do not actually execute the operations, but show which command would have been used.
- np (int) – The number of processors to use for each operation.
- timeout (int) – An optional timeout for each operation in seconds after which execution will be cancelled. Use -1 to indicate not timeout (the default).
- progress – Show a progress bar during execution.
- operations (Sequence of instances of
-
scheduler_jobs
(scheduler)[source]¶ Fetch jobs from the scheduler.
This function will fetch all scheduler jobs from the scheduler and also expand bundled jobs automatically.
However, this function will not automatically filter scheduler jobs which are not associated with this project.
Parameters: scheduler ( Scheduler
) – The scheduler instance.Yields: All scheduler jobs fetched from the scheduler instance.
-
script
(operations, parallel=False, template='script.sh', show_template_help=False)[source]¶ Generate a run script to execute given operations.
Parameters: - operations (Sequence of instances of
JobOperation
) – The operations to execute. - parallel (bool) – Execute all operations in parallel (default is False).
- template (str) – The name of the template to use to generate the script.
- show_template_help (bool) – Show help related to the templating system and then exit.
- operations (Sequence of instances of
-
submit
(bundle_size=1, jobs=None, names=None, num=None, parallel=False, force=False, walltime=None, env=None, **kwargs)[source]¶ Submit function for the project’s main submit interface.
Parameters: - bundle_size (int) – Specify the number of operations to be bundled into one submission, defaults to 1.
- jobs (Sequence of instances
Job
.) – Only submit operations associated with the provided jobs. Defaults to all jobs. - names (Sequence of
str
) – Only submit operations with any of the given names, defaults to all names. - num (int) – Limit the total number of submitted operations, defaults to no limit.
- parallel (bool) – Execute all bundled operations in parallel. Does nothing with the default behavior or bundle_size=1.
- force (bool) – Ignore all warnings or checks during submission, just submit.
- walltime – Specify the walltime in hours or as instance of
datetime.timedelta
.
-
submit_operations
(operations, _id=None, env=None, parallel=False, flags=None, force=False, template='script.sh', pretend=False, show_template_help=False, **kwargs)[source]¶ Submit a sequence of operations to the scheduler.
Parameters: - operations (A sequence of instances of
JobOperation
) – The operations to submit. - _id (str) – The _id to be used for this submission.
- parallel (bool) – Execute all bundled operations in parallel.
- flags (list) – Additional options to be forwarded to the scheduler.
- force (bool) – Ignore all warnings or checks during submission, just submit.
- template (str) – The name of the template file to be used to generate the submission script.
- pretend (bool) – Do not actually submit, but only print the submission script to screen. Useful for testing the submission workflow.
- show_template_help (bool) – Show information about available template variables and filters and exit.
- **kwargs – Additional keyword arguments to be forwarded to the scheduler.
Returns: Returns the submission status after successful submission or None.
- operations (A sequence of instances of
-
-
FlowProject.
post
(tag=None)¶ Specify a function of job that must evaluate to True for this operation to be considered complete. For example:
@Project.operation @Project.post(lambda job: job.doc.get('bye')) def bye(job): print('bye' job) job.doc.bye = True
The bye-operation would be considered complete and therefore no longer eligible for execution once the ‘bye’ key in the job document evaluates to True.
An optional tag may be associated with the condition. These tags are used by
detect_operation_graph()
when comparing conditions for equality. The tag defaults to the bytecode of the function.
-
classmethod
post.
always
(func)¶ Returns True.
Deprecated since version 0.9: This will be removed in 0.11. This condition decorator is obsolete.
-
classmethod
post.
copy_from
(*other_funcs)¶ True if and only if all post conditions of other operation-function(s) are met.
-
classmethod
post.
false
(key)¶ True if the specified key is present in the job document and evaluates to False.
-
classmethod
post.
isfile
(filename)¶ True if the specified file exists for this job.
-
classmethod
post.
never
(func)¶ Returns False.
-
classmethod
post.
not_
(condition)¶ Returns
not condition(job)
for the provided condition function.
-
classmethod
post.
true
(key)¶ True if the specified key is present in the job document and evaluates to True.
-
FlowProject.
pre
(tag=None)¶ Specify a function of job that must be true for this operation to be eligible for execution. For example:
@Project.operation @Project.pre(lambda job: not job.doc.get('hello')) def hello(job): print('hello', job) job.doc.hello = True
The hello-operation would only execute if the ‘hello’ key in the job document does not evaluate to True.
An optional tag may be associated with the condition. These tags are used by
detect_operation_graph()
when comparing conditions for equality. The tag defaults to the bytecode of the function.
-
classmethod
pre.
after
(*other_funcs)¶ True if and only if all post conditions of other operation-function(s) are met.
-
classmethod
pre.
always
(func)¶ Returns True.
Deprecated since version 0.9: This will be removed in 0.11. This condition decorator is obsolete.
-
classmethod
pre.
copy_from
(*other_funcs)¶ True if and only if all pre conditions of other operation-function(s) are met.
-
classmethod
pre.
false
(key)¶ True if the specified key is present in the job document and evaluates to False.
-
classmethod
pre.
isfile
(filename)¶ True if the specified file exists for this job.
-
classmethod
pre.
never
(func)¶ Returns False.
-
classmethod
pre.
not_
(condition)¶ Returns
not condition(job)
for the provided condition function.
-
classmethod
pre.
true
(key)¶ True if the specified key is present in the job document and evaluates to True.
@flow.cmd¶
-
flow.
cmd
(func)[source]¶ Specifies that
func
returns a shell command.If this function is an operation function defined by
FlowProject
, it will be interpreted to return a shell command, instead of executing the function itself.For example:
@FlowProject.operation @flow.cmd def hello(job): return "echo {job._id}"
@flow.with_job¶
-
flow.
with_job
(func)[source]¶ Specifies that
func(arg)
will usearg
as a context manager.If this function is an operation function defined by
FlowProject
, it will be the same as usingwith job:
.For example:
@FlowProject.operation @flow.with_job def hello(job): print("hello {}".format(job))
Is equivalent to:
@FlowProject.operation def hello(job): with job: print("hello {}".format(job))
This also works with the @cmd decorator:
@FlowProject.operation @with_job @cmd def hello(job): return "echo 'hello {}'".format(job)
Is equivalent to:
@FlowProject.operation @cmd def hello_cmd(job): return 'trap "cd `pwd`" EXIT && cd {} && echo "hello {job}"'.format(job.ws)
@flow.directives¶
-
class
flow.
directives
(**kwargs)[source]¶ Decorator for operation functions to provide additional execution directives.
Directives can for example be used to provide information about required resources such as the number of processes required for execution of parallelized operations.
In addition, you can use the @directives(fork=True) directive to enforce that a particular operation is always executed within a subprocess and not within the Python interpreter’s process even if there are no other reasons that would prevent that. .. note:
Setting `fork=False` will not prevent forking if there are other reasons for forking, such as a timeout.
flow.run()¶
-
flow.
run
(parser=None)[source]¶ Access to the “run” interface of an operations module.
Executing this function within a module will start a command line interface, that can be used to execute operations defined within the same module. All top-level unary functions will be interpreted as executable operation functions.
For example, if we have a module as such:
# operations.py def hello(job): print('hello', job) if __name__ == '__main__': import flow flow.run()
Then we can execute the
hello
operation for all jobs from the command like like this:$ python operations.py hello
Note
You can control the degree of parallelization with the
--np
argument.For more information, see:
$ python operations.py --help
flow.init()¶
flow.get_environment()¶
-
flow.
get_environment
(test=False, import_configured=True)[source]¶ Attempt to detect the present environment.
This function iterates through all defined
ComputeEnvironment
classes in reversed order of definition and returns the first environment where theis_present()
method returns True.Parameters: test (bool) – Whether to return the TestEnvironment. Returns: The detected environment class.