The FlowProject¶
This chapter describes how to setup a complete workflow via the implementation of a FlowProject
.
Setup and Interface¶
To implement a more automated workflow, we can subclass a FlowProject
:
# project.py
from flow import FlowProject
class MyProject(FlowProject):
pass
if __name__ == '__main__':
MyProject().main()
Tip
You can generate boiler-plate templates like the one above with the $ flow init
function.
There are multiple different templates available via the -t/--template
option.
Executing this script on the command line will give us access to this project’s specific command line interface:
$ python project.py
usage: project.py [-h] {status,next,run,script,submit} ...
Note
You can have multiple implementations of FlowProject
that all operate on the same signac project!
This may be useful, for example, if you want to implement two very distinct workflows that operate on the same data space.
Simply put those in different modules, e.g., project_a.py
and project_b.py
.
Classification¶
The FlowProject
uses a classify()
method to generate labels for a job.
A label is a short text string, that essentially represents a condition.
Following last chapter’s example, we could implement a greeted
label like this:
# project.py
from flow import FlowProject
from flow import staticlabel
class MyProject(FlowProject):
@staticlabel()
def greeted(job):
return job.isfile('hello.txt')
# ...
Using the staticlabel
decorator turns the greeted()
function into a function, which will be evaluated for our classification.
We can check that by executing the hello
operation for a few job and then looking at the project’s status:
$ python operations.py hello 0d32 2e6
hello 0d32543f785d3459f27b8746f2053824
hello 2e6ba580a9975cf0c01cb3c3f373a412
$ python project.py status --detailed
Status project 'MyProject':
Total # of jobs: 10
label progress
------- ----------
greeted |########--------------------------------| 20.00%
Detailed view:
job_id S next_op labels
-------------------------------- --- --------- --------
0d32543f785d3459f27b8746f2053824 U greeted
14fb5d016557165019abaac200785048 U
2af7905ebe91ada597a8d4bb91a1c0fc U
2e6ba580a9975cf0c01cb3c3f373a412 U greeted
42b7b4f2921788ea14dac5566e6f06d0 U
751c7156cca734e22d1c70e5d3c5a27f U
81ee11f5f9eb97a84b6fc934d4335d3d U
9bfd29df07674bc4aa960cf661b5acd2 U
9f8a8e5ba8c70c774d410a9107e2a32b U
b1d43cd340a6b095b41ad645446b6800 U
Abbreviations used:
S: status
U: unknown
Determine the next-operation¶
Next, we should tell the project, that the hello()
operation is to be executed, whenever the greeted
condition is not met.
We achieve this by adding the operation to the project:
class MyProject(FlowProject):
def __init__(self, *args, **kwargs):
super(MyProject, self).__init__(*args, **kwargs)
self.add_operation(
name='hello',
cmd='python operations.py hello {job._id}',
post=[MyProject.greeted])
Let’s go through the individual arguments of the add_operation()
method:
The name
argument is arbitrary, but must be unique for all operations part of the project’s workflow.
It simply helps us to identify the operation without needing to look at the full command.
The cmd
argument actually determines how to execute the particular operation, ideally it should be a function of job.
We can construct the cmd
either by using formatting fields, as shown above.
We can use any attribute of our job instance, that includes state points (e.g. job.sp.a
) or the workspace directory (job.ws
).
The command is later evaluated like this: cmd.format(job=job)
.
Alternatively, we can define a function that returns a command or script, e.g.:
# ...
self.add_operation(
name='hello',
cmd=lambda job: "python operations.py hello {}".format(job),
post=[MyProject.greeted])
Finally, the post
argument is a list of unary condition functions.
Definition:
A specific operation is eligible for execution, whenever all pre-conditions (pre
) are met and at least one of the post-conditions (post
) is not met.
In this case, the hello
operation will only be executed, when greeted()
returns False
; we can check that again by looking at the status:
$ python project.py status --detailed
Status project 'MyProject':
Total # of jobs: 10
label progress
------- -------------------------------------------------
greeted |########--------------------------------| 20.00%
Detailed view:
job_id S next_op labels
-------------------------------- --- --------- --------
0d32543f785d3459f27b8746f2053824 U greeted
14fb5d016557165019abaac200785048 U ! hello
2af7905ebe91ada597a8d4bb91a1c0fc U ! hello
2e6ba580a9975cf0c01cb3c3f373a412 U greeted
42b7b4f2921788ea14dac5566e6f06d0 U ! hello
751c7156cca734e22d1c70e5d3c5a27f U ! hello
81ee11f5f9eb97a84b6fc934d4335d3d U ! hello
9bfd29df07674bc4aa960cf661b5acd2 U ! hello
9f8a8e5ba8c70c774d410a9107e2a32b U ! hello
b1d43cd340a6b095b41ad645446b6800 U ! hello
Abbreviations used:
!: requires_attention
S: status
U: unknown
Running project operations¶
Similar to the run()
interface earlier, we can execute all pending operations with the python project.py run
command:
$ python project.py run
hello 42b7b4f2921788ea14dac5566e6f06d0
hello 2af7905ebe91ada597a8d4bb91a1c0fc
hello 14fb5d016557165019abaac200785048
hello 751c7156cca734e22d1c70e5d3c5a27f
hello 9bfd29df07674bc4aa960cf661b5acd2
hello 81ee11f5f9eb97a84b6fc934d4335d3d
hello 9f8a8e5ba8c70c774d410a9107e2a32b
hello b1d43cd340a6b095b41ad645446b6800
Again, the execution is automatically parallelized.
Let’s remove a few random hello.txt
files to regain pending operations:
$ rm workspace/2af7905ebe91ada597a8d4bb91a1c0fc/hello.txt
$ rm workspace/9bfd29df07674bc4aa960cf661b5acd2/hello.txt
Generating Execution Scripts:¶
Using the script
command, we can generate an operation execution script based on the pending operations, which might look like this:
$ python project.py script
---- BEGIN SCRIPT ----
set -u
set -e
cd /Users/johndoe/my_project
# Statepoint:
#
# {{
# "a": 4
# }}
python operations.py hello 2af7905ebe91ada597a8d4bb91a1c0fc &
wait
---- END SCRIPT ----
---- BEGIN SCRIPT ----
set -u
set -e
cd /Users/johndoe/my_project
# Statepoint:
#
# {{
# "a": 0
# }}
python operations.py hello 9bfd29df07674bc4aa960cf661b5acd2 &
wait
---- END SCRIPT ----
These scripts can be used for the execution of operations directly, or they could be submitted to a cluster environment for remote execution. For more information about how to submit operations for execution to a cluster environment, see the Cluster Submission chapter.
Full Demonstration¶
The screencast below is a complete demonstration of all steps:
Checkout the next chapter for a guide on how to submit operations to a cluster environment.