Tips and Tricks

This is a collection of miscellaneous tips that might help new users avoid common mistakes or provide useful hints to more experienced users.

How to design a good schema.

There is really no good answer on how to generally design a good schema because it is heavily dependent on the domain and the specific application. Nonetheless, there are some basic rules worth following:

  1. Be descriptive. Although we are using short variable names in the tutorial, in general metadata keys should be as long as necessary for a third party to understand their meaning without needing to ask someone.
  2. Any parameter which is likely to be varied at some point during the study should be part of the metadata right from the start to avoid needing to modify the schema later.
  3. Take advantage of grouping keys! The job metadata mapping may be nested, just like any other Python dict.
  4. Even if you don’t use “official” schemas, consider to work out standardized schemas among your peers or with your collaborators.

Do not replicate job metadata in file names.

Many users, especially those new to signac, fall into the trap of storing metadata in filenames within a job’s workspace even though that metadata is already encoded in the job itself.

Using the Tutorial project as an example, we might have stored the volume corresponding to the job at pressure 4 in a file called volume_pressure_4.txt. However, this is completely unnecessary since that information can already be accessed through the job via job.sp.p. Furthermore, creating files this way causes additional complications, such as the need to modify filenames whenever we operate on the data space. For example, extracting the volume from a particular job originally consisted of doing this:

volume = float(open(job.fn('volume.txt')).read())

Now, we instead need to adjust the filename for each job:

volume = float(open(job.fn('volume_pressure_{}.txt'.format(job.sp.p))).read())

In general, it is desirable to keep the filenames across the workspace as uniform as possible.

Do not hard-code job ids in your scripts.

While it is perfectly fine to copy & paste job ids during interactive work or for small tests, hard-coded job ids within code are almost always a bad sign. One of the main advantages of using signac for data management is that the schema is flexible and may be migrated at any time without too much hassle. That also means that existing ids will change and scripts that used them in a hard-coded fashion will fail.

Whenever you find yourself hard-coding ids into your code, consider replacing it with a function that uses the find_jobs() function instead.