Author Archives: Rob Edwards

Diversity and Inclusion Statement

Diversity and Inclusion Statement

Diversity is a key element to the success of every organization and the tech community. Freedom of thought and the open exchange of ideas are key to an effective learning environment. That kind of exchange can happen only in an environment that recognizes the value of each person and fosters mutual respect. Rob Edwards and members of the Edwards Lab are committed to increasing and fostering a diverse community of colleagues.

Code of Conduct

We are dedicated to creating an inclusive environment for everyone, regardless of race, ethnicity, nationality, religion, skin color, sex, sexual orientation, gender identity, national origin, age, health (physical or mental), genetic information, parental status, marital status, political affiliation, veteran status, socioeconomic status or background, neuro(a)typicality, appearance, body size, computing experience, or clothing. Consider that calling attention to differences can feel alienating.

We do not tolerate harassment in any form towards any person. Harassment includes offensive verbal comments related to the protected classes above, sexual images in public spaces, deliberate intimidation, stalking, following, photography or audio/video recording against reasonable consent, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention (even without sexual contact). Harassment does not need to be recognized as unwanted or unwelcome by anyone other than the person being harassed. Be careful in the words that you choose. Remember that sexist, racist, and other exclusionary jokes can be offensive to those around you. Offensive jokes are not appropriate and will not be tolerated under any circumstances.

If you are being harassed, notice that someone else is being harassed, or have any other concerns, please immediately contact Dr. Rob Edwards

Harassment Policy

Harassment of any kind is not acceptable behavior. We are committed to creating an environment in which every individual can work, study, and live without being harassed.

Harassment on the basis of race, ethnicity, nationality, religion, skin color, sex, sexual orientation, gender identity, national origin, age, health (physical or mental), genetic information, parental status, marital status, political affiliation, veteran status, socioeconomic status or background, neuro(a)typicality, appearance, body size, computing experience, or clothing. It includes harassment of an individual in terms of a stereotyped group characteristic, or because of that person’s identification with a particular group.

Sexual harassment may take many forms. Sexual assault and requests for sexual favors that affect educational or employment decisions constitute sexual harassment. However, sexual harassment may also consist of unwanted physical contact, requests for sexual favors, visual displays of degrading sexual images, sexually suggestive conduct, or offensive remarks of a sexual nature.

We are committed under this policy to stopping harassment and associated retaliatory behavior.

Anyone who feels harassed is encouraged to seek assistance from Rob or his supervisors.


This Code of Conduct was adapted from the Make School Code of Conduct, which in turn was adapted from Hack Code of Conduct which was inspired by the Conference Code of Conduct. The harassment policy has been adapted from MIT’s Harassment Policy.

Licensed under Creative Commons 4.0 (Attribution, Share Alike).

Django logo

Publishing a Django Website behind a proxy server

We use proxy servers all the time: we have a main server (eg that serves applications (eg. but the application itself runs on different hardware than the webserver.

Here, we show how to host a Django project on a proxy server using the apache web server and make it accessible.

Continue reading

Submitting a PhiSpy update to pip and conda

First, make sure everything is upto date in GitHub.

We are going to call this release version 4.0 and we will have release candidates, starting at rc1

First, create a release on GitHub. Strictly speaking you don’t need to do that but it is a great thing to do.

PyPi Release

The PyPi instructions cover this, but I have abstracted out the parts we need to focus on (since we have a already!)

As a regular user we build everything. This make a new release that we will upload

python3 sdist

This will create the tarball and the wheel file in the dist directory. Then we need to upload those to PyPi.

We are going to use the PyPi test interface to make sure that everything is OK. Do not skip this step!

If you need an API key, navigate to the PyPi login page . However, if you have done this before, you probably don’t need to save it again 😉

python3 -m twine upload --repository testpypi dist/PhiSpy-4.0.0rc1.tar.gz

Note that you can not upload the wheel. Binary wheels from linux are not supported.

Now we are going to test it out. Lets make a virtual environment and install it there

virtualenv test_phispy
cd test_phispy
source bin/activate
which pip

This should tell you that the current pip is from your virtual environment. If it is not, solve that problem!

For PhiSpy, we have a couple of dependencies that you should install with regular pip before you can install your new release candidate:

pip3 install scikit-learn biopython

This will install other things like numpy that you need.

Now you can install your new release.

pip install -i PhiSpy==4.0.0rc1

If you are not sure exactly the URL, logging into the PyPi test login page will show your available repositories, including the newly uploaded repository. If you click on the version you want, you can get the link to download and install that.

Once you are happy and have run some tests, login to the real PyPi page (good to do anyway, even if you have an API key)

Now you can upload the final version to PyPi for everyone to access

python3 -m twine upload dist/PhiSpy-4.0.0.tar.gz

Its worth logging into the real PyPi page to make sure that you can download it!

Making a CONDA release

It turns out that for most code all you have to do is wait! The conda bots will take care of incrementing to the next version and running the continuous integration tests for you.

However, if you need to update the code manually, you probably need to change the version in meta.yaml and then you should update the SHA hash:

wget -O- | shasum -a 256

and then paste the output of that into the SHA field. In this case, the shasum should be



For a long time we ran the project PhAnToMe at the website

Alas, all great things come to and end, and it is with sadness that we are winding down the phage annotation tools and methods project. However, we have not given up and are still working on new challenges.

We have migrated most of our tools to the Edwards’ lab website, but if you can’t find anything, please let us know. We still have all our tools, we are just not maintaining any longer

Memory and Core Usage on SGE

We are running into issues with one of our applications requesting too much memory on the cluster. We need to set appropriate limits and ensure that the application knows how much memory it has available.

To begin, we need some code to test how many cores and memory the application thinks it has. The application that is causing us issues is written in Java, but we’re going to do this in python3 to ensure we can debug what is going on.

There are two python3 modules that you can use to test what is available. psutil (python system and process utilities) and resources (basic mechanisms for measuring and controlling system resources utilized by a program). The former gives us access to core system information, while the latter gives us access to available resources. Here is some code to print what is, or maybe, available. Before we start, a little helper function to convert bytes to human-readable format (from this SO post)

def sizeof_fmt(num, suffix='B'):
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f%s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f%s%s" % (num, 'Yi', suffix)

Here we figure out our host and important information:

hostname = socket.gethostname()
m = psutil.virtual_memory()
c = psutil.cpu_count()
t = sizeof_fmt(
a = sizeof_fmt(m.available)
(ms, mh) = resource.getrlimit(resource.RLIMIT_AS)
if ms > 0:
ms = sizeof_fmt(ms)
if mh > 0:
mh = sizeof_fmt(mh)
print(f"Running on: {hostname}")
print(f"Number of cpus: {c}")
print(f"Total memory: {t}\nAvailable memory: {a}")
print(f"Memory limit (ulimit): soft: {ms} hard: {mh}")

You maybe wondering why we used resource.RLIMIT_AS to get the virtual memory as opposed to resource.RLIMIT_VMEM. Linux systems don’t report RLIMIT_VMEM, and instead use RLIMIT_AS for address space.

Note that we are using both psutil and resources to get information, and they tell us different things. If I run these on my laptop, I see something like this:

Running on: Laptop
Number of cpus: 8
Total memory: 15.6GiB
Available memory: 7.0GiB
Memory limit (ulimit): soft: -1 hard: -1

Note that the memory limit is -1 for both hard and soft limits (from the resource man page: the soft limit is the current limit, and may be lowered or raised by a process over time. The soft limit can never exceed the hard limit. The hard limit can be lowered to any value greater than the soft limit, but not raised.) This value is actually the value of resource.RLIM_INFINITY and so may not be -1 in your case (but probably is)!

The equivalent information is pulled from /proc/cpuinfo or /proc/meminfo on a Linux system, and the memory limit comes from ulimit (see the man page)

So … how does this help us on the cluster. Lets try a few simple tests. I create a file called that basically just runs that python3 code above. When I submit it with default parameters, this is what I get

$ qsub -cwd -o mem.out -e mem.err ./

Running on: node15
At the start:
Number of cpus: 16
Total memory: 125.9GiB
Available memory: 124.1GiB
Memory limit (ulimit): soft: -1 hard: -1

On my cluster, node15 has 16 CPUs and 126 GiB RAM, but some of it is currently being used.

With SGE, you can pass a couple of parameters to adjust memory settings. If we restrict memory usage using the h_vmem setting, we see this answer:

$ qsub -cwd -o mem.out -e mem.err -l h_vmem=1G ./

Running on: node48
At the start:
Number of cpus: 16
Total memory: 125.9GiB
Available memory: 123.8GiB
Memory limit (ulimit): soft: 1.0GiB hard: 1.0GiB

In this case, adding the -l h_vmem option has limited the amount of resources available via ulimit, and has set both hard and soft limits.

In contrast, setting s_vmem sets the ulimit soft limit, but leaves the hard limit unchanged:

$ qsub -cwd -o mem.out -e mem.err -l s_vmem=2G ./

Running on: node47
At the start:
Number of cpus: 16
Total memory: 125.9GiB
Available memory: 123.8GiB
Memory limit (ulimit): soft: 2.0GiB hard: -1

Using Java?

Unfortunately, setting the limit on SGE using -l h_vmem causes Java to crash with a known bug. You will see an error like this:

Error occurred during initialization of VM 
Could not allocate metaspace: 1073741824 bytes

There is a work around, and on my cluster I have to set both of these:

First, export MALLOC_ARENA_MAX and ensure that your qsub inherits this variable (e.g. qsub -V)


Then append this Java option:


It got Java to run, but it would still crash if I was trying to do anything remotely complex.