So you want to allow users to upload files to your server? This can be dangerous, very quickly someone will upload a malicious PHP script that allows them access to the directories of your web applications.
Here are some tips and tricks to aid in the safety of your server. We use all of these, and some others that are not included here so that the bad guys can’t figure out all of our security approaches!
At some point you will want to request a letter of recommendation from one of your Professors. Here are some tips that you should consider before you do so.[source: the original idea was on a poster I saw at Tulsa Community College, Oklahoma, but I have added advice from others too]
If you try to modify a file (removing all empty lines for example) using a command like:
cat file.txt | sed '/^$/d' > file.txt
you will end up with and empty file.txt. The reason is that bash parses the command line looking for “metacharacters” ( “|” , “>” and “space” in this case) that separate words, then groups and executes those words according to their precedence. This means that “> file.txt” get executed FIRST. This creates an empty file.txt (overwriting any existing file) and a “process” to redirect standard output to that file. Then “cat file.txt” get executed, but by now file.txt is empty. So “cat file.txt” outputs 0 lines, “sed ‘/^$/d’ ” deletes all 0 empty lines, and 0 lines get written to file.txt . This works as “intended” and bash outputs no error.
You can get around this using a temporal file.
cat file.txt | sed '/^$/d' > tmp_file.txt mv tmp_file.txt file.txt
But, as file.txt is technically a new file you might lose some information, in particular permissions and whether file.txt was originally a symbolic link or not.
Other options is to use sponge, which is part of moreutils and sadly not standard in many systems.
cat file.txt | sed '/^$/d' | sponge file.txt
We are pleased to announce the SoCal Bioinformatics Hackathon.
From 10-12 January, 2018, the NCBI will help run a bioinformatics hackathon in Southern California hosted by San Diego State University! The hackathon will focus on advanced bioinformatics analysis of next generation sequencing data, proteomics, and metadata. This event is for researchers, including students and postdocs, who have already engaged in the use of bioinformatics data or in the development of pipelines for bioinformatics analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.
Sitting in a master-class on ecology from Jennifer Martiny at UCI someone asked if you could explain ecological diversity in terms of ice cream. Of course you can.
Several people ask me about tips for learning new programming languages. Here, we talk about some of the broader concepts in learning a language.
For several years the NSF have been prototyping a spreadsheet based conflicts reporting system. The spreadsheet typically has the following fields:
|C||Name:||Organizational Affiliation||Optional (email, Department)||Last Active|
The problem is you need to make this file every time you submit a grant. Here is a somewhat trivial solution, but hopefully it will help you create this file.
A lot of software benefits from paired fastq files that contain mate pair information, and usually you get these from your sequence provider. However, sometimes (e.g. when you download them from the SRA) you get sequences that are not appropriately paired.
Recently, however, we’ve been handling very large files and the performance of these programs, (yes, even the lowmem version) is hindering our ability to process these files.
Therefore, we introduce fastq_pair, a C-implementation for pairing fastq files and sorting out which reads have matches in both files and which are singletons. This code starts with two fastq files and creates four output files. It is quick, and efficient, especially if you manipulate the size of the hash table (which you can do with a command line option).
It takes advantage of the random access ability to read files. We open a file and make an index of the ids in the file and the positions those indices occur in the file. Then, we read the second file, and if the IDs match, we scoot to the start of the appropriate line and write out those two sequences to the “pairs” files. We also set a flag in our data structure so we know that we’ve printed that sequence out. If the IDs don’t match, we write them to the “singles” file, and atthe end of all the processing we go through the IDs in our data structure and make print out those sequences we haven’t printed yet.
Take a look and give it a try!