Identifying Prophages in Bacterial Genomes

Finding prophages in microbial genomes remains a problem with no definitive answer. The majority of existing tools rely on detecting genomic regions enriched in proteins with known phage homologs, which hinders the de novo discovery of phage regions. In this study, a weighted phage detection algorithm, Phage_detector was developed based on seven distinctive characteristics of prophages i.e. protein length, transcription strand directionality, customized AT and GC skew, the abundance of unique phage words, phage insertion points and the similarity of phage proteins. The first five characteristics are capable of identifying prophages without any sequence similarity with known phage genes. Phage_detector locates prophages by ranking genomic regions enriched in distinctive phage traits, which leads to the successful prediction of 92% of prophages (including 33 previously unidentified prophages) in 95 complete bacterial genomes with 8% false negative and 18% false positive.