During the course of our studies on the C. thermocellum genome, we observed the presence of several family-3 CBMs (CBM3s) that were portions of polypeptides annotated as ‘hypothetical proteins’ or ‘membrane-associated proteins’. More extensive bioinformatic analysis of these hypothetical proteins indicated possible homology to membrane-associated anti-σ factors. Following this initial cryptic identification, systematic analysis of public nucleotide and protein databases revealed that C. thermocellum genomes
(from three strains) contain a unique set of multiple ORFs resembling both Bacillus subtilis sigI and rsgI genes that encode an alternative σI factor Selleckchem Raf inhibitor and its negative membrane-associated regulator RsgI, respectively (Asai et al., 2007). In this communication, we present data on the genomic organization of sigI- and rsgI-like genes in C. thermocellum ATCC 27405 and provide a preliminary functional analysis of three of the carbohydrate-binding C-terminal domains originating from the RsgI-like proteins. Sequence entries, primary analyses and ORF searches were performed using the National Center for Biotechnology Information server
ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and the clone manager Depsipeptide chemical structure 7 program (Scientific & Educational Software, Durham, NC). The B. subtilis SigI and RsgI deduced amino acid sequences
(accession numbers NP_389228 and NP_389229, respectively) have been used as blast (Altschul et al., 1997) queries to mine public databases including those at the Joint Genome Institute (JGI) (http://genome.jgi-psf.org/). The C. thermocellum genome databases of strains ATCC 27405, DSM 2360 (LQR1) and DSM 4150 (JW20, ATCC 31549) Adenosine were analyzed using the JGI blast servers (http://genome.jgi-psf.org/cloth/cloth.home.html), (http://genome.jgi-psf.org/clotl/clotl.home.html) and (http://genome.jgi-psf.org/clotj/clotj.home.html), respectively. CBM and glycoside hydrolase (GH) domains were identified using the CAZy (Carbohydrate-Active EnZymes) website (Cantarel et al., 2008) (http://www.cazy.org/), Simple Modular Architecture Tool (SMART) (Letunic et al., 2004) (http://smart.embl-heidelberg.de/), the Pfam protein families database (Finn et al., 2010) (http://pfam.sanger.ac.uk), integrated resource of Protein Domains (InterPro) (Hunter et al., 2009) (http://www.ebi.ac.uk/interpro/) and the database of protein families and domains PROSITE (Sigrist et al., 2010) (http://www.expasy.ch/prosite/) and the SUPERFAMILY database of structural and functional annotation for all proteins and genomes (Gough et al., 2001).