In this study, we proposed a computational approach to investigate 1. 2 million somatic mutations across 36 cancer types from the COSMIC database and TCGA onto the protein pocket regions of over 5,000 3D protein struc tures. We seek to answer two overarching questions Do the somatic mutations located in protein pocket re gions tend to be actionable mutations and are those specific mutations more likely to be involved in tumori genesis and anticancer drug responses Through our sys tematic analyses, we showed that genes harboring protein pocket somatic mutations tend to be cancer genes. Fur thermore, genes harboring protein pocket somatic muta tions tend to be highly co expressed in the co expressed protein interaction network.
We identified four putative cancer genes, whose gene expression profiles were associated with over all poor survival rates in melanoma, lung, or colorectal cancer patients. Moreover, by integrating cancer cell line mutations and drug pharmacological data from the Cancer Cell Line Encyclopedia, we showed that those genes harboring protein pocket mutations are enriched in drug sensitivity genes. In a case study, we demon strated that a BAX gene with pocket mutations was sig nificantly associated with the drug responses of three anticancer drugs. Collectively, we unveiled that som atic mutations in protein pocket regions tend to be functionally important during tumorigenesis and sensi tive to anticancer drug responses. In summary, the protein pocket based prioritization of somatic muta tions provides a promising approach to uncover the putative cancer drivers and anticancer drug response biomarkers in the post genomic era for cancer preci sion medicine.
Methods Protein pocket information We downloaded a list of 5,371 PDB structures with pro tein pocket information from the Center for the Study of Systems Biology website at Georgia Institute of Technol ogy. This library contained Entinostat only non redundant, monomeric, single domain protein structures, measuring 40 to 250 residues in length and registering less than 35% global pair wise sequence identity. A pocket detec tion algorithm called LPC was applied to the PDB dataset to generate a set of 20,414 ligand binding protein pockets whose coordinates were given in each PDB file under the header PKT, which is an abbreviation for pocket. We first parsed out all 5,371 PDB files to obtain pocket residues and their PDB coordinates under the PKT header. Then, we used infor mation from the Structure Integration with Function, Tax onomy, and Sequence database to translate the PDB coordinates into UniProt coordinates. As of April 2014, approximately 100,000 3D structures have been added to the PDB database, including approximately 22,000 human protein and nucleic acid structures.