Machine Learning in Computational Biology (MLCB) 2014 @ Montreal
A workshop at the Annual Conference on Neural Information Processing Systems (NIPS 2014) @ Montreal, Date: Saturday Dec 13, Room: TBD
- Deadline extended: submission due Oct 23, 2014, 11:59pm (time zone of your choice)
Submission deadline: Oct 22, 2014, 11:59pm (time zone of your choice)
- Decision notifications: Nov 4, 2014 (tentative)
- Workshop room: TBD
The field of computational biology has seen dramatic growth over the past few years, in terms of new available data, new scientific questions, and new challenges for learning and inference. In particular, biological data are often relationally structured and highly diverse, well-suited to approaches that combine multiple weak evidence from heterogeneous sources. These data may include sequenced genomes of a variety of organisms, gene expression data from multiple technologies, protein expression data, protein sequence and 3D structural data, protein interactions, gene ontology and pathway databases, genetic variation data (such as SNPs), cell images, and an enormous amount of textual data in the biological and medical literature. New types of scientific and clinical problems require the development of novel supervised and unsupervised learning methods that can use these growing resources. Furthermore, next generation sequencing technologies are yielding terabyte scale data sets that require novel algorithmic solutions.
The goal of this workshop is to present emerging problems and machine learning techniques in computational biology. We invite contributed talks on novel learning approaches in computational biology. We encourage contributions describing either progress on new bioinformatics problems or work on established problems using methods that are substantially different from standard approaches. Kernel methods, graphical models, feature selection, and other techniques applied to relevant bioinformatics problems would all be appropriate for the workshop. The targeted audience are people with interest in learning and applications to relevant problems from the life sciences.
- Mark Gerstein, Yale University (USA)
- Title: Comparative Genome Analysis
- Abstract: The ENCODE and modENCODE consortia have generated a resource containing large amounts of transcriptomic data, extensive mapping of chromatin states, as well as the binding locations of over 300 transcription-regulatory factors for human, worm and fly. The consortium performed extensive data integration on this data set. Here I will give an overview of the data and some of the key analyses. In particular: (1) Conservation & Divergence of Transcription (1a) A novel cross-species clustering algorithm to integrate the co-expression networks of the three species, resulting in conserved modules shared between the organisms. These modules are enriched in developmental genes and exhibited hourglass behavior. (1b) The extent of the non-coding, non-canonical transcription is consistent between worm, fly and human. (1c) In contrast, analyses of pseudogene (fossil genes) show that they diverged greatly between the organisms, much more so than genes. Nevertheless, they had a consistent amount of residual transcription. (2) Conservation of Regulation (2a) A global optimization algorithm to examine the hierarchical organization of the regulatory network. Despite extensive rewiring of binding targets, high-level organization principles such as a three-layer hierarchy are conserved across the three species. (2b) The gene expression levels in the organisms, both coding and non-coding, can be predicted consistently based on their upstream histone marks. In fact, a "universal model" with a single set of cross-organism parameters can predict expression level for both protein coding genes and ncRNAs.
- Anshul Kundaje, Stanford University (USA)
- Title: Three-dimensional regulation of gene expression across cell types and individuals
- Abstract:Networks of long-range chromatin interactions between dynamic gene proximal and distal regulatory elements are responsible for the diverse patterns of gene expression across cell types and tissues. Experiments probing genome-wide chromatin interactions are expensive and result in low-resolution, noisy contact maps. First, we present a complementary computational approach based on a novel generative probabilistic model that leverages modular co-dynamics of activity of regulatory elements and genes across a diverse panel of 56 human cell types and tissues from the Roadmap Epigenomics and ENCODE consortia to predict tissue-specific long-range regulatory interactions. The accuracy and tissue-specificity of our predictions are strongly validated by experimental data. Interacting pairs of regulatory elements are enriched for transcription factor motif pairs, providing mechanistic insights for looping interactions through protein-protein interactions. Networks of interacting elements with similar motif composition regulating common genes are depleted of disease-associated variants, suggesting buffering mechanisms for increased robustness. Next, we investigate whether non-coding genetic variants can influence regulatory activity of local and distal regulatory elements and genes through long-range interactions. By integrating genetic, regulatory and expression variation data across lymphoblastoid lines from 76 sequenced individuals with experimentally derived long-range chromatin contact maps, we show for the first time extensive coordinated co-variation of regulatory chromatin activity and expression initiated by genetic variants affecting transcription factor binding at regulatory elements and propagated by long-range physical interaction networks. These regulatory variants are also enriched for association with immune diseases and specific cancers. Collectively, these studies provide novel insights into the three-dimensional regulatory architecture of the human genome and serve as important resources for interpreting the regulatory impact of natural and disease-associated non-coding variants.
|8:25-8:30||Introduction and Welcome|
|8:30-8:50||Using Deep Learning to Predict Variable Polyadenylation. Michael Leung and Brendan Frey.|
|8:50-9:10||Ensemble Learning Based Sparse High-Order Boltzmann Machine for Unsupervised Feature Interaction Identification. Martin Renqiang Min, Xia Ning, Yanjun Qi, Chao Cheng, Anthony Bonner and Mark Gerstein|
|9:10-10:00||Invited talk: Mark Gerstein. Comparative Genome Analysis.|
|10:30-10:50||Analysis of cryptic splice sites with deep convolutional networks. Hannes Bretschneider, Babak Alipanahi, Leo J. Lee and Brendan J. Frey|
|10:50-11:10||A Bayesian nonparametric statistical framework for haplotype phasing. Derek Aguiar, Lloyd T. Elliott, Yee Whye Teh and Barbara Engelhardt.|
|11:10-11:30||Efficient multi-task Gaussian process models for association testing of sets of genetic variants in non-IID sample structured data. Francesco Paolo Casale, Barbara Rakitsch, Christoph Lippert and Oliver Stegle.|
|11:30-11:50||Inferring Edges in Biological Networks Using Trees. Loïc Schwaller, Michael Stumpf and Stéphane Robin.|
|3:00-3:20||Bayesian Tree Priors for Clonal Reconstruction of Tumors. Amit Deshwar, Shankar Vembu and Quaid Morris.|
|3:20-3:40||SPARROW: Identifying expression drivers in cancer expression data. Benjamin Logsdon, Andrew Gentles, Chris Miller, C. Anthony Blau, Pamela Becker and Su-In Lee.|
|3:40-4:30||Invited talk: Anshul Kundaje. Three-dimensional regulation of gene expression across cell types and individuals.|
|5:00-5:20||Data-Driven Mortality Prediction for Trauma Patients. Yuanyang Zhang, Bernie Daigle, Lisa Ferrigno, Mitchell Cohen and Linda Petzold.|
|5:20-5:40||Microscopic Advances with Large-Scale Learning: Stochastic Optimization for Cryo-EM. Ali Punjani and Marcus Brubaker.|
|5:40-6:00||Toward computational cumulative biology by combining models of biological datasets. Jaakko Peltonen, Ali Faisal, Elisabeth Georgii, Johan Rung and Samuel Kaski.|
|6:00-6:30||Wrap up and closing remarks by organizers|
Researchers interested in contributing should upload an extended abstract of 4 pages in PDF format to the MLCB submission web siteby Oct 22, 2014, 11:59pm (
No special style is required. Authors may use the NIPS style file, but are also free to use other styles as long as they use standard font size (11 pt) and margins (1 in).
Submissions should be suitably anonymized and meet the requirements for double-blind reviewing.
All submissions will be anonymously peer reviewed and will be evaluated on the basis of their technical content. A strong submission to the workshop typically presents a new learning method that yields new biological insights, or applies an existing learning method to a new biological problem. However, submissions that improve upon existing methods for solving previously studied problems will also be considered. Examples of research presented in previous years can be found online
The workshop allows submissions of papers that are under review or have been recently published in a conference or a journal. This is done to encourage presentation of mature research projects that are interesting to the community. The authors should clearly state any overlapping published work at time of submission, and should not anonymize their paper in that case.
- Anna Goldenberg, SickKids Research Institute program of Genetics and Genome Biology (Canada)
- Sara Mostafavi, University of British Columbia (Canada)
- Oliver Stegle, EMBL (UK)
- Su-In Lee, University of Washington, Seattle (USA)
- Martin Min, NEC Labs, Princeton (USA)
These pages are kindly hosted by the Rätschlab.