Papers and Publications, Department of Computer Science

Permanent URI for this collection


Recent Submissions

Now showing 1 - 2 of 2
  • Item
    Explicit programming strategies
    (Empirical Software Engineering, 2020-03-07) LaToza, Thomas D.; Arab, Maryam; Loksa, Dastyni; Ko, Amy J.
    Software developers solve a diverse and wide range of problems. While software engineering research often focuses on tools to support this problem solving, the strategies that developers use to solve problems are at least as important. In this paper, we offer a novel approach for enabling developers to follow explicit programming strategies that describe how an expert tackles a common programming problem. We define explicit programming strategies, grounding our definition in prior work both within software engineering and in other professions which have adopted more explicit procedures for problem solving. We then present a novel notation called Roboto and a novel strategy tracker tool that explicitly represent programming strategies and frame executing strategies as a collaborative effort between human abilities to make decisions and computer abilities to structure process and persist information. In a formative evaluation, 28 software developers of varying expertise completed a design task and a debugging task. We found that, compared to developers who are free to choose their own strategies, developers given explicit strategies experienced their work as more organized, systematic, and predictable, but also more constrained. Developers using explicit strategies were objectively more successful at the design and debugging tasks. We discuss the implications of Roboto and these findings, envisioning a thriving ecosystem of explicit strategies that accelerate and improve developers’ programming problem solving.
  • Item
    Effective Automated Feature Construction and Selection for Classification of Biological Sequences
    (Public Library of Science, 2014-07-17) Kamath, Uday; De Jong, Kenneth; Shehu, Amarda
    Background Many open problems in bioinformatics involve elucidating underlying functional signals in biological sequences. DNA sequences, in particular, are characterized by rich architectures in which functional signals are increasingly found to combine local and distal interactions at the nucleotide level. Problems of interest include detection of regulatory regions, splice sites, exons, hypersensitive sites, and more. These problems naturally lend themselves to formulation as classification problems in machine learning. When classification is based on features extracted from the sequences under investigation, success is critically dependent on the chosen set of features. Methodology We present an algorithmic framework (EFFECT) for automated detection of functional signals in biological sequences. We focus here on classification problems involving DNA sequences which state-of-the-art work in machine learning shows to be challenging and involve complex combinations of local and distal features. EFFECT uses a two-stage process to first construct a set of candidate sequence-based features and then select a most effective subset for the classification task at hand. Both stages make heavy use of evolutionary algorithms to efficiently guide the search towards informative features capable of discriminating between sequences that contain a particular functional signal and those that do not. Results To demonstrate its generality, EFFECT is applied to three separate problems of importance in DNA research: the recognition of hypersensitive sites, splice sites, and ALU sites. Comparisons with state-of-the-art algorithms show that the framework is both general and powerful. In addition, a detailed analysis of the constructed features shows that they contain valuable biological information about DNA architecture, allowing biologists and other researchers to directly inspect the features and potentially use the insights obtained to assist wet-laboratory studies on retainment or modification of a specific signal. Code, documentation, and all data for the applications presented here are provided for the community at​s.