Information Sciences and Technology Faculty Research

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 3 of 3
  • Item
    Toward Understanding Civic Data Bias in 311 Systems: An Information Deserts Perspective
    (CSCW Workshop on Civic Technologies: Research, Practice, and Open Challenges, 2020-10) Lee, Myeong; Wang, Jieshu; Johnston, Erik; Harlow, John; Gordon, Eric; Janzen, Shawn; Winter, Susan
    While civic technologies for public issues and services such as 311 systems are widely adopted in many U.S. cities, the impact of the emerging civic technologies and their data-level dynamics are unclear. Because the provision patterns of civic issues to technological systems are different across neighborhoods and populations, it is difficult for city officials to understand whether the provided data itself reflects civic issues. Also, the disparities in the information provided to civic technologies in different neighborhoods may exacerbate the existing inequality. To understand how civic data is created and how people's use of civic technologies plays a role as an intermediary process in shaping community performances, we take an information deserts perspective in studying 311 systems. The concept of information deserts is informed by a material understanding of local information landscapes, making it possible to distinguish local information's structural features from its social-construction process. Based on this theoretical lens, we suggest new opportunities for civic technology and data research.
  • Item
    A Technical Report on Real-Estate Rent Prediction
    (2017) Rafatirad, Setareh
    Real-estate rent prediction is sensitive to several independent parameters and has allured a lot of researchers in the past few years to constructing automated tools using (ML) commodities. However, most of the proposed solutions are limited in scope, and are only investigated on a particular locality, house type, or based on one type of machine learning algorithm. Furthermore, the past work often used synthetic data which can compromise the accuracy of the output, as it is not closely identical to real-world datasets. To address these challenges, we study a wide range of Machine Learning techniques applied to three real-estate housing types, using real-world data. Unlike prior work which attempt to develop a one-size-fits-all model with fixed set of features, our study shows that the important parameters for rent prediction depends highly on the type and locality. Further, for each property type, there is a different winning algorithm to perform rent prediction. Accordingly, we construct multiple rent prediction models using a large Zillow dataset of 50K real estate properties in the state of Virginia and Maryland. In addition to Zillow, external attributes such as walk/transit score, and crime rate are collected from online sources. Our comprehensive case study indicates that real-estate rent behavior strongly depends on the type of house and locality. As such, we deploy a two-layer clustering approach to partition data into multiple training sets based on house-type and similar zip codes. We evaluate and report the performance of the prediction models studied in this work based on two metrics of R-squared and Mean Absolute Error, applied on unseen data.
  • Item
    Malware Detection using Federated Learning based on HPC Events and Localized Image features
    (2019) Shukla, Sanket; Kolhe, Gaurav; Homayoun, Houman; Manoj P D, Sai; Rafatirad, Setareh
    Malware is a global threat and it has seen a tremendous increase as well as diversity which made threat detection and analysis a pivotal challenge to address. The increasing diversity in the malware syntax and behavior is some of the basic challenges to address for efficient malware detection. Thus, an efficient detection requires knowledge of different threats across the globe. However, it is impractical to have a signature-based detection or maintain a database with all malware signatures or syntax. To address these challenges, we propose a federated learning(FL)-based framework that aids to learn the threat features and characteristics irrespective of its origin and without breaching users' data or privacy for an enhanced and robust security of billions of devices across the globe against malware.The federated learning (FL) model obtains the models from a selected set of devices to determine the model parameters required for efficient detection of heterogeneous malware types. Further, one model that encompasses of knowledge from different models obtained from different devices is emerged, which will be further broadcasted to the individual device for efficient malware detection, despite a given device has previously encountered or trained with characteristics of the malware. For the individual devices, we deploy a two-pronged malware detection technique.In first prong, we extract the microarchitectural traces obtained while executing the application to detect traditional malware and in second prong, we introduce an automated localized feature extraction technique to detect obfuscated malware.With the proposed FL framework, we achieved 91% malware detection accuracy, irrespective of training data used at device-level. Furthermore, the proposed framework achieves up to 11% higher detection accuracy compared to the existing malware detection techniques.