Automated Test Case Generator for Phishing Prevention Using Generative Grammars and Discriminative Methods




Palka, Sean

Journal Title

Journal ISSN

Volume Title



This research details a methodology designed for creating content in support of various phishing prevention tasks including live exercises and detection algorithm research. Our system uses probabilistic context-free grammars (PCFG) and variable interpolation as part of a multi-pass method to create diverse and consistent phishing email content on a scale not achieved in previous research. This system, which we have named PhishGen, is capable of generating a large amount of unique content that can be used in live exercises, or alternatively used to build training datasets for phishing detection methods and filter settings. PhishGen is a web-based application that implements our underlying methodology to provide a user-interface for building and modifying PCFG rules and weights. The system is released as an open-source tool in order to allow access to other researchers. PhishGen has already been used in support of live commercial phishing exercises and is in the process of being utilized for content development for commercial frameworks.



Information technology, Cyber Security, Generative Grammars, Natural Language Processing, Phishing