Manipulating Comprehensibility of Text: An Automated Approach to Generate Deceptive Documents for Cyber Defense



Journal Title

Journal ISSN

Volume Title



Existing approaches to cyber defense such as access control have been inadequate at defending the targets from ever increasing exfiltration of intellectual property. Cyber deception is one of many solutions that can protect critical documents from advanced cyber attackers. Some cyber deception solutions require the generation and deployment of fake documents, called “honeyfiles”, which can deceive cyber attackers. Fake documents can be generated by manipulating the comprehensibility of a given document, as hard to comprehend fake documents can mislead an attacker and waste his cognitive efforts. However, generating such hard to comprehend fake documents is challenging as 1) existing research is limited in quantifying and manipulating the comprehensibility of a given technical document. 2) It requires the generated documents to be believable for the attackers, so they curiously interact with the fake documents. Existing research has investigated several techniques to automatically generate fake documents, however, they do not generate believable fake documents. In this work, we design and evaluate a novel Comprehensibility Manipulation Framework (CMF) that provides a platform for generating hard to comprehend, believable fake documents. Our framework includes different components to measure, manipulate, and evaluate the comprehensibility and believability of text. In our framework, we first define novel quantitative comprehensibility measures of text based on principles of reading comprehension: “sequentiality”, “connectivity”, and “dispersion”. Second, we design manipulation algorithms based on “Addition”, “Deletion”, and “Shuffling” operations to effectively manipulate the occurrences of sentences and concepts in a given technical document to generate fake documents that are hard to comprehend. Third, we design and optimize the selection and application of our manipulation algorithms using the genetic algorithm framework to generate fake documents. Fourth, we design algorithms to improve the believability of generated fake documents by enhancing its “cohesion” and “coherence”. Finally, we conduct task-based evaluation of our algorithms using statistical tests and user-studies on reading comprehension and believability tests. We compare the original and fake texts based on the metrics of accuracy-of-answer, effort-to-task, and effort-to-answer in the reading comprehension tests and probability-to-decipher fake documents in the believability tests. Our extensive experiments demonstrate that our methods to generate hard to comprehend, believable fake documents can improve cyber deception solutions.