Richards, DanaYoun, Inja2018-05-252018-05-25https://hdl.handle.net/1920/10977In the past few years, there has been a growing need for accurate geolocation of IP addresses, which is now a must-have feature of many Internet applications. Automated geolocation of IP addresses has important applications, including targeted delivery of localized content over Internet (news, weather, advertising, restriction of localized content based on regional policies, etc.), prevention of Internet crimes (credit card and bank fraud, identity theft, spam, phishing, etc.), detection and prevention of cyberattacks and cyberterrorism, etc. The current geolocation algorithms can be divided into several classes according to the data that is used for determining the geographic location: database-based (which use a database of mappings between Internet prefixes and their corresponding geographical locations), pure-delay based (which take as input is the round trip delay of the probing hosts which are called landmarks), location-delay based (which use the information about both the geographical location and the probing hosts), supplementary information based (which in addition to delay and geographical location, use other available information, such as DNS parsing, geographical and demographical data, etc.). However, use of network delay time for geolocation has proved not very reliable in the past, because of the non-linear correlation between distances and delays generated by the network congestion, queuing delay and circuitous routes. This thesis brings important advancements to two classes of geolocation methods. The first advancement is a family of pure delay-based algorithms based on a general class of proximity measures. When such measures are carefully chosen to discard the data which contains little information about the geographical location of a target IP address, the resulting algorithms have improved accuracy over the existing pure-delay based schemes. The second advancement, belonging to the location-delay based class of algorithms, is the development of a statistical geolocation scheme based on the application of kernel density estimation to delay measurements amongst a set of landmarks. An estimate of the target IP location is then obtained by maximizing the likelihood of the distances from the target to the landmarks, given the measured delays. This is achieved by an algorithm which combines gradient ascent and force-directed methods. We compare the proposed geolocation schemes with the previous methods by developing a measurement framework based on PlanetLab infrastructure and we compare the experimental geolocation error for the proposed algorithms compared with that for the existing schemes. We find the proposed geolocation algorithms have superior accuracy to the previously developed ones.enGeolocationInternet measurementStatistical geolocationProximity measureDelay-Based Methods for Robust Geolocation of Internet HostsDissertation