Research Article: A Global User-Driven Model for Tile Prefetching in Web Geographical Information Systems

Date Published: January 13, 2017

Publisher: Public Library of Science

Author(s): Shaoming Pan, Yanwen Chong, Hang Zhang, Xicheng Tan, Houbing Song.


A web geographical information system is a typical service-intensive application. Tile prefetching and cache replacement can improve cache hit ratios by proactively fetching tiles from storage and replacing the appropriate tiles from the high-speed cache buffer without waiting for a client’s requests, which reduces disk latency and improves system access performance. Most popular prefetching strategies consider only the relative tile popularities to predict which tile should be prefetched or consider only a single individual user’s access behavior to determine which neighbor tiles need to be prefetched. Some studies show that comprehensively considering all users’ access behaviors and all tiles’ relationships in the prediction process can achieve more significant improvements. Thus, this work proposes a new global user-driven model for tile prefetching and cache replacement. First, based on all users’ access behaviors, a type of expression method for tile correlation is designed and implemented. Then, a conditional prefetching probability can be computed based on the proposed correlation expression mode. Thus, some tiles to be prefetched can be found by computing and comparing the conditional prefetching probability from the uncached tiles set and, similarly, some replacement tiles can be found in the cache buffer according to multi-step prefetching. Finally, some experiments are provided comparing the proposed model with other global user-driven models, other single user-driven models, and other client-side prefetching strategies. The results show that the proposed model can achieve a prefetching hit rate in approximately 10.6% ~ 110.5% higher than the compared methods.

Partial Text

Similar to most of time-based tasks, reducing resource consumption and improving the response speed are two key problems for a web geographical information system (GIS). Thus, many methods have been proposed to improve system performance on-the-fly, including PGSW-OS [1], MR-D [2], 2D-WDM [3], and LLLA [4], among others. Among these methods, PGSW-OS uses P2P (peer-to-peer) nodes to share resources to reduce the total resource consumption of the system. MR-D also uses a type of device-to-device (D2D) network to share the cellular spectrum. 2D-WDM is designed to improve network transmission efficiency by considering not only the time but also the wavelength in a wavelength-division multiplexed (WDM) system. LLLA provides a new method to improve the efficiency of channel assignment in wireless mesh networks(WMNs). Obviously, all these methods can be profitably applied to a system to reduce resource consumption and improve response speed. However, compared to improving the transmission efficiency of the system by optimizing network topology, improving channel efficiency or capacity and preparing data for servers (or users) in advance can also be employed to reduce the response delay due to slow disk I/O speeds.

Unlike traditional prefetching and caching, a user-driven model fetches tiles and stores them before they are requested based on mining their popularity, or their relationships, or the user’s navigation path—and all those methods are based on users’ behavior.

This model mines the correlation patterns of tiles based on their historical access-log information to prefetch data from the set of hotspot tiles and replace data in the high-speed cache buffer as needed. Investigating a typical example of access such as the sequence shown previously, it is clear that we can make some conclusions about tile correlations:

The proposed method includes two parameters that must be determined: the matching radius n and the matching weights vector W. The matching radius n indicates the correlation depth or the navigation depth, which indicates whether the next n movements still have an influence. In this case, Serdar (2012) [17] gives a detailed proposal for navigation depths of approximately 5 to 10; in this case, n will vary from 2.5 to 5 considering the symmetry of influence. Therefore, in this article, we can set n to 5.

Web geographical information system is a typical service-intensive application which must store massive data into storage nodes and service large numbers of users. Instead of reading tiles from storage in real-time on-the-fly, prefetching and caching tiles that will be requested in the future can reduce the response time of GIS services and substantially improve the quality of service. In server-side cache mode, prefetching and caching tiles can prepare data for servers in advance to reduce the latency of accessing slower disks. In client-side mode, prefetching and caching can be used to reduce the amount of data repeatedly transferred over short periods to save network bandwidth. However, it is difficult to predict the appropriate tiles to prefetch and cache both because of the massive sizes of the datasets as well as the limited space available in high-speed caches. This type of situation requires a more effective method for finding tiles’ inner relationships to trace and predict the next movements of users.