PIQL: Success-Tolerant Query Processing in the Cloud

Advanced Topics in Foundations of Databases

PIQL: Success-Tolerant Query Processing in the Cloud Stavros Anastasios Iakovou

Introduction

In our days it is widely know that modern web applications are directly linked with databases. In addition, the number of the users is highly increas- ing through the time and as result the related databases start overloading. Furthermore, despite the fact that data indepence would be ideal for im- plemeting lithe applications developers abandoned this idea in order to avoid expensive queries. Hence, Michael Armbrust et al.[1] implemented a new declarative language called PIQL, a scale independent language.

A large number of frameworks have already appeared in order to assist developers to create modern web applications. However, this plethora of websites with millions of users led to database failures due to lack of request managing. As a result, there was a demand on implementation of a new system that will control all these requests and provide efficient results to users.

A few methods have introduced and one of the most popular is NoSQL. Despite the fact that NoSQL provided a high level interface, data indepen- dence created scalability problems since a large number of queries took a lot of time. This led to to several issues like performance failing and user disatisfaction as well. In order to avoid this bad situation scientists hand coded key/value implementations. On the one hand, this provided the de- sirable scalability but, on the other hand is was not easy enough for the developers to write that kind of code to parallelize their queries so as to fi achieve high scalability. Another significant issue is time consuming functions rewrites.

Now, once we talked about several problems occured by queries in the next section we will discuss about PIQL. More specifically, we will present this method and give a brief summary of the implementation. In the rest of the document we will discuss about the performance of the previous imple- mentation.

What is PIQL?

In this section we will discuss and analyze the PIQL (PerformanceInsightful Query Language) model. One important advantage of PIQL is that intro- duced the notion of scale independency. More specifically, the model pre- serves the logical data independence. The most significant about data this technique is that performace maintains not only on small datasets but also in large as well. For this reason this is called success-tolerant since the success is for every large dataset. But why PIQL is successful? The answer is on the limitation on key/value store operations.

As we previously mentioned, one goal of PIQL is to avoid issues when the database gets larger. PIQL uses static analysis in order to fi the correct number of operation in every step of the execution. Before we move to the next step of the analysis of the methodology we should mention the four queries classes. The fi one is called constant since the processing time is constant. The second one is the bounded class. More specifically this class refers to bounded data when the site becomes more popular. For instance, in case of Facebook every user has a limit of 5000 friends. The third class is called sub-linear or linear and is referring to queries that become more successfull when the data increase linearly. The last one is Super-linear where intermediate calculations are necessary for the queries.

Now, once we mentioned all the necessary theoritical parts of PIQL we will discuss on its structure. Every server is directly connected with a Distributed Key/Value Store. Hence, this methodology maintains the scalability and the response time is now predictable. A significant drawback of this technique is that a specific key/value store is required so as to maintain data locality. On the other hand, this method is non-blocking and according to Chen et al.[2] can reduce memory latency.

Another important benefit of PIQL is that extends the cardinality con- straint of regular direction to diff ent directions as well. More specifi , these cardinalities provide several information on its relationships. For in- stance, a Facebook user should have no more than 5000 friends. This is a very significant information since selecting the wrong number for limita- tions can lead back to the previous problems. Thinking again the Facebook limitations for the maximum number of friends on Facebook, according to Brandtzg et al.[3] a significant issue that occurs is the lack of privacy. Hence, the limitations are not only important for the performance but also for the user protection as well. In addition, the same person can create a new profit for free and add his new friends there. As a result, 5000 friends is not actually a limitation for a user and is provided in terms of privacy and performance. According to Michael Armbrust et al.[1] their algorithm for scale inde- pendent optimization contains two phases. The fi one is reffering to stop operator insertion. In order to maintain scalability, the algorithm starts by fi a linear join ordering on the query parser. Depsite the fact that stop operator is already contained due to LIMIT which in contained in the reg- ular query, scientists have introduced data-stop operators which are pushed in lower levels in order to preserve the initial rules without the demand ofÂ restart the whole system.

Next, after fiphase 1 the second step which is called remote op- erator matching. As we previously mentioned we should ensure scalabiliy. Hence, the intermediate results are bounded. But how all these logical op- erators are mapped on remote operators? For Index Scan, that means that maximum one attribute can be affected by predicates. As for Index Foreign Key join the number of tuples after the join is less than or than the tuples of the initial plan.

References

[1] Armbrust, Michael, et al. “PIQL: Success-tolerant query processing in the cloud.” Proceedings of the VLDB Endowment 5.3 (2011): 181-192.

[2] Chen, Tien-Fu, and Jean-Loup Baer. Reducing memory latency via non- blocking and prefetching caches. Vol. 27. No. 9. ACM, 1992.

[3] Brandtzg, Petter Bae, Marika Lders, and Jan Hvard Skjetne. “Too many Facebook friends? Content sharing and sociability versus the need for pri- vacy in social network sites.” Intl. Journal of HumanComputer Interaction 26.11-12 (2010): 1006-1030.

Order Now