Structured query language: Development

Abstract

Database technology and the Structured Query Language (SQL) have grown enormously in recent years. This paper presents results of the research on the historical perspective of the development of SQL and its continuing development. Applications from different domains have different requirements for using database technology and SQL. This paper also presents an overview on definition of the Standard Query Language, and highlighted the several benefits and produce literature on risks of adopting a Standard Query language.

Keyword

SQL, DBMS, XML, Data, Database.

Introduction

Standard Query Language (SQL) is today the standard language for relational and object-relational databases. Application programs commonly include a relatively large number of SQL queries and updates, which are sent to the Database Management System (DBMS) for execution. The most generally used database management systems, such as Oracle, Access, mySQL, SQLserver, Paradox, Ingres, and others, are all claimed to be relational. Certainly, they all use SQL which itself is often simulated to be an indicator of a relational database system. The purpose of this paper is to discuss the historical perspective of the development of SQL and its continuing development. This article also highlighted the benefits and risks of adopting a standard query language. This paper is based on review history of the Standard Query Language and literature on benefit and risks of adopting SQL.

Evolution and current situation of the SQL

Abreu, et.al. (2006) described that the relational model came about as a result of E. Codd’s research at IBM during the sixties. The SQL, originally named SEQUEL (Structured English Query Language), was implemented in an IBM prototype (SEQUEL-XRM), during the mid-seventies. Some years later, a subset of this language was implemented in IBM’s System-R.

In 1979, ORACLE emerged as the first commercial DBMS based on SQL, followed by several other products such as SQL/DS, DB2, DG/SQL, SYBASE, INTERBASE, INFORMIX, UNIFY. Even those which had not originally implemented SQL as their base query language, offered SQL interfaces such as INGRES, ADABAS, SUPRA, IDMS/R. As a result of this process, Standard Query Language became a de facto standard.

In late 1982, American National Standards Institute (ANSI) H23 begins to standardize SEQUEL which is version of the relational data model through the IBM language. In 1986, renamed SQL by H2, basic SQL was completed and become an American National Standard and soon an ISO standard.

In 1989, the first version of the SQL standard was revised and an addendum, which included main improvements on referential integrity issues, was published. Meanwhile, ANSI brought out a standard for embedded SQL.

In the early nineties, a new version, known as SQL2 or SQL-92, was published by ISO. Both the semantic capabilities of the language and error management were then considerably improved. That standard was complemented a few years later, with the approval of SQL/CLI (Call-Level Interface) and SQL/PSM (Persistent Stored Modules). SQL became a complete computational language, with features such as control structures and exception handling.

During the last half of the nineties, SQL was extended by the inclusion of object-oriented capabilities. The resulting standard was divided into several parts. This version, previously known as SQL3 and then finally called SQL:1999, incorporated features such as new basic data types for example very large objects, user defined data types, recursive query operators, sensitive cursors, tables generalization and user roles.

According to Abreu, et.al. (2006), The latest version of the standard is the SQL:2003, which is there are major revisions and extensions to most parts of the SQL:1999 standard. This version contains SQL/XML which is XML related specifications, new basic data types such as bigint, multiset and XML, enhancements to SQL-invoked routines, extensions to the CREATE TABLE statement, there are new MERGE statement, schema object which is the sequence generator and two new sorts of columns for identity and generated. He also produced the Table 1 as summarizes the evolution of SQL.

Table 1 – Evolution of SQL

Year	SQL
70s	Relational model DBMS prototypes (SEQUEL XRM) First relational DBMS
80s	ANSI SQL-86 standard ISO SQL-87 standard SQL-89 addendum ANSI embedded SQL
90s	SQL 92 SQL/CLI SQL/PSM SQL:1999
2003	SQL:2003

Accordingto Wikipedia, thereis another standard produced at 2006 and 2008, The SQL:2006 by ISO/IEC 9075-14:2006 which defines waysfor SQL for conjunction with XML. It also defines how to do storing and importing XML data for SQL, manipulating the data on database and publishing SQL data and XML in XML form. In addition, by use of Query and XML Query Language, it can integrate into their SQL code the published by World Wide Web Consortium (W3C) in order to concurrently access ordinary SQL-data and XML documents. The latest standard is SQL:2008, it legalizes ORDER BY outside cursor definitions. It adds INSTEAD OF triggers and the TRUNCATE statement.

Abreu, et.al. (2006) highlighted the SQL:2003 standard is composed of nine parts, which are briefly described in Table 2. The numeration of parts is not contiguous due to historical reasons: some parts have disappeared such as SQL:1999’s part 5 – SQL/Bindings – was included in part 2 of SQL:2003 and other parts are new. The latter resulted either from further partitioning of previous parts such as part 11 was previously included in SQL:1999 part 2 or from the implementation of new requirements, such as parts 13 and 14, dealing with Java methods and XML data, respectively.

Since the SQL:1999, the SQL standard has evolved, to support the object-relational paradigm. This paradigm proposes a good compromise between relational and object-oriented databases. The former have a robust data model which is the relational one and powerful query optimization, recovery, security and concurrency mechanisms. The latter incorporate object-oriented mechanisms such as encapsulation, generalization, aggregation and polymorphism, and allow representing more complex elements which are required in several domains, such as CAD, CAM or GIS.

Object-relational databases offer the possibility of defining classes or abstract data types, as well as tables, primary and foreign keys and constraints, as relational databases also do. Furthermore, generalization hierarchies can be defined among classes or tables. Table attributes can be defined in a simple domain for example CHAR(25) or in a user-defined class, as a complex number or image.

Table 2 – Structure and summary of the SQL:2003 standard

Part	Name	Description
1	Framework (SQL/Framework)	Overviewof the standard. It describes terms and notation used in the other parts. It also defines the result of processing statements in that language and the grammar of SQL by a SQL-implementation that has been illustrating in the conceptual framework used for other parts.
2	Foundation (SQL/Foundation)	Thispart describes the basic operations on SQL-data and data structures. The functional query such as controlling, maintaining, accessing, creating, and protecting SQL-data. It also defines the semantics and syntax of a database language. It deals with the portability of data definitions and compilation units between SQL-implementations and the interconnection of SQL-implementations.
3	Call-Level Interface (SQL/CLI)	Itdescribes procedures to execute SQL statements within standard programming language in application written, such the SQL statements will be executed use independent functions.
4	Persistent Stored Modules (SQL/PSM)	It defines how the semantics and syntax of a database language has been declare and maintain persistent database language systems in SQL-server modules.
9	Management of External Data (SQL/MED)	Extensionsto Database Language SQL is definedfor support management of external data purpose by using data link types and foreign-data wrappers
10	Object Language Bindings (SQL/OLB)	Itdefines SQLJQ, which is describes extensions in the Java programming language, for support embedding of SQL statements into programs written. The syntax and semanticsof SQLJ described by this part, as well as mechanisms to make sure binary portability of resulting SQLJ applications. This part also defines a number of Java packages and their classes.
11	Information and Definition Schema (SQL/Schemata)	Thispart defines a Definition Schema that characterizes the SQL object identifier, the integrity and structure constraints of SQL-data, the authorization and security specifications associated to SQL-data, the features, sub-features and the support that each of these has in an SQL implementation and also packages of this standard, and an Information Schema. It also includes SQL-implementationinformation and sizing items.
13	Routines and Types Using the Java Programming Language (SQL/JRT)	Itdescribes the Java programming language on doing capability of invoking static methods as SQL-invoked routines and SQL structured user-defined types for using classes specified in the Java programming language.
14	XML-Related Specifications (SQL/XML)	Thispart describes ways in which SQL also can be used for conjunction with XML.

SQL Overview

Gardner and Hagenbuch (1983) stated that Relational DBMSs are normally accessed via a data language. In order to allow the DBMS to optimize accesses to the database and to deal with problems related to concurrent access by multiple users, the languages are typically very high level. The SQL is an example of such a language that can be used for that purpose.

They also said that SQL statements are non-procedural statements of “what data to retrieve”, for example the query statements to retrieve employee such as SELECT * FROM EMP , or “what operation to perform”, for example the query statement to delete some data which is non-queries such as DELETE EMP WHERE NAME = ‘Jones’ .While simple SQL statements can easily be entered interactively by end-users, the complex sequences of operations required by most applications for example closing the books for the end of month, those are better handled by embedding SQL in a procedural programming language.

According to Kuhlemann, et al. (2008) SQL is a database query language used for formulating statements that are processed by a database management system for create and maintain a database. The SELECT statement is the most commonly used by the SQL query which can retrieve data from one or more tables in the database. It can limit the retrieved data using conditional statements in the WHERE clause, the GROUP BY clause can use for group related data and it can limit the grouped data with the HAVING clause; for order or sort data which based on different columns using the ORDER BY clause.

He also stated that SQL consists of many statements to create and manipulate database objects. Since its first standardization in 1986, more and more functionality is being included in SQL in each subsequent standard covering a variety of aspects of user interaction. The latest edition of the SQL standard, referred to as SQL:2003, supports various functionality such as call level interfacing, foreign-data wrappers, embedding SQL statements in Java, business intelligence and data warehousing functions, support for XML, new data types, etc.

The next standard, called SQL 20071, it will most likely add features like regular expression support, binary and outing decimal data types, materialized views, streaming data support, XQuery support and support for the Resource Description Framework (RDF) and the semantic web. Many researchers have led the huge scope of SQL’s functionality to advocate the usage of a `scaled down’ version of SQL, especially for embedded systems. The hardware limitations such as small RAM, small stable storage, and large data read/write ratio come up from embedded systems. Also the applications where embedded systems are used, for an example such as healthcare and bank cash cards, need only a small set of queries like select, project, views, and aggregations.

Kuhlemann, et al. (2008) described that a standard called Structured Card Query Language (SCQL) by ISO considers inter-industry commands for use in smart cards with constrained functionality of SQL. This issue has been proposed to address by some database systems and SQL engines. They are distinguished as `tiny’, e.g., the TinyDB2 database system, for extracting information from a sensor network and tinySQL3 SQL engine, which is a Java SQL engine that supports only a few SQL statements like select, update, insert, delete. While the standardization process shows how SQL has increased in size and complexity in terms of features provided, efforts for `scaled down’ versions show a need to control and manipulate features of SQL.

Gardner and Hagenbuch (1983) stated that SQL is a data language designed for use with the relational data model. The executable unit of SQL is the statement, but there are no SQL “programs”. SQL statements execute in the context of a single enrolled user of the database. The context in which a statement executes chooses what advantages it may exercise on objects in the database. Just only one or two contexts will be absorbed by an application program. Many SQL statements may run within each context. Each statement isdescribed by the DBMS, i.e., prepared for execution.

Catrambone and Yuasa (2006) cited in (Smelcer, 1989) described that the Structured Query Language for databases, which sometimes referred to as the ‘‘Standard Query Language”, is a command language for relational databases. It was preferred here as the test domain because writing a query with SQL is a relatively complicated task and because the knowledge required to write queries can be fully specified.

Moore (1992) said that “SQL” was once an acronym for the “Structured Query Language” which was associated with a propriety implementation. When SQL is used to refer to the ANSI standard, it is no longer an acronym, simply a short form of “Database Language-SQL”.

Benefits of adopting SQL

Donaho and Davis listed that several features make Standard Query Language at least as good as any other query language currently in use :

The basic concepts and syntax of SQL are easy for learned. This short initial learning period can decreases the amount of training required and it also can increases productivity.
SQL is a reasonably high-level language. The programmer can write queries without knowing all of the intimate details of the DBMS implementation. For example, a SELECT clause allows the user to identify the needed data without indicating how to access it.
SQL associate the data definition and data manipulation languages. Unlike other query languages, SQL uses the same syntactic constructs for definition functions and manipulation functions. This regularity makes the language easier to learn and use.
SQL provides the functionality needed for most database applications. That is, the language is powerful enough to do most of the things required in a database application.

According to Biggar, Chawla and Meese (2009), other benefits of adopting SQL include:

Lower capital and operational costs by reduced IT footprint, easier management, storage platform features and licensing advantages.
Improved agility and manageability by isolating application workloads on the storage system, providing heterogeneous storage support, and allowing live automated provisioning of storage.
Higher availability by simplified backup and recovery, availability during server maintenance, failover support through physical host clustering, and business continuity or disaster recovery.

Hoffer (2009) also stated the benefits of SQL are as follow:

Reduced training cost, training for organization can focus one language only.
Productivity, IS professionals become more proficient and learn SQL with it from continued use. They can increase the productive, usually programmers can more rapidly maintain the existing program.
Application portability, when each machine uses SQL the applications can be moved from machine to machine.
Application longevity, when new versions of DBMSs are introduced and the standard of language is enhanced, the applications will simply be updated.
Reduced dependence on a single vendor, which may lower prices and improve service because of the market for such vendors can become more competitive,.
Cross-system communication, managing data and processing user programs can become more easily communicated.

Risks of adopting SQL

Maciol (2008) stated that SQL has a row of limitations coming from its foundations such as:

It is difficult for defining terms and lists.
There is limitation of atomic data.
Lack of repetition and iteration.
Limited possibilities of data processing control.
Lack of deduction possibility.

Chan, Lu and Wei (2003) listed the problem while using SQL :

Comprehension difficulty :

– Complex queries are not easy to analyze, usually by another person.

– The “nested maze” is quite confusing. This confirms one of the theoretical flaws of SQL

not well defined semantics for nesting (Codd 1990).

– Multiple joins of many tables can lead to uncertainty of the query accuracy.

– Logical errors are difficult to detect, as compared to 3GLs.

Formulation problem :

– Joins are difficult for end-users.

– Too many aggregate functions in a single query have led to problems.

– Use of wrong field and name definition.

– Unable to format the output as desired.

– Variables used with wrong variable types, especially for embedded SQL.

Performance

– Response is slow when system does not select the best path to access tables.

– Database contention occurs by simultaneous accesses

– A query may need to be broken into smaller queries to speed up processing time. This

requires more temporary space.

Unclear error message sometimes give wrong impressions.

They also stated that usually whenusers encounter problems with SQL, the majority refers to the manual. This also confirms the finding that manuals form a substantial secondary source of SQL knowledge. Mostly prefer try to find the assistance of colleagues or superiors. Only a minority, attempt to query with other languages, while less will try another means, one of which was to try until get it right, to SQL manuals.

Brass and Goldberg (2005) highlighted that errors in SQL queries can be classified into syntactic errors and semantic errors. A syntactic error means that the entered character string is not valid SQL. Then there are print an error message because it cannot execute the query by any DBMS. Thus, the error is easy to correct and definitely detected. A semantic error means that a legal SQL query was entered, but the query does not or not always produce the intended results, and is therefore incorrect for the given task. Semantic errors can be further classified into cases where the task must be known in order to detect that the query is incorrect, and cases where there is sufficient evidence that the query is incorrect no matter what the task is.

Kiefer and Nicola (2009) observed that the adoption of SQL/XML faces several challenges. When relational legacy applications need access to new XML data, it is often too expensive to convert them from SQL to SQL/XML. Another frequent challenge is to actually write queries and updates with SQL/XML and Query. There are numbers of problem can occur such as:

Users need to learn these new languages, which are often perceived as difficult to master. This stems from the differences between the XML data model and the relational data model.
SQL/XML involves path expressions that navigate the tree structure of XML documents. To write path expressions, users must know the structure of the XML data in detail. It is not enough to know which data items exist, it is also necessary to know their exact case-sensitive name, namespace, and location within the document structure. But, this structure is often complex, difficult to understand, or even unknown to the user.
As more XML documents are accumulated in a database, newer documents may have a different XML Schema than older ones. This requires queries and updates to work across documents for different schemas, which compounds the complexity of writing SQL/XML statements. Also, existing XML queries may need to be changed when the XML Schema evolves.
In a hybrid database, where some data is stored in relational format and some in XML format, users need to know which data is in which format before they can write correct queries.

Conclusion

The SQL (Structured Query Language) is presently available for most database management systems. The SQL:2008 is the latest standard whichlegalizes ORDER BY outside cursor definitions. It also adds INSTEAD OF triggers and the TRUNCATE statement. This paper present briefly overview history of Standard Query Language, it also analyze benefits and risk of adopting SQL. There are benefit of adopting SQL such as lower capital and operational costs, improved agility and manageability and higher availability. There are many risk occur when adopting SQL, but usually the user can easily handle that risks. In future work, it can investigate patterns for SQL queries in greater detail such as analyzing SQL queries from real projects.

References

Abreu, F.B., Baroni, A., Calero, C., & Piattini, M. (2006). An ontological approach to describe the SQL:2003object-relational features. Computer Standards & Interfaces, 28, 695-713.

Biggar, H., Chawla, A., & Meese, L. (2009). The Benefits of Virtualizing Microsoft® SQL Server® in Hitachi Storage Environments. http://www.hds.com/assets/pdf/the-benefits-of-virtualizing-microsoft-sql-server-wp.pdf.

Brass, S., & Goldberg, C. (2005). Semantic errors in SQL queries: A quite complete list. The Journal of Systems and Software, 79, 630-644.

Catrambone, R., & Yuasa, M. (2006). Acquisition of procedures: The effects of example elaborations and active learning exercises. Learning and Instruction,16 , 139e153.

Chan, H.C., Lu, H., & Wei, K.K. (1993). A Survey on Usage of SQL. Sigmod Record, Vol. 22.

Davis, G.K., & Donaho, J.E. Ada-Embedded SQL : the Options. Vii.3-72.

Gardner, M., & Hagenbuch, B. (1983). Interfacing Apl And Sql. .ACM, 0-89791-095-8/83/0400-0109.

Hoffer, J.A., Prescott, M.B., & Topi, H. (2002). Modern Database Management. New Jersey : Pearson Education.

Kiefer, T., & Nicola, M. (2009). Generating SQL/XML Query and Update Statements. ACM,978-1-60558-512-3/09/11.

Kuhlemann, M., Saake, G., Siegmund, N., Sunkle, S., & Uller, M.R. (2008). Generating Highly Customizable SQL Parsers. ACM, 978-1-59593-964-7/08/03.

Maciol, A. (2008). An application of rule-based tool in attributive logic for business rules modeling. Expert Systems with Applications, 34, 1825-1836.

Moore, J. W. (1992). Minimizing the disadvantages of embedded sql/ada. ACM, 0-S9791-529-1 /92/0011-0407.

Wikipedia, the free encyclopedia – http://en.wikipedia.org/wiki/SQL.

Order Now