Chapter 06 - Retrieving Data with Transact-SQL

Retrieve data from a table by using a SELECT statement - The SELECT statement is used to display data from database tables.
Use comparison, boolean, and range operators in SELECT statements to specify table rows - You can write queries that specify selected table rows.
Return rows of a table in sorted order by one or more columns and eliminate duplicate rows - You can manipulate the rows of a table, including changing their presentation order.
Use embedded queries to return rows used as the input values to outer queries
Use the ANY and ALL keywords to constrain the rows retrieved in a SELECT statement
Use the Select Data dialog box rather than ISQL to perform data retrieval from database tables

You usually don't want to access and display all the data stored in a database in each query or report. You may want some—but not all—of the rows and columns of data. Although you can access all the information, you probably don't need to display all rows and columns simply because it's too much information to examine at one time.

In Chapter 1, "Introducing Microsoft SQL Server" and Chapter 2, "Understanding Data Modeling and Database Design," you learned that the information stored in a relational database is always accessed in the form of a table. If you reference a printed table of information, you usually don't read all the rows and columns. You probably look at only part of the table to obtain the information you need. The table exists in a printed form only because it's a primitive way of storing information.

If you can reconsider your requests for information from the database, you can start to eliminate the queries that produce unwanted or unneeded results. In these cases, you can produce output that presents exactly what's needed and nothing more.

Understanding Relevant Storage Characteristics for Retrieval

The data stored on your database—the disk of your computer system—is analogous to a set of printed tables. You don't need to retrieve an entire table when you issue queries to display information from the database. You construct a query using a Transact-SQL statement that returns only the relevant portion of the column or rows of your database tables.

Table 6.1 shows an example table structure and its data, which will be used for several examples in this chapter. For information on creating tables, see Chapter 4, "Creating Devices, Databases, and Transaction Logs" and Chapter 5, "Creating Database Tables and Using Datatypes"

Table 6.1—A Table Containing 12 Rows
Name Department badge

Bob Smith SALES 1834

Fred Sanders SALES 1051

Stan Humphries Field Service 3211

Fred Stanhope Field Service 6732

Sue Sommers Logistics 4411

Lance Finepoint Library 5522

Mark McGuire Field Service 1997

Sally Springer Sales 9998

Ludmilla Valencia Software 7773

Barbara Lint Field Service 8883

Jeffrey Vickers Mailroom 8005

Jim Walker Unit Manager 7779

The table is limited to 12 rows to make it easier to work with the examples. The typical size of a table for a production database might have more columns of information and will nearly always have more rows of information. The size of the table won't make any difference in showing the operation of Transact-SQL statements. The statements work identically regardless of the size of the tables operated on. The examples in this chapter are easier to understand if a small number of rows and columns are present in the table used to show SQL operations.

Retrieving Data from a Table with SELECT

Unless you retrieve all the columns and rows from all tables, your queries of a database are a selection process that narrows the information retrieved from the database. Your goal as you work with tables should always be to return only the information needed to fulfill the user's request. Any more information, and the user will be required to wait for a longer period of time than needed. Any less information than is needed will result in additional queries against the database. This modeling of sets of data is always a balancing act that requires continued refinement.

The Transact-SQL SELECT statement is used for the selection process. Various parts of a SELECT statement target some—but not all—of the data in the database tables. The complete syntax of the SELECT statement is as follows:

SELECT [ALL | DISTINCT] select_list
      [INTO [new_table_name]]
[FROM {table_name | view_name}[(optimizer_hints)]
      [[, {table_name2 | view_name2}[(optimizer_hints)]
      [..., {table_name16 | view_name16}[(optimizer_hints)]]]
[WHERE clause]
[GROUP BY clause]
[HAVING clause]
[ORDER BY clause]
[COMPUTE clause]
[FOR BROWSE]

A SELECT statement is like a triangle superimposed on a database table. Using SQL keywords, the database is narrowed to target the columns and rows that are to be retrieved in a query. In the triangle comparison shown in Figure 6.1, the widest part of the triangle selects all the rows and columns of a database for retrieval. The point of the triangle opposite its wide base selects the least amount of data that can be retrieved from a table (a single row consisting of one column).

Fig. 6.1 - SELECT queries are used to target specific columns and rows of a database. You can also use the SELECT statement to extract a single column or row, or all columns and all rows from the database.

Most SQL queries retrieve rows and columns that are narrower than the entire table, represented by the base of the triangle in the Figure, but wider than the single row and column as shown in the point of the triangle opposite the triangle's base. You'll typically need to retrieve more than a single row and column but less than all the rows and columns of the database.

Specifying the Table with FROM

Different parts of the SELECT statement are used to specify the data to be returned from the database. The first part of the selection process occurs when fewer than all the database tables are referenced. You can retrieve data stored in the database separately by referencing some tables, but not others, in a SELECT statement.

A SELECT statement uses the FROM clause to target the tables from which rows and columns are included in a query. The syntax of the FROM clause is

 [FROM {table_name | view_name}[(optimizer_hints)]
      [[, {table_name2 | view_name2}[(optimizer_hints)]
      [..., {table_name16 | view_name16}[(optimizer_hints)]]]

In the following complete SELECT statement, the FROM clause specifies that the returned data should include only data from the employee table:

SELECT *
FROM employee

In the examples shown in this chapter, the Transact-SQL keywords used to form clauses are written in uppercase. You can, however, use lowercase keywords. However, if you installed Microsoft SQL Server with the default binary sort order, the names of your database objects—including the names of tables and columns—must match in case.

You can also specify multiple tables in the FROM clause, as in the following example:

FROM table_name_1,...,table_name_n

Each table is separated from the names of other tables with a comma, a separator used with lists of information in FROM and other Transact-SQL clauses. The list in a FROM clause often specifies multiple tables rather than a single table.

In the following example of a SELECT statement, the FROM clause references the data from two tables:

SELECT *
FROM employee,pay

The employee and pay tables are targeted, from which all rows and columns are retrieved.

As you'll see later in the section "Using a Wild Card in the SELECT Clause," using Select * returns all columns from the requested table or tables. This can cause queries that take quite some time to complete. You should avoid using Select * if possible.

In a relational database, you must provide instructions within the SELECT statement to match the rows from two or more tables together. To learn how to match, or join rows from multiple tables, see Chapter 7, "Performing Operations on Tables"

Transact-SQL allows you to choose tables from different databases. You can specify the name of the database in which the table is located by inserting the database name to the left of the table name. Next, place a period, the database owner name, and another period, followed by the table you need to work with:

database_name.owner.table_name

In the following example, the employee table in the database company and the owner dbo is specified:

SELECT *
FROM company.dbo.employee

The DBO keyword specifies the database owner. You can refer to the dbo at any time. SQL will know that you're referring to the owner of the specific database.

In the previous example, the table was created by using the system administrator's account, so the owner is dbo. If you omit the name of the database and owner when you reference a table, SQL Server looks for the table or tables that you specified in the FROM clause in the current database. You must enter the name of the database in which a table was created, along with its owner, to include the rows and columns from tables in different databases.

Specifying Columns with SELECT

The columns of values returned from database tables are specified as part of the SELECT clause immediately following the SELECT keyword. One or more of the columns are entered as a list. Each column, like the tables in a FROM clause, is separated by a comma:

SELECT column_name_1,...column_name_n
FROM table_name_1,...table_name_n

In the following code example, the name and badge columns are selected for retrieval. The results are shown in Figure 6.2. This example uses the SQL Windows application utility ISQL/w to perform the retrieval of rows.

Select name, badge 
from employee

Fig. 6.2 - Select statements are used to retrieve data from the database..

The contents of the listings will be what should be entered into the query page of an ISQL/w session. Figures will then display what the results should look like when the Execute button (refer to fig. 6.2) is pressed.

Follow these steps to begin a ISQL/w session:

Click the ISQL/w icon in the SQL Server for Windows NT program group.
The main window of ISQL/w appears, along with the Connect Server dialog box. You can connect to one or more SQL Server systems if you have a valid Login Id and Password. Use the List Servers button to see a list of the available servers. Figure 6.3 shows the connection to begin an ISQL/w session with a local Microsoft SQL Server system.
Fig. 6.3 - You can also connect to a local Microsoft SQL Server through ISQL/w.
Enter a Server, Login Id, and Password.

If you've implemented Integrated Security, you don't have to specify a Login Id and Password. Your network name and password will be used to validate you on the server.
Click Connect.
If your server is available and your Login Id and Password are valid, a query window appears. Unless you've set the default database to the one in which you want to access tables and other objects, you must explicitly set ISQL/w to the correct database.
Enter the command Use name_of_your_database in the query window, or select the name of the database from the DB drop-down list box (above the Results tab).
Click the Execute button on the toolbar.

The example in Figure 6.4 shows the result of the execution of the USE command to set the database to employees. Use the Erase button on the toolbar to delete the previous command before you enter and execute a new command.

Fig. 6.4 - You can also define the database to which a user is automatically positioned to when he starts a SQL Server session during the creation of the user logon.

In Chapter 1, "Introducing Microsoft SQL Server" and Chapter 2, "Data Modeling and Database Design" you learned that one of the basic tenets of a relational database is that operations on database tables always return another table. The rows and columns of database tables that are targeted for retrieval are always assembled into a temporary table. In most cases, this table is maintained only until the data is provided to the requesting client.

The new temporary table shown in Figure 6.2 was constructed from the three-column employee table. According to the SELECT statement, "the temporary table targets all rows of the permanent table's three columns and eliminates the second column, Department. The temporary table is deleted after the rows are provided to the requesting client.

Figure 6.5 shows the query with the employee table performed with the command-line form of ISQL. As you can see, you must enter the default command-line terminator go at the end of each statement.

Fig. 6.5 - Transact SQL will produce the same results from a command-line prompt that it will in the GUI environment.

Use the ED command to edit a long statement to be entered at the command line. This will invoke the system editor with the previous command entered. Upon exiting the system editor, it will place the edited statement as the next statement.

You can display table columns in a different order than you originally defined. To change the default order for the display of table columns, simply list the names of the columns in the order in which you want the columns displayed. In the following example—the results are shown in Figure 6.6—the order of display for the columns of the employee table are reversed from the order in which they were defined.

select badge, department, name
from employee

Fig. 6.6 - The SELECT clause determines the order columns are listed.

Changing the order of the displayed columns of a database table is consistent with the characteristics of a relational database. You learned in Chapter 2, "Understanding Data Modeling and Database Design," that the access of data from a relational database doesn't depend on the manner in which the data is physically stored. You simply specify the names of the columns in the order in which you want them returned in the SELECT clause of the SELECT statement.

You can display the same column of a table in multiple places if you need to improve the readability of the table, as in a train schedule.

Using a Wild Card in the SELECT Clause

You can use an asterisk (*) in the SELECT clause to specify all columns for inclusion in the retrieval. The following code shows a query that uses an asterisk to reference all columns of the employee table with the results shown in Figure 6.7. The name, department, and badge columns are displayed in the query results.

Select * 
from employee

Fig. 6.7 - You can use the asterisk wild-card character in the SELECT clause of a SELECT statement.

Although numerous examples throughout this book show SELECT statements with the asterisk (*) in the SELECT clause, you should always use caution in using the asterisk with production databases. The asterisk is used in the examples because it's convenient to use to reference all the columns. In many of the sample queries in which the asterisk is used, it has little effect on the amount of time it takes to perform the query.

You shouldn't use an asterisk with a production database because you probably need to access only some of the table columns in a query rather than all of them. Eliminating some table columns can dramatically reduce the time it takes to retrieve the rows when several rows are retrieved.

You can specify the column names even if all the columns should be retrieved so that the query is more descriptive. If the query is saved and later in need of revision, the columns and rows that the query retrieves will be easy to determine by reviewing the query.

Specifying the Rows of a Table

In the previous examples, all the table rows of a database are retrieved. Your goal often will be to retrieve only certain rows rather than all rows. For example, if you have tables that contain millions of rows, you'll probably never execute a query to retrieve all rows from the tables. Every query that you execute specifies a specific results set because it's impractical to retrieve or manipulate all rows in a single query.

The WHERE keyword is used to form a clause that you add to a SELECT statement to specify the rows of a table to be retrieved. A WHERE clause uses the following syntax:

SELECT column_name_n,...column_name_n
FROM table_name_1,...table_name_n
WHERE column_name comparison_operator value

A WHERE clause forms a row-selection expression that specifies, as narrowly as possible, the rows that should be included in the query. A SELECT statement that includes a WHERE clause may return a single row or even no rows if none of the rows matches the criteria specified in the SELECT statement.

In the example from the following code—the result is shown in Figure 6.8—a WHERE clause specifies that only the rows with the sales department are retrieved. All rows that contain the sales department, without regard for the case, are displayed.

select * 
from employee 
where department = "sales"

Fig. 6.8 - The sort order that you defined during the installation of Microsoft SQL Server determines case-sensitivity.

The default sort order is defined as case-insensitive during installation. If you change this after installation, you'll need to reinstall SQL Server.

The following code—the result is shown in Figure 6.9—shows the SELECT statement returns a single row because only one row contains the mailroom department.

select *
from employee
where department = "mailroom"

Fig. 6.9 - The count message shows "row(s)" but only a single row is selected.

The query result for a row that contains the personnel department retrieves no rows, as shown in the following code and in Figure 6.10. The count line, which displays the number of rows retrieved, is at the bottom of the output window. A retrieval in which no rows match the criteria of the SELECT statement doesn't return an error; instead, the message (0 row(s) affected) appears.

If you work with query products other than Transact-SQL, you may receive an error when no rows are retrieved. A query that returns no rows is considered valid by Transact-SQL—as well as other SQL dialects.

select *
from employee
where department = "personnel"

Fig. 6.10 - You can enclose the value in the WHERE clause in single or double quotation marks. (Double quotes are used here.)

You can refer to the @@ERROR system symbol, which is called a global variable, to learn whether the previous operation was successful. Microsoft SQL Server returns a zero (0) to @@ERROR if the previous operation was successful. After you execute a SELECT query that retrieves zero rows, @@ERROR contains a zero (0), indicating that no error occurred (see fig. 6.11).

select @@error

Fig. 6.11 - Conditional statements can be used to check @@ERROR and perform additional statements, if needed.

A SELECT statement is used in Figure 6.11 to display the contents of @@ERROR, so the results are returned in the form of rows retrieved from a table. The dashes (–) are displayed underneath the location of where a column header would appear if a column—rather than @@ERROR—was specified. A count message is also displayed for the one value (0) retrieved from the global variable. Error conditions and global variables are discussed in Chapter 13, "Managing Stored Procedures and Using Flow-Control Statements."

Using Comparison Operators in a WHERE Clause

The syntax for the WHERE clause allows the use of a comparison operator following the name of a table column and before a column value. In the earlier examples, only the comparison operator = (equal) was used. Additional comparison operators may be used to retrieve different rows. Table 6.2 lists the comparison operators that you can use in the WHERE clause.

Table 6.2—Comparison Operators
Symbol Meaning

= Equal
!= Not equal

<> Not equal

< Less than

> Greater than

<= Less than or equal to

>= Greater than or equal to

LIKE Equal to value fragment

Table 6.2 lists the LIKE keyword as one of the comparison operators. Although LIKE isn't listed as one of the comparison operator symbols in the Microsoft documentation, LIKE is used exactly as a comparison operator. For more information on this operator, see the later section "Using the Comparison Operator LIKE."

The syntax for a WHERE clause that uses a comparison operator is as follows:

SELECT column_name_1,...column_name_n
FROM table_name_1,...table_name_n
WHERE column_name comparison_operator value

You can optionally use spaces around the comparison operations if you wish. You query will execute correctly with or without spaces around the comparison operators.

In addition to the = (equal) comparison operator that was used in the preceding section (refer to figs. 6.8 through 6.11), you can use the <> (not equal) operator. You can use the not-equal operator to retrieve all rows except those that contain the value to the right of the <> operator. The following code—the result is shown in Figure 6.12—shows a SELECT statement that contains a WHERE clause for all rows from the employee table except those that contain the sales department.

select *
from employee
where department <> "sales"

Fig. 6.12 - You can the comparison operators <> or != for 'not equal'.

The < (less than) comparison operator can be used to retrieve rows that are less than the value specified for the column in the WHERE clause. In the following code—the result is shown in Figure 6.13—the rows that contain a badge number less than 5000 are retrieved from the employee table.

select *
from employee
where badge < 5000

Fig. 6.13 - The results include rows with badge numbers less than 5000.

The > (greater than) comparison operator retrieves rows that contain a value greater than the value used in the WHERE clause. In the following code, rows with badge numbers greater than 8000 are retrieved from the employee table (see fig. 6.14).

select *
from employee
where badge > 8000

Fig. 6.14 - The results include rows with badge numbers greater than 8000.

The <= (less than or equal to) comparison operator returns rows that have a value equal to or greater than the value in the WHERE statement. The following code returns rows that contain the value less than or equal to badge number 3211 (see fig. 6.15).

select *
from employee
where badge <= 3211

Fig. 6.15 - The results include rows with badge numbers less than or equal to 3211.

The >= (greater than or equal to) comparison operator returns rows that are greater than or equal to the value in the WHERE clause. Comparison operators can be used with columns that contain alphabetic values as well as numeric values. The following code uses >= in the WHERE clause to retrieve rows that are alphabetically greater than or equal to software for the department column (see fig. 6.16).

select *
from employee
where department >= "software"

Fig. 6.16 - The results include rows that contain the software department.

When you use comparison operations with columns that are defined as datatypes such as CHAR or VARCHAR, SQL Server uses the binary representation of all characters including alphabetic characters. For example, an uppercase letter A is stored with a lower binary value than an uppercase B. A character value of B is considered greater than the value of an uppercase A using its binary representation.

For values that are more than a single character, each character is successively compared using the binary representation.

Using the Comparison Operator LIKE

The last of the comparison operators is a keyword—rather than one or two special symbols. The LIKE keyword is followed by a value fragment rather than a complete column value. The example query in the following code retrieves all rows that contain a department name beginning with the alphabetic character S (see fig. 6.17). A wild-card character such as the percent sign (%) can follow the letter S. This wild card is used to match any number of characters up to the size of the column, minus the number of characters that precede the percent sign.

You can also use the % before a value fragment in the WHERE clause of a SELECT statement, such as '%s'. You can also use wildcards multiple times.

select *
from employee
where department like "s%"

Fig. 6.17 - The query retrieves only the rows that contain a department that starts with the letter s.

An underscore (_) is another wild card that you can use to specify a value fragment. Each underscore used in the specification of a value fragment can match any one character. The example shown in the following code uses four underscores following the S to match any rows that contain a column value that begins with an s followed by any four characters (see fig. 6.18). Unlike the example shown in Figure 6.17, the query retrieves only the rows that contain Sales or SALES; it doesn't retrieve the rows that include Software.

select *
from employee
where department like "s____"

Fig. 6.18 - You can use the underscore (_) wild card along with the percent sign (%) in a WHERE clause.

You can use square brackets ([]) as wild cards in a WHERE clause that uses the LIKE comparison operator. The square brackets specify a range of values. In the following code, the brackets are used to specify a range of any upper- or lowercase characters as the first character of the department column (see fig. 6.19).

select *
from employee
where department like "[a-zA-Z]%"

Fig. 6.19 - You can use any wild-card combination in the value fragment.

In Figure 6.19, % and [] are combined to specify that the rows for retrieval have any upper- or lowercase letter as their first character, as well as any additional characters up to the width of the column. The department column was created wide enough to store 20 characters. The Figure shows that you can combine wild cards to specify a value fragment.

You can also use a caret (^) after the left bracket to specify a range of values to be excluded from the rows retrieved in a SELECT statement. For example, the SELECT statement shown in the following code retrieves all rows from the employee table except those with first characters that fall within the range F through M (see fig. 6.20).

select *
from employee
where department like "[^F-M]%"

Fig. 6.20 - A SELECT statement that excludes a range of values from F through M.

You can use wild cards only in a WHERE clause that uses the LIKE keyword. If you use the asterisk (*), underscore (_), brackets ([]), or caret (^) with any of the other comparison-operator symbols, they're treated as literal column values. For example, the following code contains the same query issued in Figure 6.20, but an equal sign (=) has been substituted for the LIKE query (see fig. 6.21). The identical query with an equal comparison operator rather than LIKE doesn't retrieve any rows.

select *
from employee
where department = "[^F-M]%"

Fig. 6.21 - When you use wild cards with comparison operators other than LIKE, they're treated as literal column values.

Selecting Columns and Rows with the WHERE Clause

You can retrieve a subset of a table's columns and rows in a SELECT statement by combining the use of specific, called-out columns and the use of a constricting WHERE clause. In the following code, only two columns, name and department, are selected, and only for the rows that contain the 'Field Service' value in the department column (see fig. 6.22).

select name, department
from employee
where department = "field service"

Fig. 6.22 - A SELECT statement can limit both the rows and columns retrieved.

In Chapter 1, "Introducing Microsoft SQL Server" you learned that you can create more than 250 columns in a table and an unlimited number of rows. It's almost always impractical—except for small, simple tables—to try to retrieve all columns and rows of a table. If you correctly construct your query, you can also reference columns from multiple tables in a single query.

The SELECT clause is descriptive because it invokes a selection operation on a table's rows and columns. Keep in mind the analogy of the data-retrieval triangle introduced earlier in this chapter (refer to fig. 6.1). The SELECT statement effectively superimposes a triangle over a table to retrieve some (but not all) columns and some (but not all) rows.

I executed a SELECT statement that resulted in no rows returned from a table, and I was surprised that SQL Server didn't return an error.

SQL Server doesn't return an error for a query that results in no returned rows. Although this is not true of other software products—for example, some programming languages expect an error when no records are retrieved from a file—Microsoft SQL Server and Transact-SQL consider a query that returns no rows a valid query.

I executed a SELECT statement that resulted in no rows returned, but I'm certain information is in the database. How can I debug what went wrong?

Try modifying your WHERE clause so only one condition is applied. Run the query with only the first condition. If no rows are returned, you can examine the WHERE clause to determine the problem. If information is returned, add the next portion or portions of the original WHERE clause, running the query after each. You should be able to quickly narrow down which portion of the constricting clause is going awry.

Using Boolean Operators and Other Keywords in a WHERE Clause

You can use Boolean operators to retrieve table rows that are based on multiple conditions specified in the WHERE clause. Booleans are used the way conjunctions are used in the English language. Boolean operators are used to form multiple row-retrieval criteria. Use Boolean operators to closely control the rows that are retrieved.

The syntax for the use of a Boolean is as follows:

SELECT column_name_1,...column_name_n
FROM table_name_1,...table_name_n
WHERE column_name comparison_operator value
Boolean_operator column_name comparison operator

Using the OR Operator

The first of the Boolean operators is OR, which you can use to select multiple values for the same column. In the following code, OR is used to form a WHERE clause to retrieve rows containing two column values (see fig. 6.23). Continue to add ORs to the WHERE clause to select additional values for the same column.

select *
from employee
where department = "field service"
or department = "logistics"

Fig. 6.23 - You can use any number of ORs with different comparison operators in each comparison.

The following query retrieves the rows of the employee table that contain three column values (see fig. 6.24).

select *
from employee
where department = "field service"
or department = "logistics"
or department = "software"

Fig. 6.24 - You can use additional ORs to select more than two values of the same column.

You can specify different columns in the WHERE clause of a SELECT statement that uses an OR. The query in following code retrieves rows that either are members of the Field Service department (with any badge number) or have a badge number that's less than 6000 (but are members of any department). See Figure 6.25.

select *
from employee
where department = "field service"
or badge < 6000

Fig. 6.25 - You can use multiple Boolean operators to specify a criteria for the rows to be returned by a SELECT statement.

Using the AND Operator

Use the AND Boolean operator if you want the rows returned by the query to meet both comparisons specified in the WHERE clause.

In the following code, a query is used to retrieve a row that contains a specific Name and badge combination (see fig. 6.26). If two rows in the table contained Bob Smith, the boolean AND is used to specify a criteria that requires the row to also contain the badge value 1834. Multiple rows would be returned only if more than one row contained the values Bob Smith and 1834.

select *
from employee
where name = "bob smith"
and badge = 1834

Fig. 6.26 - You can also use AND (as shown here) and OR together in a WHERE clause.

Populating one column of the table with unique values allows individual rows to be retrieved.A unique row is returned if one of the specified columns is the column that contains unique values.

Defining a column with a unique row or a combination of rows allows individual rows to be retrieved or manipulated. In Chapter 1, "Introducing Microsoft SQL Server" you learned that Microsoft SQL Server allows you to store rows that have duplicate values across all table columns. If you allow rows to be individually selected, you establish the capability to reference one table row at a time, if necessary. Chapter 5, "Creating Database Tables and Using Datatypes" reviews the datatypes that are available. You may recall that the TIMESTAMP datatype is a perfect way to manage uniqueness for tables.

See the section in Chapter 5 entitled Specialized Datatypes

Using the NOT Operator

NOT is an additional Boolean operator that you can use as part of a WHERE clause. Use NOT to specify negation. Use NOT before the column name and a comparison operator, such as = (equal), after the column name and before the value.

For example, the following query retrieves all rows of the employee table that contain any department except Field Service (see fig. 6.27).

You can use NOT instead of the not-equal comparison operators != and <>. A WHERE clause that uses NOT for negation is visually easier to understand than one that uses != (not equal).

select *
from employee
where not department = "field service"

Fig. 6.27 - You can use NOT for negation the same way the not-equal comparison operators (!= and <>) are used.

You can also use NOT in a WHERE clause in combination with AND and OR. In the following code, NOT is used to retrieve all rows of the employee table that are members of the Field Service department, except for Mark McGuire (see fig. 6.28).

select *
from employee
where department = "field service"
and not name = "mark mcguire"

Fig. 6.28 - You can use OR as well as AND with the NOT Boolean operator.

Using BETWEEN to Select a Range of Values

Although you can use a number of ORs in a WHERE clause to specify the selection of multiple rows, another construction is available in Transact-SQL. You can use the BETWEEN keyword with AND to specify a range of column values to be retrieved.

In the following code, a WHERE clause that includes BETWEEN is used to specify a range of badge values to be retrieved from the employee table (see fig. 6.29). Use BETWEEN after the name of the column, followed by one end of the range of values, the AND keyword, and the other end of the range of values.

select *
from employee
where badge between 2000 and 7000

Fig. 6.29 - The table doesn't have to contain rows that are identical to the column values used to specify the range of values referenced by BETWEEN.

The two numbers that form the range of values for retrieval don't actually need to be stored in the table. For example, in Figure 6.29, badge numbers 2000 and 7000 don't have to be stored in the table. Those numbers simply specify a range. Also, a successful query (one that returns an error code of 0) can return zero rows within the range specified by the WHERE clause that contains a BETWEEN. No rows need to be stored in the table for the query to execute correctly.

Using IN to Specify a List of Values

You can't always use a WHERE clause with BETWEEN to specify the rows that you want to retrieve from a table in place of a WHERE clause that contains multiple ORs. The rows that contain the column values specified within the range of values will include rows that you don't want. However, you can use the IN keyword in a WHERE clause to specify multiple rows more easily than if you use multiple ORs with a WHERE clause.

A statement that uses IN uses the following syntax:

SELECT column_name_1,...column_name_n
FROM table_name_1,...table_name_n
WHERE column_name IN (value_1, ...value_n)

In the following code, IN is followed by a list of values to specify the rows to be retrieved (see fig. 6.30).

select *
from employee
where badge in (3211,6732,4411,5522)

Fig. 6.30 - A WHERE clause that contains IN is simpler to write than a WHERE clause that contains multiple ORs and is more specific than using the BETWEEN operator.

Using an ORDER BY Clause

In Chapter 1, "Introducing Microsoft SQL Server" you learned that the rows of a relational database are unordered. As part of your SELECT statement, you can specify the order in which you want the rows retrieved and displayed: Add an ORDER BY clause to sort the table rows that are retrieved by a SELECT statement.

The rows of a SQL Server database usually will be retrieved in the order in which you inserted the rows into the table. If you create a clustered index for a table, the order of the rows returned by a query is the order of the clustered index. But you can't rely on the stored order of rows for two reasons:

You can create an index for a table that didn't have a clustered index. After the clustered index is created for the table, the rows will be retrieved in a different order than before the index is created.
The clustered index for a table can be deleted at any time, which affects the order in which rows are subsequently retrieved.

If you want to return the table rows in a specific order, you must add an ORDER BY clause to a SELECT statement.

The syntax of a SELECT statement that contains an ORDER BY clause is as follows:

SELECT column_name_1,...column_name_n
FROM table_name_1,...table_name_n
ORDER BY column_name_1,...column_name_n

The following code shows a SELECT statement in which the rows are ordered by department (see fig. 6.31). The rows of the employee table are retrieved in ascending order by default.

select *
from employee
order by department

Fig. 6.31 - You can use multiple columns to determine the order of rows retrieved.

badgeIf you include the name of a second column after the name of the first column, the second column orders the rows that are duplicates of the first column. In the following code, ORDER BY is used to order the rows of the employee table first by the department and then by the badge column (see fig. 6.32).

select *
from employee
order by department, badge desc

Fig. 6.32 - You can reference a third column if you have the need.

Notice that the DESC keyword is added after the second named column to order the badge numbers in descending orderbadge. You can also use the keyword ASC to specify explicitly that a column's order in the ORDER BY clause be ascending. However, it's unnecessary because the order of a column is ascending by default.

When you use ORDER BY in a SELECT statement, you can specify the columns in the ORDER BY clause by their order number in the SELECT clause. In the following code, the department and badge columns from the employee table are referenced in the ORDER BY clause by their order of occurrence from left to right in the SELECT clause (see fig. 6.33).

select department, badge
from employee
order by 1, 2

Fig. 6.33 - A SELECT statement that references the columns using column numbers rather than names.

The number of columns referenced by column number in the ORDER BY clause can't be greater than the number of columns in the SELECT clause of a SELECT statement. If you specify a number that's larger than the number of columns in the SELECT clause, you receive an error message saying that you're out of range of the number of items in the select list.

You must understand why SQL Server doesn't use the order of the column defined in a table such as employee in the ORDER BY clause. In a relational database, the syntax of a query language that references database data should be independent of the manner in which the data is stored.

If Transact-SQL used the order of columns as they are defined in the table, the column number would be based as a physical characteristic of the stored data. It's more appropriate to reference the columns using the relative order of the columns in the SELECT clause.

Another problem exists in trying to reference table columns by the order in which the columns are defined in a table. You can specify columns from different tables in the SELECT clause. You can't, for example, reference two columns that are both defined as the third column in two tables, because the column numbers are identical.

You have complete control over specifying columns in the SELECT clause of a SELECT statement. You can reference the same column more than once in a SELECT clause, and the column values will be displayed multiple times. In the following code, the badge column is referenced twice in the SELECT clause (see fig. 6.34).

select badge, name, department, badge
from employee

Fig. 6.34 - Multiple copies of the same column can be used to improve readability.

You may not immediately see a reason for displaying the values of a column more than once. If a listing is wide enough, it can be convenient to display a column—often the unique identifier for each row—as the first and last columns of a display, as Figure 6.34 shows. A train schedule is an example of a wide output in which the stations often are displayed in the first, last, and center columns to make the output display easier to read.

Using DISTINCT to Retrieve Unique Column Values

You can construct your database table so that you never allow duplicate rows to be stored or to allow duplicate rows. Unless you define a constraint on your table, such as a unique key, you can store duplicate rows in the table. Although you can disallow duplicate rows in the table, you may allow duplicates for some columns. You may want to find the unique entries that exist in a table column. The DISTINCT keyword is used to return the unique values of a column.

The following code shows the different departments of the employee table (see fig. 6.35).

select distinct department
from employee

Fig. 6.35 - You can use DISTINCT with multiple columns to return the unique values of a column.

If you use DISTINCT with multiple columns, the rows retrieved are unique by the combination of the columns specified after the DISTINCT keyword. The combination of the values from the Department and badge columns must return all rows. Whenever you combine a column that contains non-duplicate values (such as badge) with a column that contains duplicate values (such as Department), the combination of the two is non-duplicate.

Using Arithmetic Operators

You can use arithmetic operators to form expressions in Transact-SQL. Expressions are evaluated within the statements in which they appear. You need arithmetic operators to manipulate the data retrieved from tables. You can use the arithmetic operators in the SELECT clause to add, subtract, multiply, and divide data from columns that store numeric data.

Table 6.3 shows the arithmetic operators you can use in Transact-SQL.

Table 6.3—Transact-SQL Arithmetic Operators
Symbol Operation

+ Addition

– Subtraction

* Multiplication

/ Division

% Modulo (remainder)

You can form expressions by using arithmetic operators on columns defined with the datatypes TINYINT, SMALLINT, INT, FLOAT, REAL, SMALLMONEY, and MONEY. You can't use the modulo operator (%) on columns defined with the MONEY, SMALLMONEY, FLOAT, or REAL datatypes.

The modulo operator (%) is used to return an integer remainder that results from the division of two integer values. As a result, it can't be used with datatypes that can contain non-integer values. You'll receive an error message if you try to use the operator on columns defined as other than integer datatypes, even if the column values are whole numbers.

You can use arithmetic operators to form expressions in the SELECT clause of a SELECT statement with both numeric constants and columns. In the following code, an expression is used to increment the badge numbers by 5 (see fig. 6.36).

select badge, badge + 5
from employee

Fig. 6.36 - You can use multiple arithmetic operators to operate on column names, constants, or a combination of column names and constants.

When you use an expression in a SELECT clause, the display for the evaluation of the expression doesn't have a column header. You can specify a column header for the expression by preceding the expression with a text string followed by an equal sign (=). For example, the following code shows the expression preceded by the specified column header in the SELECT clause of the SELECT statement (see fig. 6.37). You can enclose the text string in single or double quotation marks to retain embedded spaces.

If you perform an arithmetic operation on a column that contains a NULL, the result is an unknown value.

select badge, "badge + 5" = badge + 5
from employee

Fig. 6.37 - You can also specify an alternate column header for any column without using the column name in an expression.

Computed columns are not stored in the database. They may exist in a temporary table that is created during the execution of the SELECT statement. Since the data is not in the database, there is no way to directly verify the results of a computed column. One easy way to compute a value incorrectly is to ignore operator precedence. Arithmetic operators are performed in predetermined order unless parenthesis are used to force otherwise. Table 6.4 shows the order in which arithmetic operators are executed.

Table 6.4—Precedence Order of Arithmetic Operators
Operator Order of Precedence

* 1st

/ 1st

% 1st

+ 2nd

– 2nd

If you use multiple arithmetic operators with the same order of precedence, the expressions are evaluated from left to right. You can use parentheses to control the order of execution. Expressions in parentheses are evaluated before any other expressions. Use parentheses to evaluate expressions that contain addition and subtraction before the expressions that contain multiplication, division, and modulo operators.

The following code shows the use of parentheses in the SELECT clause. The constant 5 is added to each value of the badge column. After the constant is added to badge, the sum is multiplied by 2. See Figure 6.38.

select "badge + 5 * 2" = (badge + 5) * 2
from employee

Fig. 6.38 - You can use parentheses to make the evaluation order more descriptive, even if the parentheses are unnecessary.

You can perform arithmetic operations on different numeric datatypes in the same expression—a procedure called mixed mode arithmetic. The datatype of the result is determined by the rank of the datatype code stored in a column of a system table.

You can use a SELECT statement to retrieve the datatype names and their code numbers from the systypes system table. Table 6.5 shows the codes for the numeric datatypes.

Table 6.5—Type Codes for the Numeric Datatypes
Datatype Code

TINYINT 48

SMALLINT 52

INT[EGER] 56

REAL 59

MONEY 60

FLOAT 62

SMALLMONEY 122

When you write expressions using different datatypes, the results are returned in the datatype of the highest ranked datatype. For example, the values of a column that's defined as either a TINYINT or a SMALLINT datatype is converted to INT if they're evaluated in a expression that contains an INT datatype.

One exception to the datatype code rule applies to expressions that include columns with the FLOAT and MONEY datatypes. The evaluation of an expression that contains FLOAT and Money is returned as the MONEY datatype, even though the code number for MONEY (60) is lower than FLOAT (62). You can retrieve the code numbers for all the Transact-SQL datatypes by using the query in following code.

select name, type
from systypes
order by type desc

This statement retrieves the names and codes for all Transact-SQL datatypes in order by the highest code numbers.

Using a GROUP BY Clause

The GROUP BY clause divides a table into groups of rows. The rows in each group have the same value for a specified column. Duplicate values for each different value are placed in the same group. Grouping allows you to perform the same functions on groups of rows.

You can group by any number of columns in a statement. Columns in the select list must be in the GROUP BY clause or have a function used on it. The syntax of a SELECT statement that contains a GROUP BY clause is as follows:

SELECT column 1,...column n
FROM tablename
GROUP BY columnname 1, columnname n

GROUP BY targets only unique column values after sorting by ascending column value (default). GROUP BY is unlike the ORDER BY clause, which though it also sorts records in ascending order, it doesn't remove duplicate column values.

The example query shown in the following code groups the rows by the department column (see fig. 6.39). The departments are first sorted to group them together. The duplicate departments aren't displayed because the purpose of the GROUP BY clause in a SELECT statement is to form groups of rows for subsequent action by other clauses.

select department, "headcount" = count(*)
from employee
group by department

Fig. 6.39 - A SELECT statement containing a GROUP BY clause that sorts rows by the column referenced in the GROUP BY clause.

See the Chapter 8 section entitled Using COUNT

For example, you can select specific groups with a HAVING clause, which compares some property of the group with a constant value. If a group satisfies the logical expression in the HAVING clause, it's included in the query result. The syntax of a SELECT statement with a HAVING clause is

SELECT column 1,...column n
FROM tablename
GROUP BY columnname
HAVING expression

The HAVING clause is used to determine the groups to be displayed in the output of the SELECT statement. The following code shows the use of a HAVING clause (see fig. 6.40).

select department, "headcount" = count(*)
from employee
group by department
having count(*) = 1

Fig. 6.40 - A SELECT statement containing a HAVING clause that limits the returned rows.

Using a COMPUTE Clause in a SELECt Statement

You can use a COMPUTE clause in a SELECT statement with functions such as SUM(), AVG(), MIN(), MAX(), and COUNT(). The COMPUTE clause generates summary values that are displayed as additional rows. The COMPUTE clause works like a so-called control break, a mechanism used in applications called report writers. You can use the COMPUTE clause to produce summary values for groups as well as calculate values using more than one function for the same group.

A report writer is an application that permits you to retrieve data from a database without using SQL statements. Nowadays, a report writer is designed with a graphical user interface which permits you to point and click on buttons and menu commands to retrieve database data. You might find it useful to purchase a report writer to retrieve data from your database as well as for using SQL statements.

The general syntax of the COMPUTE clause is as follows:

COMPUTE row_aggregate(column name)
[,row_aggregate(column name,...]
[BY column name [,column name...]

See Chapter 8, "Using Functions" for more information.

Several restrictions apply to the use of a COMPUTE clause in a SELECT statement. The following list summarizes the COMPUTE clause restrictions:

You can't include text or image datatypes in a COMPUTE or COMPUTE BY clause.
DISTINCT isn't allowed with row aggregate functions.
Columns in a COMPUTE clause must appear in the statement's SELECT clause.
You can't use SELECT INTO in the same statement as a COMPUTE clause.
If you use COMPUTE BY, you must also use an ORDER BY clause.
Columns listed after COMPUTE BY must be identical to or a subset of those in the ORDER BY clause. They must also be in the same order, left to right, start with the same expression, and not skip any expressions.
You must use a column name or an expression in the ORDER BY clause, not a column heading.

Using COMPUTE Without BY
You can use a clause that contains the keyword COMPUTE without BY to display grand totals or counts. You can also use both a COMPUTE and a COMPUTE BY clause in a SELECT statement.

Using Subqueries

You can nest a complete SELECT statement within another SELECT statement. A SELECT statement that's nested within another SELECT statement is called a subquery. The nested or inner SELECT statement is evaluated and the result is available to the outer SELECT statement. To use a subquery, enclose a SELECT statement within parentheses to specify that it should be evaluated before the outer query.

The row or rows returned by the SELECT statement in parentheses are used by the outer SELECT statement. The rows returned by the inner SELECT statement are used in the position of the value in the WHERE clause of the outer SELECT statement. For example, in the following code, all rows are retrieved for the employee table, where the department is equal to the same department to which Bob Smith is a member (see fig. 6.41).

select *
from employee
where department = (
select department
from employee
where name = "bob smith")

Fig. 6.41 - You can nest a subquery within the subquery by using an additional set of parentheses around an enclosed SELECT statement.

Some restrictions apply to the use of subqueries. The SELECT list of a subquery must return either a single value or one or more rows (if an EXISTS is used) unless IN is used in the outer query. This would result in the following error if a comparison operator such as = (equal to) was used in the WHERE clause of the outer query:

Msg 512, Level 16, State 1
Subquery returned more than 1 value. This is illegal when the subquery
follows =, !=, <, <= , >, >=, or when the subquery is used as an expression.
Command has been aborted.

Us a NOT IN to eliminate rows that match the results of a subquery.

You're restricted in the choice of datatypes within subqueries. You can't use either the IMAGE or TEXT datatypes in the SELECT clause of the subquery. Also, the datatype of the value(s) returned by the subquery must match the datatype used in the WHERE clause of the outer query.

I tried to use the rows returned by a SELECT statement within another SELECT statement. Although I enclosed the inner SELECT statement in parentheses, I received an error when the statement executed.

Your nested SELECT statement probably returned more than one row value to the WHERE clause of your outer SELECT statement. Unless you're sure that the inner SELECT statement will return only a single row value, you should use an IN in the WHERE clause of the outer SELECT statement.

Using ANY and ALL

You can use ANY and ALL to modify the comparison operators that precede a subquery. In Transact-SQL, ANY and ALL don't have the same meaning that they do in the English language. For example, when > (greater than) is followed by ALL, it's interpreted as greater than all values—including the maximum value returned by a subquery. When > (greater than) is followed by ANY, it's interpreted as greater than at least one—the minimum.

If you have difficulty understanding the results of queries that contain one or more nested queries, you can separately execute each subquery. If you record the result of the execution of an inner query, you can use the values to help you interpret the results of the outer queries.

The = ANY keyword is evaluated identically to IN. It would be clearer to use an IN instead of = ANY.

Specifying Optimizer Hints

The optimizer hints clause is somewhat misleading. You use the optimizer clause of the SELECT statement to override the data-retrieval methods that are automatically chosen by the query optimizer. When a query is executed, a portion of SQL Server, called the query optimizer, determines how the retrieval of the data from the database is performed.

For example, although an index may exist on a table that's referenced in a query, the query optimizer may determine that it would be faster to retrieve the rows from the table without using the index. The query optimizer may not use an index because the number of rows requested by the query are few in number, and it would be faster to directly access the data rows rather than both the index rows and then data rows.

If multiple indexes exist on a table, the query optimizer will chose to return the rows of the table using the index that would result in the fastest retrieval of information.

You may, however, want to override the way in which the retrieval of rows will be done. For example, if two indexes—one clustered and one non-clustered—exist on a table, the query optimizer may choose to use the non-clustered index for the retrieval of the table rows. You can use the optimizer hints clause in a SELECT statement to force a retrieval using the clustered index. You want the rows to be retrieval by the clustered index because they will automatically be returned in ascending sorted order by the column or columns on which the clustered index was created.

For example, you can use the following optimizer hints clause in a SELECT clause to specify that the rows of a table will be retrieved by the clustered index:

select * from employee (index=0)

This statement specifies the index name or ID to use for that table. 0 forces the use of a clustered index (if one exists). If you use a optimizer hint of 1, a non-clustered index is used for retrieval of rows targeted in the SELECT statement.

You can confirm the method that's used to retrieve your rows by using the query option SHOWPLAN. This option returns information about how SQL Server performed your query. For example, if the rows of the query were retrieved without using an index, the method table scan appears in the information returned by SHOWPLAN. SHOWPLAN also clearly specifies if a clustered or non-clustered index is used for retrieval in a query. Click the Query Options button on the toolbar to bring up the Query Options dialog box, then select the Show Query Plan check box to return SHOWPLAN information.

You can also use the syntax index=index_column_name as an optimizer hint in a SELECT statement in place of the index number.

See the section in Chapter 17 entitled Interpreting SHOWPLAN Reports

You can use a second set of optimizer hints to control the synchronization, or locking of the tables in your SELECT statement. You can use the locking optimizer hints to override the way in which SQL Server normally controls access to data from multiple clients.

Using NOLOCK

You use the NOLOCK optimizer hint to permit you to read rows that SQL Server would normally not permit you to access. For example, if you use NOLOCK in a SELECT statement, you can read uncommitted rows.

Using HOLDLOCK

You use HOLDLOCK to prevent other clients from changing rows that are part of your SELECT clause until your transaction is complete. Normally, other clients can modify the rows as soon as they're displayed. One restriction of HOLDLOCK is that you can't use it in a SELECT clause that also contains a FOR BROWSE clause.

Using UPDLOCK

You use an UPDLOCK like HOLDLOCK to prevent other clients from changing rows that are part of your SELECT clause. UPDLOCK releases the rows of your table and the end of the command or next transaction, rather than at the end of the transaction only.

Using TABLOCK

You use TABLOCK like HOLDLOCK to prevent other clients from changing rows. TABLOCK, unlike HOLDLOCK, acts on the entire table, rather than just on the rows of your table. You can use TABLOCK along with HOLDLOCK to prevent other clients from changing rows of you entire table until your transaction completes.

Using PAGLOCK

You use PAGLOCK like HOLDLOCK to prevent other clients from changing rows. PAGLOCK prevents other clients from changing rows a table page at a time, rather than the entire table.

Using TABLOCKx

You use TABLOCKx to prevent other clients from displaying as well as changing an entire table referenced in your SELECT clause until your command or transaction is complete.

Using FASTFIRSTROW

You use FASTFIRSTROW to retrieve the rows of a table using a non-clustered index. Unlike the index=1 optimizer hint, the first row of the query is returned more quickly through use of optimized read techniques. The total time that it takes to perform the query may be longer than if the non-clustered index were used with the FASTFIRSTROW option. You use FASTFIRSTROW to get better response time by returning the initial results of your query faster.

The response time for a database such as Microsoft SQL Server is usually defined as time that it takes to display on a client system monitor the first row of a query. Throughput is the amount of time that it takes to complete an operation, whether or not part of the operation involves the display of information as feedback to a client. Often enough in computer systems, better response time can be achieved by though the throughput may be slower.

Using the FOR BROWSE option

You can use the FOR BROWSE clause in a SELECT statement to read a table that another client is now adding, deleting, or updating rows within. Normally, SQL Server won't permit you to read a table while pending updates, deletes, or inserts are uncommitted. There are restrictions on what other clauses your SELECT statement can contain to use the FOR BROWSE clause.

To use the FOR BROWSE clause in a SELECT statement, the SELECT statement must contain a table with a timestamp column and a unique index. To use the FOR BROWSE clause in a SELECT statement, the SELECT statement can't contain a UNION clause. FOR BROWSE should be the last clause of a SELECT statement.

Unique indexes and timestamp columns are required attributes of tables to be used with the FOR BROWSE clause. If a table doesn't meet these requirements, a retrieval statement executes as if the FOR BROWSE clause weren't present in the SELECT statement. A SELECT statement that tries to read the rows from a table that's being modified waits up the default query timeout interval of 5 minutes.

If the modification is completed within that time, the rows are displaying by the waiting query. If the pending modification doesn't complete within the timeout interval, the query fails. The FOR BROWSE clause in a SELECT statement permits you to read rows of a table while they're being changed.

If you use the FOR BROWSE clause in a SELECT statement, remember that you're looking at table rows whose values may not be kept by the user who's modifying the table. You must be willing to take a chance that a change you see in a table using a SELECT statement containing the FOR BROWSE clause may not be kept.

See the section in Chapter 12 entitled Defining Transactions

From Here...

In this chapter you've learned to write queries for the retrieval of data from a database contains only the required rows or columns. In addition, you've learned to manipulate the returned data, performing arithmetic operations and sorting the rows. Finally, you've learned to use optimization techniques to override the default actions of SQL Server to retrieve data faster or in different ways.

For more information about the topics mentioned in this chapter, see the following chapters:

Chapter 1, "Introducing Microsoft SQL Server" shows the syntax used for the command-line form of interactive SQL (ISQL).
Chapter 5, "Creating Database Tables and Using Datatypes" discusses the treatment of NULLs as column values as well as column datatypes.
Chapter 7, "Performing Operations on Tables" teaches you how to combine rows from multiple tables in the same query.
Chapter 10, "Managing and Using Indexes and Keys" teaches you how to create and use keys and indexes and how to constrain rows and columns to unique values.
Chapter 13, Managing Stored Procedures and Using Flow-Control Statements," teaches you how to use conditional statements and check status in global variables such as @@ERROR.

QUE Home Page

For technical support For our books And software contact support@mcp.com

Table of Contents

05 - Creating Database Tables and Using Datatypes

07 - Performing Operations on Tables