Home | Tip of the Week | Tip of the Week Archive | Partners | Clients | History | Friends

Tip of the Week (Archived)
Data Access and Loading Using External Tables

Several months ago, I had to do some custom one-time data loads for a client. The data was in flat files, and in the past that meant using SQL*Loader or UTL_FILE. But for this particular effort, I decided to use Oracle's external table feature. External tables allow a flat file to be queried as if it was another table in the database. DML operations are not permitted against an external table, and it may not be indexed. However, I was able to query the external using a standard SELECT statement, and even join the external table to an internal Oracle table with no problem.

In this example, I am using a file I worked with last baseball season that contained baseball standing information. The file is located in the c:\ directory and is named BaseballStandings.txt. The file contains the following data:

NY Yankees,92,64,AL,East
Boston,91,64,AL,East
Toronto,76,79,AL,East
Baltimore,70,86,AL,East
Tampa Bay,65,91,AL,East
Chi White Sox,94,62,AL,Central
Cleveland,92,64,AL,Central
Minnesota,78,78,AL,Central
Detroit,70,86,AL,Central
Kansas City,54,102,AL,Central
LA Angels,90,66,AL,West
Oakland,85,71,AL,West
Texas,77,79,AL,West
Seattle,67,89,AL,West
Atlanta,89,68,NL,East
Philadelphia,84,73,NL,East
Florida,80,77,NL,East
NY Mets,79,77,NL,East
Washington,79,78,NL,East
St. Louis,97,60,NL,Central
Houston,85,71,NL,Central
Milwaukee,78,78,NL,Central
Chi Cubs,77,79,NL,Central
Cincinnati,72,84,NL,Central
Pittsburgh,63,94,NL,Central
San Diego,77,79,NL,West
San Francisco,74,82,NL,West
Arizona,72,84,NL,West
LA Dodgers,70,86,NL,West
Colorado,65,91,NL,West

To use an external table, first you have to tell Oracle where the external file is located. You do this using a CREATE DIRECTORY command like this one:

CREATE DIRECTORY c_external AS 'c:\';

Directory created.

The user executing this command must have CREATE DIRECTORY privileges. This creates a directory object, c_external, that points to the directory location c:\. Once that is complete, I need to grant access to that directory to anyone who needs to use it. In this example, I am granting the access to that directory object to the user scott.

GRANT READ, WRITE ON DIRECTORY c_external TO scott;

Grant succeeded.

The READ privilege is necessary to create the external table. The WRITE access is necessary to query the table, which seems odd, until you see that the query actually creates a log file in the directory that is similar to a SQL*Loader log file. Once the privileges have been granted, the table may be created.

CREATE TABLE baseball_standings_external
(team VARCHAR2(20)
, wins NUMBER(3)
, losses NUMBER(3)
, league VARCHAR2(2)
, division VARCHAR2(7))
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY c_external
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL)
LOCATION ('baseballstandings.txt'))
REJECT LIMIT UNLIMITED
/

The beginning of the CREATE TABLE statement looks very much like any other. The differences begin with the follow clauses:

  • ORGANIZATION EXTERNAL - Defines the table as external, meaning the data is stored in a file outside the database.

  • TYPE ORACLE_LOADER - The "access_driver_type." I've not seen anything other than ORACLE_LOADER.

  • DEFAULT DIRECTORY - Directory object corresponding to the file system directory of the data file. Log and error files will also be placed in this directory.

  • ACCESS PARAMETERS - Defines the format of the record. In this case, the record is delimited by the new line character, the fields are terminated by a comma, and any missing field values are treated as null..

  • LOCATION - The data source. I've not seen anything other than a file name.

  • REJECT LIMIT - Number of errors that may occur before the query aborts. UNLIMITED means do not abort the query due to errors. An error might be that the data in a column is longer than defined, as we will see below.

Now when I execute a query against the table, I see the following results:

SELECT *
FROM   baseball_standings_external;

TEAM                       WINS     LOSSES LE DIVISIO
-------------------- ---------- ---------- -- -------
NY Yankees                   92         64 AL East
Boston                       91         64 AL East
Toronto                      76         79 AL East
Baltimore                    70         86 AL East
Tampa Bay                    65         91 AL East
Detroit                      70         86 AL Central
LA Angels                    90         66 AL West
Oakland                      85         71 AL West
Texas                        77         79 AL West
Seattle                      67         89 AL West
Atlanta                      89         68 NL East
Philadelphia                 84         73 NL East
Florida                      80         77 NL East
NY Mets                      79         77 NL East
Washington                   79         78 NL East
San Diego                    77         79 NL West
San Francisco                74         82 NL West
Arizona                      72         84 NL West
LA Dodgers                   70         86 NL West
Colorado                     65         91 NL West

20 rows selected.

If you know baseball, you know that 8 teams are missing. In the same directory as the data file, two new files have been created, baseballstandings.bad and baseballstandings.log. The log file contains the following information

LOG file opened at 02/08/06 13:40:09

Field Definitions for table BASEBALL_STANDINGS_EXTERNAL
  Record format DELIMITED BY NEWLINE
  Data in file has same endianness as the platform
  Rows with all null fields are accepted

  Fields in Data Source:

    TEAM                            CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader
    WINS                            CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader
    LOSSES                          CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader
    LEAGUE                          CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader
    DIVISION                        CHAR (255)
      Terminated by ","
      Trim whitespace same as SQL Loader
error processing column DIVISION in row 6 for datafile c:\baseballstandings.txt
ORA-01401: inserted value too large for column
...

The bad file contains the following:

Chi White Sox,94,62,AL,Central
Cleveland,92,64,AL,Central
Minnesota,78,78,AL,Central
Kansas City,54,102,AL,Central
St. Louis,97,60,NL,Central
Houston,85,71,NL,Central
Milwaukee,78,78,NL,Central
Chi Cubs,77,79,NL,Central
Cincinnati,72,84,NL,Central
Pittsburgh,63,94,NL,Central

It turns out that there is an extra space at the end of the line causing the DIVISION column to actually have a maximum length of 8. I can alter the table definition using the normal ALTER TABLE command and requery the data to retrieve all data.

ALTER TABLE baseball_standings_external MODIFY (division VARCHAR2(8))

 

Table altered.

 

SELECT *
FROM   baseball_standings_external;

 

TEAM                       WINS     LOSSES LE DIVISION

-------------------- ---------- ---------- -- --------

NY Yankees                   92         64 AL East

Boston                       91         64 AL East

Toronto                      76         79 AL East

Baltimore                    70         86 AL East

Tampa Bay                    65         91 AL East

Chi White Sox                94         62 AL Central

Cleveland                    92         64 AL Central

Minnesota                    78         78 AL Central

Detroit                      70         86 AL Central

Kansas City                  54        102 AL Central

LA Angels                    90         66 AL West

Oakland                      85         71 AL West

Texas                        77         79 AL West

Seattle                      67         89 AL West

Atlanta                      89         68 NL East

Philadelphia                 84         73 NL East

Florida                      80         77 NL East

NY Mets                      79         77 NL East

Washington                   79         78 NL East

St. Louis                    97         60 NL Central

Houston                      85         71 NL Central

Milwaukee                    78         78 NL Central

Chi Cubs                     77         79 NL Central

Cincinnati                   72         84 NL Central

Pittsburgh                   63         94 NL Central

San Diego                    77         79 NL West

San Francisco                74         82 NL West

Arizona                      72         84 NL West

LA Dodgers                   70         86 NL West

Colorado                     65         91 NL West

 

30 rows selected.

But the bottom line was to load the data into our internal table. The internal table can be created as simply as the following DDL.

CREATE TABLE baseball_standings AS
SELECT *
FROM   baseball_standings_external;

Note: This tip was tested using Oracle9ir2.

Was this tip useful? Did you find any errors? Do you have any suggestions? Do you care? Click here for the tip feedback page. Thank you.

2000 Turnberry Circle, Glenmoore, PA 19343
Voice: (610) 942-1979
Fax: (610) 942-1990
Email


© 2006 Alydan Consulting, Inc.