|
Several months ago, I had to do some custom
one-time data loads for a client. The data was in flat files,
and in the past that meant using SQL*Loader or UTL_FILE. But for
this particular effort, I decided to use Oracle's external table
feature. External tables allow a flat file to be queried as if
it was another table in the database. DML operations are not
permitted against an external table, and it may not be indexed.
However, I was able to query the external using a standard
SELECT statement, and even join the external table to an
internal Oracle table with no problem.
In this example, I am using a file I worked with
last baseball season that contained baseball standing
information. The file is located in the c:\ directory and
is named BaseballStandings.txt. The file contains the
following data:
NY
Yankees,92,64,AL,East
Boston,91,64,AL,East
Toronto,76,79,AL,East
Baltimore,70,86,AL,East
Tampa Bay,65,91,AL,East
Chi White Sox,94,62,AL,Central
Cleveland,92,64,AL,Central
Minnesota,78,78,AL,Central
Detroit,70,86,AL,Central
Kansas City,54,102,AL,Central
LA Angels,90,66,AL,West
Oakland,85,71,AL,West
Texas,77,79,AL,West
Seattle,67,89,AL,West
Atlanta,89,68,NL,East
Philadelphia,84,73,NL,East
Florida,80,77,NL,East
NY Mets,79,77,NL,East
Washington,79,78,NL,East
St. Louis,97,60,NL,Central
Houston,85,71,NL,Central
Milwaukee,78,78,NL,Central
Chi Cubs,77,79,NL,Central
Cincinnati,72,84,NL,Central
Pittsburgh,63,94,NL,Central
San Diego,77,79,NL,West
San Francisco,74,82,NL,West
Arizona,72,84,NL,West
LA Dodgers,70,86,NL,West
Colorado,65,91,NL,West
To use an external table, first you have to tell
Oracle where the external file is located. You do this using a
CREATE DIRECTORY command like this one:
CREATE
DIRECTORY c_external AS 'c:\';
Directory created.
The user executing this command must have CREATE DIRECTORY
privileges. This creates a directory object, c_external, that
points to the directory location c:\. Once that is
complete, I need to grant access to that directory to anyone who
needs to use it. In this example, I am granting the access to
that directory object to the user scott.
GRANT READ, WRITE ON DIRECTORY
c_external TO scott;
Grant succeeded.
The READ privilege is necessary to create the
external table. The WRITE access is necessary to query the
table, which seems odd, until you see that the query actually
creates a log file in the directory that is similar to a
SQL*Loader log file. Once the privileges have been granted, the
table may be created.
CREATE TABLE baseball_standings_external
(team VARCHAR2(20)
, wins NUMBER(3)
, losses NUMBER(3)
, league VARCHAR2(2)
, division VARCHAR2(7))
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY c_external
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL)
LOCATION ('baseballstandings.txt'))
REJECT LIMIT UNLIMITED
/
The beginning of the CREATE TABLE statement
looks very much like any other. The differences begin with the
follow clauses:
-
ORGANIZATION EXTERNAL - Defines the table as
external, meaning the data is stored in a file outside the
database.
-
TYPE ORACLE_LOADER - The "access_driver_type."
I've not seen anything other than ORACLE_LOADER.
-
DEFAULT DIRECTORY - Directory object
corresponding to the file system directory of the data file.
Log and error files will also be placed in this directory.
-
ACCESS PARAMETERS - Defines the format of
the record. In this case, the record is delimited by the new
line character, the fields are terminated by a comma, and
any missing field values are treated as null..
-
LOCATION - The data source. I've not seen
anything other than a file name.
-
REJECT LIMIT - Number of errors that may
occur before the query aborts. UNLIMITED means do not abort
the query due to errors. An error might be that the data in
a column is longer than defined, as we will see below.
Now when I execute a query against the table, I
see the following results:
SELECT *
FROM baseball_standings_external;
TEAM WINS LOSSES LE DIVISIO
-------------------- ---------- ---------- -- -------
NY Yankees 92 64 AL East
Boston 91 64 AL East
Toronto 76 79 AL East
Baltimore 70 86 AL East
Tampa Bay 65 91 AL East
Detroit 70 86 AL Central
LA Angels 90 66 AL West
Oakland 85 71 AL West
Texas 77 79 AL West
Seattle 67 89 AL West
Atlanta 89 68 NL East
Philadelphia 84 73 NL East
Florida 80 77 NL East
NY Mets 79 77 NL East
Washington 79 78 NL East
San Diego 77 79 NL West
San Francisco 74 82 NL West
Arizona 72 84 NL West
LA Dodgers 70 86 NL West
Colorado 65 91 NL West
20 rows selected.
If you know
baseball, you know that 8 teams are missing. In the same
directory as the data file, two new files have been created,
baseballstandings.bad and baseballstandings.log.
The log file contains the following information
LOG
file opened at 02/08/06 13:40:09
Field Definitions for table BASEBALL_STANDINGS_EXTERNAL
Record format DELIMITED BY NEWLINE
Data in file has same endianness as the platform
Rows with all null fields are accepted
Fields in Data Source:
TEAM CHAR (255)
Terminated by ","
Trim whitespace same as SQL Loader
WINS CHAR (255)
Terminated by ","
Trim whitespace same as SQL Loader
LOSSES CHAR (255)
Terminated by ","
Trim whitespace same as SQL Loader
LEAGUE CHAR (255)
Terminated by ","
Trim whitespace same as SQL Loader
DIVISION CHAR (255)
Terminated by ","
Trim whitespace same as SQL Loader
error processing column DIVISION in row 6 for datafile c:\baseballstandings.txt
ORA-01401: inserted value too large for column
...
The bad file contains the following:
Chi White
Sox,94,62,AL,Central
Cleveland,92,64,AL,Central
Minnesota,78,78,AL,Central
Kansas City,54,102,AL,Central
St. Louis,97,60,NL,Central
Houston,85,71,NL,Central
Milwaukee,78,78,NL,Central
Chi Cubs,77,79,NL,Central
Cincinnati,72,84,NL,Central
Pittsburgh,63,94,NL,Central
It turns out that there is an extra space at the
end of the line causing the DIVISION column to actually have a
maximum length of 8. I can alter the table definition using the
normal ALTER TABLE command and requery the data to retrieve all
data.
ALTER TABLE baseball_standings_external MODIFY (division
VARCHAR2(8))
Table altered.
SELECT *
FROM baseball_standings_external;
TEAM WINS LOSSES LE DIVISION
-------------------- ---------- ---------- -- --------
NY Yankees 92 64 AL East
Boston 91 64 AL East
Toronto 76 79 AL East
Baltimore 70 86 AL East
Tampa Bay 65 91 AL East
Chi White Sox 94 62 AL Central
Cleveland 92 64 AL
Central
Minnesota 78 78 AL
Central
Detroit 70 86 AL Central
Kansas City
54 102 AL Central
LA Angels 90 66 AL West
Oakland 85 71 AL West
Texas 77 79 AL West
Seattle 67 89 AL West
Atlanta 89 68 NL East
Philadelphia 84 73 NL East
Florida 80 77 NL East
NY Mets 79 77 NL East
Washington 79 78 NL East
St. Louis 97 60 NL Central
Houston 85 71 NL Central
Milwaukee 78 78 NL Central
Chi Cubs 77 79 NL Central
Cincinnati 72 84 NL Central
Pittsburgh 63 94 NL Central
San Diego 77 79 NL West
San Francisco 74 82 NL West
Arizona 72 84 NL West
LA Dodgers 70 86 NL West
Colorado 65 91 NL West
30 rows selected.
But the bottom line was to
load the data into our internal table. The internal table can be
created as simply as the following DDL.
CREATE TABLE
baseball_standings AS
SELECT *
FROM baseball_standings_external;
Note: This tip was tested using Oracle9ir2.
Was this tip useful? Did you find any errors? Do you have any suggestions? Do you care? Click
here for the tip feedback page. Thank you.
|