Home | Tip of the Week | Tip of the Week Archive | Partners | Clients | History | Friends

Tip of the Week (Archived)
The 75,000,000 Row Question

The subject line from the OTN forum was BEST WAY TO DELETE OVER A 75,000,000 RECORDS. Further explanation included the fact that the table contained approximately 113,294,685 records (a very accurate approximation) and the query condition

WHERE MONTHS_BETWEEN(SYSDATE, TO_DATE(YRMO_NBR||'01','YYYYMMDD')) > 38

It also seems like this is something that needs to be done on a regular basis. I think there is some discrepancy between the amount of data to be deleted, the need for the delete to be recurring, and the time frame, but taking the statements at face value, the requirement is to remove the majority of rows from a large table based on a date.

So what are the options? The first one mentioned in the forum was to use CREATE TABLE AS SELECT (CTAS) to create a copy of the table containing only the rows not to be deleted, since there are far fewer of them, drop the original table, and rename the copy. It would work, but it would not be my choice. There would need to be downtime for the table during the process, since new data would have to be prevented from entering the original table while the CTAS was running. There would also be issues dealing with dependencies and, for a period of time, additional disk space is required to hold a second copy of a large table.

Another option is partitioning. If this type of delete would need to be done on a ongoing basis, then partitioning is a terrific solution. The table is partitioned by month, the partition that is 39 months old is dropped each month, and a new partition is created for the upcoming month. It's quick, and as a by product, it is possible that query performance improves as well. There is some application redesign involved, but the biggest drawback is cost. This is a pricey solution if you are not already licensed to use partitioning. Partitioning is only available with the Enterprise version of Oracle (list price $40K per processor), and partitioning is even an extra cost option (list price $10K per processor) on top of that. (If anyone is interested, we offer a nice discount off list.)

A third option (my suggestion) is to actually figure out a way to do the delete as efficiently as possible. I put together the following example using bulk collect and delete. First I built a test table and populated it with 1,000,000 rows. Each row contains a column with a single digit from 0 through 9.

CREATE TABLE test_tbl
(test_col INTEGER);

BEGIN
   FOR i in 1..1000000 LOOP
      INSERT INTO test_tbl
      VALUES (i - TRUNC(i, -1));
   END LOOP;
   COMMIT;
END;
/

Then I used the following PL/SQL script to delete 70% of the rows in the table, trying to somewhat replicate the original request.

DECLARE
--
-- Create table of rowids to hold rows to be deleted.
   TYPE tt_delete IS TABLE OF ROWID;
   t_delete tt_delete;
--
-- Query to retrieve data to be deleted
   CURSOR c_delete IS
      SELECT ROWID
      FROM   test_tbl
      WHERE  test_col < 7;
--
-- Number of rows to be retrieved by each bulk collect fetch
   l_delete_buffer   CONSTANT PLS_INTEGER := 100000;
BEGIN
   OPEN c_delete;
   LOOP
      FETCH c_delete BULK COLLECT                 -- Fetch the defined number of rows
         INTO t_delete LIMIT l_delete_buffer;
      FORALL i IN 1..t_delete.COUNT               -- Delete the fetched rows in a single call
         DELETE test_tbl
         WHERE ROWID = t_delete (i);
      EXIT WHEN c_delete%NOTFOUND;
      COMMIT;                                     -- Commit after each delete call
   END LOOP;
   CLOSE c_delete;
END;
/

The procedure can be tweaked by changing the value of  l_delete_buffer, set above to 100,000. A larger value could improve performance, but consume more resources, such as memory and undo space. A smaller value would reduce the resource usage, but also limit performance.

How was the thread resolved? The initial response to the above approach was "Awesome!" Then the thread went on (and on) with how to tweak the above and partitioning.

Note: This tip was tested on Oracle9i.

Suggestions or comments? Click here for the tip feedback page. Thank you.

2000 Turnberry Circle, Glenmoore, PA 19343
Voice: (610) 942-1979
Fax: (610) 942-1990
Email


© 2005 Alydan Consulting, Inc.