magart 1 - themen

Synthetic Analytic

Analytic functions can do more than simple “partition ‘n rank”

by Jaromir D.B. Nemec

Analytic functions, introduced in Oracle 8i, are not really a new feature. In the mean time are they brightly popularized and well accepted. A typical example of using an analytic function is a product sales ranking report elegantly solving the problem, that would otherwise lead to deadly self outer join or would have to be solved procedurally. This article points out an alternative way of usage going beyond the “partition and rank” or “lag and lead” approach, demonstrating the brightness of this SQL extension.

Introduction

This article is not meant as an analytic function primer, so if you are not familiar with the basics of analytic function approach check the resource section to get the introducing information. But even without deep knowledge of analytic functions the main idea of this article should be understandable.

Balancing the Balance

Let’s start with a simplified accounting application, the main table consisting of account entries is described in Listing 1. The table contains an account number identifying the account (account_no), the transaction date (when_changed), booking date (when_booked) and an amount. The amount is positive for deposits, negative in case of withdrawals. A possible report ranking the total deposit amount per account and month is in Listing 2. In the inner query the total sum of deposits (positive amount) per month is calculated using the SUM as a classical aggregate function. In the further step the analytic function RANK is used to order the accounts on the total deposit (ORDER BY deposit desc) per month (PARTITION BY booking_month). What we see is for example that the winner (rank = 1) in January 2005 is the account 1003/1.

The SUM function used in this example as an aggregate function to calculate the total deposits can be easily leveraged – in its analytic form – to calculate the account balance. Let’s consider for a moment that our account table contains a complete list of transactions. This is not a very realistic assumption as most applications hold detail transaction data only for a limited period of time, but we will see how to deal with such limitation shortly. To calculate an account balance is not really a complicated thing. You must consider each account separately (PARTITION BY account_no) and summarize for every transaction all preceding transactions (ROWS UNBOUNDED PRECEDING ORDER BY when_booked). The windowing clause UNBOUNDED PRECEDING is an abbreviation, a more verbose syntax with the same meaning is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWS. This syntax states more clearly that we sum up all rows of a specific account from the first one till the current one – this is exactly the definition of a balance. The order of the rows is defined in the order by clause, in our case the booking date is used. For the complete statement look at Listing 3. Note that for the purpose of the calculation of balance both deposits and withdrawals are considered.

How to deal with the more complicated case, if there is no complete transaction history of an account? In case when the older transactions are removed from the table, it is not possible to start the sum at the beginning of an account so some workaround must be found. Assume there is a table containing the snapshot of the balance of all accounts at some point in time, say at the end of the year 2004 (see table account_12_31_2004 in Listing 4) and that after this point the transactions are complete. This allows us to start with the calculation of the balance per 31.12.2004 (from the snapshot table) and consider the complete transaction record of the year 2005. A simple UNION ALL select merges the snapshot information (table account_12_31_2004) with the transaction data. The rest of the statement is identical with the preceding statement, see Listing 5. The column source indicates the source of the information, so we can distinct between real transactions and information reconstructed from the snapshot table.

A more complicated, though more realistic case is that there is a table containing a current balance of each account (see table account_header in Listing 6). In this case we can start with the current state of the account (we'll use a scalar subquery to get it) and trace the balance backwards in time. Note the reverse descending order in the order by close of the sum function. The summed up balance is subtracted from the current balance (remember we are moving backwards in time). Additionally, due to the reverse order the current amount of each transaction must be added to the result, otherwise a balance before the transaction (and not after the transaction as required) will be the calculated – see Listing 6.

So far to the different possibilities of calculating the balance of an account. In the further examples we will use the simplest way considering the full transactional history. Of course all other options are applicable to the following examples as well.

Up to now we have been using only the booking date (when_booked) as the column defining the order of transactions. A usual account entry has two date columns so it is perfectly possible to use the charging date (when_charged) instead the booking date to build the balance. Even more, it is possible to calculate both balances in one statement. Note that because the booking date (i.e. the date the transaction was entered on the account) may vary from the charging date (i.e. the real timepoint of the transaction) in both directions, the received balances can be substantially different, see Listing 7.

Threshold Triggering

Computing a balance for an account enables to perform threshold based supervising of booking entries (generating of threshold events). A definition of such an event is very simple. An underflow event for threshold X (X stands for an arbitrary balance amount) raise if the current balance (i.e. the balance after the transaction) is lower that X and the balance of the preceding transaction (i.e. the balance before the transaction) is higher or equal to X, i.e. the current transaction crossed the threshold.

The overflow event is defined analogically. The constraint on the balance before the event is important to suppress repeating alarms; in case that more succeeding transactions violate the defined limit only the first one is treated as a threshold event, the succeeding transactions are ignored.

Supervising such threshold events can be very useful: overflow events can trigger some bonus treatment of the customer, underflow events are connected with features from the opposite site of the customer relationship management toolset. How to calculate the events is illustrated in Listing 8. The balance is treated the same way as in preceding examples, based on balance the “before balance” (i.e. the balance of the transaction immediately preceding the current transaction) is calculated using the LAG function. The last step then tests the threshold based on the definition and decides whether the event occurred or not.

Note the usage of inline views in the select statement, this is typical for combining analytical functions if they can’t be used in the same select statement.

State of the Account

Let’s extend our application introducing a table containing the opening and closing log of an account, see table account_log in Listing 9. Any time when an account is opened or closed an appropriate row is stored in the table. Using this evidence is it possible to reconstruct the status of an account at any arbitrary timepoint? We will see that this task is very similar to the computing of the balance. The necessary abstraction is very straightforward. Simply consider the status of an account as a number, zero describes an inactive (deactive) account, one stands for an active account. In this context an account opening corresponds to adding one to the numeric status, account closing is done by subtracting one from the current state. To reconstruct the status of an account is as simple as to balance the open and closed events. This is illustrated in Listing 10. The principle is the same as used for the balancing (see Listing 3) the case statement within the sum function transforms the open / closed events to the appropriate zero / one logic. The decode transfers the numerical information back to the active / deactive status.

Using the LEAD function a thru_date attribute is calculated that defines the validity interval of the account status. The status of an account begins its validity at the time of the transaction booking date (when_booked) and remains unchanged till thru_date. A thru_date is a non-inclusive valid through date. The thru date is set to NULL if the interval has an open end.

This approach is very intuitive and can be used even in case of reopening an account but contains a possible pitfall. If the input data is not correct, e.g. contains a closing record without corresponding opening entry we can face some surprising results. Note the "unknown" option in the decode statement. Lets simulate this behavior by inserting an invalid row, a record with closing event for an already closed account, see the top of Listing 11. Without a care this will lead to an incorrect result, simply repeat the statement from Listing 10 with this invalid row to see the invalid state to be deployed.

This behavior is generally bad as it can skew the reporting. Thought this can be easily fixed introducing the concept of validation of events. We accept an opening event if and only if the status of account is deactive (note that not existing account is considered to be deactive), similarly a closing event is only permitted for an active account. Technically we will use a concept similar to events threshold tracing (see Listing 8), for each event we additionally track the account status before the event. This information is used to filter out the invalid events. Listing 11 shows the implementation of the filter to distinct between valid and invalid events. The business rules for the validation of events are quite simple: only active accounts can be closed and only deactive (or not existing) accounts can be opened.

Note that the computation of the account status and the validation of events is a classical egg – hen problem; for the calculation of the account status of course only valid events are considered, but for events validation the account status (of the previous transaction) must be known. These two tasks are not separable, the solution is to perform both of them simultaneously in one pass.

Period of Activity

Well, but one could argue that the state of an account is not fine enough. Of course there is a great difference between an account with one move in a year and an other one with several transactions each day, both extremes are considered to be same in our status model. To distinct the behavior we introduce an alternative status model considering the active period of an account. The definition of an active period is based on usage of the account; to stay active there is a mandatory condition, say at least one transaction each seven days (the length of the interval is arbitrary chosen). In simple words each account stays active for seven days after a transaction; after seven days of inactivity the status is changed to inactive. If within this interval a new transaction is performed, the active period is prolonged.

The possible implementation of active periods of accounts is in Listing 12. First of all in innermost in line view the starting rows of each active period are flagged. A special treatment is performed for the first row of each account, this is always a starting row of an active period. According to the definition, any row with a gap bigger than seven days to the preceding row is considered as a starting point. The starting points receive the corresponding row_number (simple the sequential number of the transaction within the account), all others are set to NULL – see column RN.

The good news is we have created in column RN an unique identification of an active period within each account; the bad news is that this identification is defined in the starting row only – not in the subsequent rows of the period. But the gaps can simply be filled using the LAST_VALUE function with the IGNORE NULLS option. Note that the IGNORE NULLS option is a new feature in 10g, so in previous releases you must use the MAX function (simple comment out the LAST_VALUE line and re-instate the MAX statement in Listing 12).

As for each period there is only one non NULL value in the column rn, the max function overwrites all NULLs with this value – see column period_id. In the next step the period’s index (relative position within the period) and cardinality (number of transactions within the period) are determined using row_number respectively count function. The final step tags the transactions as start or stop of a period using index and cardinality considering a special case of a “dummy” transaction period that is simultaneously start and stop.

As already noted the interval between transactions (seven days in our case) is chosen arbitrarily, if we move it to the level of seconds the problem can be transformed to the question of grouping events to a session. Check the resource box for credit of the idea how to solve this problem using analytical functions.

The next logical step after introducing the term active period is a so called “parallel report”, defining at any point of time the total sum of all active accounts (i.e. accounts within an active period). To report this is not really new for us. We will apply the same principle as used for balancing of the account state. Each start and stop of an active period of an account is treated as a relevant event causing an increase or decrease of the „degree of parallelism” by one. The trivial active period (consisting of only one record) doesn’t change the balance as it both adds and subtracts one to the result. The only difference to example of account status (see Listing 9) is in the definition of the partition, the partition is not bound to a particular account but defined globally. See the select statement in Listing 13. Note that to get a reliable report only validated events should be considered.

Transaction View

An alternative way of storing the status information of accounts is shown in Listing 14. Each active period has an opening and closing timestamp (start_date and end_date), the main difference is that for each period of activity there is only one row in the table containing the opening and closing information, i.e. not two separate records as in Listing 13. Note that we discarded the “zero length” transactions to keep the example simple.

But this alternative presentation is not really a big difference to the previous one. A very simple trick transforms this view of data into the formerly used form. A Cartesian join with a table consisting of two rows (a double dual table; the first row is an opening record, the second one is a closing record) will effectively multiply the rows by two, the account identification and period_id will be taken over in both records, the timestamp will be processed selectively. The opening record gets the opening timestamp, the closing record receives the closing timestamp of the period, see Listing 15. So if you report the parallel degree on this “transaction” view of data with this simple trick, you can use the solution Listing 13.

An interesting question is the handling of the opening and closing point of the period. Imagine there are only the two periods, one ending at timestamp X, the other starting at the same timestamp X. It is obvious that most of the time the degree of parallelism is one, but what exactly is the degree in the timestamp X? The possible answers range from zero to two, depending on the inclusive or exclusive interpretation of the endpoints. This problem becomes obvious if we report the maximum (or minimum) degree of parallelism. In this case the order of the events, particularly those with the same timestamp is extremely important. To see the difference we have to consider the two alternative statements in Listing 16, both are computing the maximum degree of parallelism, both yielding a different result. The subtle difference is the ordering clause of window: the first approach priories the opening events, they are considered before the closing events with the same timestamp, the second statement goes the other way round. The difference is obvious, the first approach results in a bigger throughput than the second one. Which solution is right? Well, it depends on the nature of the business. I’d expect, while reporting the throughput of business workflow engine, the first solution will be used, whereas reporting the open shipments, not resolved bugs etc. the second solution will be appropriate. The important thing is to define the order explicitly if the transaction timestamps are not distinctive enough, otherwise the result could fluctuate randomly.

Conclusions

Far a long time some analytic tasks have been beyond the scope of easy solution within a relational SQL. With the introduction of the SQL functions for analysis this isn't valid any more. A great deal of analytic problems – especially in the data warehouse environment - is now easily solvable with the extended SQL, this article points out only a small part of this problems.

Jaromir D.B. Nemec (jaromir at db-nemec dot com) is an independent consultant focusing on Oracle databases, data warehouse systems and integration solutions.

NextSTEPS

Reference

Script with test data and all examples - synthetic_analytic.sql

Analytics “official site”

Oracle Database Data Warehousing Guide

http://www.oracle.com/technology/documentation

Search Ask Tom

http://asktom.oracle.com/pls/ask/search?p_string=analytics%20rock%20analytics%20roll

Check http://asktom.oracle.com/pls/ask/f?p=4950:8:3135579430049857077::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:13946369553642 for the discussion and credit to the solution of the “session problem”

last revision 11.1.2006

CodeLISTING

listing 1: Create environment

SQL> create table booking_entry

2 (

3 account_no varchar2(10) not null,

4 when_booked date not null,

5 when_charged date not null,

6 amount number(10,4) not null

7 )

8 ;

Table created.

SQL> select * from booking_entry order by 1,2;

ACCOUNT_NO WHEN_BOOK WHEN_CHAR AMOUNT

---------- --------- --------- ----------

1003/1 20-DEC-04 10-DEC-04 1000

1003/1 05-JAN-05 10-JAN-05 100

1003/1 12-JAN-05 08-JAN-05 50

1003/1 19-JAN-05 09-JAN-05 -120

1003/1 31-JAN-05 01-FEB-05 10

2009/2 10-DEC-04 10-DEC-04 10

2009/2 01-JAN-05 05-JAN-05 20

2009/2 05-JAN-05 15-JAN-05 30

2009/2 15-JAN-05 13-JAN-05 -100

2009/2 20-JAN-05 01-FEB-05 60

10 rows selected.

listing 2: Top deposits per account and month

SQL> select booking_month, account_no, deposit,

2 rank() over (partition by booking_month order by deposit desc) account_rank

3 from (

4 select account_no, trunc(when_booked,'MM') booking_month,

5 sum(amount) deposit

6 from booking_entry

7 where amount > 0

8 group by account_no, trunc(when_booked,'MM')

9 )

10 order by booking_month, 4

11 ;

BOOKING_M ACCOUNT_NO DEPOSIT ACCOUNT_RANK

--------- ---------- ---------- ------------

01-DEC-04 1003/1 1000 1

01-DEC-04 2009/2 10 2

01-JAN-05 1003/1 160 1

01-JAN-05 2009/2 110 2

listing 3: Balance of an account

SQL> select account_no, when_booked, amount,

2 sum(amount) over (partition by account_no order by when_booked rows unbounded preceding) as balance

3 from booking_entry

4 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK AMOUNT BALANCE

---------- --------- ---------- ----------

1003/1 20-DEC-04 1000 1000

1003/1 05-JAN-05 100 1100

1003/1 12-JAN-05 50 1150

1003/1 19-JAN-05 -120 1030

1003/1 31-JAN-05 10 1040

2009/2 10-DEC-04 10 10

2009/2 01-JAN-05 20 30

2009/2 05-JAN-05 30 60

2009/2 15-JAN-05 -100 -40

2009/2 20-JAN-05 60 20

10 rows selected.

listing 4: Account balance using snapshot

SQL> create table account_12_31_2004

2 (

3 account_no varchar2(10),

4 when_closed date,

5 balance number(10,4)

6 )

7 ;

Table created.

SQL> select * from account_12_31_2004

2 order by 1;

ACCOUNT_NO WHEN_CLOS BALANCE

---------- --------- ----------

1003/1 31-DEC-04 1000

2009/2 31-DEC-04 10

Listing 5: Balance of account with incomplete transaction history

SQL> select account_no, when_booked, amount,

2 sum(amount) over (partition by account_no order by when_booked rows unbounded preceding) as balance,

3 source

4 from

5 ( -- merge snapshot with transactions

6 select account_no, when_closed when_booked, balance amount, 'snapshot' source

7 from account_12_31_2004

8 union all

9 select account_no, when_booked, amount, 'transaction' source

10 from booking_entry

11 where when_booked > to_date('12/31/2004','mm/dd/yyyy')

12 )

13 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK AMOUNT BALANCE SOURCE

---------- --------- ---------- ---------- -----------

1003/1 31-DEC-04 1000 1000 snapshot

1003/1 05-JAN-05 100 1100 transaction

1003/1 12-JAN-05 50 1150 transaction

1003/1 19-JAN-05 -120 1030 transaction

1003/1 31-JAN-05 10 1040 transaction

2009/2 31-DEC-04 10 10 snapshot

2009/2 01-JAN-05 20 30 transaction

2009/2 05-JAN-05 30 60 transaction

2009/2 15-JAN-05 -100 -40 transaction

2009/2 20-JAN-05 60 20 transaction

10 rows selected.

Listing 6: Balance reconstructed from current state

SQL> create table account_header

2 (

3 account_no varchar2(10),

4 when_booked date, -- last transaction

5 current_balance number(10,4)

6 )

7 ;

Table created.

SQL> select * from account_header

2 order by 1;

ACCOUNT_NO WHEN_BOOK CURRENT_BALANCE

---------- --------- ---------------

1003/1 01-FEB-05 1040

2009/2 01-FEB-05 20

SQL> select account_no, when_booked, amount,

2 (select current_balance from account_header where account_no = a.account_no) + amount -

3 sum(amount) over (partition by account_no order by when_booked desc rows unbounded preceding) as balance

4 from booking_entry a

5 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK AMOUNT BALANCE

---------- --------- ---------- ----------

1003/1 20-DEC-04 1000 1000

1003/1 05-JAN-05 100 1100

1003/1 12-JAN-05 50 1150

1003/1 19-JAN-05 -120 1030

1003/1 31-JAN-05 10 1040

2009/2 10-DEC-04 10 10

2009/2 01-JAN-05 20 30

2009/2 05-JAN-05 30 60

2009/2 15-JAN-05 -100 -40

2009/2 20-JAN-05 60 20

10 rows selected.

Listing 7: Double balance, ordered on when_charged

SQL> select account_no, when_booked, when_charged, amount,

2 sum(amount) over (partition by account_no order by when_booked rows unbounded preceding) as booking_balance,

3 sum(amount) over (partition by account_no order by when_charged rows unbounded preceding) as charging_balance

4 from booking_entry

5 order by account_no, when_charged;

ACCOUNT_NO WHEN_BOOK WHEN_CHAR AMOUNT BOOKING_BALANCE CHARGING_BALANCE

---------- --------- --------- ---------- --------------- ----------------

1003/1 20-DEC-04 10-DEC-04 1000 1000 1000

1003/1 12-JAN-05 08-JAN-05 50 1150 1050

1003/1 19-JAN-05 09-JAN-05 -120 1030 930

1003/1 05-JAN-05 10-JAN-05 100 1100 1030

1003/1 31-JAN-05 01-FEB-05 10 1040 1040

2009/2 10-DEC-04 10-DEC-04 10 10 10

2009/2 01-JAN-05 05-JAN-05 20 30 30

2009/2 15-JAN-05 13-JAN-05 -100 -40 -70

2009/2 05-JAN-05 15-JAN-05 30 60 -40

2009/2 20-JAN-05 01-FEB-05 60 20 20

10 rows selected.

Listing 8: Generate threshold events

SQL> select * from

2 (select

3 account_no, when_booked, amount, balance,

4 case

5 when balance < -25 and balance_lag >= -25 then 'treshold -25 underflow'

6 when balance > 1000 and balance_lag <= 1000 then 'treshold 1000 overflow'

7 end threshhold_event

8 from

9 (select

10 account_no, when_booked, amount, balance,

11 lag(balance,1,0) over (partition by account_no order by when_booked) as balance_lag

12 from (

13 --- balance generating query

14 select account_no, when_booked, amount,

15 sum(amount) over (partition by account_no order by when_booked rows unbounded preceding) as balance

16 from booking_entry

17 )))

18 where -- filter events

19 threshhold_event is not null

20 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK AMOUNT BALANCE THRESHHOLD_EVENT

---------- --------- ---------- ---------- ----------------------

1003/1 05-JAN-05 100 1100 treshold 1000 overflow

2009/2 15-JAN-05 -100 -40 treshold -25 underflow

Listing 9: Account status

SQL> create table account_status

2 (

3 account_no varchar2(10),

4 when_booked date,

5 account_change varchar2(20) check (account_change in ('OPEN','CLOSE'))

6 )

7 ;

Table created.

SQL> select * from account_status

2 order by 1;

ACCOUNT_NO WHEN_BOOK ACCOUNT_CHANGE

---------- --------- --------------------

1003/1 01-JAN-04 OPEN

2009/2 01-MAY-05 CLOSE

2009/2 01-JUN-04 OPEN

Listing 10: Tracking account status

SQL> select account_no, when_booked,

2 lead(when_booked) over (partition by account_no order by when_booked) as thru_date,

3 account_change,

4 decode(

5 sum(case when account_change = 'OPEN' then 1

6 when account_change = 'CLOSE' then -1

7 else 0 end) over (partition by account_no order by when_booked rows unbounded preceding),

8 1,'ACTIVE',0,'DEACTIVE','UNKNOWN')

9 as account_status

10 from account_status

11 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK THRU_DATE ACCOUNT_CHANGE ACCOUNT_

---------- --------- --------- -------------------- --------

1003/1 01-JAN-04 OPEN ACTIVE

2009/2 01-JUN-04 01-MAY-05 OPEN ACTIVE

2009/2 01-MAY-05 CLOSE DEACTIVE

Listing 11: Invalid events

SQL> -- add an invalid event as a test case

SQL> insert into account_status values ('2009/2',to_date('05/02/2005','mm/dd/yyyy'), 'CLOSE');

1 row created.

SQL> commit;

Commit complete.

SQL> select account_no, when_booked, account_change,

2 case when (account_change = 'OPEN' and status_before = 'DEACTIVE') or

3 (account_change = 'CLOSE' and status_before = 'ACTIVE') then 'VALID'

4 else 'INVALID' end as event_validation

5 from

6 (select account_no, when_booked, account_change,

7 lag(account_status,1,'DEACTIVE') over (partition by account_no order by when_booked) as status_before

8 from

9 (select account_no, when_booked, account_change,

10 decode(

11 sum(case when account_change = 'OPEN' then 1

12 when account_change = 'CLOSE' then -1

13 else 0 end) over (partition by account_no order by when_booked rows unbounded preceding),

14 1,'ACTIVE',0,'DEACTIVE','UNKNOWN')

15 as account_status

16 from account_status

17 ))

18 order by account_no, when_booked;

ACCOUNT_NO WHEN_BOOK ACCOUNT_CHANGE EVENT_V

---------- --------- -------------------- -------

1003/1 01-JAN-04 OPEN VALID

2009/2 01-JUN-04 OPEN VALID

2009/2 01-MAY-05 CLOSE VALID

2009/2 02-MAY-05 CLOSE INVALID

Listing 12 - Active period

SQL> create or replace view select_from_listing_12 as

2 select account_no, when_booked,

3 case

4 when period_cardinality = 1 then 'PERIOD START/STOP'

5 when period_index = 1 then 'PERIOD START'

6 when period_index = period_cardinality then 'PERIOD STOP' end as period_change,

7 rn, period_id, period_index, period_cardinality

8 from

9 (select account_no, when_booked, period_id,

10 row_number() over (partition by account_no, period_id order by when_booked) as period_index,

11 count(*) over (partition by account_no, period_id) as period_cardinality, rn

12 from

13 (select account_no, when_booked, rn,

14 last_value(rn ignore nulls) over (partition by account_no order by when_booked) as period_id

15 -- use max(rn) in Oracle 9i

16 -- max(rn) over (partition by account_no order by when_booked) as period_id

17 from

18 (select account_no, when_booked,

19 case

20 when row_number() over (partition by account_no order by when_booked) = 1 then 1 -- initial period

21 when when_booked - lag(when_booked) over (partition by account_no order by when_booked) > 7 then

22 row_number() over (partition by account_no order by when_booked)

23 else null end as rn

24 from booking_entry

25 )))

26 order by account_no, when_booked;

View created.

SQL> select * from select_from_listing_12;

ACCOUNT_NO WHEN_BOOK PERIOD_CHANGE RN PERIOD_ID PERIOD_INDEX PERIOD_CARDINALITY

---------- --------- ----------------- ---------- ---------- ------------ ------------------

1003/1 20-DEC-04 PERIOD START/STOP 1 1 1 1

1003/1 05-JAN-05 PERIOD START 2 2 1 3

1003/1 12-JAN-05 2 2 3

1003/1 19-JAN-05 PERIOD STOP 2 3 3

1003/1 31-JAN-05 PERIOD START/STOP 5 5 1 1

2009/2 10-DEC-04 PERIOD START/STOP 1 1 1 1

2009/2 01-JAN-05 PERIOD START 2 2 1 2

2009/2 05-JAN-05 PERIOD STOP 2 2 2

2009/2 15-JAN-05 PERIOD START 4 4 1 2

2009/2 20-JAN-05 PERIOD STOP 4 2 2

10 rows selected.

Listing 13 - Parallelism of active periods

SQL> select

2 account_no,

3 when_booked, period_change,

4 sum(period_event) over (order by when_booked) as dop

5 from

6 (select

7 account_no, when_booked,

8 case

9 when period_change = 'PERIOD START/STOP' then 0

10 when period_change = 'PERIOD START' then 1

11 when period_change = 'PERIOD STOP' then -1 end as period_event, period_change

12 from select_from_listing_12

13 where period_change is not null

14 )

15 order by when_booked;

ACCOUNT_NO WHEN_BOOK PERIOD_CHANGE DOP

---------- --------- ----------------- ----------

2009/2 10-DEC-04 PERIOD START/STOP 0

1003/1 20-DEC-04 PERIOD START/STOP 0

2009/2 01-JAN-05 PERIOD START 1

1003/1 05-JAN-05 PERIOD START 1

2009/2 05-JAN-05 PERIOD STOP 1

2009/2 15-JAN-05 PERIOD START 2

1003/1 19-JAN-05 PERIOD STOP 1

2009/2 20-JAN-05 PERIOD STOP 0

1003/1 31-JAN-05 PERIOD START/STOP 0

9 rows selected.

Listing 14: Normalized view on account periods

SQL> create or replace view select_from_listing_14 as

2 select

3 account_no, period_id, min(when_booked) start_date, max(when_booked) stop_date

4 from select_from_listing_12

5 group by account_no, period_id

6 order by account_no, period_id;

View created.

SQL> select * from select_from_listing_14 where start_date != stop_date;

ACCOUNT_NO PERIOD_ID START_DAT STOP_DATE

---------- ---------- --------- ---------

1003/1 2 05-JAN-05 19-JAN-05

2009/2 2 01-JAN-05 05-JAN-05

2009/2 4 15-JAN-05 20-JAN-05

Listing 15: Transformation of the transaction view

SQL> create or replace view select_from_listing_15 as

2 ---

3 select a.account_no, a.period_id,

4 case when b.period_event = 1 then start_date else stop_date

5 end as transaction_date,

6 period_event,

7 case when b.period_event = 1 then 'PERIOD START' else 'PERIOD STOP'

8 end as period_change

9 from

10 (select * from select_from_listing_14 where start_date != stop_date) a, -- cartesian join

11 (select 1 period_event from dual union all select -1 period_event from dual) b

12 order by 3;

View created.

SQL> select * from select_from_listing_15;

ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_EVENT PERIOD_CHANG

---------- ---------- --------- ------------ ------------

2009/2 2 01-JAN-05 1 PERIOD START

1003/1 2 05-JAN-05 1 PERIOD START

2009/2 2 05-JAN-05 -1 PERIOD STOP

2009/2 4 15-JAN-05 1 PERIOD START

1003/1 2 19-JAN-05 -1 PERIOD STOP

2009/2 4 20-JAN-05 -1 PERIOD STOP

6 rows selected.

Listing 16: Degree of parallelism

SQL> select account_no, period_id,transaction_date, period_change, period_event,

2 sum(period_event) over (order by transaction_date, period_event rows unbounded preceding) as parallel_degree

3 from select_from_listing_15

4 order by transaction_date, period_event;

ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_CHANG PERIOD_EVENT PARALLEL_DEGREE

---------- ---------- --------- ------------ ------------ ---------------

2009/2 2 01-JAN-05 PERIOD START 1 1

2009/2 2 05-JAN-05 PERIOD STOP -1 0

1003/1 2 05-JAN-05 PERIOD START 1 1

2009/2 4 15-JAN-05 PERIOD START 1 2

1003/1 2 19-JAN-05 PERIOD STOP -1 1

2009/2 4 20-JAN-05 PERIOD STOP -1 0

6 rows selected.

SQL> select account_no, period_id,transaction_date, period_change, period_event,

2 sum(period_event) over (order by transaction_date, period_event desc rows unbounded preceding) as parallel_degree

3 from select_from_listing_15

4 order by transaction_date, period_event desc;

ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_CHANG PERIOD_EVENT PARALLEL_DEGREE

---------- ---------- --------- ------------ ------------ ---------------

2009/2 2 01-JAN-05 PERIOD START 1 1

1003/1 2 05-JAN-05 PERIOD START 1 2

2009/2 2 05-JAN-05 PERIOD STOP -1 1

2009/2 4 15-JAN-05 PERIOD START 1 2

1003/1 2 19-JAN-05 PERIOD STOP -1 1

2009/2 4 20-JAN-05 PERIOD STOP -1 0

6 rows selected.