Synthetic Analytic
by Jaromir D.B. Nemec
Analytic
functions, introduced in Oracle 8i, are
not really a new feature. In the mean time are they brightly popularized and
well accepted. A typical example of using an analytic function is a product
sales ranking report elegantly solving the problem, that would otherwise lead
to deadly self outer join or would have to be solved procedurally. This article
points out an alternative way of usage going beyond the “partition and rank” or
“lag and lead” approach, demonstrating the brightness of this SQL extension.
This
article is not meant as an analytic function primer, so if you are not familiar
with the basics of analytic function approach check the resource section to get
the introducing information. But even without deep knowledge of analytic
functions the main idea of this article should be understandable.
Let’s start with a simplified accounting
application, the main table consisting of account entries is described in Listing 1. The table contains an account number
identifying the account (account_no), the transaction date (when_changed),
booking date (when_booked) and an amount. The amount is positive for deposits,
negative in case of withdrawals. A
possible report ranking the total deposit amount per account and month is in Listing 2. In
the inner query the total sum of deposits (positive amount) per month is
calculated using the SUM as a classical
aggregate function. In the further step the analytic function RANK is used to order the accounts on the
total deposit (ORDER BY deposit desc)
per month (PARTITION BY booking_month). What we see is for example that the
winner (rank = 1) in January 2005 is the account 1003/1.
The SUM function used in this
example as an aggregate function to
calculate the total deposits can be easily leveraged – in its analytic form –
to calculate the account balance. Let’s consider for a moment that our account
table contains a complete list of transactions. This is not a very realistic
assumption as most applications hold detail transaction data only for a limited
period of time, but we will see how to deal with such limitation shortly. To
calculate an account balance is not really a complicated thing. You must
consider each account separately (PARTITION BY account_no) and summarize for every transaction all preceding transactions (ROWS UNBOUNDED PRECEDING ORDER BY when_booked).
The windowing clause UNBOUNDED PRECEDING is
an abbreviation, a more verbose syntax with the same meaning is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWS.
This syntax states more clearly that we sum up all rows of a specific account
from the first one till the current one – this is exactly the definition of a
balance. The order of the rows is defined in the order by clause, in our case
the booking date is used. For the complete statement look at Listing 3. Note that for the purpose of the
calculation of balance both deposits and withdrawals are considered.
How to deal with the more complicated case, if there is no complete transaction history of an account? In case when the older transactions are removed from the table, it is not possible to start the sum at the beginning of an account so some workaround must be found. Assume there is a table containing the snapshot of the balance of all accounts at some point in time, say at the end of the year 2004 (see table account_12_31_2004 in Listing 4) and that after this point the transactions are complete. This allows us to start with the calculation of the balance per 31.12.2004 (from the snapshot table) and consider the complete transaction record of the year 2005. A simple UNION ALL select merges the snapshot information (table account_12_31_2004) with the transaction data. The rest of the statement is identical with the preceding statement, see Listing 5. The column source indicates the source of the information, so we can distinct between real transactions and information reconstructed from the snapshot table.
A
more complicated, though more realistic case is that there is a table
containing a current balance of each account (see table account_header in Listing 6). In this case we can start with the
current state of the account (we'll use a scalar subquery to get it) and trace
the balance backwards in time. Note the reverse descending order in the order
by close of the sum function. The summed up balance is subtracted from the
current balance (remember we are moving backwards in time). Additionally, due
to the reverse order the current amount of each transaction must be added to
the result, otherwise a balance before
the transaction (and not after the transaction as required) will be the
calculated – see Listing 6.
So far to
the different possibilities of calculating the balance of an account. In the
further examples we will use the simplest way considering the full
transactional history. Of course all other options are applicable to the
following examples as well.
Up to now we have been using only the
booking date (when_booked) as the column defining the order of transactions. A
usual account entry has two date columns so it is perfectly possible to use the
charging date (when_charged) instead the booking date to build the balance. Even
more, it is possible to calculate both
balances in one statement. Note that because the booking date (i.e. the date
the transaction was entered on the account) may vary from the charging date
(i.e. the real timepoint of the transaction) in both directions, the received
balances can be substantially different, see Listing 7.
Computing a balance for an account enables to perform threshold based supervising of booking entries (generating of threshold events). A definition of such an event is very simple. An underflow event for threshold X (X stands for an arbitrary balance amount) raise if the current balance (i.e. the balance after the transaction) is lower that X and the balance of the preceding transaction (i.e. the balance before the transaction) is higher or equal to X, i.e. the current transaction crossed the threshold.
The overflow event is defined analogically. The constraint on the balance before the event is important to suppress repeating alarms; in case that more succeeding transactions violate the defined limit only the first one is treated as a threshold event, the succeeding transactions are ignored.
Supervising
such threshold events
can be very useful: overflow events can trigger some bonus treatment of the
customer, underflow events are connected with features from the opposite site
of the customer relationship management toolset. How to calculate the events is
illustrated in Listing 8.
The balance is treated the same way as in preceding examples, based on balance
the “before balance” (i.e. the balance of the transaction immediately preceding
the current transaction) is calculated using the LAG function. The last step then tests the threshold based on
the definition and decides whether the event occurred or not.
Note
the usage of inline views in the select statement, this is typical for
combining analytical functions if they can’t be used in the same select
statement.
Let’s
extend our application introducing a table containing the opening and closing
log of an account, see table account_log in Listing
9. Any time when an account is opened or closed an
appropriate row is stored in the table. Using this evidence is it possible to
reconstruct the status of an account at any arbitrary timepoint? We will see
that this task is very similar to the computing of the balance. The necessary
abstraction is very straightforward. Simply consider the status of an account
as a number, zero describes an inactive (deactive) account, one stands for an
active account. In this context an account opening corresponds to adding one to
the numeric status, account closing is done by subtracting one from the current
state. To reconstruct the status of an account is as simple as to balance the
open and closed events. This is illustrated in Listing
10. The principle is the same as used for the
balancing (see Listing 3)
the case statement within the sum function transforms the open / closed events to the appropriate
zero / one logic. The decode transfers the numerical information back to the
active / deactive status.
Using
the LEAD function a thru_date attribute is calculated that defines the validity
interval of the account status. The status of an account begins its validity at
the time of the transaction booking date (when_booked) and remains unchanged
till thru_date. A thru_date is a non-inclusive
valid through date. The thru date is set to NULL if the interval has an open
end.
This
approach is very intuitive and can be used even in case of reopening an account
but contains a possible pitfall. If the input data is not correct, e.g.
contains a closing record without corresponding opening entry we can face some
surprising results. Note the "unknown" option in the decode
statement. Lets simulate this behavior by inserting an invalid row, a record
with closing event for an already
closed account, see the top of Listing 11.
Without a care this will lead to an incorrect result, simply repeat the
statement from Listing 10 with this invalid
row to see the invalid state to be deployed.
This
behavior is generally bad as it can skew the reporting. Thought this can be
easily fixed introducing the concept of validation of events. We accept an
opening event if and only if the status of account is deactive (note that not
existing account is considered to be deactive), similarly a closing event is
only permitted for an active account. Technically we will use a concept similar
to events threshold tracing (see Listing 8),
for each event we additionally track the account status before the event. This
information is used to filter out the invalid events. Listing 11 shows the
implementation of the filter to distinct between valid and invalid events. The business rules for the
validation of events are quite simple: only active accounts can be closed and only
deactive (or not existing) accounts can be opened.
Note that
the computation of the account status and the validation of events is a
classical egg – hen problem; for the calculation of the account status of
course only valid events are considered, but for events validation the account
status (of the previous transaction) must be known. These two tasks are not
separable, the solution is to perform both of them simultaneously in one pass.
Well, but
one could argue that the state of an account is not fine enough. Of course
there is a great difference between an account with one move in a year and an
other one with several transactions each day, both extremes are considered to
be same in our status model. To distinct the behavior we introduce an
alternative status model considering the active period of an account. The
definition of an active period is based on usage of the account; to stay active
there is a mandatory condition, say at least one transaction each seven days
(the length of the interval is arbitrary chosen). In simple words each account
stays active for seven days after a transaction; after seven days of inactivity
the status is changed to inactive. If within this interval a new transaction is
performed, the active period is prolonged.
The
possible implementation of active periods of accounts is in Listing 12. First of all in innermost in line
view the starting rows of each active period are flagged. A special treatment
is performed for the first row of each account, this is always a starting row
of an active period. According to the definition, any row with a gap bigger
than seven days to the preceding row is considered as a starting point.
The starting points receive the corresponding row_number (simple the sequential
number of the transaction within the account), all others are set to NULL – see
column RN.
The good
news is we have created in column RN
an unique identification of an
active period within each account; the bad news is that this identification is
defined in the starting row only – not in the subsequent rows of the
period. But the gaps can simply be
filled using the LAST_VALUE function with the IGNORE NULLS option. Note that
the IGNORE NULLS option is a new feature in 10g, so in previous releases you
must use the MAX function (simple comment out the LAST_VALUE line and
re-instate the MAX statement in Listing 12).
As for each
period there is only one non NULL value
in the column rn, the max function overwrites all NULLs with this value – see
column period_id. In the next step the period’s index (relative position within
the period) and cardinality (number of transactions within the period) are
determined using row_number respectively count function. The final step tags
the transactions as start or stop of a period using index and cardinality
considering a special case of a “dummy” transaction period that is
simultaneously start and stop.
As already
noted the interval between transactions (seven days in our case) is chosen
arbitrarily, if we move it to the level of seconds the problem can be
transformed to the question of grouping events to a session. Check the resource
box for credit of the idea how to solve this problem using analytical
functions.
The next
logical step after introducing the term active period is a so called “parallel
report”, defining at any point of time the total sum of all active accounts
(i.e. accounts within an active period). To report this is not really
new for us. We will apply the same principle as used for balancing of the
account state. Each start and
stop of an active period of an account is treated as a relevant event causing
an increase or decrease of the „degree of parallelism” by one. The
trivial active period (consisting of only one record) doesn’t change the
balance as it both adds and subtracts one to the result. The only difference to example of account
status (see Listing 9) is in the definition of
the partition, the partition is not bound to a particular account but defined
globally. See the select statement in Listing 13. Note that to get a reliable report only
validated events should be considered.
An
alternative way of storing the status information of accounts is shown in Listing 14. Each active period has an
opening and closing timestamp (start_date and end_date), the main difference is
that for each period of activity there is only one row in the table containing
the opening and closing information, i.e. not two separate records as in Listing 13. Note that we discarded the “zero length”
transactions to keep the example simple.
But this
alternative presentation is not really a big difference to the previous one. A
very simple trick transforms this view of data into the formerly used form. A Cartesian
join with a table consisting of two rows (a double dual table; the first
row is an opening record, the second one is a closing record) will effectively
multiply the rows by two, the account identification and period_id will be
taken over in both records, the timestamp will be processed selectively. The opening
record gets the opening timestamp, the closing record receives the closing
timestamp of the period, see Listing 15. So if you report the parallel degree on this
“transaction” view of data with this simple trick, you can use the solution Listing 13.
An
interesting question is the handling of the opening and closing point of the
period. Imagine there are only the two periods, one ending at timestamp X, the
other starting at the same timestamp X. It is obvious that most of the time the
degree of parallelism is one, but what exactly is the degree in the timestamp
X? The possible answers range from zero to two, depending on the inclusive or
exclusive interpretation of the endpoints. This problem becomes obvious if we
report the maximum (or minimum) degree of parallelism. In this case the order
of the events, particularly those with the same timestamp is extremely
important. To see the difference we have to consider the two alternative
statements in Listing 16, both are computing the maximum
degree of parallelism, both yielding a different result. The subtle difference
is the ordering clause of window: the first approach priories the opening
events, they are considered before the closing events with the same timestamp,
the second statement goes the other way round. The difference is obvious, the
first approach results in a bigger throughput than the second one. Which
solution is right? Well, it depends on the nature of the business. I’d expect,
while reporting the throughput of business workflow engine, the first solution
will be used, whereas reporting the open shipments, not resolved bugs etc. the
second solution will be appropriate. The important thing is to define the order
explicitly if the transaction timestamps are not distinctive enough, otherwise
the result could fluctuate randomly.
Far a long
time some analytic tasks have been beyond the scope of easy solution within a
relational SQL. With the introduction of the SQL functions for analysis this
isn't valid any more. A great deal of analytic problems – especially in the
data warehouse environment - is now easily solvable with the extended SQL, this
article points out only a small part of this problems.
Jaromir
D.B. Nemec (jaromir at db-nemec dot com) is an independent consultant focusing
on Oracle databases, data warehouse systems and integration solutions.
Reference
Script with
test data and all examples - synthetic_analytic.sql
Analytics
“official site”
Oracle
Database Data Warehousing Guide
http://www.oracle.com/technology/documentation
Search Ask
Tom
http://asktom.oracle.com/pls/ask/search?p_string=analytics%20rock%20analytics%20roll
Check http://asktom.oracle.com/pls/ask/f?p=4950:8:3135579430049857077::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:13946369553642
for the discussion and credit to the
solution of the “session problem”
last
revision 11.1.2006
CodeLISTING
listing
1: Create environment
SQL> create table
booking_entry
2 (
3
account_no varchar2(10) not null,
4
when_booked date not null,
5
when_charged date not null,
6
amount number(10,4) not null
7 )
8 ;
Table created.
SQL> select * from
booking_entry order by 1,2;
ACCOUNT_NO WHEN_BOOK
WHEN_CHAR AMOUNT
---------- ---------
--------- ----------
1003/1 20-DEC-04 10-DEC-04 1000
1003/1 05-JAN-05 10-JAN-05 100
1003/1 12-JAN-05 08-JAN-05 50
1003/1 19-JAN-05 09-JAN-05 -120
1003/1 31-JAN-05 01-FEB-05 10
2009/2 10-DEC-04 10-DEC-04 10
2009/2 01-JAN-05 05-JAN-05 20
2009/2 05-JAN-05 15-JAN-05 30
2009/2 15-JAN-05 13-JAN-05 -100
2009/2 20-JAN-05 01-FEB-05 60
10 rows selected.
listing
2: Top deposits per account
and month
SQL> select
booking_month, account_no, deposit,
2
rank() over (partition by booking_month order by deposit desc)
account_rank
3
from (
4
select account_no, trunc(when_booked,'MM') booking_month,
5
sum(amount) deposit
6
from booking_entry
7
where amount > 0
8
group by account_no, trunc(when_booked,'MM')
9 )
10
order by booking_month, 4
11 ;
BOOKING_M
ACCOUNT_NO DEPOSIT ACCOUNT_RANK
--------- ----------
---------- ------------
01-DEC-04 1003/1 1000 1
01-DEC-04 2009/2 10 2
01-JAN-05 1003/1 160 1
01-JAN-05 2009/2 110 2
listing 3: Balance of an
account
SQL> select
account_no, when_booked, amount,
2
sum(amount) over (partition by account_no order by when_booked rows
unbounded preceding) as balance
3
from booking_entry
4
order by account_no, when_booked;
ACCOUNT_NO
WHEN_BOOK AMOUNT BALANCE
---------- ---------
---------- ----------
1003/1 20-DEC-04 1000 1000
1003/1 05-JAN-05 100 1100
1003/1 12-JAN-05 50 1150
1003/1 19-JAN-05 -120 1030
1003/1 31-JAN-05 10 1040
2009/2 10-DEC-04 10 10
2009/2 01-JAN-05 20 30
2009/2 05-JAN-05 30 60
2009/2 15-JAN-05 -100 -40
2009/2 20-JAN-05 60 20
10 rows selected.
listing 4: Account balance using snapshot
SQL> create table
account_12_31_2004
2 (
3
account_no varchar2(10),
4
when_closed date,
5
balance number(10,4)
6 )
7 ;
Table created.
SQL> select * from
account_12_31_2004
2
order by 1;
ACCOUNT_NO WHEN_CLOS BALANCE
---------- ---------
----------
1003/1 31-DEC-04 1000
2009/2 31-DEC-04 10
Listing
5: Balance of account with
incomplete transaction history
SQL> select
account_no, when_booked, amount,
2
sum(amount) over (partition by account_no order by when_booked rows
unbounded preceding) as balance,
3
source
4
from
5 (
-- merge snapshot with transactions
6
select account_no, when_closed when_booked, balance amount, 'snapshot'
source
7 from
account_12_31_2004
8
union all
9
select account_no, when_booked, amount, 'transaction' source
10
from booking_entry
11
where when_booked > to_date('12/31/2004','mm/dd/yyyy')
12 )
13
order by account_no, when_booked;
ACCOUNT_NO WHEN_BOOK AMOUNT
BALANCE SOURCE
---------- ---------
---------- ---------- -----------
1003/1 31-DEC-04 1000 1000
snapshot
1003/1 05-JAN-05 100 1100
transaction
1003/1 12-JAN-05 50 1150
transaction
1003/1 19-JAN-05 -120 1030 transaction
1003/1 31-JAN-05 10 1040
transaction
2009/2 31-DEC-04 10 10
snapshot
2009/2 01-JAN-05 20 30
transaction
2009/2 05-JAN-05 30 60
transaction
2009/2 15-JAN-05 -100 -40
transaction
2009/2 20-JAN-05 60 20
transaction
10 rows selected.
Listing
6: Balance reconstructed from
current state
SQL> create table
account_header
2 (
3
account_no varchar2(10),
4
when_booked date, -- last transaction
5
current_balance number(10,4)
6 )
7 ;
Table created.
SQL> select * from
account_header
2
order by 1;
ACCOUNT_NO WHEN_BOOK
CURRENT_BALANCE
---------- ---------
---------------
1003/1 01-FEB-05 1040
2009/2 01-FEB-05 20
SQL> select
account_no, when_booked, amount,
2
(select current_balance from account_header where account_no =
a.account_no) + amount -
3 sum(amount)
over (partition by account_no order by when_booked desc rows unbounded
preceding) as balance
4
from booking_entry a
5
order by account_no, when_booked;
ACCOUNT_NO
WHEN_BOOK AMOUNT BALANCE
---------- ---------
---------- ----------
1003/1 20-DEC-04 1000 1000
1003/1 05-JAN-05 100 1100
1003/1 12-JAN-05 50 1150
1003/1 19-JAN-05 -120 1030
1003/1 31-JAN-05 10 1040
2009/2 10-DEC-04 10 10
2009/2 01-JAN-05 20 30
2009/2 05-JAN-05 30 60
2009/2 15-JAN-05 -100 -40
2009/2 20-JAN-05 60 20
10 rows selected.
Listing
7: Double balance, ordered on
when_charged
SQL> select
account_no, when_booked, when_charged, amount,
2
sum(amount) over (partition by account_no order by when_booked rows
unbounded preceding) as booking_balance,
3
sum(amount) over (partition by account_no order by when_charged rows
unbounded preceding) as charging_balance
4
from booking_entry
5
order by account_no, when_charged;
ACCOUNT_NO WHEN_BOOK
WHEN_CHAR AMOUNT BOOKING_BALANCE
CHARGING_BALANCE
---------- ---------
--------- ---------- --------------- ----------------
1003/1 20-DEC-04 10-DEC-04 1000 1000 1000
1003/1 12-JAN-05 08-JAN-05 50 1150
1050
1003/1 19-JAN-05 09-JAN-05 -120 1030
930
1003/1 05-JAN-05 10-JAN-05 100 1100
1030
1003/1 31-JAN-05 01-FEB-05 10 1040
1040
2009/2 10-DEC-04 10-DEC-04 10
10 10
2009/2 01-JAN-05 05-JAN-05 20 30
30
2009/2 15-JAN-05 13-JAN-05 -100 -40
-70
2009/2 05-JAN-05 15-JAN-05 30 60
-40
2009/2 20-JAN-05 01-FEB-05 60 20
20
10 rows selected.
Listing
8: Generate threshold events
SQL> select * from
2
(select
3
account_no, when_booked, amount, balance,
4
case
5
when balance < -25 and balance_lag >= -25 then 'treshold -25
underflow'
6
when balance > 1000 and balance_lag <= 1000 then 'treshold 1000
overflow'
7
end threshhold_event
8
from
9
(select
10 account_no, when_booked, amount, balance,
11 lag(balance,1,0) over (partition by account_no
order by when_booked) as balance_lag
12
from (
13
--- balance generating query
14 select account_no, when_booked, amount,
15 sum(amount) over (partition by account_no order by when_booked
rows unbounded preceding) as balance
16 from booking_entry
17
)))
18
where -- filter events
19
threshhold_event is not null
20
order by account_no, when_booked;
ACCOUNT_NO
WHEN_BOOK AMOUNT BALANCE THRESHHOLD_EVENT
---------- ---------
---------- ---------- ----------------------
1003/1 05-JAN-05 100 1100 treshold 1000 overflow
2009/2 15-JAN-05 -100 -40
treshold -25 underflow
Listing
9: Account status
SQL> create table
account_status
2 (
3
account_no varchar2(10),
4
when_booked date,
5
account_change varchar2(20) check (account_change in ('OPEN','CLOSE'))
6 )
7 ;
Table created.
SQL> select * from
account_status
2
order by 1;
ACCOUNT_NO WHEN_BOOK
ACCOUNT_CHANGE
---------- ---------
--------------------
1003/1 01-JAN-04 OPEN
2009/2 01-MAY-05 CLOSE
2009/2 01-JUN-04 OPEN
Listing
10: Tracking account status
SQL> select
account_no, when_booked,
2
lead(when_booked) over (partition by account_no order by when_booked) as
thru_date,
3
account_change,
4
decode(
5
sum(case when account_change = 'OPEN' then 1
6
when account_change = 'CLOSE' then -1
7
else 0 end) over (partition by account_no order by when_booked rows
unbounded preceding),
8
1,'ACTIVE',0,'DEACTIVE','UNKNOWN')
9
as account_status
10
from account_status
11
order by account_no, when_booked;
ACCOUNT_NO WHEN_BOOK
THRU_DATE ACCOUNT_CHANGE ACCOUNT_
---------- ---------
--------- -------------------- --------
1003/1 01-JAN-04 OPEN
ACTIVE
2009/2 01-JUN-04 01-MAY-05 OPEN ACTIVE
2009/2 01-MAY-05 CLOSE
DEACTIVE
Listing
11: Invalid events
SQL> -- add an
invalid event as a test case
SQL> insert into
account_status values ('2009/2',to_date('05/02/2005','mm/dd/yyyy'), 'CLOSE');
1 row created.
SQL> commit;
Commit complete.
SQL> select
account_no, when_booked, account_change,
2
case when (account_change = 'OPEN' and status_before = 'DEACTIVE') or
3 (account_change = 'CLOSE' and status_before = 'ACTIVE') then
'VALID'
4 else
'INVALID' end as event_validation
5
from
6
(select account_no, when_booked, account_change,
7
lag(account_status,1,'DEACTIVE') over (partition by account_no order by
when_booked) as status_before
8
from
9
(select account_no, when_booked, account_change,
10 decode(
11 sum(case when account_change = 'OPEN' then 1
12 when
account_change = 'CLOSE' then -1
13 else 0
end) over (partition by account_no order by when_booked rows unbounded
preceding),
14 1,'ACTIVE',0,'DEACTIVE','UNKNOWN')
15 as account_status
16
from account_status
17 ))
18
order by account_no, when_booked;
ACCOUNT_NO WHEN_BOOK
ACCOUNT_CHANGE EVENT_V
---------- ---------
-------------------- -------
1003/1 01-JAN-04 OPEN VALID
2009/2 01-JUN-04 OPEN VALID
2009/2 01-MAY-05 CLOSE VALID
2009/2 02-MAY-05 CLOSE INVALID
Listing
12 - Active period
SQL> create or
replace view select_from_listing_12 as
2
select account_no, when_booked,
3
case
4
when period_cardinality = 1 then
'PERIOD START/STOP'
5
when period_index = 1 then 'PERIOD START'
6
when period_index = period_cardinality then 'PERIOD STOP' end as
period_change,
7
rn, period_id, period_index, period_cardinality
8
from
9
(select account_no, when_booked, period_id,
10
row_number() over (partition by account_no, period_id order by
when_booked) as period_index,
11
count(*) over (partition by account_no, period_id) as
period_cardinality, rn
12
from
13
(select account_no, when_booked, rn,
14 last_value(rn
ignore nulls) over (partition by account_no order by when_booked) as
period_id
15 --
use max(rn) in Oracle 9i
16 --
max(rn) over (partition by account_no order by when_booked) as period_id
17
from
18
(select account_no, when_booked,
19 case
20 when
row_number() over (partition by account_no order by when_booked) = 1 then 1 --
initial period
21 when
when_booked - lag(when_booked) over (partition by account_no order by
when_booked) > 7 then
22 row_number() over (partition by account_no order by when_booked)
23 else
null end as rn
24 from booking_entry
25
)))
26
order by account_no, when_booked;
View created.
SQL> select * from
select_from_listing_12;
ACCOUNT_NO WHEN_BOOK
PERIOD_CHANGE RN PERIOD_ID PERIOD_INDEX
PERIOD_CARDINALITY
---------- ---------
----------------- ---------- ---------- ------------ ------------------
1003/1 20-DEC-04 PERIOD START/STOP 1 1 1 1
1003/1 05-JAN-05 PERIOD START 2 2
1 3
1003/1 12-JAN-05 2 2
3
1003/1 19-JAN-05 PERIOD STOP 2 3
3
1003/1 31-JAN-05 PERIOD START/STOP 5 5 1 1
2009/2 10-DEC-04 PERIOD START/STOP 1 1 1 1
2009/2 01-JAN-05 PERIOD START 2 2
1 2
2009/2 05-JAN-05 PERIOD STOP 2 2 2
2009/2 15-JAN-05 PERIOD START 4 4 1 2
2009/2 20-JAN-05 PERIOD STOP 4 2 2
10 rows selected.
Listing
13 - Parallelism of active
periods
SQL> select
2
account_no,
3
when_booked, period_change,
4 sum(period_event) over
(order by when_booked) as dop
5
from
6
(select
7
account_no, when_booked,
8
case
9 when period_change = 'PERIOD START/STOP' then
0
10 when period_change = 'PERIOD START' then 1
11 when period_change = 'PERIOD STOP' then -1 end
as period_event, period_change
12
from select_from_listing_12
13
where period_change is not null
14 )
15
order by when_booked;
ACCOUNT_NO WHEN_BOOK
PERIOD_CHANGE DOP
---------- ---------
----------------- ----------
2009/2 10-DEC-04 PERIOD START/STOP 0
1003/1 20-DEC-04 PERIOD START/STOP 0
2009/2 01-JAN-05 PERIOD START 1
1003/1 05-JAN-05 PERIOD START 1
2009/2 05-JAN-05 PERIOD STOP 1
2009/2 15-JAN-05 PERIOD START 2
1003/1 19-JAN-05 PERIOD STOP 1
2009/2 20-JAN-05 PERIOD STOP 0
1003/1 31-JAN-05 PERIOD START/STOP 0
9 rows selected.
Listing
14: Normalized view on account
periods
SQL> create or
replace view select_from_listing_14 as
2
select
3
account_no, period_id, min(when_booked) start_date, max(when_booked)
stop_date
4
from select_from_listing_12
5
group by account_no, period_id
6
order by account_no, period_id;
View created.
SQL> select * from
select_from_listing_14 where start_date != stop_date;
ACCOUNT_NO PERIOD_ID START_DAT STOP_DATE
---------- ----------
--------- ---------
1003/1 2 05-JAN-05 19-JAN-05
2009/2 2 01-JAN-05 05-JAN-05
2009/2 4 15-JAN-05 20-JAN-05
Listing
15: Transformation of the
transaction view
SQL> create or
replace view select_from_listing_15 as
2 ---
3
select a.account_no, a.period_id,
4
case when b.period_event = 1 then start_date else stop_date
5
end as transaction_date,
6
period_event,
7
case when b.period_event = 1 then 'PERIOD START' else 'PERIOD STOP'
8
end as period_change
9
from
10
(select * from select_from_listing_14 where start_date != stop_date) a,
-- cartesian join
11
(select 1 period_event from dual union all select -1 period_event from
dual) b
12
order by 3;
View created.
SQL> select * from
select_from_listing_15;
ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_EVENT
PERIOD_CHANG
---------- ----------
--------- ------------ ------------
2009/2 2 01-JAN-05 1 PERIOD START
1003/1 2 05-JAN-05 1 PERIOD START
2009/2 2 05-JAN-05 -1 PERIOD STOP
2009/2 4 15-JAN-05 1 PERIOD START
1003/1 2 19-JAN-05 -1 PERIOD STOP
2009/2 4 20-JAN-05 -1 PERIOD STOP
6 rows selected.
Listing
16: Degree of parallelism
SQL> select
account_no, period_id,transaction_date, period_change, period_event,
2
sum(period_event) over (order by transaction_date, period_event rows unbounded preceding) as parallel_degree
3
from select_from_listing_15
4
order by transaction_date, period_event;
ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_CHANG
PERIOD_EVENT PARALLEL_DEGREE
---------- ----------
--------- ------------ ------------ ---------------
2009/2 2 01-JAN-05 PERIOD START 1 1
2009/2 2 05-JAN-05 PERIOD STOP -1 0
1003/1 2 05-JAN-05 PERIOD START 1 1
2009/2 4 15-JAN-05 PERIOD START 1 2
1003/1 2 19-JAN-05 PERIOD STOP -1 1
2009/2 4 20-JAN-05 PERIOD STOP -1 0
6 rows selected.
SQL> select
account_no, period_id,transaction_date, period_change, period_event,
2
sum(period_event) over (order by transaction_date, period_event
desc rows unbounded preceding) as
parallel_degree
3
from select_from_listing_15
4
order by transaction_date, period_event desc;
ACCOUNT_NO PERIOD_ID TRANSACTI PERIOD_CHANG
PERIOD_EVENT PARALLEL_DEGREE
---------- ----------
--------- ------------ ------------ ---------------
2009/2 2 01-JAN-05 PERIOD START 1 1
1003/1 2 05-JAN-05 PERIOD START 1 2
2009/2 2 05-JAN-05 PERIOD STOP -1 1
2009/2 4 15-JAN-05 PERIOD START 1
2
1003/1 2 19-JAN-05 PERIOD STOP -1 1
2009/2 4 20-JAN-05 PERIOD STOP -1 0
6 rows selected.