REAL APPLICATION TEST 10G
Implementation (before execute anything read the documentation below).
1 – Add an entry on the tnsnames.ora of both sides, production and test databases.
2 – Run the script rap_install.sql on both sides, as a DBA user.
2 – Connect as user RAP and create dblink from production to test and from test to production, with the name raplink in both cases, the parameter GLOBAL_NAMES should be set as FALSE, if you need to use it as TRUE, then you will need to create the dblinks with the same name as the instance you are connecting, also you will need to edit the RAP package body and change all the raplink entries to the dblink name you have created on each side, much more complicated but it works.
Test to production:
Conn rap/rapver1
Create database link raplink
Connect to rap identified by rapver1
Using ‘tnsprod’;
Production to test:
Conn rap/rapver1
Create database link raplink
Connect to rap identified by rapver1
Using ‘tnstest’;
3 – Run the script rap_package_v1.0.sql on both sides, connected as RAP user.
4 – Run the script rap_package_body_ver_1.0.sql on both sides, connected as RAP user.
5 – Recompile the package body on both sides, connected as RAP user.
Alter package rap compile body;
Documentation
Oracle has created the Real Application Test on 11g, this feature is really interesting and can save you a lot of time when you need to test your application’s sql code, mainly when dealing with upgrades, patches, migrations, etc…
How does it work?
Basically it capture the workload (sql and dml) on your production environment and execute everything again on another instance, doing so you can compare executions times, IO usage, cpu and also the execution plan.
Oracle also allows you to use this tool from 10g (10.2.0.4) with 11g, which means you can capture the workload on 10g and test on 11g, but not 10g to 10g.
But at this point with the 11g with just a few miles on the road, will take some time till we definitely upgrade, so getting inspiration on this new 11g feature I decide to create a similar one that would allow me to do almost the same thing from 10g to 10g or 11g databases doesn’t matter.
The Oracle tool is based on PL/SQL and a new binary tool, and it is able not only to replay the workload but also you can simulate different scenarios like increasing the number of user executing the queries.
On this first release I’ve created I just replay the workload based on the number of executions * 2, of each query, that means if a query was executed 10 times on production I will execute it 20 times on test, there are two basic reasons for that, the first one is to stress the test environment, the second one is to minimize the overhead the replay process can have when executing the queries, what that means is during the replay phase when binding and processing the queries we can have an overhead of 10% to 200% depending on the number of bind variables, so when executing more times the statistic/execution is minimized and the overhead became almost insignificant.
At this moment that is enough for my needs, but I intend to create a new release soon and include other types of replay like the oracle 11g does, and capture other types of statistics to have a more detailed comparison report.
How does my “Real Application Test” works?
The process of capturing the workload is based on 3 views V$SQLAREA, V$SQL_BIND_CAPTURE, V$SQL_PLAN.
V$SQLAREA – shows all the queries in the cache, that means the most used SQL and DML.
V$SQL_BIND_CAPTURE – shows information about bind variables, by default every 15 minutes Oracle capture the active binds and update this view.
V$SQL_PLAN – shows all execution plans in the cache.
I’ve create a package called RAP, on this package there is a main procedure called CAPTURE_AND_REPLAY, this procedure is able to capture the workload and automatically execute the replay, below the steps it will perform:
1 – CAPTURE_WORKLOAD: During the specified time the capture process, will be saving all the queries, binds and executions plans.
2 – TRANSFER_TABLE: It will transfer the workload data to the test database
3 – CREATE_SNAP: It will create a snapshot using a similar concept of the statspack, and the data will be saved on partitioned table where the partition clause is the SNAP_ID, that means it is easier to perform clean up and also improve the performance when executing the comparison reports.
4 – RUN_SNAP_TYPE_SQL – This is the replay phase for SQL commands.
5 – RUN_SNAP_TYPE_DML - This is the replay phase for DML commands.
6 – CREATE_SNAP_REPLAY – During the replay steps, this procedure is called to store the replay data, from the same views V$SQLAREA and V$SQL_PLAN on the test database.
7 – RUN_REPORTS (1 TO 5) – I’ve created 5 automatic reports that are generated after the replay phase, these reports are explained in detail later.
If you want you can call each individual procedure manually, but just use this option when you understand exactly how everything works, otherwise you may cause some confusion with the workload and replay data.
I will explain in details the call of the main procedure, the others you can read inside the package code where I tried to explain each step.
First a brief description of the objects involved within the whole process.
SCHEMA:
RAP – this is the schema used to store all the data and the package responsible to capture and replay
Tables used to store the workload data:
SQL
SQLPLAN
BNID
SQL_SNAP
SQLPLAN_SNAP
BIND_SNAP
Tables used to store the replay data:
SQL_SNAP_REPLAY
SQLPLAN_SNAP_REPLAY
Tables used to keep tracking of the capture and replay phases:
RAP_LOGRUN – Used to log the sql_id and message error of queries that failed to execute or to parse during the replay phase.
RAP_FOLLOW – Used to show the progress of the procedure CAPTURE_AND_REPLAY, with this table you can trace how your process is performing.
RAP_CONTROL – Used to store the SNAP_ID, date of execution and instances involved, and comments that you may want to save about this snapshot.
Report Tables:
RAP_REP1 – Report 1 will show the average (disk reads, cpu_time, elapsed_time) per execution, and the difference between the test and prod in percents.
RAP_REP2 – Report 2 will show the number of queries that were tested and how many are with different execution plans.
RAP_REP3 – Report 3 will show all the sql_id with different execution plans also the percentage (disk reads, cpu_time, elapsed_time) per execution over the replay.
RAP_REP4 – Report 4 will show all sql_id that took more than 0.2 seconds per execution and when the replay had 10% increase in the elapsed_time per execution.
RAP_REP5 – Report 5 will show all the queries that failed to replay.
DBLINK:
RAPLINK – Used to transfer data from prod to test and also to update the progress on the RAP_FOLLOW table.
PACKAGE:
RAP – Package contains all the procedures involved on the process
Procedure in the package:
CAPTURE_AND_REPLAY – As explained above capture the workload and also execute the replay.
CAPTURE_WORKLOAD – Capture the workload on the production environment and save on the SQL, SQLPLAN and BIND tables.
TRANSFER_TABLE – Transfer the base tables SQL, SQLPLAN and BIND from production to test.
CREATE_SNAP – Create the snapshots based on the SQL,SQLPLAN and BIND tables that were transferred from production, prepare the snapshot for the replay.
RUN_SNAP_TYPE_DML – Execute the replay for all DML captured on the workload
RUN_SNAP_TYPE_SQL – Execute the replay for all SQL captured on the workload
CREATE_SNAP_REPLAY – Create the snapshot replay based on the V$SQLAREA and V$SQL_PLAN on the test database during the replay phase.
CLEAN_SNAP – Used to clean snapshots, you can choose ALL or a specified id, you can check the snap_id on RAP_CONTROL table
RUN_SQL_ID – Used to replay a specific sql_id.
RUN_RAP_REPORT1 – generates the report 1
RUN_RAP_REPORT2_3 - generates the report 2 and 3
RUN_RAP_REPORT4 - generates the report 4
RUN_RAP_REPORT5 - generates the report 5
As I said I will explain CAPTURE_AND_REPLAY and CLEAN_SNAP the others you can study by yourself using the package code. Of course if you have any question please let me know.
The clean_snap procedure as the name says, you can use to delete snapshot, it will drop the partitions for both workload and replay data.
PROCEDURE CLEAN_SNAP
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
TSNAP_ID VARCHAR2 IN DEFAULT
Definitions:
TSNAP_ID – You can specify a snap_id you would like to drop or the default ALL.
PROCEDURE CAPTURE_AND_REPLAY
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
TRUNTIME NUMBER IN DEFAULT
TFREQ NUMBER IN DEFAULT
TCHANGE BOOLEAN IN DEFAULT
TSCHEMA VARCHAR2 IN DEFAULT
CLEANLOG BOOLEAN IN DEFAULT
TPRODLINK VARCHAR2 IN
TTESTLINK VARCHAR2 IN
TYPECMD VARCHAR2 IN DEFAULT
TFLUSH BOOLEAN IN DEFAULT
VCOMMENT VARCHAR2 IN DEFAULT
Definitions:
TRUNTIME – Specify the time in minutes the capture process will run, the default value is 30 minutes.
TFREQ – Specify the frequency time in seconds between the each capture, default value is 15 seconds.
VERY IMPORTANT PARAMETER READ WITH ATTENTION
TCHANGE – In order to have a better replay the ideal is to use the most different set of bind variables as possible, to do that we need to change a hidden parameter “_cursor_bind_capture_interval”, this parameter specify in seconds how long it takes Oracle to capture the active binds and show on v$sql_bind_capture view, the default value is 900 seconds(15 min), that means if you let it that way, you won’t be able to capture a good number of different variables, so setting the TCHANGE to true, will change this hidden parameter to the value specified in TFREQ, when the process completes it return the hidden parameter to the default value of 900. On the several tests that I did, it didn’t show any bad side effect in performance on my production databases during the capture process, but that is on my environment I can’t guarantee there are no side effect at all or bugs related to it, you will need to take the risk by your own, or better get a test database that is exactly the same as your production and do some tests with this parameter to see if it will cause any problems, if you don’t want to change it at all, no problem, you will still be able to use the capture, the difference is that your queries will be tested with a small variety of binds, and perhaps you will still need to do some manual tests by yourself.
TSCHEMA – You can capture the workload for all the schemas of for a specific one, the default is ‘ALL’.
CLENLOG – This parameter will clean up the RAP_LOGRUN table during the replay, the default is false.
TPRODLINK – You must specify the TNS alias you have used to create the dblink to the prod database, this will eliminate the risk of running the replay on the wrong server.
TTESTLINK – You must specify the TNS alias you have used to create the dblink to the test database, this will eliminate the risk of running the replay on the wrong server.
TYPECMD – You can choose to run the replay for SQL and DML or everything, the default is ‘ALL’, the other options are SQL and DML.
ALL THE DML WILL COMMIT WHEN COMPLETED, THE REASON FOR THAT IS IF I’VE USED ROLLBACK INSTEAD, THE STATISTICS WOULD BE AFFECTED.
TFLUSH – If you set this parameter to TRUE it will flush the shared_pool and buffer_cache in both Production and Test databases, this option is good if you want to capture queries that start only after the flush, otherwise the capture process will get everything that was on cache.
REMEMBER FLUSHING THE SHARED_POOL AND BUFFER_CACHE IN HIGH USAGE DATABASES WILL CAUSE A TEMPORARY PERFORMANCE SLOWDOWN TILL THE CACHE IS RELOADED, IS MOST CASES THE SLOWDOWN IS NOT SIGNIFICANT, BUT YOU ARE THE ONLY ONE THAT KNOWS YOUR ENVIRONMENT TO MAKE THAT DECISION.
VCOMMENT – You can enter comments here that will be stored on the RAP_CONTROL table, such as “USING A DIFFERENT SET OF STATISTICS ON THE REPLAY SIDE”.
Example:
My prod instance call PROD and my test instance call TEST,
My tns alias used to create the dblink are PRODLINK and TESTLINK.
To check the alias used to create the dblink you can run the following query from the TEST database:
TEST DATABASE
CONNECT RAP/RAPVER1
SELECT HOST FROM USER_DB_LINKS; -- THIS IS THE LINK TO THE PRODUCTION DATABASE
PRODLINK
SELECT HOST FROM USER_DB_LINKS@RAPLINK; -- THIS IS THE LINK FROM THE PRODUCTION DATABASE TO TEST
TESTLINK
After confirmed this information we can run the CAPTURE_AND_REPLAY procedure, REMEMBER THE CAPTURE_AND_REPLAY PROCEDURE MUST BE EXECUTED FROM THE TEST SIDE, IN OTHER WORDS, WHERE YOU WANT THE REPLAY TO HAPPEN.
exec rap.CAPTURE_AND_REPLAY(TRUNTIME =>60,
TFREQ =>5,
TCHANGE=>true,
TSCHEMA=>'ALL',
CLEANLOG=>true,
TPRODLINK=>'PRODLINK',
TTESTLINK=>'TESTLINK',
TYPECMD=>'ALL',
TFLUSH=>TRUE,
VCOMMENT=>'STATISTICS ARE EXACTLY THE SAME');
What this is saying is:
Run the capture for 1 hour.
The frequency of captures is 5 seconds.
Capture all schemas.
Clean the rap_logrun table before the replay
Prod and Test links specified
Replay all command types
Flush the shared_pool and buffer_cache to get only new commands
Add a comment on the RAP_CONTROL table.
The best way to run it, is on a background process, so you could create a script like this on the TEST side:
$vi rap_run.sh
sqlplus -s "rap/rapver1" << EOF
set timing on
spool capture_and_replay.log
begin
rap.CAPTURE_AND_REPLAY(TRUNTIME =>60 ,TFREQ=>5, TCHANGE=>true, TSCHEMA=>'ALL', CLEANLOG=>true, TPRODLINK=>'PRODLINK', TTESTLINK=>'TESTLINK', TYPECMD=>'ALL', tflush=>true, VCOMMENT=>'STATISTICS ARE EXACTLY THE SAME');
end;
/
spool off
exit
EOF
$ chmod +x rap_run.sh
And execute with nohup and &.
$ nohup /home/oracle/rap_run.sh &
To check how the process is performing, you can use the below query:
SQL> select * from rap_follow;
SNAP_ID RUNDATE STATUS
---------- --------- --------------------------------------------------------------------------------
14-APR-08 CAPTURING WORKLOAD - ESTIMATED TIME TO COMPLETE: 58 MINUTES
When the capture is completed a SNAP_ID will be assigned and it will start the replay phase, that can also be monitored using the RAP_FOLLOW table.
The RAP_LOGRUN table show queries that were not executed, but anyway when the process completes the Report 5 will get all the information on the RAP_LOGRUN and store on the RAP_REP5 table together with information of queries that were not tested for different reasons that are logged on rap_logrun.
And then we can check the reports.
select * from rap.rap_rep1 WHERE SNAP_ID=
select * from rap.rap_rep2 WHERE SNAP_ID=
select * from rap.rap_rep3 WHERE SNAP_ID=
select * from rap.rap_rep4 WHERE SNAP_ID=
select * from rap.rap_rep5 where snap_id=
Other useful queries:
If you want to compare individually queries, between the replay the capture and perhaps what is on memory:
select sql_id||' - capture' sql_id, executions,
buffer_gets, disk_reads, cpu_time, elapsed_time,
rows_processed,sql_fulltext
from rap.sql_snap where sql_7id='druv8yxdgjwgm' and snap_id=13
union all
select sql_id||' - replay' sql_id, executions,
buffer_gets, disk_reads, cpu_time, elapsed_time,#
rows_processed,sql_fulltext
from rap.sql_snap_replay where sql_id='druv8yxdgjwgm' and snap_id=13
union all
select sql_id||' - mem' sql_id, executions,
buffer_gets, disk_reads, cpu_time, elapsed_time,
rows_processed,sql_fulltext
from v$sqlarea where sql_id='druv8yxdgjwgm'
To check the execution plan of both capture and replay use the tables, with a similar query as above:
RAP.SQLPLAN_SNAP – Capture from production
RAP.SQLPLAN_SNAP_REPLAY – Replay
if you want to check how many were capture during the process, you can run this on production:
select count(distinct sql_id) from sql
union all
select count(distinct sql_id) from sqlplan
union all
select count(distinct sql_id) from bind
If you want to check if the hidden parameter is back after the process was completed:
Create this view as SYS user:
create
or replace view hide_param
as
select KSPPINM
name , KSPPSTVL value , KSPPDESC
description
from x$ksppi, x$ksppsv
where
x$ksppi.indx = x$ksppsv.indx;
select * from hide_param
where name ='_cursor_bind_capture_interval';
If you want to rerun the replay you will need to do it manually calling the run_snap_type_sql or dml, and then rerun the report procedures again.
There are many things you can do to analyze these data, and you can play around with the package and tables to personalize for your environment, if you think your changes are worth to share please do so.
Also if there are any errors, questions or you think in better way to do it please let me know.
http://dbabrain.blogspot.com