Vishal desai’s Oracle Blog

April 24, 2015

Wait Chain analysis using Tanel’s awesome ash_wait_chains script

Filed under: Oracle Database, Troubleshooting — Tags: — vishaldesai @ 7:14 pm

Our database performance engineering team was trying to troubleshoot high wait time on enq:SQ – contention and was working with Oracle support. Snippet from Active Sessions and AWR report is shown below:

 

image

image

image

I thought it could be classic case of Sequence concurrency in RAC environment. I started looking as dba_hist_active_sess_history and found all the sessions were waiting on same sequence object id (p2=362). Object id 362 belongs to sequence AUDSES$ which is used by AUD$ and sequence cache size was already set to 10000. In previous releases there were couple of bugs where sequence cache for AUDSES$ was not high enough and workaround was to fix cache size but this particular database is on 11.2.0.4 and cache size is also 10000 so there was no reason to believe we were hitting those bugs.

image

I manually started mining ASH for sample session that led me all the way from one instance to another instance’s lgwr session.

image

image

image

image

So for given session 1286 and few others wait chain was as follows:

image

Then I realized, instead of doing this manually, I could use Tanel’s ash_wait_chains.sql to see the overall impact. As you can see from top of the output that more than 70% of impact was along the same line.

 

@dash_wait_chains username||':'||event  1=1 1596 1596

%This     SECONDS WAIT_CHAIN
------ ---------- ----------------------------------------------------------------------------------------------------------------------------------------------------
  70%   105023520 -> SYS:enq: SQ - contention -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:control file sequential read
   9%    13698720 -> SYS:enq: SQ - contention -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   6%     9803520 -> SYS:enq: SQ - contention -> SYS:log file switch (archiving needed) -> SYS:control file sequential read
   6%     9132480 -> SYS:enq: SQ - contention -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:
   3%     4566240 -> SYS:enq: SQ - contention -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:enq: CF - contention
   1%     1887840 -> SYS:buffer busy waits -> SYS:log file switch (archiving needed) -> SYS:control file sequential read
   1%     1278720 -> SYS:enq: SQ - contention -> SYS:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   1%     1159430 -> DBFS_dbname01:buffer busy waits -> DBFS_dbname01:log file switch (archiving needed) -> SYS:control file sequential read
   1%      852480 -> SYS:enq: SQ - contention -> SYS:log file switch (archiving needed) -> SYS:
   0%      695520 -> DBSNMP:buffer busy waits -> ETL_USER:log file switch (archiving needed) -> SYS:control file sequential read
   0%      695520 -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:control file sequential read
   0%      426240 -> SYS:enq: SQ - contention -> SYS:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%      246240 -> SYS:buffer busy waits -> SYS:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%      195500 -> SYS:log file switch (archiving needed) -> SYS:control file sequential read
   0%      182160 -> ETL_USER:log file switch (archiving needed) -> SYS:control file sequential read
   0%      164160 -> SYS:buffer busy waits -> SYS:log file switch (archiving needed) -> SYS:
   0%      151230 -> DBFS_dbname01:buffer busy waits -> DBFS_dbname01:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%      100820 -> DBFS_dbname01:buffer busy waits -> DBFS_dbname01:log file switch (archiving needed) -> SYS:
   0%       90720 -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%       90720 -> DBSNMP:buffer busy waits -> ETL_USER:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%       82080 -> SYS:buffer busy waits -> SYS:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%       60480 -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:
   0%       60480 -> DBSNMP:buffer busy waits -> ETL_USER:log file switch (archiving needed) -> SYS:
   0%       50410 -> DBFS_dbname01:buffer busy waits -> DBFS_dbname01:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%       32660 -> DBFS_dbname01:log file switch (archiving needed) -> SYS:control file sequential read
   0%       30240 -> SYS:row cache lock -> SYS:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%       30240 -> DBSNMP:buffer busy waits -> ETL_USER:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%       25500 -> SYS:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%       23760 -> ETL_USER:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%       17000 -> SYS:log file switch (archiving needed) -> SYS:
   0%       15840 -> ETL_USER:log file switch (archiving needed) -> SYS:
   0%       13590 -> SYS:enq: SQ - contention -> ETL_USER:
   0%        8500 -> SYS:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%        7920 -> ETL_USER:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%        5520 -> DBSNMP:log file switch (archiving needed) -> SYS:control file sequential read
   0%        4260 -> DBFS_dbname01:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%        3600 -> ETL_USER:
   0%        3020 -> SYS:enq: SQ - contention -> ETL_USER:cell single block physical read
   0%        2840 -> DBFS_dbname01:log file switch (archiving needed) -> SYS:
   0%        1420 -> DBFS_dbname01:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%         970 -> SYS:
   0%         870 -> GGADMIN:SQL*Net more data from client
   0%         720 -> DBSNMP:log file switch (archiving needed) -> SYS:Disk file Mirror Read
   0%         600 -> ETL_USER:cell single block physical read
   0%         480 -> DBSNMP:log file switch (archiving needed) -> SYS:
   0%         240 -> DBSNMP:log file switch (archiving needed) -> SYS:enq: CF - contention
   0%         230 -> SYS:control file sequential read
   0%         160 -> SYS:gc current request
   0%          90 -> SYS:flashback log file write
   0%          90 -> SYS:db file parallel write
   0%          80 -> ETL_USER:log file sync -> SYS:
   0%          80 -> ETL_USER:log file sync -> SYS:log file parallel write
   0%          80 -> GGADMIN:cell single block physical read
   0%          80 -> GGADMIN:
   0%          70 -> SYS:reliable message
   0%          70 -> SYS:log file sequential read
   0%          70 -> SYS:JS kgl get object wait
   0%          60 -> ETL_USER:resmgr:cpu quantum
   0%          40 -> SYS:log file parallel write
   0%          40 -> ETL_USER:cell multiblock physical read
   0%          30 -> SYS:ges lms sync during dynamic remastering and reconfig -> SYS:
   0%          30 -> GGADMIN:gcs drm freeze in enter server mode -> SYS:ges lms sync during dynamic remastering and reconfig -> SYS:
   0%          30 -> SYS:Disk file Mirror Read
   0%          20 -> SYS:enq: CF - contention
   0%          20 -> GGADMIN:Streams miscellaneous event
   0%          10 -> ETL_USER:direct path write temp
   0%          10 -> ETL_USER:gc cr grant 2-way
   0%          10 -> DBSNMP:log file sync -> SYS:log file parallel write
   0%          10 -> SYS:change tracking file synchronous read
   0%          10 -> U002585:
   0%          10 -> ETL_USER:direct path read temp
   0%          10 -> SYS:library cache lock
   0%          10 -> DBFS_dbname01:
   0%          10 -> ETL_USER:enq: FB - contention
   0%          10 -> ETL_USER:gc current block 3-way
   0%          10 -> ETL_USER:gc current block 2-way
   0%          10 -> SYS:ASM file metadata operation
   0%          10 -> DBFS_dbname01:direct path write
   0%          10 -> DBSNMP:log file sync -> SYS:

79 rows selected.

Further from below output its also clear that the ultimate blocker was session id 1262 on instance 2 (LGWR process). Line id with 3% impact matches my manual analysis.

 

dash_wait_chains instance_number||':'||session_id  1=1 1596 1596

%This     SECONDS WAIT_CHAIN
------ ---------- -----------------------------------------------------
  24%    35955360 -> 1:3164 -> 1:654 -> 2:2540 -> 2:1262
  21%    32447520 -> 1:2533 -> 1:654 -> 2:2540 -> 2:1262
  17%    25431840 -> 1:1914 -> 1:654 -> 2:2540 -> 2:1262
  13%    19293120 -> 1:1913 -> 1:654 -> 2:2540 -> 2:1262
  10%    14908320 -> 1:6 -> 1:654 -> 2:2540 -> 2:1262
   3%     4384800 -> 1:1286 -> 1:654 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:2542 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:1276 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:2520 -> 2:2538 -> 2:1262
   1%     1503360 -> 2:1907 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:3170 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:2533 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:1282 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:25 -> 2:2540 -> 2:1262
   1%     1503360 -> 2:3171 -> 2:2540 -> 2:1262
   1%     1461890 -> 2:1906 -> 2:1909 -> 2:1262
   1%      876960 -> 1:654 -> 2:2540 -> 2:1262
   1%      876960 -> 2:1263 -> 2:2538 -> 2:1262
   0%      709920 -> 2:650 -> 2:3169 -> 2:1262
   0%      229680 -> 2:650 -> 2:2540 -> 2:1262
   0%      167040 -> 2:645 -> 2:3169 -> 2:1262
   0%      104400 -> 2:3163 -> 2:2540 -> 2:1262
   0%       20880 -> 2:653 -> 2:1262
   0%       20880 -> 2:1908 -> 2:1262
   0%       20880 -> 2:20 -> 2:1262
   0%       20880 -> 2:2540 -> 2:1262
   0%       20880 -> 2:1281 -> 2:1262
   0%       20880 -> 2:1269 -> 2:1262
   0%       20880 -> 2:2536 -> 2:1262
   0%       20880 -> 2:649 -> 2:1262
   0%       20880 -> 2:6 -> 2:1262
   0%       20880 -> 2:3169 -> 2:1262
   0%       20880 -> 2:1284 -> 2:1262
   0%       20880 -> 2:1896 -> 2:1262
   0%       20880 -> 2:2538 -> 2:1262
   0%       20880 -> 2:17 -> 2:1262
   0%       20880 -> 2:655 -> 2:1262
   0%       20880 -> 2:1280 -> 2:1262
   0%       20880 -> 2:1895 -> 2:1262
   0%       20880 -> 2:1899 -> 2:1262
   0%       20880 -> 2:3158 -> 2:1262
   0%       20880 -> 2:3167 -> 2:1262
   0%       20880 -> 2:24 -> 2:1262
   0%       20880 -> 2:1288 -> 2:1262
   0%       20590 -> 2:1909 -> 2:1262
   0%       20590 -> 2:2541 -> 2:1262
   0%       16820 -> 2:1913 -> 2:1262
   0%        6960 -> 2:2523 -> 2:1262
   0%        4510 -> 1:3164 -> 1:654
   0%        4070 -> 1:2533 -> 1:654
   0%        3190 -> 1:1914 -> 1:654
   0%        2420 -> 1:1913 -> 1:654
   0%        1870 -> 1:6 -> 1:654
   0%         880 -> 3:3162
   0%         720 -> 2:632

Now the question is what was LGWR process doing and why was it constantly trying to read control file. If you run 10046 on LGWR process you will notice that LGWR will read control file while doing log switches.

WAIT #0: nam='control file sequential read' ela= 59 file#=0 block#=1 blocks=1 obj#=-1 tim=1429886768732384
WAIT #0: nam='control file sequential read' ela= 203 file#=0 block#=40 blocks=1 obj#=-1 tim=142988668732612
WAIT #0: nam='control file sequential read' ela= 197 file#=0 block#=42 blocks=1 obj#=-1 tim=1429886768732845
WAIT #0: nam='control file sequential read' ela= 216 file#=0 block#=1 blocks=1 obj#=-1 tim=1429886768733206
WAIT #0: nam='control file sequential read' ela= 192 file#=1 block#=1 blocks=1 obj#=-1 tim=1429886768733440
WAIT #0: nam='control file sequential read' ela= 200 file#=0 block#=40 blocks=1 obj#=-1 tim=1429886768733674
WAIT #0: nam='control file sequential read' ela= 198 file#=0 block#=42 blocks=1 obj#=-1 tim=1429886768733909

Then I checked alert log on instance 2 and found number of occurrences of following errors:

Tue Apr 21 19:59:21 2015
ORA-19815: WARNING: db_recovery_file_dest_size of 858993459200 bytes is 100.00% used, and has 0 remaining bytes available.
ARC1: Error 19809 Creating archive log file to '+RECO1'
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance dbname2 - Archival Error
ORA-16038: log 21 sequence# 2506 cannot be archived
ORA-19809: limit exceeded for recovery files

Session id 1262 on instance 2 was doing number of small control file sequential read and none of single IO was stuck in large wait.

@event_hist.sql control.*read session_id=1262 1596 1596
                                 Wait time    Num ASH   Estimated    Estimated    % Event  Estimated
Wait Event                      bucket ms+    Samples Total Waits    Total Sec       Time  Time Graph
------------------------------ ----------- ---------- ----------- ------------ ---------- ------------
control file sequential read           < 1         17     66149.3         66.1       88.3 |######### |
                                      < 16          2       171.8          2.7        3.6 |          |
                                      < 32          3       128.2          4.1        5.5 |#         |
                                     < 128          1        15.6          2.0        2.7 |          |

 

image

Majority of sessions were ultimately waiting for session id 1262 on instance 2 (LGWR process). I think wait counts for control file read went up as LGWR was constantly trying to switch log file but it couldn’t as recovery destination was full. So enq: SQ – contention was the symptom of LGWR spinning on log switch due to no space in recovery destination(db_recovery_file_dest_size).

References:

Advanced Oracle Troubleshooting Guide – Part 11: Complex Wait Chain Signature Analysis

Diagnosing buffer busy waits with the ash_wait_chains.sql script (v0.2)

Update: 04/27/2015

I deliberately filled recovery destination area and then tried to switch log files. Switch log file command was hung and LGWR 10046 trace showed repeated loop of trying to read control file blocks. First two lines shows it was trying to read control file header block from multiplexed control files. Below is the excerpt from LGWR trace file.

nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 
nam='control file sequential read' file#=0 block#=40 
nam='control file sequential read' file#=0 block#=42 
nam='control file sequential read' file#=0 block#=43 
nam='control file sequential read' file#=0 block#=815 
nam='control file sequential read' file#=0 block#=46 
nam='control file sequential read' file#=0 block#=1 
nam='control file sequential read' file#=1 block#=1 

Advertisements

3 Comments »

  1. I hit a similar issue on 10G, it looks like whenever the DB is stuck on log writer, if there was something else is stuck at the exact same time for concurrency, they continuously increase counter. I even remember discussing this with Tanel back in 2009, his recommendation was to suspend logwr to confirm the theory but I was to lazy.

    Comment by coskan — April 25, 2015 @ 1:34 pm

    • Coskan, I think I remember this chat (I hope we were talking about suspending the LGWR in a test environment, not production 😉

      Comment by Tanel Poder — April 26, 2015 @ 7:51 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: