It all started with a DDL I had to perform - a Truncate.
This failed as it follows:
truncate table TRANSACTION_FLAGS_IOT;
truncate table TRANSACTION_FLAGS_IOT
Error report:
SQL Error: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
00054. 00000 - "resource busy and acquire with NOWAIT specified"
*Cause: Resource interested is busy.
*Action: Retry if necessary.
The error tells us that I still have an active DML on the resource, which have found being true, using the below query:
select session_id,owner,name,mode_held,blocking_others;
from dba_dml_locks
where name like 'TRANSACTION_FLAGS_IOT'
151 SMART TRANSACTION_FLAGS_IOT Row-X (SX) Not Blocking
Now, I want to see some more details on this session:
select
sid, serial#,paddr, status,type, last_call_et,blocking_Session,event,wait_class,state, command
from v$session
where sid=151
151 2439 000000015F4DBEA0 ACTIVE USER 537179 db file sequential read User I/O WAITING 0
Since this session is in Waiting state for so much time, I confirmed with users this can be safely killed:
alter system kill session '151,2439';
alter system kill session '151,2439'
Error report:
SQL Error: ORA-00030: User session ID does not exist.
00030. 00000 - "User session ID does not exist."
*Cause: The user session ID no longer exists, probably because the
session was logged out.
*Action: Use a valid session ID.
This normally happens when you want to kill a session that has a (large) transaction to rollback.
We cannot kill it since it has to rollback, so we expect that the serial# in v$session for the process will keep changing, but in my case, this change of serial# does not happen.
Such cleanup of a rollback is on a low priority, done by SMON background process.
I kept checking and checking, my transaction was not doing a rollback in the shadow, I suspect it was somehow blocked.
Now I don't want to kill the OS process associate with my session, I want to understand where is stuck.
So I identify the OS process as being 1783:
select addr,pid,spid,username,pga_used_mem from v$process where addr = '000000015F4DBEA0'
000000015F4DBEA0 22 1783 oracle 2343004
I verify the existance of the process on the DB server:
[oracle@performance ~]$ ps -flp 1783
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
0 D oracle 1783 1 0 77 0 - 1108041 sync_p Aug10 ? 00:54:24 oracletoptech (LOCAL=NO)
Now, in order to track the operations done in the background by my process, I take a pstack (process stack - current state of server process 1783):
[oracle@performance ~]$ pstack 1783
#0 0x000000364c60e3f3 in __pread_nocancel () from /lib64/libpthread.so.0
#1 0x00000000085b9ee1 in skgfqio ()
#2 0x000000000849c745 in ksfd_skgfqio ()
#3 0x000000000849ba4b in ksfd_io ()
#4 0x000000000849b104 in ksfdread ()
#5 0x0000000001b28a4d in kcfrbd1 ()
#6 0x0000000000d1d9c8 in kcbzib ()
#7 0x00000000082e5a04 in kcbgcur ()
#8 0x00000000082adb53 in ktbgcur ()
#9 0x00000000015353d5 in kdiulk ()
#10 0x0000000008476bf4 in kcoubk ()
#11 0x0000000008291fce in ktundo ()
#12 0x0000000000c53a5f in kturCurrBackoutOneChg ()
#13 0x0000000000c52037 in ktuabt ()
#14 0x0000000000c4c516 in ktcAbortTxn ()
#15 0x0000000000c46a23 in ktcsod ()
#16 0x000000000822c757 in kssdel ()
#17 0x000000000822da91 in kssdch ()
#18 0x0000000000b5dc5b in ksuxds ()
#19 0x0000000000b5d1b7 in ksudel ()
#20 0x00000000014394e1 in opidcl ()
#21 0x0000000001437bb1 in opidrv ()
#22 0x00000000018aac5b in sou2o ()
#23 0x00000000009d3eb5 in opimai_real ()
#24 0x00000000018afeea in ssthrdmain ()
#25 0x00000000009d3e21 in main ()
This shows that my process is doing some work, and have to identify where is waiting.
The mnemonics shown are described in ORA-600 Lookup Error Categories [ID 175982.1], for example, reading it buttom-up
- ktc - transaction control operations at the block level
- ktcAbortTxn - function to abort transaction
- ktu - internal management of undo and rollback segments
The last (from buttom-up) command is __pread_nocancel () from /lib64/libpthread.so.0
The pread system call does an atomic position-and-read, in our case of /lib64/libpthread.so.0.
So I am having a blocking read, but why on /lib64/libpthread.so.0? Do I have this file, why my Oracle account is not able to read it?
[oracle@performance ~]$ cd /lib64
[oracle@performance lib64]$ ls -lrt libpthread.so.0
lrwxrwxrwx 1 root root 17 Nov 23 2010 libpthread.so.0 -> libpthread-2.5.so
The file is there, is defined as a symbolic link, but am not able to read it.
Now I will look in my environment variable LD_LIBRARY_PATH and will find the answer, the 64 bit library in not defined in my environment:
[oracle@performance ~]$ echo $LD_LIBRARY_PATH
/u01/app/oracle/product/11.1.0/db_1/lib:/lib:/usr/lib
So the /lib64/libpthread.so.0 could have never been able to be read with my current definition of the $LD_LIBRARY_PATH.
I had to do a classic kill -9 1783 in order to kill the process and continue to innitiate the Truncate command.
I then changed the variable definition.
References
ORA-600 Lookup Error Categories [ID 175982.1]
ORA-04031 "KSFQ Buffers" ksmlgpalloc [ID 453521.1]
http://blog.tanelpoder.com/2007/09/06/advanced-oracle-troubleshooting-guide-part-3-more-adventures-in-process-stack/
thanks, it helped me a lot!
ReplyDelete