Tuesday, October 11, 2011

ORA-00030 raised on kill session - troubleshoot with Oracle pstack

It all started with a DDL I had to perform - a Truncate.
This failed as it follows:
truncate table TRANSACTION_FLAGS_IOT;

truncate table TRANSACTION_FLAGS_IOT
Error report:
SQL Error: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
00054. 00000 -  "resource busy and acquire with NOWAIT specified"
*Cause:    Resource interested is busy.
*Action:   Retry if necessary.

The error tells us that I still have an active DML on the resource, which have found being true, using the below query:

select session_id,owner,name,mode_held,blocking_others;
from dba_dml_locks
where name like 'TRANSACTION_FLAGS_IOT'
151 SMART TRANSACTION_FLAGS_IOT Row-X (SX) Not Blocking

Now, I want to see some more details on this session:

select
sid, serial#,paddr, status,type, last_call_et,blocking_Session,event,wait_class,state, command
from v$session
where sid=151

151 2439 000000015F4DBEA0 ACTIVE USER 537179  db file sequential read User I/O WAITING 0

Since this session is in Waiting state for so much time, I confirmed with users this can be safely killed:

alter system kill session '151,2439';

alter system kill session '151,2439'
Error report:
SQL Error: ORA-00030: User session ID does not exist.
00030. 00000 -  "User session ID does not exist."
*Cause:    The user session ID no longer exists, probably because the
           session was logged out.
*Action:   Use a valid session ID.

This normally happens when you want to kill a session that has a (large) transaction to rollback.
We cannot kill it since it has to rollback, so we expect that the serial# in v$session for the process will keep changing, but in my case, this change of serial# does not happen.
Such cleanup of a rollback is on a low priority, done by SMON background process.
I kept checking and checking, my transaction was not doing a rollback in the shadow, I suspect it was somehow blocked.
Now I don't want to kill the OS process associate with my session, I want to understand where is stuck.
So I identify the OS process as being 1783:

select addr,pid,spid,username,pga_used_mem from v$process where addr = '000000015F4DBEA0'
000000015F4DBEA0 22 1783 oracle 2343004

I verify the existance of the process on the DB server:

[oracle@performance ~]$ ps -flp 1783
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 D oracle    1783     1  0  77   0 - 1108041 sync_p Aug10 ?      00:54:24 oracletoptech (LOCAL=NO)

Now, in order to track the operations done in the background by my process, I take a pstack (process stack - current state of server process 1783):

[oracle@performance ~]$ pstack 1783
#0  0x000000364c60e3f3 in __pread_nocancel () from /lib64/libpthread.so.0
#1  0x00000000085b9ee1 in skgfqio ()
#2  0x000000000849c745 in ksfd_skgfqio ()
#3  0x000000000849ba4b in ksfd_io ()
#4  0x000000000849b104 in ksfdread ()
#5  0x0000000001b28a4d in kcfrbd1 ()
#6  0x0000000000d1d9c8 in kcbzib ()
#7  0x00000000082e5a04 in kcbgcur ()
#8  0x00000000082adb53 in ktbgcur ()
#9  0x00000000015353d5 in kdiulk ()
#10 0x0000000008476bf4 in kcoubk ()
#11 0x0000000008291fce in ktundo ()
#12 0x0000000000c53a5f in kturCurrBackoutOneChg ()
#13 0x0000000000c52037 in ktuabt ()
#14 0x0000000000c4c516 in ktcAbortTxn ()
#15 0x0000000000c46a23 in ktcsod ()
#16 0x000000000822c757 in kssdel ()
#17 0x000000000822da91 in kssdch ()
#18 0x0000000000b5dc5b in ksuxds ()
#19 0x0000000000b5d1b7 in ksudel ()
#20 0x00000000014394e1 in opidcl ()
#21 0x0000000001437bb1 in opidrv ()
#22 0x00000000018aac5b in sou2o ()
#23 0x00000000009d3eb5 in opimai_real ()
#24 0x00000000018afeea in ssthrdmain ()
#25 0x00000000009d3e21 in main ()

This shows that my process is doing some work, and have to identify where is waiting.
The mnemonics shown are described in ORA-600 Lookup Error Categories [ID 175982.1], for example, reading it buttom-up
    - ktc - transaction control operations at the block level
    - ktcAbortTxn - function to abort transaction
    - ktu - internal management of undo and rollback segments
The last (from buttom-up) command is __pread_nocancel () from /lib64/libpthread.so.0
The pread system call does an atomic position-and-read, in our case of /lib64/libpthread.so.0.
So I am having a blocking read, but why on /lib64/libpthread.so.0? Do I have this file, why my Oracle account is not able to read it?

[oracle@performance ~]$ cd /lib64
[oracle@performance lib64]$ ls -lrt libpthread.so.0
lrwxrwxrwx 1 root root 17 Nov 23  2010 libpthread.so.0 -> libpthread-2.5.so

The file is there, is defined as a symbolic link, but am not able to read it.
Now I will look in my environment variable LD_LIBRARY_PATH and will find the answer, the 64 bit library in not defined in my environment:

[oracle@performance ~]$ echo $LD_LIBRARY_PATH
/u01/app/oracle/product/11.1.0/db_1/lib:/lib:/usr/lib

So the /lib64/libpthread.so.0 could have never been able to be read with my current definition of the $LD_LIBRARY_PATH.
I had to do a classic kill -9 1783 in order to kill the process and continue to innitiate the Truncate command.
I then changed the variable definition.

References
ORA-600 Lookup Error Categories [ID 175982.1]
ORA-04031 "KSFQ Buffers" ksmlgpalloc [ID 453521.1]
http://blog.tanelpoder.com/2007/09/06/advanced-oracle-troubleshooting-guide-part-3-more-adventures-in-process-stack/

Monday, October 10, 2011

Read alert log file in Oracle 11g

In Oracle 11g we have the possibility to read the alert log file directly from a X$ view.
This view can be read, only when connected with sysdba role.

Below sql will display the content of the alert log file for the current sysdate , however you can restrict the result set based on your needs:

select to_char(originating_timestamp, 'DD-MON-YYYY HH:MI:SS HH24') "TIME",
       MESSAGE_TEXT
from   X$DBGALERTEXT
where  to_char(originating_timestamp,'DD-MON-YYYY') = TO_CHAR(SYSDATE,'DD-MON-YYYY')
ORDER BY originating_timestamp ASC

Change DB name for a cloned DB

Clone a DB and ORACLE_HOME:

1) perform a cold backup of source DB, and copy the following to the target machine              :
  a) oracle software
  b) oracle home and datafiles (datafiles, controlfiles, redo log files)
  c) configuration files, environment variables, system files
 
2) on the target machine
  a) edit the init ora file and keep the db_name and db_unique_name as in source DB
3) start-up and mount the database with the source db name unchanged
SQL> startup pfile='/u01/DEV3/apps/proddb/11.2.0/dbs/initDEV3.ora nomount;
ORACLE instance started.
Total System Global Area 5344731136 bytes
Fixed Size                  2213136 bytes
Variable Size            3489663728 bytes
Database Buffers         1811939328 bytes
Redo Buffers               40914944 bytes
SQL> alter database mount;
Database altered.
SQL> exit

4) backup the control file to trace as 'file_name' in order to edit the path of the datafiles
SQL> alter database backup controlfile to trace as '/u01/DEV3/apps/proddb/11.2.0/dbs/ctl_bck.sql';
Database altered.
SQL> exit

5) edit the control file sql script in order for the datafiles and redo log files to point to the correct location on the target machine

6) bring the database to nomount in order to run the modified control file script
SQL> shutdown immediate
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.
SQL> startup pfile='/u01/DEV3/apps/proddb/11.2.0/dbs/initDEV3.ora' nomount;
ORACLE instance started.
Total System Global Area 5344731136 bytes
Fixed Size                  2213136 bytes
Variable Size            3489663728 bytes
Database Buffers         1811939328 bytes
Redo Buffers               40914944 bytes
SQL>  @ctl_bck.sql
Control file created.

SQL> alter database mount;
alter database mount
*
ERROR at line 1:
ORA-01100: database already mounted

SQL> exit

The error is raised since the control file script already contains the statement to mount the database.

7) Open the database in order for the file header to synchronize and your new controlfile to become current, then shut it down:
SQL> alter database open;
Database altered.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
8) Startup mount - in order to change the DB name         
SQL> startup pfile='/u01/DEV3/apps/proddb/11.2.0/dbs/initDEV3.ora' nomount;
ORACLE instance started.
Total System Global Area 5344731136 bytes
Fixed Size                  2213136 bytes
Variable Size            3489663728 bytes
Database Buffers         1811939328 bytes
Redo Buffers               40914944 bytes
SQL> alter database mount;
Database altered.
             
9) Change the DBID, using 'nid'
applprod@gvaebsdb3:/u01/DEV3/apps/proddb/11.2.0/dbs $ nid target=system/<password> DBNAME=DEV3
DBNEWID: Release 11.2.0.1.0 - Production on Mon Oct 10 12:38:44 2011
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
Connected to database PROD (DBID=141797709)
Connected to server version 11.2.0
Control Files in database:
    /u02/DEV3/dbf/DEV3/control01.ctl
    /u01/DEV3/apps/proddb/flash_recovery_area/PROD/control02.ctl
Change database ID and database name PROD to DEV3? (Y/[N]) => Y
Proceeding with operation
Changing database ID from 141797709 to 3627549972
Changing database name from PROD to DEV3
    Control File /u02/DEV3/dbf/DEV3/control01.ctl - modified
    Control File /u01/DEV3/apps/proddb/flash_recovery_area/PROD/control02.ctl - modified
    Datafile /u02/DEV3/dbf/DEV3/system01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/sysaux01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_MEDIA03.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/users01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/example01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/UNDO01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_ARCHIVE01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_NOLOGGING01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_QUEUES01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_SEED01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_SUMMARY01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TOOLS01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/DBA01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/DISCOVERER01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/INTERIM.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_INTERFACE01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_INTERFACE02.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_MEDIA01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_MEDIA02.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX02.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX03.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX04.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX05.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX06.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_IDX07.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA01.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA02.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA03.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA04.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA05.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA06.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TX_DATA07.db - dbid changed, wrote new name
    Datafile /u02/DEV3/dbf/DEV3/APPS_TS_TEM_MV01.db - dbid changed, wrote new name
    Control File /u02/DEV3/dbf/DEV3/control01.ctl - dbid changed, wrote new name
    Control File /u01/DEV3/apps/proddb/flash_recovery_area/PROD/control02.ctl - dbid changed, wrote new name
    Instance shut down
Database name changed to DEV3.
Modify parameter file and generate a new password file before restarting.
Database ID for database DEV3 changed to 3627549972.
All previous backups and archived redo logs for this database are unusable.
Database is not aware of previous backups and archived logs in Recovery Area.
Database has been shutdown, open database with RESETLOGS option.
Succesfully changed database name and ID.
DBNEWID - Completed succesfully.

10) Edit the init<SID>.ora file and change the db_name and db_unique_name to the new DB name
11) Start the db using the new init.ora file and check the status and the name of the db:
applprod@gvaebsdb3:/u01/DEV3/apps/proddb/11.2.0/dbs $ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Mon Oct 10 12:41:56 2011
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup pfile='/u01/DEV3/apps/proddb/11.2.0/dbs/initDEV3.ora'
ORACLE instance started.
Total System Global Area 5344731136 bytes
Fixed Size                  2213136 bytes
Variable Size            3489663728 bytes
Database Buffers         1811939328 bytes
Redo Buffers               40914944 bytes
Database mounted.
ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

SQL> alter database open resetlogs;
Database altered.
SQL> select status from v$instance;
STATUS
------------
OPEN
SQL> select name,db_unique_name from v$database;
NAME      DB_UNIQUE_NAME
--------- ------------------------------
DEV3      DEV3

12) Create a spfile from pfile in order for the database to start using a spfile on the next startup
SQL> create spfile from pfile;
File created.

References: MOS: [ID 28433.1]