Ken's DBA blog

Thursday, December 19, 2019

How to merge two large tables? (Oracle)

Scenario:
There are two large tables, table A has 3 billion records and table B has 80 million records. How to merge them into one table?

Solutions:
################
# Solution 1:
################
Use MERGE sql statement with parallel.
--pros: simple and easy to understand
--cons: the requirement to hardware is relatively high

################
# Solution 2:
################

Hash partition both tables by ID, then use MERGE with parallel. This will use Oracle's new feature PWJ -- Partition Wise Join.

################
# Solution 3:
################

Create a new table C, then insert into C using full outer job of table A and B. Then rename table C, rebuild indexes...etc.

--PWJ + parallel + nologging + compress

################
# Solution 4:
################

11g + version only

Use DBMS_PARALLEL_EXECUTE to separate table B(which has less records) by ROWID. Then use cursor/plsql to loop thru the smaller tables.

--pros: less rollback/UNDO

--cons: very slow using cursor .

################
# Solution 5:
################
Compare records in both A and B and remove duplicate ones. Then do a direct INSERT.
--pros: no MERGE, less expensive
--cons: more DML than other solutions

How to see progress of long running sql statements? (Oracle)

From now and then, you are asked by your teammates about when his/her sql statements will finish. Here is a script that shows an estimation. It can be used for any long running sqls including RMAN jobs as well.

Sometimes this estimation might not be accurate. The elapsed_seconds might change to zero or a few seconds frequently for some long running jobs. That means the job is progressing and it is not hang.

prompt ====================================================
prompt == Start of check_longops.sql
prompt ====================================================

prompt ====================================================
prompt == Check Long Operation notes
prompt == These are just Estimations and sometimes not accurate.
prompt ==
prompt ==
prompt ====================================================

SET PAGES 100
SET LINE 200
COLUMN sid FORMAT 999
COLUMN serial# FORMAT 9999999
COLUMN machine FORMAT A20
COLUMN progress_pct FORMAT 99999999.00
COLUMN elapsed FORMAT A10
COLUMN remaining FORMAT A10
COLUMN message FORMAT a80
COLUMN opname FORMAT a20
column username format a20

SELECT s.sid,
sl.username,
opname,
MESSAGE,
s.serial#,
s.machine,
ROUND (sl.elapsed_seconds / 60) || ':' || MOD (sl.elapsed_seconds, 60)
elapsed,
ROUND (sl.time_remaining / 60) || ':' || MOD (sl.time_remaining, 60)
remaining,
DECODE (totalwork, 0, 0, ROUND (sl.sofar / sl.totalwork * 100, 2))
progress_pct
FROM gv$session s, gv$session_longops sl
WHERE s.sid = sl.sid AND s.serial# = sl.serial# AND sofar <> totalwork;

prompt ====================================================
prompt == End of check_longops.sql
prompt ====================================================

Wednesday, December 18, 2019

How to check locking status? (Oracle)

When you are called to help an application hang issue, you would like to know if there are any database sessions are blocked and what transactions are blocking them and who/which session/which machines are blocking them.

The following script would help you find out if there are any locks in the database and who is blocking them.

prompt ====================================================

prompt == Start of check_lock.sql

prompt ====================================================

prompt == Check Locks notes

prompt ==

prompt == Who is blocking whom?

prompt ==

prompt ====================================================

select s1.username || ' '|| s1.machine

|| ' ( SID=' || s1.sid || ' ) is blocking '

|| s2.username || ' '|| s2.machine || ' ( SID=' || s2.sid || ' ) ' AS blocking_status , s2.event as wait_event

from v$lock l1, v$session s1, v$lock l2, v$session s2

where s1.sid=l1.sid and s2.sid=l2.sid

and l1.BLOCK=1 and l2.request > 0

and l1.id1 = l2.id1

and l2.id2 = l2.id2 ;

prompt ====================================================

prompt == End of check_lock.sql

prompt ====================================================

How to check IO performance? (Oracle)

During performance tuning or being called to debug a performance issues of the database system, we always would like to know the current I/O status and how it compares with baseline periods. This is a handy script that would give you useful information.

Please keep in mind that busy I/O might not necessary lead to the conclusion that I/O is the bottleneck or culprit of the performance issue.

prompt ====================================================
prompt == Start of check_io.sql
prompt ====================================================

prompt ====================================================
prompt == Check IO notes
prompt == OS level should support Async for better performance.
prompt == In general, for waits 1000+ per hour, elapsed time
prompt == should be less than 15~20ms.
prompt ==
prompt == Single Block -->wait on db file sequential read -->index visit or rollback
prompt ==
prompt == Multi Block -->wait on db file scattered read -->full table scan or index fast full scan
prompt ==
prompt ==
prompt ==
prompt == How to Tell if the I/O of the Database is Slow (Doc ID 1275596.1)
prompt ====================================================
prompt ====================================================
prompt == Show IO Stats
prompt ====================================================
set lines 300
set pages 300
select * from DBA_HIST_IOSTAT_DETAIL;

prompt ====================================================
prompt == Show IO Metric
prompt ====================================================
select * from dba_hist_sysmetric_history where metric_name like 'I/O%Megabytes%';

prompt ====================================================
prompt == End of check_io.sql
prompt ====================================================

How to generate a report of all invalid objects in the database? (Oracle)

When should we use this script?
--Before and after upgrading your database
--Before and after deploying new DDL changes to your database
--Debug ORA-04063 or similar errors

DOC
#######################################################################
#######################################################################
The following statement will generate a html report Invalid.html
which shows invalid object details of the database.
Please note that the last two statements might fail in higher db
versions.
#######################################################################
#######################################################################
#

spool Invalid.html

set pagesize 50000
set markup html on

col comp_id format a10
col comp_name format a30
col version format a10
col object_name format a30
col owner format a15
select * from v$version;
select comp_id, comp_name, status, version from dba_registry;
select count(*) from dba_objects where status='INVALID';
select owner,object_type,count(*) from dba_objects where status='INVALID' group by owner,object_type;
select owner,object_name,object_type, status from dba_objects where status = 'INVALID';
select owner,object_name,object_type, status from dba_objects where status = 'INVALID'and owner IN ('SYSTEM' , 'SYS' );
select comp_name, version,status from dba_registry where STATUS <> 'VALID';
select * from registry$sys_inv_objs;
select * from registry$nonsys_inv_objs;

set markup html off
spool off

Tuesday, December 17, 2019

How to find the sessions that are using most CPUs? (Oracle)

Here is a handy script I usually use.

check_cpu.sql

prompt ====================================================
prompt == Start of check_cpu.sql
prompt ====================================================

prompt ====================================================
prompt == Check CPU notes
prompt == OS level CPU usage can be monitor by top or topas
prompt == Where you should be able to pin point the top
prompt == process ID
prompt ==
prompt ====================================================
prompt ====================================================
prompt == Show top 20 CPU usage sessions
prompt ====================================================
set lines 300
set pages 300
col machine format a20
col program format a30
col name format a30
select * from
(SELECT se.username, ss.sid,se.machine,se.program, name, se.status, ROUND (value/100) "CPU Usage Seconds", se.sql_id
FROM v$session se, v$sesstat ss, v$statname st
WHERE ss.statistic# = st.statistic#
AND name LIKE '%CPU used by this session%'
AND se.sid = ss.SID
AND se.username IS NOT NULL
ORDER BY value DESC)
where rownum <=20;

prompt ====================================================
prompt == Check Sql Text related to above sql_id
prompt ====================================================
SET LONG 100000
select sql_id, SQL_TEXT from v$SQL where SQL_ID in (
select sql_id from
(SELECT se.username, ss.sid,se.machine,se.program, name, se.status, ROUND (value/100) "CPU Usage Seconds", se.sql_id
FROM v$session se, v$sesstat ss, v$statname st
WHERE ss.statistic# = st.statistic#
AND name LIKE '%CPU used by this session%'
AND se.sid = ss.SID
AND se.username IS NOT NULL
ORDER BY value DESC)
where rownum <=20
);

prompt ====================================================
prompt == End of check_cpu.sql
prompt ====================================================

Wednesday, June 29, 2016

/app keep growing in Cassandra (Cassandra)

During nodetool repair job, we found an interesting issue that /app goes up quickly and drops down after job finishing.

The official document says:
"By default, the repair command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots. For example, if you have RF=3 and A, B and C represents three replicas, this command takes a snapshot of each replica immediately and then sequentially repairs each replica from the snapshots (A<->B, A<->C, B<->C) instead of repairing A, B, and C all at once. This allows the dynamic snitch to maintain performance for your application via the other replicas, because at least one replica in the snapshot is not undergoing repair."

So basically during this repair job, data will be checked thru all 3 nodes(if RF=3) and sync any inconsistency (if there is any).

If the repair jobs on all nodes are set start at the same time, this causes HUGE resource issues and data usage issues because snapshots will be created for this job cross all nodes!!

So the solution is:
1) run the repair job on one node, mark the time.
2) cron the jobs accordingly for all nodes without overlapping each other.
3) keep an eye on the cron jobs to ensure they work as expected.