본문 바로가기

Real Application Cluster

Oracle RAC GCS Monitoring

Global Cache Services (GCS) Monitoring
The use of the GCS relative to the number of buffer cache reads,
or logical reads can be estimated by dividing the sum of GCS requests
(global cache gets + global cache converts + global cache cr blocks received + global cache current blocks received )
by the number of logical reads (consistent gets   + db block gets ) for a given statistics collection interval.

A global cache service request is made in Oracle when a user attempts to access a buffer cache to read or modify a data block
and the block is not in the local cache. A remote cache read, disk read or change access privileges is the inevitable result.
These are logical read related. Logical reads form a superset of the global cache service operations.
The calculation for global cache hit ratio since instance startup is:

SELECT
  a.inst_id "Instance",
  (A.VALUE+B.VALUE+C.VALUE+D.VALUE)/(E.VALUE+F.VALUE) "GLOBAL CACHE HIT RATIO"
FROM
  GV$SYSSTAT A,
  GV$SYSSTAT B,
  GV$SYSSTAT C,
  GV$SYSSTAT D,
  GV$SYSSTAT E,
  GV$SYSSTAT F
WHERE
  A.NAME='gc gets'
  AND B.NAME='gc converts'
  AND C.NAME='gc cr blocks received'
  AND D.NAME='gc current blocks received'
  AND E.NAME='consistent gets'
  AND F.NAME='db block gets'
  AND B.INST_ID=A.INST_ID
  AND C.INST_ID=A.INST_ID
  AND D.INST_ID=A.INST_ID
  AND E.INST_ID=A.INST_ID
  AND F.INST_ID=A.INST_ID;
 
SEE CODE DEPOT FOR MORE SCRIPTS
  Instance GLOBAL CACHE HIT RATIO
---------- ----------------------
         1              .02403656
         2             .014798887

The instance with the best access to the drives, or the faster I/O path, will likely have the best cache hit ratio.
This is due to the way Oracle's RAC caching algorithm works as it may decide that the cost of doing a local read is higher
than reading into the other cache and siphoning it across the cluster interconnect. In formula form:
(gc gets + gc converts + gc cr blocks received +
gc current blocks received) / (consistent gets + db block gets)
Blocks frequently requested by local and remote users will be very hot. If a block is hot,
its transfer is delayed for a few milliseconds to allow the local users to complete their work.
The following ratio provides a rough estimate of how prevalent this is:

SELECT
  A.INST_ID "Instance",
  A.VALUE/B.VALUE "BLOCK TRANSFER RATIO"
FROM
  GV$SYSSTAT A, GV$SYSSTAT B
WHERE
  A.NAME='gc defers'
  AND B.NAME='gc current blocks served'
  AND B.INST_ID=A.INST_ID;
 
SEE CODE DEPOT FOR MORE SCRIPTS
  Instance BLOCK TRANSFER RATIO
---------- --------------------
         1           .052600105
         2           .078004479

If the above SELECT generates a ratio of more than 0.3, a fairly hot data set is indicated.
If this is the case, blocks involved in busy waits should be analyzed.
The following columns should be queried to find the blocks involved in busy waits:
- name
- kind
- forced_reads
- forced_writes

For example:

col instance format 99999999
col name format a20
col kind format a10
set lines 80 pages 55
Select
INST_ID "Instance",
NAME,
KIND,
sum(FORCED_READS) "Forced Reads",
sum(FORCED_WRITES) "Forced Writes"
FROM GV$CACHE_TRANSFER
WHERE owner#!=0
GROUP BY INST_ID,NAME,KIND
ORDER BY 1,4 desc,2;
 
SEE GRID CODE DEPOT FOR DOWNLOAD

Instance NAME                 KIND       Forced Reads Forced Writes
--------- -------------------- ---------- ------------ -------------
        1 MOD_TEST_IND         INDEX               308             0
        1 TEST2                TABLE                64             0
        1 AQ$_QUEUE_TABLES     TABLE                 5             0
        2 TEST2                TABLE               473             0
        2 MOD_TEST_IND         INDEX               221             0
        2 AQ$_QUEUE_TABLES     TABLE                 2             0

These values come from the gv$cache_transfer   view. Alternatively,
the cr_requests and current_requests columns in gv$cr_block_server can be examined.
Also, the values shown for the global cache busy , buffer busy global cache ,
and buffer busy global cr statistics from the gv$sysstat   view should be examined.

SELECT
  INST_ID,
  sum(CR_REQUESTS) "CR Requests",
  sum(CURRENT_REQUESTS) "Current Requests"
FROM
  GV$CR_BLOCK_SERVER
GROUP BY
  INST_ID;
 
   INST_ID CR Requests Current Requests
---------- ----------- ----------------
         1       28940             2244
         2       31699              837
 
SELECT
  inst_id "Instance",
  event "Wait Event",
  total_waits,
  time_waited
FROM
  GV$SYSTEM_EVENT
WHERE
      event in (
       'global cache busy',
       'buffer busy global cache',
       'buffer busy global CR')
ORDER BY
  INST_ID;
 
SEE CODE DEPOT FOR MORE SCRIPTS

 Instance Wait Event               TOTAL_WAITS TIME_WAITED
--------- ------------------------ ----------- -----------
        1 buffer busy global CR              1           0
        1 global cache busy               1073        7171
        2 global cache busy                973        7524

If a problem is discovered, the object causing the problem should be identified along with the instance that is accessing the object,
and how the object is being accessed. If necessary, the contention can be alleviated by:
- Reducing hot spots by spreading the accesses to index blocks or data blocks.
- Using Oracle hash or range partitions wherever applicable, just as it would be done in single instance Oracle databases.
- Reducing concurrency on the object by implementing load balancing or resource management.
For example, decrease the rate of modifications to that object by using fewer database processes.
In RAC, as in a single instance Oracle database, blocks are only written to disk for aging,
cache replacement, or checkpoints. When a data block is replaced from the cache due to aging or
when a checkpoint occurs and the block was previously changed in another instance but not written to disk,
Oracle sends a message to notify the other instance that Oracle will perform a fusion write to move the data block to disk.
These fusion writes are monitored with the following ratio. It reveals the proportion of writes that Oracle manages.

SELECT
 a.inst_id "Instance",
 A.VALUE/B.VALUE "Cache Fusion Writes Ratio"
FROM
  GV$SYSSTAT A,
  GV$SYSSTAT B
WHERE
      a.name='DBWR fusion writes'
  AND b.name='physical writes'
  AND b.inst_id=a.inst_id
ORDER BY
  A.INST_ID;
 
Instance Cache Fusion Writes Ratio
--------- -------------------------
        1                .216290958
        2                .131862042
 
The larger this ratio is, the higher the number of written blocks that have been copied with their previous changes between the RAC instances.
A large ratio is the result of:
- Insufficiently sized caches.
- Insufficient checkpoints.
- Large numbers of buffers written due to cache replacement or checkpointing.
For example, 0.21 means that 21% of the buffers written to disk were globally dirty.
A fusion write does not involve an additional write to disk.
A fusion write does require messaging to arrange the transfer with the other instances.
This indicates that fusion writes are in fact a subset of all the instance's physical writes.

'Real Application Cluster' 카테고리의 다른 글

CRS 와 10G REAL APPLICATION CLUSTERS  (0) 2012.07.09