We had batch system that starts concurrent sessions all with parallel(8). With every release more concurrent sessions are added to that job and at times few sessions are getting serialized which in turn impacts runtime of batch. Downgraded SQL that used to take few minutes now run for couple of hours and does not complete. Lot of concurrent sessions completes within range of 1-5 minutes so it would be ideal to queue the sessions until it gets all parallel slave processes instead of downgrade and missing SLA. We did not want all the features of parallel automatic tuning so I thought of quickly testing only subset of feature called parallel statement queuing.
Operational warehouse system is on 5 node RAC (non Exadata) and batch is tied to service running on single node and parallel_force_local was set to TRUE for known reasons.
Test 1: There are two instances with total 16 slave processes and parallel_force_local set to TRUE. When we submit 4 concurrent sessions with parallel(4) there was downgrade and no queuing. When we submit 4 concurrent sessions with parallel(8), one session is queued, two are downgraded and one gets all 8 slaves.
Test 2: Now I turn off parallel_force_local and we don’t get queuing with 4 concurrent sessions and parallel(4) as there are 16 slaves available across node. When we submit 6 concurrent sessions with parallel(4) we do get queuing and eventually all processes can get requested number of slaves and there were no downgrades.
Test 3: If I turn on parallel_force_local (same as Test 1) we get downgrades.
Test 4: Now I left parallel_force_local turned on but set parallel_min_server, parallel_max_servers and parallel_server_Targets to 0 on second instance. So there are only 8 slaves to serve requests across the cluster. Now I get queuing for 4 concurrent threads with parallel(4) and parallel(6).
Test 5: Now I tried parameter instead of hint and got same behavior.
Based on above observations, I think statement queuing logic does not take into consideration parallel_force_local set to TRUE or always assumes its FALSE. When we submit concurrent requests, Oracle thinks that there are slaves available (across the cluster) but when it actually tries to execute SQL due to parallel_force_local set to TRUE it has to get all slaves from same node and there are not enough slaves so its getting downgraded.
I will check with Oracle support to find out if its bug or design limitation and update this blog.
Update from Oracle support:
Bug 21438369 is fixed in 12.2
If the system is under a high load of PFL queries, then you may see that parallel_servers_target is not conformed by Resource Manager, or may result in downgrades even.
Parallel statement queuing did not conform to parallel_servers_target when a lot of parallel_force_local (PFL) queries are run at the same time. This was happening due to the stale published load value: kxfpinaslv.
It looks like there are some ongoing problems with these settings.
Instead of parallel_force_local = true setting, dev suggests:
If they want to restrict certain jobs to certain nodes, they should do the following:
– Create a service for each job type.
– Enable the service on the nodes where it should run.
– Create a consumer group for each service and map the service to the consumer group.