royw on "gradual deterioration of Hive performance"

Hi,

We are developing daily batch processing with Hive (DSE 3.0). As the more tables being loaded and processed, we observing a distinct slow down of Hive for identical HQL batch process (operating on batch data of appropriately the same size).

One distinct factor that appears to be related to the slow down is the increasing delay between the result table generation and the MapReduce job completion. Following is an example of the symptom that we are observing: given the following simple hql execution, from the execution log, last log entry's time stamp is at 17:53:53, and the target file shows time stamp of 17:54:18 (converted from UTC), -- so there's a 25 seconds gap between the output table being created after the processing finished. When we started off such daily processing, this gap would be about 1 or 2 seconds, and as more data being processed, we are observing this delay gradually increases to the current 25 seconds. Our current the DSE node size is about 80GB.

Based on CFS design, we couldn't think of any possible reason how the amount of data already stored would negatively affect new data insertion. I am wondering if anyone has also experienced similar problem? Or any idea what configuration options could contribute to this?

thanks,
Roy

##HQL output:
> insert overwrite table tmp_bind_00_20130116A
> select distinct
> concat(cast(site_id as string),'-',bind_id) as bind_guid
> , bind_date
> , day_of_week
> , bind_id as ori_bind_id
> , site_id
> , device_id
> , substring(bind_date,1,10) as bind_day
> from rawdata_00_20130116
> ;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Execution log at: /tmp/root/root_20130513175353_58d446d0-b8c5-4206-9d22-bdb692c98d14.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2013-05-13 17:53:15,452 null map = 0%, reduce = 0%
2013-05-13 17:53:21,456 null map = 100%, reduce = 0%
2013-05-13 17:53:27,459 null map = 100%, reduce = 100%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Loading data to table staging.tmp_bind_00_20130116A
Table staging.tmp_bind_00_20130116A stats: [num_partitions: 0, num_files: 0, num_rows: 21685, total_size: 0, raw_data_size: 2233555]
OK
bind_guid bind_date day_of_week ori_bind_id site_id device_id bind_day
Time taken: 73.574 seconds

##excerpt of execution log (/tmp/root/root_20130513175353_58d446d0-b8c5-4206-9d22-bdb692c98d14.log):

2013-05-13 17:53:39,467 INFO exec.ExecDriver (SessionState.java:printInfo(391)) - Ended Job = job_local_0001
2013-05-13 17:53:39,473 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1267)) - Moving tmp dir: cfs://IVM-CRS-VM41/tmp/hive-root/hive_2013-05-13_17-53-05_049_2835076602177774524/_tmp.-ext-10000 to: cfs://IVM-CRS-VM41/tmp/hive-root/hive_2013-05-13_17-53-05_049_2835076602177774524/_tmp.-ext-10000.intermediate
2013-05-13 17:53:53,348 INFO exec.FileSinkOperator (Utilities.java:mvFileToFinalPath(1278)) - Moving tmp dir: cfs://IVM-CRS-VM41/tmp/hive-root/hive_2013-05-13_17-53-05_049_2835076602177774524/_tmp.-ext-10000.intermediate to: cfs://IVM-CRS-VM41/tmp/hive-root/hive_2013-05-13_17-53-05_049_2835076602177774524/-ext-10000

## time stamp of target table:
> dfs -stat /user/hive/warehouse/staging.db/tmp_bind_00_20130116A/000000_0;
2013-05-13 21:54:18

royw on "gradual deterioration of Hive performance"

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112