2015-07-01 22 views
12

Ho usato OpenTSDB su HBase (Hadoop pseudo-distribuito su una casella virtuale) per inviare dati a carico molto elevato (~ 50.000 record/s). Il sistema ha funzionato correttamente per un po 'ma è andato giù all'improvviso. Ho terminato OpenTSDB e HBase. Sfortunatamente, non ho mai potuto portarli di nuovo. Ogni volta che provavo a eseguire HBase e OpenTSDB, mostravano i log degli errori. Qui elenco dei registri:Il server delle regioni HBase viene interrotto e non può mai essere attivato dopo questo

regionserver:

2015-07-01 18:15:30,752 INFO [sync.3] wal.FSHLog: Slow sync cost: 112 ms, current pipeline: [192.168.56.101:50010] 
2015-07-01 18:15:41,277 INFO [regionserver/node1.vmcluster/192.168.56.101:16201.logRoller] wal.FSHLog: Rolled WAL /hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742101122 with entries=3841, filesize=123.61 MB; new WAL /hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742141109 
2015-07-01 18:15:41,278 INFO [regionserver/node1.vmcluster/192.168.56.101:16201.logRoller] wal.FSHLog: Archiving hdfs://node1.vmcluster:9000/hbase/WALs/node1.vmcluster,16201,1435738612093/node1.vmcluster%2C16201%2C1435738612093.default.1435742061805 to hdfs://node1.vmcluster:9000/hbase/oldWALs/node1.vmcluster%2C16201%2C1435738612093.default.1435742061805 
2015-07-01 18:15:42,249 INFO [MemStoreFlusher.0] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 132.20 MB 
2015-07-01 18:15:42,381 INFO [MemStoreFlusher.1] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 133.09 MB 
2015-07-01 18:15:42,382 WARN [MemStoreFlusher.1] regionserver.DefaultMemStore: Snapshot called again without clearing previous. Doing nothing. Another ongoing flush or did we fail last attempt? 
2015-07-01 18:15:42,391 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: ABORTING region server node1.vmcluster,16201,1435738612093: Replay of WAL required. Forcing server shutdown 
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72. 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) 
    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.NegativeArraySizeException 
    at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:494) 
    at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) 
    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) 
    at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) 
    at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) 
    at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) 
    at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) 
    ... 7 more 
2015-07-01 18:15:42,398 FATAL [MemStoreFlusher.0] regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 

Master:

questo è lo stesso per uno su regionserver:

2015-07-01 18:15:42,596 ERROR [B.defaultRpcServer.handler=15,queue=0,port=16020] master.MasterRpcServices: Region server node1.vmcluster,16201,1435738612093 reported a fatal error: 
ABORTING region server node1.vmcluster,16201,1435738612093: Replay of WAL required. Forcing server shutdown 
Cause: 
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72. 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1772) 
    at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1704) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:445) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:407) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:69) 
    at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:225) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.NegativeArraySizeException 
    at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:494) 
    at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) 
    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) 
    at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) 
    at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) 
    at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) 
    at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) 
    ... 7 more 

e questo:

2015-07-01 18:17:16,971 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:21,976 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:26,979 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:31,983 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:36,985 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:41,992 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:45,283 WARN [CatalogJanitor-node1:16020] master.CatalogJanitor: Failed scan of catalog table 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=351, exceptions: 
Wed Jul 01 18:17:45 KST 2015, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68275: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node1.vmcluster,16201,1435738612093, seqNum=0 

    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:264) 
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:215) 
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:56) 
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) 
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288) 
    at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:267) 
    at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:139) 
    at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:134) 
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:823) 
    at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:187) 
    at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89) 
    at org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:169) 
    at org.apache.hadoop.hbase.master.CatalogJanitor.getMergedRegionsAndSplitParents(CatalogJanitor.java:121) 
    at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:222) 
    at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:103) 
    at org.apache.hadoop.hbase.Chore.run(Chore.java:80) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=68275: row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=node1.vmcluster,16201,1435738612093, seqNum=0 
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) 
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:310) 
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:291) 
    at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    ... 1 more 
Caused by: java.net.ConnectException: Connection refused 
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) 
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) 
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) 
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupConnection(RpcClientImpl.java:403) 
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:709) 
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:880) 
    at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:849) 
    at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1173) 
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216) 
    at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300) 
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:31889) 
    at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344) 
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:188) 
    at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) 
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) 
    ... 6 more 
2015-07-01 18:17:46,995 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:52,002 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:17:57,004 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:18:02,006 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 
2015-07-01 18:18:07,011 INFO [node1.vmcluster,16020,1435738420751.splitLogManagerTimeoutMonitor] master.SplitLogManager: total tasks = 1 unassigned = 1 tasks={/hbase/splitWAL/WALs%2Fnode1.vmcluster%2C16201%2C1435738612093-splitting%2Fnode1.vmcluster%252C16201%252C1435738612093..meta.1435738616764.meta=last_update = -1 last_version = -1 cur_worker_name = null status = in_progress incarnation = 0 resubmits = 0 batch = installed = 1 done = 0 error = 0} 

Dopo quello, quando Ho ricominciato HBase, regionserver ha mostrato questo errore:

2015-07-02 09:17:49,151 INFO [RS_OPEN_REGION-node1:16201-0] regionserver.HRegion: Replaying edits from hdfs://node1.vmcluster:9000/hbase/data/default/tsdb/1a692e2668a2b4a71aaf2805f9b00a72/recovered.edits/0000000000000169657 
2015-07-02 09:17:49,343 INFO [RS_OPEN_REGION-node1:16201-0] regionserver.HRegion: Started memstore flush for tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., current region memstore size 132.20 MB; wal is null, using passed sequenceid=169615 
2015-07-02 09:17:49,428 ERROR [RS_OPEN_REGION-node1:16201-0] handler.OpenRegionHandler: Failed open of region=tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72., starting to roll back the global memstore size. 
org.apache.hadoop.hbase.DroppedSnapshotException: region: tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72. 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2001) 
    at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:3694) 
    at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3499) 
    at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:889) 
    at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:769) 
    at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:742) 
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4921) 
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4887) 
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4858) 
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4814) 
    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4765) 
    at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:356) 
    at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:126) 
    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
    at java.lang.Thread.run(Thread.java:745) 
Caused by: java.lang.NegativeArraySizeException 
    at org.apache.hadoop.hbase.CellComparator.getMinimumMidpointArray(CellComparator.java:490) 
    at org.apache.hadoop.hbase.CellComparator.getMidpoint(CellComparator.java:448) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishBlock(HFileWriterV2.java:165) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.checkBlockBoundary(HFileWriterV2.java:146) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:263) 
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87) 
    at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:932) 
    at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:121) 
    at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:71) 
    at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:879) 
    at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2128) 
    at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1955) 
    ... 16 more 
2015-07-02 09:17:49,430 INFO [RS_OPEN_REGION-node1:16201-0] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => 1a692e2668a2b4a71aaf2805f9b00a72, NAME => 'tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72.', STARTKEY => '', ENDKEY => '\x00\x15\x08M|kp\x00\x00\x01\x00\x00\x01'} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting version 1 
2015-07-02 09:17:49,443 INFO [PriorityRpcServer.handler=9,queue=1,port=16201] regionserver.RSRpcServices: Open tsdb,,1435740133573.1a692e2668a2b4a71aaf2805f9b00a72. 
2015-07-02 09:17:49,458 INFO [StoreOpener-1a692e2668a2b4a71aaf2805f9b00a72-1] hfile.CacheConfig: blockCache=LruBlockCache{blockCount=3, currentSize=533360, freeSize=509808592, maxSize=510341952, heapSize=533360, minSize=484824864, minFactor=0.95, multiSize=242412432, multiFactor=0.5, singleSize=121206216, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, prefetchOnOpen=false 
2015-07-02 09:17:49,458 INFO [StoreOpener-1a692e2668a2b4a71aaf2805f9b00a72-1] compactions.CompactionConfiguration: size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point 2684354560; major period 604800000, major jitter 0.500000, min locality to compact 0.000000 
2015-07-02 09:17:49,519 INFO [RS_OPEN_REGION-node1:16201-2] regionserver.HRegion: Replaying edits from hdfs://node1.vmcluster:9000/hbase/data/default/tsdb/1a692e2668a2b4a71aaf2805f9b00a72/recovered.edits/0000000000000169567 

Aggiornamento

Oggi ho fatto il test a traffico molto leggero: 1000 record/sec per 2000 secondi, il problema è venuto di nuovo.

+0

Potrebbe essere correlato alla memoria poiché lo si utilizza in modalità Psuedo-Distributed. Puoi recuperare la memoria della VM e fornire ai daemon HBase più memoria? –

+0

@ brandon.bell La memoria del VM è stata impostata su 5 GB. Il mio host è di 8 GB di RAM, quindi non ho potuto allocare altro per la VM. Ad ogni modo, cosa intendi con "fornire ai demoni di HBase più memoria" qui? Posso comunque assegnare più memoria ai daemon HBase in modo esplicito? Se sì, quanto dovrei dargli? –

+0

In hbase-env.sh (di solito in/etc/hbase/conf /) è possibile impostare HBASE_HEAPSIZE. Esempio: 'export HBASE_HEAPSIZE = 2000'. –

risposta

3

Secondo HBASE-13329, una breve tipo overflow è verificato con "short diffIdx = 0" in HBase-common/src/main/java/org/apache/hadoop/HBase/CellComparator.java. A patch è disponibile da oggi (2015-07-06). Nella patch, gli sviluppatori HBase cambiano la dichiarazione da "short diffIdx = 0" a "int diffIdx = 0".

Ho eseguito test con la patch, ha funzionato correttamente.