Quantcast
Channel: DataStax Support Forums » Recent Topics
Viewing all articles
Browse latest Browse all 387

acchen on "Cassandra Node dying, saw OpsCenter thrift operation queue full prior"

$
0
0

We have just moved Cassandra 1.1.7 into production today, but just before that we saw two Cassandra nodes go down with OOM. We saw this error in the past in load tests and have tuned the nofiles accordingly so these should not occur. Also note that this error happened when there was NO load on the infrastructure.

ERROR [Thread-22] 2013-07-08 16:31:50,905 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[Thread-22,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:652)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:581)
at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:155)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:113)

We could NOT start the Cassandra server back up (kept giving OOM error). Only after we shutdown the OpsCenter (Enterprise 2.1.3) agent were we able to start Cassandra back up, then start the agent back up. Below is the agent.log close to the time of the Cassandra node dying. We are seeing a lot of thrift operation queue full and operations being dropped. We are also NOT using secondary indexes. Any thoughts are welcome, thanks!!

In agent.log:
WARN [pool-4-thread-1] 2013-07-08 16:31:41,395 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367168 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367169 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,396 367170 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367171 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367172 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367173 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367174 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367175 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367176 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367177 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367178 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367179 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367180 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367181 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367182 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367183 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367184 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367185 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367186 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367187 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367188 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367189 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,405 367190 operations dropped so far.
ERROR [Thread-4] 2013-07-08 16:31:45,347 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
ERROR [pool-5-thread-1] 2013-07-08 16:31:47,793 Error connecting via JMX: java.io.IOException: Cannot run program "cat": java.io.IOException: error=11, Resource temporarily unavailable
ERROR [Thread-4] 2013-07-08 16:31:50,348 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
INFO [pool-5-thread-1] 2013-07-08 16:31:52,794 New JMX connection (127.0.0.1:7199)
ERROR [pool-5-thread-1] 2013-07-08 16:31:52,857 Error connecting via JMX: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused]
WARN [pool-3-thread-4] 2013-07-08 16:31:53,127 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367191 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367192 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367193 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367194 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367195 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 367196 operations dropped so far.
ERROR [Thread-4] 2013-07-08 16:31:55,350 Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<16.211.56.72:9160-3>
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367171 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,397 367172 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367173 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367174 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,398 367175 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367176 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 367177 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,399 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367178 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367179 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,400 367180 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367181 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,401 367182 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367183 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 367184 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,402 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367185 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367186 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,403 367187 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367188 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 367189 operations dropped so far.
WARN [pool-4-thread-1] 2013-07-08 16:31:41,404 Thrift operation queue is full, discarding thrift operation
WARN [pool-4-thread-1] 2013-07-08 16:31:41,405 367190 operations dropped so far.
ERROR [Thread-4] 2013-07-08 16:31:45,347 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
ERROR [pool-5-thread-1] 2013-07-08 16:31:47,793 Error connecting via JMX: java.io.IOException: Cannot run program "cat": java.io.IOException: error=11, Resource temporarily unavailable
ERROR [Thread-4] 2013-07-08 16:31:50,348 Error when proccessing thrift callme.prettyprint.hector.api.exceptions.HUnavailableException: : May not be enough replicas present to handle consistency level.
INFO [pool-5-thread-1] 2013-07-08 16:31:52,794 New JMX connection (127.0.0.1:7199)
ERROR [pool-5-thread-1] 2013-07-08 16:31:52,857 Error connecting via JMX: java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is:
java.net.ConnectException: Connection refused]
WARN [pool-3-thread-4] 2013-07-08 16:31:53,127 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367191 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 367192 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,128 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367193 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367194 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,129 367195 operations dropped so far.
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 Thrift operation queue is full, discarding thrift operation
WARN [pool-3-thread-4] 2013-07-08 16:31:53,130 367196 operations dropped so far.
ERROR [Thread-4] 2013-07-08 16:31:55,350 Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<16.211.56.72:9160-3>
org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147)
at org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156)
at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:98)
at me.prettyprint.cassandra.connection.client.HThriftClient.close(HThriftClient.java:26)
at me.prettyprint.cassandra.connection.HConnectionManager.closeClient(HConnectionManager.java:311)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:260)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at clj_hector.core$put.doInvoke(core.clj:164)
at clojure.lang.RestFn.invoke(RestFn.java:470)
at opsagent.cassandra$store_rollup.invoke(cassandra.clj:107)
at clojure.lang.AFn.applyToHelper(AFn.java:161)
at clojure.lang.AFn.applyTo(AFn.java:151)
at clojure.core$apply.invoke(core.clj:540)
at opsagent.cassandra$async_call$fn__582$fn__583.invoke(cassandra.clj:164)
at opsagent.cassandra$process_queue$fn__587.invoke(cassandra.clj:170)
at opsagent.cassandra$process_queue.invoke(cassandra.clj:169)
at opsagent.cassandra$setup_cassandra$fn__595.invoke(cassandra.clj:203)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
... 19 more
ERROR [Thread-4] 2013-07-08 16:31:55,351 MARK HOST AS DOWN TRIGGERED for host 16.211.56.72(16.211.56.72):9160
ERROR [Thread-4] 2013-07-08 16:31:55,351 Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160}; IsActive?: true; Active: 1; Blocked: 0; Idle: 0; NumBeforeExhausted: 0
INFO [Thread-4] 2013-07-08 16:31:55,351 Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160}
INFO [Thread-4] 2013-07-08 16:31:55,351 Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{16.211.56.72(16.211.56.72):9160}
INFO [Thread-4] 2013-07-08 16:31:55,352 Host detected as down was added to retry queue: 16.211.56.72(16.211.56.72):9160
WARN [Thread-4] 2013-07-08 16:31:55,392 Could not fullfill request on this host CassandraClient<16.211.56.72:9160-3>

Regards,
Alvin


Viewing all articles
Browse latest Browse all 387

Trending Articles