Hello. I launched a game yesterday on the app store and had 2 large AWS EC2 instances running community edition 1.2. After a few hours a rush of traffic started coming in and I kept seeing a lot of errors in my logs about timeouts to cassandra. I eventually found that problem in the Cassandra logs, but was not sure what to do about it because repairs didn't work and there wasn't anything obvious that I could find. I took one of the snapshots from about 30 minutes into the rush of traffic that was still good and started a new cluster with that snapshot. Everything seems to be ok now on that cluster, but because my traffic went down and not remotely close to what it was, I don't know if it is actually OK. Here are the constantly repeating messages that I get:
Aside from these corrupt sstable messages, I also got corrupt index messages a few days before I launched, but a repair seems to have fixed that. However, it's concerning because it was just me and maybe 10 friends testing it. Why would I get these corrupt index messages so soon with hardly any traffic?
I'd really appreciate any help here because this killed my first day and the game was climbing the store charts until this happened and I lost all the momentum.
Thanks for your time everyone
ERROR [ReadStage:2843] 2013-02-20 02:09:56,671 CassandraDaemon.java (line 133) Exception in thread Thread[ReadStage:2843,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:106)
at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:38)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:90)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:171)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:154)
at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:143)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:86)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:45)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:134)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1358)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1215)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1127)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:48)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:399)
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:377)
at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:380)
at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
at org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:371)
at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:84)
at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:73)
at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:102)
... 23 more