Nud freezes when booting up

This has been spammed many times, but not too many times (only 454 KiB)

BDB2506 file walletB.dat has LSN 2008/4971684, past end of log at 92/7190103
BDB2507 Commonly caused by moving a database from one database environment
BDB2508 to another without clearing the database LSNs, or by removing all of
BDB2509 the log files from a database environment
BDB2516 DB_ENV->log_flush: LSN of 2008/4971684 past current end-of-log of 92/7203245
BDB2517 Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
BDB0061 PANIC: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
BDB3027 walletB.dat: unable to flush page: 18
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 0
BDB3027 blkindex.dat: unable to flush page: 0
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 773
BDB3027 blkindex.dat: unable to flush page: 773
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 2846
BDB3027 blkindex.dat: unable to flush page: 2846
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 3316
BDB3027 blkindex.dat: unable to flush page: 3316
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 4108
BDB3027 blkindex.dat: unable to flush page: 4108
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 5176
BDB3027 blkindex.dat: unable to flush page: 5176
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 5721
BDB3027 blkindex.dat: unable to flush page: 5721
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 6550
BDB3027 blkindex.dat: unable to flush page: 6550
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 6635
BDB3027 blkindex.dat: unable to flush page: 6635
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 8719
BDB3027 blkindex.dat: unable to flush page: 8719
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 9256
BDB3027 blkindex.dat: unable to flush page: 9256
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 17189
BDB3027 blkindex.dat: unable to flush page: 17189
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 17352
BDB3027 blkindex.dat: unable to flush page: 17352
BDB0060 PANIC: fatal region error detected; run recovery
BDB3015 blkindex.dat: write failed for page 17353
BDB3027 blkindex.dat: unable to flush page: 17353
BDB4519 txn_checkpoint: failed to flush the buffer cache: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery

By the way, are the following log lines normal?

2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 35a5ebbabc9ad1441248a31a23c64ec6a01b91eba4601f1080d64b5dbc9748bf
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (534 bytes)
2015-09-15 10:36:01 UTC received block 6259114e8c9fcf88ace02d8f9d22506a644cfb85b967ccc3af4be70d6e75a24d
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 6259114e8c9fcf88ace02d8f9d22506a644cfb85b967ccc3af4be70d6e75a24d
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: liquidity (137 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (537 bytes)
2015-09-15 10:36:01 UTC received block 67209539cbc613659ae6d22b5189aec1962ee26d1f43d6011e0dafdbd2619d9a
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 67209539cbc613659ae6d22b5189aec1962ee26d1f43d6011e0dafdbd2619d9a
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (536 bytes)
2015-09-15 10:36:01 UTC received block df33dacd7a975b30aee5996a4b66b28dbc1f35f3e5a8221f9b152034f901de7c
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block df33dacd7a975b30aee5996a4b66b28dbc1f35f3e5a8221f9b152034f901de7c
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (595 bytes)
2015-09-15 10:36:01 UTC received block b344065bbbd5fbecbfcdae163db622673b9ff2b1dd94816c2ad12405c9b1a21d
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block b344065bbbd5fbecbfcdae163db622673b9ff2b1dd94816c2ad12405c9b1a21d
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (534 bytes)
2015-09-15 10:36:01 UTC received block 8c4893bce24a421866cf1dd9a3b272e8d7fef660e03a907d87966bccb07353a5
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 8c4893bce24a421866cf1dd9a3b272e8d7fef660e03a907d87966bccb07353a5
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (535 bytes)
2015-09-15 10:36:01 UTC received block 35a5ebbabc9ad1441248a31a23c64ec6a01b91eba4601f1080d64b5dbc9748bf
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 35a5ebbabc9ad1441248a31a23c64ec6a01b91eba4601f1080d64b5dbc9748bf
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (534 bytes)
2015-09-15 10:36:01 UTC received block 6259114e8c9fcf88ace02d8f9d22506a644cfb85b967ccc3af4be70d6e75a24d
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 6259114e8c9fcf88ace02d8f9d22506a644cfb85b967ccc3af4be70d6e75a24d
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: liquidity (137 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (537 bytes)
2015-09-15 10:36:01 UTC received block 67209539cbc613659ae6d22b5189aec1962ee26d1f43d6011e0dafdbd2619d9a
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block 67209539cbc613659ae6d22b5189aec1962ee26d1f43d6011e0dafdbd2619d9a
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:01 UTC 2015-09-15 10:36:01 UTC received: block (536 bytes)
2015-09-15 10:36:01 UTC received block df33dacd7a975b30aee5996a4b66b28dbc1f35f3e5a8221f9b152034f901de7c
2015-09-15 10:36:01 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed
2015-09-15 10:36:01 UTC WARNING: ProcessBlock(): check proof-of-stake failed for block df33dacd7a975b30aee5996a4b66b28dbc1f35f3e5a8221f9b152034f901de7c
2015-09-15 10:36:02 UTC 2015-09-15 10:36:02 UTC sending: getblocks (965 bytes)
2015-09-15 10:36:02 UTC 2015-09-15 10:36:02 UTC received: block (595 bytes)
2015-09-15 10:36:02 UTC received block b344065bbbd5fbecbfcdae163db622673b9ff2b1dd94816c2ad12405c9b1a21d
2015-09-15 10:36:02 UTC ERROR: CheckProofOfStake() : INFO: read txPrev failed

Some errors and failures appear periodically in the debug.log file. I started to download everything from scratch again.

Yes I compiled it myself. Could it be that the external USB drive I use being an NTFS drive could cause any problems?

/dev/sda1: LABEL="TOURO" UUID="EC1EA3241EA2E734" TYPE="ntfs" PARTUUID="f7340f82-01"

When I bought it I forgot to format it with ext4 so turns out I’ve been using it as ntfs ever since :smiley: not sure how important it is to change it now.

Things in this log all happened in a second. It’s not receiving new blocks as a node normally does. You are showing a segment when the client is not yet sync’ed. You should see things like this

received block 4df9a64d998f484c25e6cf23885c399511f56c950766276cf3d1178b11f32267
SetBestChain: new best=4df9a64d998f484c25e6 height=527968 trust=765451674720
moneysupply(S)=877032065.7799 moneysupply(B)=4607984.5713
ProcessBlock: ACCEPTED
accepted liquidity info from B8BxkGL8kK6X9fBxbeEBHwaSTCLuvvHqAM
2015-09-15 11:03:40 UTC Flushing wallet.dat
Flushed wallet.dat 74ms

Yes I mostly see this stuff in the log that you pointed out. However, sometimes I see those failures.

That means you received a block but could not find the Proof of Stake input in your transaction database. It may happen, even repeatedly (because you received a chain of blocks but you still miss a large part of the chain) but it should solve itself later. If it doesn’t, it’s likely your database is missing important parts.

The other errors suggest your database is corrupted. You may have a problem with your storing device. I think NTFS should work, but it’s probably more risky than a well tested ext4. You may also want to check your system and kernel logs for errors (if the device or the filesystem failed it should have been reported there when it happened).

Ok thanks for the info. Let’s see how the downloading goes this time. I’ve been downloading the block chain from scratch for the past 36 hours and it hasn’t got stuck yet. Isn’t it possible to save the block chain in a way that it would not become corrupted so easily? For example, bad blocks could be downloaded again and replaced. The system should be more elastic so that it would overcome such problems. The bigger the block chain the more likely it will become that it becomes corrupted. I suspect that power failure may have caused it. Perhaps nud got killed in the middle of writing something and later it was unable understand that last block or whatever.

Ok I finished downloading the block chain and it seems that nud is attempting to mint now although I haven’t yet found any blocks. Here’s the screenshot of top now that the block chain has been downloaded.

edit:
any ideas if these errors are normal:

2015-09-17 10:45:41 UTC ThreadDumpAddress exiting
2015-09-17 10:47:49 UTC GetVoteFromDataFeed error: Data feed failed: SSL connect error
2015-09-17 10:47:49 UTC UpdateFromDataFeed failed: Data feed failed: SSL connect error
2015-09-17 10:47:50 UTC ThreadUpdateFromDataFeed exited

also I should point out that I am currently having a situation where nud and bcexchanged are both totally unresponsive. Sent kill -TERM <pid> to them and they just won’t die. This is important to get fixed since I suspect that killing the daemon might get the block chain corrupted.

That “72,6 wa” indicates that your CPU is spending lots of time waiting for IO. That IO is from swapping. Thrashing actually. The 6+ load shows that y our system is very stressed. I bet the debug.log shows wallet flushing times in several to several tens seconds, which normally should be tens of millisec…

I suggest get another pi for bced.

Why not trying “ramdrive-nud”?
I mean, it takes some MB of the RAM, but as flushing the wallets might cause a lot of the IO, it could still be better with less RAM left, but flushing wallets at RAM speed (will only take msecs!).

Compare

 tail ~/.nu/debug.log -n 1000 | grep Flushed

with the wallets on disk and the wallets on ramdrive to find out whether it helps to move the wallets to RAM.

I’m not going to get another pi because of that because I think a pi should handle multiple PoS daemons with its 1 GiB of RAM. So I expect some optimizations to be done. Before that happens I might just stop minting nushares, since they are cheaper. Alternatively I could make a script that would make sure one day nud gets to run and the other day bcexcanged gets to run. Ideally a good PoS coin should not give a penalty for that since I should accumulate coin days and get a higher chance of minting a block the day I run the daemon.

pi@pi-desktop:~$ tail /media/TOURO/nu/17-09-2015-data/debug.log -n 1000 | grep Flushed
2015-09-17 10:26:47 UTC Flushed wallet.dat 397452ms
2015-09-17 10:54:10 UTC Flushed wallet.dat 886614ms

is this too much?

Try to come to your own conclusion:

Remark: this wallet is empty, but it’s an order of magnitude less time than without wallets on ramdrive:

Maybe the next release of nud will do.

I think nu doesn’t use coinday when looking for kernels. it only uses coin amount. There is some security reason for it.

OMG. Your pi is miserable.

False alarm about having downloaded the block chain. Turned out that for some reason nud stopped downloading new blocks. It periodically downloaded voting information but no new blocks. I falsely assumed that it got finished. I restarted the system and it started to download again. The blockchain height is currently 483912 but according to blockexplorer.nu the height is 530732 so it will take a while.

If ramdrive-nud helps getting the flushing of the wallets in a better condition, you might try ramdrive-bcexchanged as well :wink:

Again it stopped downloading new blocks. This time I see the following failures being spammed in the debug.log file:

pi@pi-desktop:~$ tail -F /media/TOURO/nu/data/debug.log
2015-09-18 07:50:01 UTC connection timeout
2015-09-18 07:50:01 UTC ComputeNextStakeModifier: prev modifier=0x05f184d670d94edd time=2015-09-07 00:01:31 UTC epoch=1441584091
2015-09-18 07:50:01 UTC ComputeNextStakeModifier: no new interval keep current modifier: pindexPrev nHeight=516553 nTime=1441592413
2015-09-18 07:50:02 UTC trying connection 14.192.213.121:7890 lastseen=-559.4hrs
2015-09-18 07:50:02 UTC SetBestChain: new best=f7d533250bb1198dc2f0  height=516554  trust=752130327424  moneysupply(S)=876575780.4714 moneysupply(B)=4602036.0149
2015-09-18 07:50:02 UTC ProcessBlock: ACCEPTED
2015-09-18 07:50:02 UTC 2015-09-18 07:50:02 UTC received: block (633 bytes)
2015-09-18 07:50:02 UTC received block bdc96d88ef830f8a17f29798d18618366cc6f90d5ff21a233ebfa53f3b9c56a4
2015-09-18 07:50:02 UTC CheckStakeKernelHash() : using modifier 0x06e9d201b5234f61 at height=514192 timestamp=2015-09-05 12:01:50 UTC for block from height=505958 timestamp=2015-08-30 12:54:19 UTC
2015-09-18 07:50:02 UTC CheckStakeKernelHash() : check protocol=0.3 modifier=0x06e9d201b5234f61 nTimeBlockFrom=1440939259 nTxPrevOffset=160 nTimeTxPrev=1440939259 nPrevout=1 nTimeTx=1441592456 hashProof=000000354c3b1e83df5da00988a10883938e0b0ea2bdc8494aa18404384d5dde
2015-09-18 07:50:02 UTC ComputeNextStakeModifier: prev modifier=0x05f184d670d94edd time=2015-09-07 00:01:31 UTC epoch=1441584091
2015-09-18 07:50:02 UTC ComputeNextStakeModifier: no new interval keep current modifier: pindexPrev nHeight=516554 nTime=1441592446
2015-09-18 07:50:03 UTC SetBestChain: new best=bdc96d88ef830f8a17f2  height=516555  trust=752131547484  moneysupply(S)=876575820.4714 moneysupply(B)=4602036.0149
2015-09-18 07:50:03 UTC ProcessBlock: ACCEPTED
2015-09-18 07:50:03 UTC 2015-09-18 07:50:03 UTC received: block (668 bytes)
2015-09-18 07:50:03 UTC received block d739f753ab2715ff6e34615e4f3a263233bf73dc251b4a50418466b5d32c5057
2015-09-18 07:50:04 UTC CheckStakeKernelHash() : using modifier 0x3070c383ddce04de at height=0 timestamp=1970-01-01 00:00:00 UTC for block from height=466553 timestamp=2015-08-03 02:20:04 UTC
2015-09-18 07:50:04 UTC CheckStakeKernelHash() : check protocol=0.3 modifier=0x3070c383ddce04de nTimeBlockFrom=1438568404 nTxPrevOffset=422 nTimeTxPrev=1438567899 nPrevout=322 nTimeTx=1441592472 hashProof=00000134bf0a07882f9ed9b4714612a3337577f2775e4dae6a5f0b1217b2aaa2
2015-09-18 07:50:04 UTC ComputeNextStakeModifier: prev modifier=0x05f184d670d94edd time=2015-09-07 00:01:31 UTC epoch=1441584091
2015-09-18 07:50:04 UTC ComputeNextStakeModifier: no new interval keep current modifier: pindexPrev nHeight=516555 nTime=1441592456
2015-09-18 07:50:07 UTC connection timeout
2015-09-18 07:50:07 UTC trying connection 77.128.146.141:7890 lastseen=-8842.8hrs
2015-09-18 07:50:12 UTC connection timeout
2015-09-18 07:50:13 UTC trying connection 2.38.211.181:7890 lastseen=-254.3hrs
2015-09-18 07:50:15 UTC Updating from data feed https://raw.githubusercontent.com/cryptog/nu_data_feed/master/cryptog_nu_data_feed.json
2015-09-18 07:50:18 UTC connection timeout
2015-09-18 07:50:18 UTC trying connection 86.135.208.10:7890 lastseen=-2826.6hrs
2015-09-18 07:50:23 UTC connection timeout
2015-09-18 07:50:24 UTC trying connection 46.109.129.53:7890 lastseen=-42.8hrs
2015-09-18 07:50:29 UTC connection timeout
2015-09-18 07:50:29 UTC trying connection 95.188.131.121:7890 lastseen=-3762.9hrs
2015-09-18 07:50:34 UTC connection timeout
2015-09-18 07:50:35 UTC trying connection 85.212.85.147:7890 lastseen=-320.6hrs
2015-09-18 07:50:35 UTC connect() failed after select(): No route to host
2015-09-18 07:50:35 UTC trying connection 210.174.40.53:7890 lastseen=-5165.4hrs
2015-09-18 07:50:40 UTC connect() failed after select(): Connection refused
2015-09-18 07:50:40 UTC trying connection 193.138.219.233:7890 lastseen=-18.8hrs
2015-09-18 07:50:40 UTC connect() failed after select(): Connection refused
2015-09-18 07:50:41 UTC trying connection 217.71.44.124:7890 lastseen=-587.7hrs
2015-09-18 07:50:46 UTC connection timeout

Any ideas why the lastseen parameter is negative?

Finished downloading the block chain. Again, nud started to freeze at startup and bcexchanged also stopped minting blocks. I stopped the bcexchanged and this time only started nud. It started to work properly and I’ve been minting nushares for some 24 hours now. I think it’s now confirmed that the issue is RAM. Nud consuming more than 700 MiB cannot be ran in parallel with other block chains. Restarting it after every 4 hours did not help either. It froze at startup. I might make a script that would run nud and bcexchange in turns. So 50% of the time nud gets to run and 50% of time bcexchanged gets to run. I guess it is better than running only one of them.

If I understand it right every second the minting part of the client wakes up and loops through all utxos looking for kernels. If you let nud wakes up 0.5 seconds after the “round second” moment on the Pi’s clock, at which point bcexchanged starts its loop, then the two minters might take turns using the Pi’s resource within a second period, assuming they each till take less than 0.5 sec most of the time.
If my assumptions are right and you compile your code, the change in the source could be a trivial one-liner. @sigmike

Yes the code could be changed to run for a maximum amount of time every second as I said earlier.

The code starts here: https://bitbucket.org/JordanLeePeershares/nubit/src/6fe88883f316b1ff3448d52438e9d122f808da51/src/wallet.cpp?at=master&fileviewer=file-view-default#wallet.cpp-1533
For each output it searches a kernel for each nSearchInterval seconds (which is usually 1, unless there are seconds to catch up). You could easily get the current time at the beginning (with GetTimeMillis()) and return error("something") in the loop whenever the time elapsed since the start is above a threshold (it would not be precise, but probably good enough).

You may also want to force nSearchInterval to always be 1, as having to catch up probably happens rarely in normal situation and is aggravating the situation when there’s not enough resources.

1 Like

Thanks. I will first try measuring how long it takes to loop through when no kernel is found.