another server outage!
@Cybnate perhaps the VPS you are using has issues?
It is not the VPS, the server is running including the Nu daemon. The problem is with the server.py which becomes non-responsive. It provides me with this error:
2015/10/12-20:26:33 ERROR: exception caught in main loop: dictionary changed size during iteration
And I have no clue what I can do about this error. Will continue to trial and error and tweak the config files to a situation before last week. Will keep my fingers crossed. My apologies but there is no support available to investigate the root cause.
Will restart in a few minutes…
there is no way of an automatic restart after error-crash?
Not for the server. The manual payout and logs are too complicated to automate a restart.
It is even more complicated as the server.py process is still working in a way. It will require an external trigger like feedback from the client.py from a different pc to determine that it is down. There might be a smart way to built something in the software itself when it triggers those exception errors it automatically restarts. Will need some help to code for that. Edit: it would still require manual payouts every time that happens.
Let’s see how we go in the next 48h. I’ve reverted everything back to weeks ago and excluded the cryptsy and bter wrappers being loaded in the first place. They have been there for 8 weeks or so just for testing purposes, so unlikely to resolve this, but it is worth a try. At least it helps narrowing down the problem.
I’m considering some generic compensation for those who have been active in the last 2 weeks on LiquidBits. This instead of spending a lot of time on calculating all the manual payouts. Will focus on trying to stabilise the situation first though.
Same error again, deleted more config and restarted. Not much left now. If it goes down again, I’m afraid I’m out of options. I can only restart again until someone is available to help (@woolly_sammoth).
Need some sleep now, will check later again…
The server is down again. It really appears we have a new problem and not a fluke. I don’t have the knowledge to fix this in the server software. For now my next steps are until recommended otherwise:
Step 1:
- Restart the full server (today) including NuD and the liquidity server software and update to the latest OS patches.
When problem continue in the next few days:
Step 2:
- Rebuild the server based on a backup from 8 weeks ago leaving out the recent liquidity provisioning features and a few other improvements. This is more work and will have to wait till the weekend.
When the problem stil continues:
Step 3:
- Default on the grant and end the services and take advice from the shareholders how to proceed.
In the mean time I have to advise people to assess their exposure/risks on the exchanges based on the current instability and my inability to restart the server several times a day, let alone do manual payouts. I’m basically already defaulting on the last grant and I have to think about how to compensate any liquidity provision during this period lacking the proof of liquidity provisioning during the outages.
these problems started when you test extra configuration for southX?
lets hope that step1 or step 2 will solve the issue
True, but that code hasn’t only been made obsolete in the config.py, but also physically removed from the exchanges.py. It might just have been a coincidence…
Edit 7.50 UTC: server updated and restarted as per step 1, submitting liquidity
Are we sure that the problem is ccedk-specific? Is the server-side software the same as those running on other pools?
take it easy.
Haven’t seen reports from the other pools, so it seems to be server specific. We haven’t been trying running CCEDK exchange on other pools as far as I’m aware. Would be a bit of trial and error anyway. Rather have some debugging to find the root cause of this. It might be the custom made bots which make the server software not behave or some changes on CCEDK we are not aware off. Anyway it shouldn’t crash the software as it does.
Not being a coder, I’m staying away from changing code myself. Just updating the config files and following the updates on the nu-pool github. If there is some code difference we will learn that in step 2, but unless I made a stupid mistake syncing code in Github, it is unlikely to be the case. At the same time Woolly-sammoth will help me with some code to debug to identify what is causing this.
Right I think the best thing to do is outputing a lot of debugging messages to get a clue exactly where the server crashes, then log the context of that part in the next run.
Agree. I’ve already provided some detailed logging, but it still doesn’t give away what is happening. The event is also very random in time and there are two types of exceptions causing the crash. The cause might be the same though. If you are interested in more detail I just shared the latest log here: https://gitter.im/inuitwallet/nu-pool. Any thoughts are welcome.
That error seems to say to me that your config dictionary changed during iteration. I’d put a print statement for maybe the size or type of config before line 800 in server.py. Do you know what steps to take to reproduce the errors, or is it just run it and it errors eventually? Is it at 24 hour intervals or anything?
It is very random, not reproducible at this stage. I’ve done a lot of log analyses to find a trend, but with only 5 examples and seemingly random occurrences it is hard to identify trends.
How would you feel about a proposal for running NuBot at CCEDK to provide liquidity at the NBT/BTC pair?
Would you consider that supportive (for Nu) or a slap in the face (for your ALP operation)?
I could imagine running NuBot at the CCEDK NBT/BTC pair with the funds I used for the ALP bot so far.
Although I’m a proponent of supporting the NBT/USD pair, I’d rather not support that pair in my operation-to-be, because I might end up with USD and could not balance the liquidity or withdraw USD.
I can try setting up an ALP for the NBT/BTC pair on ccedk and see if it errors.
I don’t have much funds left at CCEDK, because I withdrew most of them when I stopped my ALP bot.
I’m willing to support the troubleshooting with the remaining funds, though.
i can help providing some liquidity
just tell me when.
I won’t be able to setup for another day at least. I have to remember how to open a new port and stuff (shouldn’t be too bad, sam gave us a great resource for server setup a while ago).