Jump to content

Random server crashes


henshall_protects

Recommended Posts

Our server is crashing randomly. It always has since setting it up recently. Can be anything from once in 3 days, to a few times per day.

 

- Random amount of players, it's been as high as 30 and as low as 4 at the time of death

- Resource monitor on our dedi shows that nothing is being overloaded.

- 8gb of RAM currently allocated for use, slots now set to 20

- Xeon E3 3.1ghz processor, 1TB HDD, 16gb RAM dedi.

- No other demands on the dedi box at the moment.

- No errors in player or admin logs, nor anywhere else

- Message in command box when Zomboid server dies.. "connection lost", repeatedly

 

Anyone got any ideas? Upon having a Minecraft server a few times before, I feel the need to shake an angry stick at Java. 

 

Thanks very much.

Link to comment
Share on other sites

Hello,

Sorry to jump in in this thread but I'm experiencing the same issue on my server since the build 29 uipdate.

Symptoms: Server randomly (afaik) stop responding to clients and shows "connection lost" for every clients. Clients are not disconnected per say, but obviously nothing works in-game. When anyone tries to reconnect, nothing shows up in the server logs.

 

We are running the server since build 27 and it is the first time we are seeing this kind of behavior/issue.

 

Specs:

OS: Ubuntu 14.04 LTS (x64)

Ram: 12Gig (10 dedicated to java)

Java version: 1.7.0_72

Server usage: Nothing else runs on this server

 

Troubleshoot:

-We tried updating the java version to Java8, but the issue still happens.

-We re-installed the dedicated server software from scratch (from steamcmd), but no success.

-We have implemented a lot of monitoring to help us pinpoint the problem and based on the graphs  we have, there is nothing unusual on the server when it happens (Cpu load minimal, Ram usage ok, Network interface latency ok - avg 15ms, bandwidth usage ok).

-We checked for patterns or something that would look like a trigger for this issue, but nothing seems suspicious on the server.

 

Let me know if you need some logs or if I can provide anything else that could help with this matter.

Regards.

Link to comment
Share on other sites

Might help to have a copy of the console log. Can you PM it to myself, RobertJohnson, or EasyPickins?

If you mean the console window, sure, I will copy it into a pastebin the next time it happens. Out of coincidence, I'm usually not playing when the crash happens, but if I happen to be when it dies, I'll send my console.txt too. Thanks.

Hello,

Sorry to jump in in this thread but I'm experiencing the same issue on my server since the build 29 uipdate.

Symptoms: Server randomly (afaik) stop responding to clients and shows "connection lost" for every clients. Clients are not disconnected per say, but obviously nothing works in-game. When anyone tries to reconnect, nothing shows up in the server logs.

 

We are running the server since build 27 and it is the first time we are seeing this kind of behavior/issue.

 

Specs:

OS: Ubuntu 14.04 LTS (x64)

Ram: 12Gig (10 dedicated to java)

Java version: 1.7.0_72

Server usage: Nothing else runs on this server

 

Troubleshoot:

-We tried updating the java version to Java8, but the issue still happens.

-We re-installed the dedicated server software from scratch (from steamcmd), but no success.

-We have implemented a lot of monitoring to help us pinpoint the problem and based on the graphs  we have, there is nothing unusual on the server when it happens (Cpu load minimal, Ram usage ok, Network interface latency ok - avg 15ms, bandwidth usage ok).

-We checked for patterns or something that would look like a trigger for this issue, but nothing seems suspicious on the server.

 

Let me know if you need some logs or if I can provide anything else that could help with this matter.

Regards.

Yes indeed, sounds like the same thing. The players are still on the server, they can open doors and "use" some items but it's all pointless until the server gets restarted.

Link to comment
Share on other sites

I think I've run into this a couple of times on servers (as a player.)

The symptoms look to me like the network thread is working fine (for example when a server is in this state if you try to connect to it the client gets as far as 'Connection Request Accepted' and then there is no further output.) Server responds properly to raknet pings.

So it seems maybe the main server loop is stuck processing something/in an infinite loop/blocking and never gets around to handling any further traffic that the networking thread hands it.

 

It'd be interesting to see the server cpu usage when it's in this state (per core! if it's an infinite loop I think it would be maxing out one core which wouldn't necessarily show as a heavy load overall.)

Timestamps for those connection lost messages might be interesting as well, I think those are basically when players close their clients because exiting a 'hanging' server doesn't seem to really work.

Do commands still work on the server console? I suspect not but that might be useful information.

 

Maybe the same issue as http://theindiestone.com/forums/index.php/tracker/issue-935-server-connection-lost-100-cpu/ 

 

Edit: if you have the JDK installed you could maybe use jstack or jmap to help diagnose it?

Link to comment
Share on other sites

Hello Brybry,

The symptoms are exactly the same as the ones I see in the thread you are referring to (connection lost - 100%CPU). Since I'm using a VPS, I'm not sure I can get the CPU "per core" data. I'll dig further to be sure tho. And just to be sure, we are talking about server CPU behavior and not client side right?

For the time stamps, it is a bit complicated since the console itself doesn't show timestamps. If there is a way to make the console more verbose, I'm all hears :).

Server side commands stop working when it happens, such as "players" or "save".

I  do have the JDK installed. I'll check how to diagnose with jstack.

 

I'll try to come back with more information such as jstack diagnosis and CPU usage per core (if I can). Let me know if there is something else that would be useful.

Cheers!

Link to comment
Share on other sites

I've finaly been able to gather some trace live when the problem occured.

I have performed a Jstack trace and Jmap trace as soon as I detected the issue in the console.

I also have graphs for Outgoing Bandwidth, CPU usage, Ram uage and latency for this period.

Based on the CPU graph, I can definitely see a steady increase in CPU load when it happens.

 

PS: I've attached the traces and CPU usage info in this post.

Cheers!

JMap Results .txt

JStack Results .txt

post-18787-0-04788600-1414211121_thumb.p

Link to comment
Share on other sites

Hello,

Small question here :)

I performed an update check today on my server and I think I've downloaded a very tiny one. Might it be a hotfix for this issue?

Meanwhile I'll disable the auto-restart script and try to see by myself. But if you can confirm it would be even better.

 

EDIT: Never mind this question. I witness the issue again. I'll wait for the next update :)

Cheers!

Link to comment
Share on other sites

  • 3 weeks later...

Just for reference and testing with the fixed version to make sure everything is good when it comes out I was able to work out reproduction steps:

 

- Make sure HoursForZombiesRespawn is set higher than 0 (I think this is necessary but not sure, I didn't test it extensively.)

- Go to coordinates (10636,10612,0). Do a 360 on the player to view all of the tiles.

- Spawn some zombies and aggro them to make sure they see tiles in the area (I think this is necessary but not sure, I didn't test it extensively.)

 

For a list of other coordinates that should also cause the crash you can try (very quick ugly regex, I'm guilty :cry:) awk is gawk:

grep -inPr "x.*\=.*(\d+),.*y.*\=.*(\d+)" media/lua/server/metazones | awk "{ match($0,/x = ([0-9]+),.*y = ([0-9]+),.*height = ([0-9]+)/,a); if (a[1]+0 < a[2]+0+a[3]) { print a[1]; print a[2]; print;  fflush();} }"

As a side note, there seem to be some buildings/area that don't have metazones. I wonder how hard it would be to make something to do a border overlay of all of the metazones on blindcoder's map for debugging purposes, especially for loot respawn testing.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...