2014년 12월 14일 일요일

is the debug node binary supposed to work?

I'm trying to track down a periodic SIGSEGV in the node process running my app. (Such things aren't supposed to happen in a type-safe language like Javascript and much of the node community is probably not used to wrestling with such things; the cause must be either a bug in a native-code extension I'm using or a bug in V8's code generation.) I thought running the debug build of node itself might shed some additional light on the problem.

I built node from source (0.10.33) after doing `./configure --debug` and grabbed the resulting out/Debug/node binary [1], and when I run my app with that, it crashes pretty soon with a V8 runtime fatal error message:

#
# Fatal error in ../deps/v8/src/hydrogen-instructions.cc, line 641
# CHECK(cur == other_operand) failed
#
It also spits out a backtrace of the Javascript code it was running (well, trying to translate) at the time, and it doesn't always crash at the same point in my program, but it does always crash with this same symptom at hydrogen-instructions.cc:641.

I went looking for more standard workloads to run to see if the problem was with my codebase (or, again, the native-code node extensions it uses, bugs in which could mean all bets are off), and I chanced across a minimal repro case in npm itself: with my debug node binary, `npm install archiver connect` crashes the same way. (See here for more details.)

So I wonder, does anyone else have experience running the debug node binary and is it known to work better than this? Does the condition that's failing here mean anything to you? Can you repro this, or did I perhaps do something wrong when building node?

thanks,

Matt

[1]: One thing I did learn along the way while playing with building node from source is to be careful with the build configuration, which apparently gets stamped into the node binary and autodetected by node-gyp, which is a problem if you want to build native-code extensions. Normally you do `./configure` without the --debug flag, you build for release, you grab out/Release/node, and node-gyp can tell it's release. If you do `./configure --debug` then `make`, you get both out/Debug/node and out/Release/node, and their own runtime behavior is as you'd expect from the names, but both of them got stamped with default_configuration: debug, so node-gyp thinks they're both debug builds and builds all your native code extensions for debug, which breaks most of the packages with native code you'll find in the npm repository (because package.json or index.js looks for something with Release in the name). And if you do `./configure` without --debug and then do `make node_g`, you get the opposite problem: it builds out/Debug/node but with default_configuration: release stamped into it, and node-gyp will build all the extensions for release. The release extension binaries seem to work fine with the debug node binary and vice versa, as long as whatever loads them can actually find them; I don't know if there are any further problems from mixing and matching but I'm trying to eliminate variables I don't understand. Forrest helped me on Twitter to figure this out (thanks again Forrest).



I tried to use the debug node binary from 0.10.33 last week and ran into exactly the same problem as you. I get the same hydrogen-instructions.cc:641 failure message when attempting to run anything with node_g (which is a symlink to out/Debug/node). I don't have any solutions either, but I wanted to chime and and say it's not just you.



I went looking for more standard workloads to run to see if the problem was with my codebase (or, again, the native-code node extensions it uses, bugs in which could mean all bets are off), and I chanced across a minimal repro case in npm itself: with my debug node binary, `npm install archiver connect` crashes the same way. (See here for more details.)

So I wonder, does anyone else have experience running the debug node binary and is it known to work better than this? Does the condition that's failing here mean anything to you? Can you repro this, or did I perhaps do something wrong when building node?

I could not reproduce this issue. How did you get the source for node (GitHub clone and if so which branch/tag did you checkout, source tarball, other)? Also, which OS did you use to run node?




I could not reproduce this issue. How did you get the source for node (GitHub clone and if so which branch/tag did you checkout, source tarball, other)? Also, which OS did you use to run node?


I cloned the github repo, checked out the v0.10.33-release branch, did `./configure --debug --without-snapshot` and `make`.

I'm running on 64-bit Ubuntu 12.04.5.

I just tried my repro steps and I can repro this but it apparently also depends on the contents of the package.json in the current directory -- if I run the same "npm install archiver connect" command from the original directory it fails as described; if I run it from an empty directory (or rm ./package.json) it completes successfully. I have to run somewhere right now but I will boil this down to a minimal repro case that's actually self-contained and get back to you later tonight.

I also tried rebuilding after reconfigure with `./configure --debug` (no --without-snapshot) and a clean build, to see if the --without-snapshot flag makes a difference (my experience is I need that flag to be able to use the heap profiler which I like to be able to use, and I assume that the official nodejs.org binaries are built that way). It doesn't seem to make a difference.



I could not reproduce this issue. How did you get the source for node (GitHub clone and if so which branch/tag did you checkout, source tarball, other)? Also, which OS did you use to run node?

I cloned the github repo, checked out the v0.10.33-release branch, did `./configure --debug --without-snapshot` and `make`.

I'm running on 64-bit Ubuntu 12.04.5.

I just tried my repro steps and I can repro this but it apparently also depends on the contents of the package.json in the current directory -- if I run the same "npm install archiver connect" command from the original directory it fails as described; if I run it from an empty directory (or rm ./package.json) it completes successfully. I have to run somewhere right now but I will boil this down to a minimal repro case that's actually self-contained and get back to you later tonight.

OK, the extra step to repro is have a package.json with the following contents

{
  "dependencies": {
    "archiver": "0.13",
    "connect": "~2.15.0"
  }
}

And then run "npm install archiver connect" when the node binary on your path is built for debug.

This repro is at least somewhat dependent on the precise versions specified in package.json there; it doesn't crash if I remove the tilde or change the version of connect to something more current. To be clear, I don't think the problem I'm asking about has anything to do with npm or connect or the precise version of connect; this is just a repro case I stumbled onto by accident that deterministically follows some code path that manages to repro the bug I'm asking about.


I'll be very interested to learn whether the underlying failure in the hydrogen-instructions verify stuff is something totally superficial or actually represents a real problem. Thanks for looking into this.



OK, the extra step to repro is have a package.json with the following contents

{
  "dependencies": {
    "archiver": "0.13",
    "connect": "~2.15.0"
  }
}

And then run "npm install archiver connect" when the node binary on your path is built for debug.


I could not reproduce the issue on Ubuntu 12.04.4 64 bits. I would suggest creating an issue with as much details as possible in joyent/node here: https://github.com/joyent/node/issues.
It might get more attention and we might be able to find what causes this. Also, John, would you be able to provide us with more details about your setup (OS, etc.)? That could help us find a pattern.
 
This repro is at least somewhat dependent on the precise versions specified in package.json there; it doesn't crash if I remove the tilde or change the version of connect to something more current. To be clear, I don't think the problem I'm asking about has anything to do with npm or connect or the precise version of connect; this is just a repro case I stumbled onto by accident that deterministically follows some code path that manages to repro the bug I'm asking about.


Yes, that's very clear, thank you!
 
I'll be very interested to learn whether the underlying failure in the hydrogen-instructions verify stuff is something totally superficial or actually represents a real problem. Thanks for looking into this.

I'm not familiar with V8's internals, but it seems like it's an error triggered when V8's optimizing compiler doesn't generate code properly, which to me sounds like a genuine problem.


댓글 없음:

댓글 쓰기