Common Traps for C extensions

aris · August 29, 2012, 8:08pm

Hi, I am writing a toy library in C++ and currently, I am writing a Ruby
extensions for it.

I have one C++ layer to catch all the C++ exceptions, convert them into
error codes wich casts the void Pointers it gets from C to the
appropriate classes etc. Then I have one C layer which handles the Ruby
datatypes and raises the correct exceptions.

But now I am thinking of possible traps I might run into. For example,
the Ruby interpreter forces me to differentiate between allocation and
initialization and in my current implementation, the User could redefine
the initialize() method of my class and then call another method, which
results in undefined behaviour or possibly segfault. I can easily solve
that problem (e.g. by setting some internal flag) but there are probably
a thousand other typical traps like that, where the dynamicity of Ruby
messes with my C memory management. Do you know any important others?

lykos · August 30, 2012, 9:18am

On Wed, Aug 29, 2012 at 8:08 PM, Bernhard B.
[email protected] wrote:

Hi, I am writing a toy library in C++ and currently, I am writing a Ruby
extensions for it.

I have one C++ layer to catch all the C++ exceptions, convert them into
error codes wich casts the void Pointers it gets from C to the
appropriate classes etc. Then I have one C layer which handles the Ruby
datatypes and raises the correct exceptions.

Wouldn’t it be simpler to implement just one layer of C++ functions
with extern “C” that do all the adjustments (i.e. catch C++ exceptions
and convert types)?

But now I am thinking of possible traps I might run into. For example,
the Ruby interpreter forces me to differentiate between allocation and
initialization and in my current implementation, the User could redefine
the initialize() method of my class and then call another method, which
results in undefined behaviour or possibly segfault.

I am not sure I understand the scenario. Are you talking about a user
redefining #initialize in Ruby land leading to improperly initialized
C / C++ data structures?

I can easily solve
that problem (e.g. by setting some internal flag) but there are probably
a thousand other typical traps like that, where the dynamicity of Ruby
messes with my C memory management. Do you know any important others?

I never did serious C extension coding so I can’t help you with
general guidelines. Storing something which verifies integrity of the
C++ data structures is certainly a good idea. If I think about it,
isn’t it sufficient to check whether a pointer to the C++ struct is
valid, i.e. not NULL? It certainly depends on how you design the
interface between Ruby and C / C++ world: you could completely rely on
C / C++ state or make use of Ruby instance variables from C / C++
which would probably make things more complicated.

Kind regards

robert

lykos · August 30, 2012, 11:23am

On Thu, Aug 30, 2012 at 9:34 AM, Bernhard B.
[email protected] wrote:

Hi, thanks for your answer. Somehow it didn’t compile as C++ code, if I
do the conversions,

Well, as long as we do not see the code and / or the error we can’t
really tell why it did not compile. I created a test this morning
which worked but threw it away. You do need to compile the catching
and conversion code with extern “C” with a C++ compiler though.

Exactly, that is what I am talking about, checking for non-null does not
suffice, because Ruby calls my alloc function first, so the pointer is
actually valid, but it points to an uninitialized object.

Yeah, but your allocation function could place one or more NULL
pointers in the structure which get filled later, couldn’t it?

Kind regards

robert

lykos · August 31, 2012, 12:41am

On 30/08/2012, at 7:34 PM, Bernhard B. wrote:

But now I am thinking of possible traps I might run into. For example,
the Ruby interpreter forces me to differentiate between allocation and
initialization and in my current implementation, the User could redefine
the initialize() method of my class and then call another method, which
results in undefined behaviour or possibly segfault.

Someone might sub-class and not call super.

If it’s possible to seg fault your extension then you are doing it
wrong. After calling the alloc function, your class might not be
‘valid’, but it must be ‘safe’. This is the reason Ruby added the alloc
hook.

Remember, when you are writing an extension you are effectively writing
Ruby and in Ruby implementing initialize is optional and nil (or Qnil)
is a valid value. You should bare this in mind when designing your
extension.

Henry

lykos · August 30, 2012, 9:34am

Robert K. wrote in post #1073869:

On Wed, Aug 29, 2012 at 8:08 PM, Bernhard B.
[email protected] wrote:

Hi, I am writing a toy library in C++ and currently, I am writing a Ruby
extensions for it.

I have one C++ layer to catch all the C++ exceptions, convert them into
error codes wich casts the void Pointers it gets from C to the
appropriate classes etc. Then I have one C layer which handles the Ruby
datatypes and raises the correct exceptions.

Wouldn’t it be simpler to implement just one layer of C++ functions
with extern “C” that do all the adjustments (i.e. catch C++ exceptions
and convert types)?

But now I am thinking of possible traps I might run into. For example,
the Ruby interpreter forces me to differentiate between allocation and
initialization and in my current implementation, the User could redefine
the initialize() method of my class and then call another method, which
results in undefined behaviour or possibly segfault.

I am not sure I understand the scenario. Are you talking about a user
redefining #initialize in Ruby land leading to improperly initialized
C / C++ data structures?

I can easily solve
that problem (e.g. by setting some internal flag) but there are probably
a thousand other typical traps like that, where the dynamicity of Ruby
messes with my C memory management. Do you know any important others?

I never did serious C extension coding so I can’t help you with
general guidelines. Storing something which verifies integrity of the
C++ data structures is certainly a good idea. If I think about it,
isn’t it sufficient to check whether a pointer to the C++ struct is
valid, i.e. not NULL? It certainly depends on how you design the
interface between Ruby and C / C++ world: you could completely rely on
C / C++ state or make use of Ruby instance variables from C / C++
which would probably make things more complicated.

Kind regards

robert

Hi, thanks for your answer. Somehow it didn’t compile as C++ code, if I
do the conversions, so I just thought it is not possible and divided it
into one part that is compiled as C++ with extern C functions and one
part that is compiled as C.

Exactly, that is what I am talking about, checking for non-null does not
suffice, because Ruby calls my alloc function first, so the pointer is
actually valid, but it points to an uninitialized object.

lykos · September 3, 2012, 1:52am

Robert K. wrote in post #1073887:

On Thu, Aug 30, 2012 at 9:34 AM, Bernhard B.
[email protected] wrote:

Hi, thanks for your answer. Somehow it didn’t compile as C++ code, if I
do the conversions,

Well, as long as we do not see the code and / or the error we can’t
really tell why it did not compile. I created a test this morning
which worked but threw it away. You do need to compile the catching
and conversion code with extern “C” with a C++ compiler though.

I defined a function with two VALUE args and a VALUE return type and I
wanted this to be a method with one argument (plus the self Argument)
but I always got this error:

error: invalid conversion from ‘VALUE ()(VALUE, VALUE) {aka long
unsigned int ()(long unsigned int, long unsigned int)}’ to ‘VALUE
()(…) {aka long unsigned int ()(…)}’ [-fpermissive]

Exactly, that is what I am talking about, checking for non-null does not
suffice, because Ruby calls my alloc function first, so the pointer is
actually valid, but it points to an uninitialized object.

Yeah, but your allocation function could place one or more NULL
pointers in the structure which get filled later, couldn’t it?

Yes, I solved it exactly this way.

Henry M. wrote in post #1073985:

On 30/08/2012, at 7:34 PM, Bernhard B. wrote:

But now I am thinking of possible traps I might run into. For example,
the Ruby interpreter forces me to differentiate between allocation and
initialization and in my current implementation, the User could redefine
the initialize() method of my class and then call another method, which
results in undefined behaviour or possibly segfault.

Someone might sub-class and not call super.

If it’s possible to seg fault your extension then you are doing it
wrong. After calling the alloc function, your class might not be
‘valid’, but it must be ‘safe’. This is the reason Ruby added the alloc
hook.

Yes, that is what I am trying to achieve, but it seems to be more
difficult than I thought in the first moment, but it is safe for any
scenario I can think of right now.

Cheers,
Bernhard

lykos · September 3, 2012, 9:40am

On Mon, Sep 3, 2012 at 1:52 AM, Bernhard B.
[email protected] wrote:

and conversion code with extern “C” with a C++ compiler though.

I defined a function with two VALUE args and a VALUE return type and I
wanted this to be a method with one argument (plus the self Argument)
but I always got this error:

error: invalid conversion from ‘VALUE ()(VALUE, VALUE) {aka long
unsigned int ()(long unsigned int, long unsigned int)}’ to ‘VALUE
()(…) {aka long unsigned int ()(…)}’ [-fpermissive]

As I said, as long as we do not see the code… I suggest you create
a http://sscce.org/ and post it.

Cheers

robert

lykos · September 4, 2012, 1:16pm

Robert K. wrote in post #1074390:

On Mon, Sep 3, 2012 at 1:52 AM, Bernhard B.
[email protected] wrote:

and conversion code with extern “C” with a C++ compiler though.

I defined a function with two VALUE args and a VALUE return type and I
wanted this to be a method with one argument (plus the self Argument)
but I always got this error:

error: invalid conversion from ‘VALUE ()(VALUE, VALUE) {aka long
unsigned int ()(long unsigned int, long unsigned int)}’ to ‘VALUE
()(…) {aka long unsigned int ()(…)}’ [-fpermissive]

As I said, as long as we do not see the code… I suggest you create
a http://sscce.org/ and post it.

Cheers

robert

Thanks anyway, but I rewrote everything and I am using Rice now, which
works much better except that it doesn’t automatically handle this
problem if the user redefines initialize and it is a little cumbersome
to do a workaround since Rice assumes it defines the initialize method
itself etc. Maybe I should write this to the Rice developers because I
guess nobody wants a Segfault if the user redefines initialize.

The smallest code to reproduce my previous looks about like this:

extern “C” {

VALUE encrypt(VALUE self, VALUE key) {
return qNil;
}

static VALUE Encrypter = qNil;

Init_RubyCrypto() {
Encrypter = rb_define_class(“Encrypter”, rb_cObject);
rb_define_method(Encrypter, “encrypt”, encrypt, 2);
}

}

The problem appears to be that rb_define_method takes a function pointer
with a variable number of arguments, but encrypt has 2 arguments and C
somehow implicitely casts this while C++ complains. But it doesn’t
matter anyway now since I use Rice now.

Thanks for the help