Inverse scanf: finding format specifers of existing fields

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

‘0.4577’ → ‘0.7728’

or

‘-2.345e-02’ → ’ 1.232e-03’

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

Thanks,

Bil K.

[1] Legacy formatted-Fortran data files.

On May 2, 2007, at 12:50 PM, Bil K. wrote:

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

‘0.4577’ -> ‘0.7728’

or

‘-2.345e-02’ -> ’ 1.232e-03’

Are there many different formats?

– fxn

On 02.05.2007 12:47, Bil K. wrote:

the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

If there is a fixed number of formats you can probably use a cascade of
RX matches. Otherwise it probably becomes a bit more complex like
matching sequences of digits and measuring their lengths.

md = %r{^(\d+).(\d+)?$}.match(‘0.4577’)
=> #MatchData:0x7ef61250

pa="%#{md[0].size}.#{md[2].size}f"
=> “%6.4f”

pa % 0.4577111
=> “0.4577”

HTH

robert

Hi –

On 5/2/07, Bil K. [email protected] wrote:

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

You could probably just do a gsub, like this:

require ‘scanf’

re = /-?\d+.\d+(e-\d+)?/

a = “‘0.4577’ → ‘0.7728’”
b = “‘-2.345e-02’ → ’ 1.232e-03’”

as = a.gsub(re, “%f”)
bs = a.gsub(re, “%f”)

p a.scanf(as)
p b.scanf(bs)

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

David

On May 2, 2007, at 2:50 PM, Bil K. wrote:

Xavier N. wrote:

Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three “types”:
integers, vanilla floats, and exponentials.

Then I think you could base the solution on String#index/regexps
depending on the existence of “e” and “.”, since we can assume
numbers are well-formed. The idea would be:

if none
%d
elsif “e”
%e
else
%f with computed widths
end

– fxn

On 5/2/07, Bil K. [email protected] wrote:

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

Bill,

How’s this for a start? I wrote it leaning towards clarity vs.
conciseness.

rick@frodo:/public/rubyscripts$ cat number_format.rb
class String
def to_number_format
m = match(%r{^([ ])([±]?)(.)$})
leading_blanks, sign, rest = m[1], m[2], m[3]
plus_flag = sign == ‘+’ ? sign : ‘’
case rest
when %r{^([\d].([\d]+)([eE])[±][\d]+)(.)$}
# exponentiated float
entirety, frac_part, e_or_E, exponent, suffix = $1, $2, $3, $4, $5
entirety = leading_blanks << entirety
“%#{entirety.length}.#{frac_part.length}#{e_or_E}#{suffix}”
when %r{^([\d]+.([\d]
))(.)$}
# simple float
entirety, frac_part, suffix = $1, $2, $3
zero = frac_part.match(/00$/) ? ‘0’ : ‘’
“%#{zero}#{entirety.length}.#{frac_part.length}f#{suffix}”
when %r{^(0[\d]+)([^e.]
)$}
# zero padded integer
digits, suffix = $1, $2
“#{leading_blanks}%#{plus_flag}0#{digits.length}d#{$suffix}”
when %r{^([\d]+)([^e.]*)$}
# whitespace padded integer
digits, suffix = $1, $2
digits = leading_blanks << digits
“%#{digits.length}d#{suffix}”
else
nil
end
end
end

x = ‘0.4577’
puts x
puts x.to_number_format
puts x.to_number_format % x.to_f
puts(x.to_number_format % 0.7728)
puts (x.to_number_format % x.to_f) == x
puts

x = ‘-2.345e-02’
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_f)
puts(x.to_number_format % 1.232e-03)
puts (x.to_number_format % x.to_f) == x
puts

x = ‘12345’
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_f) == x
puts

x = ’ 00012345’
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x
puts

x = ’ 12345’
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x

rick@frodo:/public/rubyscripts$ ruby number_format.rb
0.4577
%6.4f
0.4577
0.7728
true

-2.345e-02
%9.3e
-2.345e-02
1.232e-03
true

12345
%5d
12345
765
true

00012345
%08d
00012345
00000765
true

12345
%7d
12345
765
true


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Rick DeNatale wrote:

On 5/4/07, Bil K. [email protected] wrote:

  assert_equal( '%8.7f', '.0001170'.to_number_format )

Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> ‘%8.7f’ % 0.0001170
=> “0.0001170”

And I haven’t been able to find an sprintf format string which
supresses a leading zero on a float.

You’re correct; as you wrote, I wasn’t testing round-trip.

Thanks,

Bil K. wrote:

Puzzling the minus sign part now…

“%#{zero}#{sign.length+entirety.length}.#{frac_part.length}f#{suffix}”
^^^^^^^^^^^^
Later,

On 02.05.2007 15:08, Bil K. wrote:

md = %r{^(\d+).(\d+)?$}.match(‘0.4577’)
capacity of the existing format.
For floating point numbers you might even get away with a single regexp
if that is crafted appropriately and group values are evaluated
accordingly.

Kind regards

robert

David A. Black wrote:

Hi –

Hi.

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

The second output indicates that I failed to express
my predicament clearly, as the numbers are no longer
in exponential format?

A brief re-cast:

The original file has numbers of the form

5 0.4577 -2.345e-02

Something reads the numbers and spits out new numbers,
but in exactly the same format as the original file, e.g.,

8 0.7728 1.232e-03

I.e., I can’t write the last number out as 0.001232 –
it has to be in exponential format with the same field
lengths.

Regards,

Xavier N. wrote:

%f with computed widths

end

This, coupled with Robert’s computed field lengths
is beginning to look tractable…

Thanks,

On 5/5/07, Rick DeNatale [email protected] wrote:

On 5/4/07, Bil K. [email protected] wrote:

Rick DeNatale wrote:

How’s this for a start?

Excellent! Thanks.

By the way Bill, seeing who you seem to work for, I’d like to dedicate
whatever help I’ve given to you to the memory of Wally Schirra!

Are you a turtle?


Rick DeNatale

Visit the Project Mercury Wiki Site
http://www.mercuryspacecraft.com/

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Robert K. wrote:

If there is a fixed number of formats you can probably use a cascade of
RX matches.

Unfortunately not.

Otherwise it probably becomes a bit more complex like
matching sequences of digits and measuring their lengths.

md = %r{^(\d+).(\d+)?$}.match(‘0.4577’)
=> #MatchData:0x7ef61250

pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn’t thought of using MatchData groups, but as you say,
it may get ugly fast… I’m thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

Thanks,

Rick DeNatale wrote:

By the way Bill, seeing who you seem to work for, I’d like to dedicate
whatever help I’ve given to you to the memory of Wally Schirra!

You helped me learn more Ruby; always a pure joy. Thank you.

I’ve since decided that I’m going to require the users
specify the format instead of trying to back it out –
there are cases for which you just can’t back out the
correct format. Besides, the need is infrequent, and
I have no sympathy for code that employs formatted reads…

Are you a turtle?

You bet your sweet ass I am! :wink:

Regards,

Xavier N. wrote:

Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three “types”:
integers, vanilla floats, and exponentials.

Regards,

Rick DeNatale wrote:

How’s this for a start?

Excellent! Thanks.

All but my last test passed:

require ‘test/unit’
require ‘number_format’
class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( ‘%3.1f’, ‘8.3’.to_number_format )
assert_equal( ‘%05.3f’, ‘0.500’.to_number_format )
assert_equal( ‘%8.7f’, ‘.0001170’.to_number_format )
assert_equal( ‘%7.1f’, ‘14000.0’.to_number_format )
assert_equal( ‘%9.3E’, ‘4.480E+09’.to_number_format )
assert_equal( ‘%6.1e’, ‘3.2e-5’.to_number_format )
assert_equal( ‘%6.1f’, ‘-254.2’.to_number_format )
end
end

  1. Failure:
    test_some_floats(TestNumberFormat) [-:11]:
    <"%6.1f"> expected but was
    <"%5.1f">.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now…

Thanks again,

On 5/4/07, Bil K. [email protected] wrote:

class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( ‘%3.1f’, ‘8.3’.to_number_format )
assert_equal( ‘%05.3f’, ‘0.500’.to_number_format )
assert_equal( ‘%8.7f’, ‘.0001170’.to_number_format )

Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> ‘%8.7f’ % 0.0001170
=> “0.0001170”

And I haven’t been able to find an sprintf format string which
supresses a leading zero on a float.

<“%5.1f”>.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now…

I see that you figured this out.

Another thing to test is that the values actually round trip. Here’s my
test:

rick@frodo:/public/rubyscripts$ cat test_number_format.rb
require ‘test/unit’
require ‘number_format’
class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( ‘%3.1f’, ‘8.3’.to_number_format )
assert_nf(‘8.3’)
assert_equal( ‘%05.3f’, ‘0.500’.to_number_format )
assert_nf(‘0.500’)
assert_equal( ‘%8.7f’, ‘.0001170’.to_number_format )
assert_nf(‘.0001170’)
assert_equal( ‘%7.1f’, ‘14000.0’.to_number_format )
assert_nf(‘14000.0’)
assert_equal( ‘%9.3E’, ‘4.480E+09’.to_number_format )
assert_nf(‘4.480E+09’)
assert_equal( ‘%6.1e’, ‘3.2e-5’.to_number_format )
assert_nf(‘3.2e-5’)
assert_equal( ‘%6.1f’, ‘-254.2’.to_number_format )
assert_nf(‘-254.2’)
end

private
def assert_nf(str)
assert_equal(str, str.to_number_format % eval(str))
end
end


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/