Awk regexp search

baptiste_AuguiSSSS · July 1, 2007, 3:14pm

Hi,

The last bit of a bash program is still resisting me. Here is the
code I used before:

awk ‘/scattering efficiency/{print $4}’ …/OUTPUTFILES/Output.dat

How would you do that in Ruby? I just need to locate this regexp in
the file, and get the following value in the same line. I’ve tried
something like,

output_file=IO.readlines(’…/OUTPUTFILES/Output.dat’).to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print
elt.to_s, ‘\t’}

but it clearly doesn’t work.

Best regards,

baptiste

baptiste_AuguiSSSS · July 1, 2007, 3:22pm

On Sun, Jul 01, 2007 at 10:13:16PM +0900, baptiste Augui? wrote:

output_file=IO.readlines(’…/OUTPUTFILES/Output.dat’).to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print
elt.to_s, ‘\t’}

but it clearly doesn’t work.

First off, don’t be too quick to abandon awk. I took a dozen lines of
Ruby
someone wrote that did almost what I wanted and ported/replaced it with
three lines of sh/awk. For what awk can do, it is excellent.

To set up the equivalent of the awk code above, you want something like
this (note: untested):

ARGF.each { |line|
case line
when /scattering efficiency/
puts line.split(/\s+/)[3] #note 3 instead of 4
end
}

Note that this basic framework does not support awk’s /regex1/,/regex2/
notation that captures lines between (and including) the lines matching
those regular expressions.

Best regards,
baptiste
–Greg

baptiste_AuguiSSSS · July 1, 2007, 3:26pm

Hi –

On Sun, 1 Jul 2007, baptiste Auguié wrote:

output_file=IO.readlines(’…/OUTPUTFILES/Output.dat’).to_s
myarray = output_file.each_line(/scattering efficiency/){|elt| print
elt.to_s, ‘\t’}

but it clearly doesn’t work.

Here’s a little test/demo (using stdin) – I’ve added => to the output
lines:

$ ruby -ne ‘puts $1 if /scattering efficiency\s+(\S+)/’
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

David

baptiste_AuguiSSSS · July 1, 2007, 4:52pm

On 7/1/07, [email protected] [email protected] wrote:

How would you do that in Ruby? I just need to locate this regexp in the file,
and get the following value in the same line. I’ve tried something like,

$ ruby -ne ‘puts $1 if /scattering efficiency\s+(\S+)/’
scattering efficiency blah
=> blah
nothing
this has scattering efficiency just like the other one
=> just

The problem is that we have no idea where “scattering efficency” is
relatively to $4
However

ruby -ane ‘puts $F[3] if /scattering efficency/’ …/ton/beau/fichier

does the same as the awk script above

Side Remark:
domage que l’on ne puisse utiliser mes options préfèrées: -anpe

Robert

baptiste_AuguiSSSS · July 1, 2007, 5:28pm

Thanks everybody,

this piece of code works fine for me,

output_file.each { |line|
case line
when /scattering efficiency/
qsca << line.split(/\s+/)[4]
end
}

although I realize now that “output_file” may contain a duplicate of
the line I want to extract. How can I specify to take only the last
occurrence?

The file to parse looks something like that,

  extinction efficiency =    4.9374E-06

…

On 1 Jul 2007, at 15:51, Robert D. wrote:

scattering efficiency blah

does the same as the awk script above

this piece of code needs to be part of a script, not a one line call -

how would that work in a Ruby script?

Side Remark:
domage que l’on ne puisse utiliser mes options préfèrées: -anpe

; )

Robert

Thanks,

baptiste

baptiste_AuguiSSSS · July 1, 2007, 8:09pm

On 7/1/07, baptiste Auguié [email protected] wrote:

ruby -ane ‘puts $F[3] if /scattering efficency/’ …/ton/beau/fichier

does the same as the awk script above

this piece of code needs to be part of a script, not a one line call -

how would that work in a Ruby script?
I am not sure I understand, this command can be put into a bash script
as the awk example above, if however you want to replace the bash
script by a Ruby script it will be necessary to read the File
explicitly

puts File.readlines(“/le/beau/fichier”/).grep(/scattering efficency/).
map{|line| line.split[3]}

or only the first

puts File.readlines(“/le/beau/fichier”/).grep(…).first.split[3]

or if performance is an issue and you do not want to read further lines

File.each(“/le/grand/fichier”) do
|line|
next unless /…/ === line
puts line.split[3]
break
end

Cheers
Robert

baptiste_AuguiSSSS · July 1, 2007, 9:55pm

Hi –

On Sun, 1 Jul 2007, Robert D. wrote:

=> blah
nothing
this has scattering efficiency just like the other one
=> just

The problem is that we have no idea where “scattering efficency” is
relatively to $4

He said “the following value”, so I assumed it would be a \S+ match
after a \s+ match.

However

ruby -ane ‘puts $F[3] if /scattering efficency/’ …/ton/beau/fichier

does the same as the awk script above

Wow – I really must brush up on ‘man ruby’

David

baptiste_AuguiSSSS · July 12, 2007, 1:24pm

Few mistakes

instead of

      @file = File.open(filename, "r").close;;

should be
@file = File.open(filename, “r”);
and

    ARGF.each { |@line|

replace with

    @file.each { |@line|

baptiste_AuguiSSSS · July 1, 2007, 8:31pm

On 1 Jul 2007, at 19:09, Robert D. wrote:

or only the first

puts File.readlines("/le/beau/fichier"/).grep(…).first.split[3]

Perfect! exactly what I wanted (with last instead of first). I just
didn’t know about this grep command in Ruby, basically.

Thanks,

baptiste

baptiste_AuguiSSSS · July 12, 2007, 12:40pm

Note that this basic framework does not support awk’s /regex1/,/regex2/
notation that captures lines between (and including) the lines matching
those regular expressions.

Best regards,
baptiste
–Greg

how about something like this simple implementation

AWK in Ruby

module AWK
class ClassAwk
def initialize(filename = “”)
@NR = 0; # record number
@NF = 0; # field number
@FS = /\s+/; # field separator
@line = “”; # line matched
@fields = []; # fields of macthed line
@trace = []; # regexp trace

    if (filename == "")
      @file = ARGF;
    else
      @file = File.open(filename, "r").close;;
    end
  end

  #NOTE: every rule has to be in separate line
  #to get unique rule id
  def rule(regexp1, regexp2 = nil)
    msg = "regexp parameter must be Regexp";
    raise ArgumentError, msg unless regexp1.kind_of?(Regexp);

    if regexp2 == nil
      if @line =~ regexp1
        @fields = @line.split(@FS)
        yield
      end
    else
      raise ArgumentError, msg unless regexp2.kind_of?(Regexp);
      rule_id = /.+:([0-9]+)/.match(caller.first.to_s)[1].to_i;

      @trace[rule_id] = true if @line =~ regexp1;
      if @trace[rule_id]
        @fields = @line.split(@FS)
        yield
      end
      @trace[rule_id] = false if @line =~ regexp2;
    end
  end

  def analyze()
    @NR = 0;
    ARGF.each { |@line|
      @line = @line.chop;
      @NR += 1;
      yield
    }
  end

  #get paricular field
  def getField(index)
    output = "";

    if (index == 0)
      output = @line;
    else
      if index - 1 < @fields.length
        return @fields[index - 1];
      end
    end
  end

  #get NR (record number)
  def getNR
    return @NR;
  end

  #get number of fileds
  def getNF
    return @fields.length;
  end

end
end

and an example how to use it:

require “awk.rb”

awk = AWK::ClassAwk.new();

awk.analyze() {
awk.rule(/start1/, /stop1/) {
print “1, NR:”, awk.getNR(),", ";
print "NF: “, awk.getNF(),”, “, awk.getField(0), “\n”;
};
awk.rule(/start2/, /stop2/) {
print “2, NR:”, awk.getNR(),”, ";
print "NF: “, awk.getNF(),”, ", awk.getField(0), “\n”;
};
awk.rule(/start1/) {
print awk.getField(0);
};
}

baptiste_AuguiSSSS · July 16, 2007, 5:11pm

small update to make it easier

AWK implementation

module AWK
class ClassAwk
def initialize(filename = “”)
@NR = 0; # record number
@NF = 0; # field number
@FS = /\s+/; # field separator
@f = []; # fields of macthed line, f[0] - line
@trace = []; # regexp trace

    #input file
    @file = filename == "" ? ARGF: File.open(filename, "r");
  end

  #NOTE: every rule has to be in separate line
  #to get unique rule id
  def rule(regexp1, regexp2 = nil)
    msg = "regexp parameter must be Regexp";
    raise ArgumentError, msg unless regexp1.kind_of?(Regexp);

    if regexp2 == nil
      yield if @f[0] =~ regexp1;
    else
      raise ArgumentError, msg unless regexp2.kind_of?(Regexp);
      rule_id = /.+:([0-9]+)/.match(caller.first.to_s)[1].to_i;

      @trace[rule_id] = true if @f[0] =~ regexp1;
      yield if @trace[rule_id]
      @trace[rule_id] = false if @f[0] =~ regexp2;
    end
  end

  def analyze()
    @NR = 0;
    @file.each { |line|
      @NR += 1;
      @f = line.split(@FS)
      @NF = @f.length
      @f.unshift(line.chop);
      yield
    }
  end

  attr_reader :NR, :NF, :f;

end
end

example:

require “awk.rb”

awk = AWK::ClassAwk.new();

awk.analyze() {
awk.rule(/start1/, /stop1/) {
print “1, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n”;
};
awk.rule(/start2/, /stop2/) {
print “2, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n”;
};
awk.rule(/start1/) {
print “3, NR:#{awk.NR}, NF:#{awk.NF}, #{awk.f[0]}\n”;
};
}