I tried the CSV library .
uri = URI(url)
response = Net::HTTP.get(uri)
lines = CSV.parse(response, headers: true)
But the issue I am facing is , there is a space between the titles and the data .
There is a skip_blanks option: Class: CSV (Ruby 3.0.2) - does that help?
e.g. for file.csv:
a b c
1 2 3
4 5 6
Read using:
> table = File.open('file.csv') {|f| CSV.parse(f,headers:true, skip_blanks:true, col_sep:" ")}
=> #<CSV::Table mode:col_or_row row_count:3>
> table[0]
=> #<CSV::Row "a":"1" "b":"2" "c":"3">
> table[1]
=> #<CSV::Row "a":"4" "b":"5" "c":"6">
> table["b"]
=> ["2", "5"]
Assuming your file is named as p
, you can run this script that finds the MxT column, and prints the data as an array that MxT column has.
Do note that it will work even if you have big file, bigger than your system’s memory:
#!/usr/bin/env ruby
COLUMN = 'MxT'
data = IO.foreach('p')
# Find the index of column MxT
mxt_col = 0
data.next.tap(&:strip!).split { |x|
break if x == COLUMN
mxt_col += 1
}
# Get the data in columns MxT
loop { p data.next.tap(&:strip!).split[mxt_col] }
Here we are reading the file line-by-line. The purpose of the loop { }
is to stop once StopIteration
error is raised.
If you need more performance, just replace the whole loop with:
mxt_col_2 = mxt_col + 2
loop { p data.next.tap(&:strip!).split($;, mxt_col_2)[mxt_col] }
I am parsing from a URL not a file.
The last line is not working. It’s turning into a infinite loop.
If you are getting data from the internet, you can send a get request to the website with Net::HTTP.get().
I have no idea why the last line in the previous code will turn into an infinite loop, if you already have a file. There’s no way it can happen unless you are continuously writing to that file - in that case, the program will behave like tail -f p
without the wait that tail does.
For regular saved text files the data.next()
must raise a StopIteration
when the end of the file is reached, and the purpose of the loop { }
iterator is just to rescue that. If there’s a while loop, we had to break when StopIteration
is reached.
The modified code for getting the data from the internet will look like this:
#!/usr/bin/env ruby
COLUMN = 'MxT'
URL = 'https://www.example.net/csv?rows=1000'
require 'net/https'
data = begin
Net::HTTP.get(URI(URL)).split(?\n)
rescue SocketError, OpenSSL::SSL::SSLError
abort "Can't download data. Error:\n\t#{$!}"
end
# Find the index of column MxT
presence, mxt_col = false, 0
data[0].tap(&:strip!).split { |x|
if x == COLUMN
presence = true
break
end
mxt_col += 1
}
abort %(Can't find column "#{COLUMN}") unless presence
# Get the data in columns MxT
all_data = data.drop(1).map! { |x| x.tap(&:strip!).split[mxt_col] }
p all_data