Ruby Question : Parsing CSV data

John1234 · August 2, 2021, 6:13am

I want all the data under the Mxt column . Can anyone help ?

pcl · August 2, 2021, 9:19am

Probably, but what have you tried so far?

Ruby has a CSV library: Class: CSV (Ruby 3.0.2)

John1234 · August 2, 2021, 9:32am

I tried the CSV library .
uri = URI(url)

response = Net::HTTP.get(uri)
lines = CSV.parse(response, headers: true)
But the issue I am facing is , there is a space between the titles and the data .

pcl · August 3, 2021, 8:32am

There is a skip_blanks option: Class: CSV (Ruby 3.0.2) - does that help?

e.g. for file.csv:

Read using:

> table = File.open('file.csv') {|f| CSV.parse(f,headers:true, skip_blanks:true, col_sep:" ")}
 => #<CSV::Table mode:col_or_row row_count:3> 
> table[0]
 => #<CSV::Row "a":"1" "b":"2" "c":"3"> 
> table[1]
 => #<CSV::Row "a":"4" "b":"5" "c":"6"> 
> table["b"]
 => ["2", "5"]

SouravGoswami · August 3, 2021, 2:05pm

Assuming your file is named as p, you can run this script that finds the MxT column, and prints the data as an array that MxT column has.

Do note that it will work even if you have big file, bigger than your system’s memory:

#!/usr/bin/env ruby
COLUMN = 'MxT'

data = IO.foreach('p')

# Find the index of column MxT
mxt_col = 0
data.next.tap(&:strip!).split { |x|
	break if x == COLUMN
	mxt_col += 1
}

# Get the data in columns MxT
loop  { p data.next.tap(&:strip!).split[mxt_col] }

Here we are reading the file line-by-line. The purpose of the loop { } is to stop once StopIteration error is raised.

If you need more performance, just replace the whole loop with:

mxt_col_2 = mxt_col + 2
loop  { p data.next.tap(&:strip!).split($;, mxt_col_2)[mxt_col] }

John1234 · August 4, 2021, 3:26am

I am parsing from a URL not a file.
The last line is not working. It’s turning into a infinite loop.

SouravGoswami · August 4, 2021, 5:23am

If you are getting data from the internet, you can send a get request to the website with Net::HTTP.get().

I have no idea why the last line in the previous code will turn into an infinite loop, if you already have a file. There’s no way it can happen unless you are continuously writing to that file - in that case, the program will behave like tail -f p without the wait that tail does.

For regular saved text files the data.next() must raise a StopIteration when the end of the file is reached, and the purpose of the loop { } iterator is just to rescue that. If there’s a while loop, we had to break when StopIteration is reached.

The modified code for getting the data from the internet will look like this:

#!/usr/bin/env ruby
COLUMN = 'MxT'
URL = 'https://www.example.net/csv?rows=1000'

require 'net/https'

data = begin
	Net::HTTP.get(URI(URL)).split(?\n)
rescue SocketError, OpenSSL::SSL::SSLError
	abort "Can't download data. Error:\n\t#{$!}"
end

# Find the index of column MxT
presence, mxt_col = false, 0
data[0].tap(&:strip!).split { |x|
	if x == COLUMN
		presence = true
		break
	end

	mxt_col += 1
}

abort %(Can't find column "#{COLUMN}") unless presence

# Get the data in columns MxT
all_data = data.drop(1).map! { |x| x.tap(&:strip!).split[mxt_col] }
p all_data