Cee J. wrote in post #995830:
7stud – wrote in post #995821:
I suggest that people never use irb because it has too many quirks.
The first thing you need to realize is that ‘>’ is
not the separator you want to look for. That is the second bit of
erroneous advice your mentor gave you. That’s because you don’t care
what character marks the beginning of every entry, rather you care what
character marks the end of every entry. The end of every entry in your
file is marked by the string “\n\n”, so you should use that as your
input line terminator. Remember, ruby uses “\n” for the input line
separator by default, which means that when you read a file using
IO#each, ruby reads lines–where the end of a line is marked by a
newline.
I understand the logic, it makes sense. What if the file looked like
this, where there is one newline seperating the entries? :
What if you had presented that possibility from the very beginning?
require ‘stringio’
str =<<ENDOFSTRING
gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG
CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTT
AATAGCGCGCCATCTGAGCAG
TTAGTCGCTGACGCATGCACG
ENDOFSTRING
input = StringIO.new(str)
buffer = ‘’
input.each do |line|
if line[0, 1] == ‘>’
if buffer != ‘’ #for first entry,
puts buffer #or do something else to buffer
puts ‘-’ * 20
end
buffer = ''
buffer << line
else
buffer << line.sub(/ \n+ \z /xms, ‘’)
end
end
puts buffer #for last entry,
#or do something else to buffer
–output:–
gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAGTTAGTCGCTGACGCATGCACG
If the entries in a file will all be separted by “\n\n” or they will all
be separated by “\n”, then you could also ask for some user input:
print "What’s the entry separator: "
sep = gets.chomp
Then:
input.each(sep) do |section|
…