I have 2 list in 2 separate files
I call them 1st file = old file, 2nd file = new file
They are similar, except there is some lines added and some lines
removed in both files, and I only want to keep : lines common in both
files and line only existing in my 2nd file.
My first solution is to read the 2nd file, compare each lines with each
lines of the 1st file see if I fond it and if not, save it in a 3rd file
to list all lines I have to remove in my 2nd file.
Then I will read my 2nd file and compare each line with each lines of my
3rd file, if it does not exist, I write the line in a 4th file (the
final)
I’m not sure if this is the quickest and best method, maybe something
faster exist.
I also assume there is a fast method, with high memory usage and a
slower but with low memory usage.
My files are excel sheet exported in CSV files, but with something like
40k lines each
So if someone have clues to help me I would be grateful.
I have 2 list in 2 separate files
I call them 1st file = old file, 2nd file = new file
They are similar, except there is some lines added and some lines
removed in both files, and I only want to keep : lines common in both
files and line only existing in my 2nd file.
You want two separate output files, one containing lines common to both,
and one containing lines only in the 2nd file?
As long as preserving the order isn’t important, look at the manpages
for ‘sort’ and ‘join’.
Otherwise, since the files aren’t too big, you can read all of file1
into a Hash of {linedata=>true}. Then read through file2 and print a
line only if the corresponding Hash entry is true.
seen = {}
File.open(“file1”) do |f|
f.each_line do |line|
seen[line] = true
end
end
File.open(“file2”) do |f|
f.each_line do |line|
if seen[line]
print “In both,#{line}”
else
print “Only in file 2,#{line}”
end
end
end
Phillip and Dwayne, your answer was very useful, I did not know some
commands and such softwares exist for this application (but as I’m on
windows, the diff command was not working because of my file size) and I
did not only see the difference, but create a new file with some
specific things.
As Bartosz and Brian said I’ll try with your solution, the file is not
so big (~300k text each)
Only because I think it could be good to know, someone have some kind of
solution if files are bigger ?
I was thinking of sorting lines in the files before removing lines in
the 2nd file.
I’ll try with your ideas and maybe think about an alternate method in
case of …