string - Sort counted hits into separate files -
i have short.txt (contains strings) , long.txt (contains strings).
for example short.txt contains:
this
that
long.txt contains: this
thisis
that
thisisan
thisisanexample
i have source code counts if string short.txt in long.txt.
grep -f -o -f short.txt long.txt | sort | uniq -c | sort -nr > counted.txt
so counted.txt contain:
3 this
1
my question is: how can results in separate files like:
3_this.txt (so number of hits+_+word+.txt)
(which contains) this
thisis
thisisan
thisisanexample
1_that.txt
(which contains)
small list can contain 10.000+ strings, long list 100.000.000+
i playing .sh because can run on mac ease. don't know if have faster solution this.
long.txt updated in every month, small.txt in every day.
simple python solution. doesn't assume have created counted.py
import os open('short.txt', 'r') shorttxt: s in shorttxt: outfilename = s[:-1] + '.txt' count = 0 open('long.txt', 'r') longtxt, open(outfilename, 'w') out: l in longtxt: if s[:-1] in l: count += 1 out.write(l) os.rename(outfilename, str(count) + '_' + outfilename)
Comments
Post a Comment