string - Sort counted hits into separate files -


i have short.txt (contains strings) , long.txt (contains strings).

for example short.txt contains:
this
that

long.txt contains: this
thisis
that
thisisan
thisisanexample

i have source code counts if string short.txt in long.txt.

grep -f -o -f short.txt long.txt | sort | uniq -c | sort -nr > counted.txt

so counted.txt contain:
3 this
1

my question is: how can results in separate files like:
3_this.txt (so number of hits+_+word+.txt)
(which contains) this
thisis
thisisan
thisisanexample
1_that.txt
(which contains)

small list can contain 10.000+ strings, long list 100.000.000+

i playing .sh because can run on mac ease. don't know if have faster solution this.
long.txt updated in every month, small.txt in every day.

simple python solution. doesn't assume have created counted.py

import os  open('short.txt', 'r') shorttxt:     s in shorttxt:         outfilename = s[:-1] + '.txt'         count = 0         open('long.txt', 'r') longtxt, open(outfilename, 'w') out:             l in longtxt:                 if s[:-1] in l:                     count += 1                     out.write(l)         os.rename(outfilename, str(count) + '_' + outfilename) 

Comments

Popular posts from this blog

neo4j - finding mutual friends in a cypher statement starting with three or more persons -

php - How to remove letter in front of the word laravel -

minify - Minimizing css files -