text processing - sed to copy part of line to end -
i'm trying copy part of line append end:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz becomes:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1/gca_900169985_ionxpress_024_genomic.fna.gz i have tried:
sed 's/\(.*(gca_\)\(.*\))/\1\2\2)'
$ f1=$'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz' $ echo "$f1" ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz $ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\1\2\3\/\2\4/' <<<"$f1" ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1/gca_900169985_ionxpress_024_genomic.fna.gz sed -e (or -r in systems) enables extended regex support in sed , don't need escape group parenthesis ( ).
the format (gca_.[^.]*) equals "get gca_ chars , excluding first found dot" :
$ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\2/' <<<"$f1" gca_900169985 similarly (.[^_]*) means chars first found _ (excluding _ char). regex way perform non greedy/lazy capture (in perl regex have been written .*_?)
$ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\3/' <<<"$f1" .1
Comments
Post a Comment