text processing - sed to copy part of line to end -
i'm trying copy part of line append end:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz
becomes:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1/gca_900169985_ionxpress_024_genomic.fna.gz
i have tried:
sed 's/\(.*(gca_\)\(.*\))/\1\2\2)'
$ f1=$'ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz' $ echo "$f1" ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1_ionxpress_024_genomic.fna.gz $ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\1\2\3\/\2\4/' <<<"$f1" ftp://ftp.ncbi.nlm.nih.gov/genomes/all/gca/900/169/985/gca_900169985.1/gca_900169985_ionxpress_024_genomic.fna.gz
sed -e (or -r in systems) enables extended regex support in sed , don't need escape group parenthesis ( )
.
the format (gca_.[^.]*)
equals "get gca_ chars , excluding first found dot" :
$ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\2/' <<<"$f1" gca_900169985
similarly (.[^_]*)
means chars first found _
(excluding _
char). regex way perform non greedy/lazy capture (in perl regex have been written .*_?
)
$ sed -e 's/(.*)(gca_.[^.]*)(.[^_]*)(.*)/\3/' <<<"$f1" .1
Comments
Post a Comment