html - extracting links from a page using grep on a mac -


i see other questions similar on none solved problem.

i have local html page want extract links don't want links want whole tag creates links, like

<a href="page1.html">my page 1</a> <a href="page2.html">my page 2</a> <a href="page3.html">my page 3</a> 

i fine if easier

my page 1 page1.html page 2 page2.html page 3 page3.html 

i have tried command answer on question on so

grep "<a href=" t2.html | sed "s/<a href/\\n<a href/g" | sed 's/\"/\"><\/a>\n/2' | grep href 

but reason extracting couple of links page

if want see, this page trying extract links.

thanks

cat indexantigo.html | grep -oie "<a([^>]+)>([^<]+)</a>" 

it match inline <a> tags without others tags inside.

details

<a([^>]+)>: start <a end > , contain no >.

([^<]+): contain no <

</a>: end </a>

note not match <a> tags other tag in it. such <a href="#"><img src="1.jpg" /></a>

edit: agree anthony geoghegan's answer, more convenient use script language such python.


Comments

Popular posts from this blog

angular - Ionic slides - dynamically add slides before and after -

minify - Minimizing css files -

Add a dynamic header in angular 2 http provider -