html - extracting links from a page using grep on a mac -
i see other questions similar on none solved problem.
i have local html page want extract links don't want links want whole tag creates links, like
<a href="page1.html">my page 1</a> <a href="page2.html">my page 2</a> <a href="page3.html">my page 3</a> i fine if easier
my page 1 page1.html page 2 page2.html page 3 page3.html i have tried command answer on question on so
grep "<a href=" t2.html | sed "s/<a href/\\n<a href/g" | sed 's/\"/\"><\/a>\n/2' | grep href but reason extracting couple of links page
if want see, this page trying extract links.
thanks
cat indexantigo.html | grep -oie "<a([^>]+)>([^<]+)</a>" it match inline <a> tags without others tags inside.
details
<a([^>]+)>: start <a end > , contain no >.
([^<]+): contain no <
</a>: end </a>
note not match <a> tags other tag in it. such <a href="#"><img src="1.jpg" /></a>
edit: agree anthony geoghegan's answer, more convenient use script language such python.
Comments
Post a Comment