html - extracting links from a page using grep on a mac -
i see other questions similar on none solved problem.
i have local html page want extract links don't want links want whole tag creates links, like
<a href="page1.html">my page 1</a> <a href="page2.html">my page 2</a> <a href="page3.html">my page 3</a>
i fine if easier
my page 1 page1.html page 2 page2.html page 3 page3.html
i have tried command answer on question on so
grep "<a href=" t2.html | sed "s/<a href/\\n<a href/g" | sed 's/\"/\"><\/a>\n/2' | grep href
but reason extracting couple of links page
if want see, this page trying extract links.
thanks
cat indexantigo.html | grep -oie "<a([^>]+)>([^<]+)</a>"
it match inline <a>
tags without others tags inside.
details
<a([^>]+)>
: start <a
end >
, contain no >
.
([^<]+)
: contain no <
</a>
: end </a>
note not match <a>
tags other tag in it. such <a href="#"><img src="1.jpg" /></a>
edit: agree anthony geoghegan's answer, more convenient use script language such python.
Comments
Post a Comment