Lubos Rendek

The only solution is determination.

Regex to Extract URL From HTML Code

| Comments

Simple hint on how to extract URLs whether from html code. Fors example save the following code into file called url-extract.txt

1
2
3
4
5
6
7
8
9
</a>
</li></ul></li><li  class="jsn-icon-search"><a  href="http://jobs.linuxcareers.com/" >
>
>
</a>
</li><li  class="parent jsn-icon-info"><a  href="http://how-to.linuxcareer.com" title="Default contents" >
>
 class="jsn-menutitle">Docs</span><span class="jsn-menudescription">Default contents</span>'
</a>

Whether the URL consists of HTTP or HTTPS protocol it can be extracted with a following sed command:

1
2
3
$ sed -ne 's/.*\(http[^"]*\).*/\1/p' url-extract.txt
http://jobs.linuxcareers.com/
http://how-to.linuxcareer.com

Comments