<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The WebZappr &#187; extract urls</title>
	<atom:link href="http://blog.webzappr.com/tag/extract-urls/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.webzappr.com</link>
	<description>My Random Web Snippets</description>
	<lastBuildDate>Wed, 17 Mar 2010 00:11:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Get a list of urls/domains from a text file</title>
		<link>http://blog.webzappr.com/2008/11/get-a-list-of-urlsdomains-from-a-text-file/</link>
		<comments>http://blog.webzappr.com/2008/11/get-a-list-of-urlsdomains-from-a-text-file/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 20:05:27 +0000</pubDate>
		<dc:creator>Thorsten</dc:creator>
				<category><![CDATA[linux]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[extract domains]]></category>
		<category><![CDATA[extract urls]]></category>
		<category><![CDATA[sed]]></category>

		<guid isPermaLink="false">http://snipplr.wordpress.com/2008/11/05/get-a-list-of-urlsdomains-from-a-text-file/</guid>
		<description><![CDATA[I was just in need of a little script that extracts all urls from a text file. Here is the result.
sed 's/http/\^http/g' FILENAME &#124; tr -s &#34;^&#34; &#34;\n&#34; &#124; grep http&#124; sed 's/[\ &#124;\\\&#124;\&#34;].*//g' &#124; sed &#34;s/['].*//g&#34; &#124; sort &#124; uniq
And as my final goal was to extract a list of domain names from the [...]]]></description>
			<content:encoded><![CDATA[<p>I was just in need of a little script that extracts all urls from a text file. Here is the result.</p>
<pre class="brush: php;">sed 's/http/\^http/g' FILENAME | tr -s &quot;^&quot; &quot;\n&quot; | grep http| sed 's/[\ |\\\|\&quot;].*//g' | sed &quot;s/['].*//g&quot; | sort | uniq</pre>
<p>And as my final goal was to extract a list of domain names from the file which I can later use in my php script here is the hardcore version which gives you a copy&amp;paste array definition of all domains found in a file.</p>
<pre class="brush: php;">echo -n '$domains = array ( &quot;' ; sed 's/http/\^http/g' FILENAME | tr -s &quot;^&quot; &quot;\n&quot; | grep http| sed 's/[\ |\\\|\&quot;].*//g' | sed &quot;s/['].*//g&quot; | sort | uniq | awk 'BEGIN{FS=&quot;/&quot;}{print $3}' | cut -d . -f 2- | grep -E '^[a-z]+\.[a-z]+$' | sort | uniq | tr &quot;\n&quot; &quot;,&quot; | sed &quot;s/,/\&quot;, \&quot;/ig&quot; | sed 's/, \&quot;$//ig'; echo -n ' );'</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.webzappr.com/2008/11/get-a-list-of-urlsdomains-from-a-text-file/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
