Split up urls in their components

// October 10th, 2008 // development

This little call finds urls in a string and splits them up in their components. All with labels for easy usage.

preg_match_all('/(?P<protocol>(?:(?:f|ht)tp|https):\/\/)?(?P<domain>(?:(?!-)(?P<sld>[a-zA-Z\d\-]+)(?<!-)[\.]){1,2}(?P<tld>(?:[a-zA-Z]{2,}\.?){1,}){1,}|(?P<ip>(?:(?(?<!\/)\.)(?:25[0-5]|2[0-4]\d|[01]?\d?\d)){4}))(?::(?P<port>\d{2,5}))?(?:\/(?P<script>[~a-zA-Z\/.0-9-_]*)?(?:\?(?P<parameters>[=a-zA-Z+%&\&amp;\'\(\)0-9,.\/_ -]*))?)?(?:\#(?P<anchor>[=a-zA-Z+%&0-9._]*))?/x',$text,$data)

2 Responses to “Split up urls in their components”

  1. nikolay says:

    http://bg.php.net/parse_url

    The regexp does a better job at partitioning the domain, but the core php function was tested through the years :-)

  2. tott says:

    sure thing! but parse_url does it only for one url and not on a string with plenty of them.

Leave a Reply