I just updated the second version of my github project, URLFinderAndVerifier. Needing to verify a http URL as it was typed in to toggle an “Accept” button, I searched the web for such a thing, and unfortunately found oodles of them, none of which had any credentials.
Looking closer on another day, I found a site run by Jeff Roberson where he had meticulously worked through RFC-3896, constructing simple regular expressions then combining them to create a full regular expression that should properly parse any valid URL. Note that the standard allows much of what we think of as necessary information to be empty.
So I started with his expressions, then tweaked them slightly to handle more real world conditions, such as forcing a person to enter (the optional) ‘/’ at the start of the path segment (which acts as a end of authority marker).
Using my previous Email verifier test harness, I constructed a test engine that lets you test out various combinations of options. It has three components:
– construct the regular expressions for use in a text file or a NSString constant
– URLSearcher, a class that uses to regular expression to find or verify URLs
– a test app that you use to test various URLs, or “live” input mode
All regular expressions exist in text files that are heavily commented. A pre-processor reads these files and removes comments and extraneous characters (like spaces). This makes the files much more readable and understandable. These partial regular expressions that are used to build a full regular expression based on the options you select in the GUI.
Users can optionally enter Unicode characters, which the regular expression can optionally allow. Given a verified URL, a utility function in URLSearcher converts these syntactically correct strings to ‘%’ encoded ones that fully meet the RFC spec (Europeans should like this!)
Other options are to look for every scheme, or http/https, or http/https/ftp and the ability to spec capture groups to see URL subcomponents: user, host, port, path, query, and fragment.
PS: if you end up using this it would be great if you could give the StackOverflow post an up-arrow.