Now that Xcode 5 supports doxygen annotation, it’s time for you to start using it!

Ever since Xcode 5 added native support for doxygen, it really pays to annotate your code with doxygen style comments, which only require a few extra characters to comments you may already make.

In looking for tips and tricks, I turned up an excellent how-to written 4 years ago (but still correct) that offers many different techniques. For example, you can remind yourself (and others) how to use a particular variable:

 @property(assign) int ts; ///< Short Comment

or line wrap for up to 3 lines of comments:

@property(assign) int ts; ///< Set the complete ...
                          ///< the progress ...

If you’d rather use C-style comments you can:

@property(assign) int ts; /**< Set the complete ...
                               the progress ... */

The parser seems really smart about how finding continuation comments and also ignoring leading white space. In my examples, I indented the second lines, but you can leave them pegged to the left margin if you prefer.

The more common doxygen usage is to provide annotation blocks above methods, and the oodles of options you can use to do that can be found in this recent post on StackOverFlow (make sure to upvote the question and answer!):

You can add this handy code snippet to your Xcode Code Snippet library:

/**
 <#description#>
 @param <#parameter#>
 @returns <#retval#>
 @exception <#throws#>
 */

as detailed here (and I just did!):

From then on you and others can get help during autocomplete, as well as by option-clicking on a method. I was slow to start using this, but now its getting to be second nature.

Also, if you start posting your open source code (using podspecs) to Cocoapods, they run doxygen on your files and create really nice documentation for them on CocoaDocs.

Advertisements

The Near-Perfect Email Validating Regular Expression

There have been hundreds of posts on the internet by developers claiming to have an “excellent” regular expression through which an app can validate an address supplied by an end user. However, what you get is a quite long unintelligible string. While you can test it against one or more addresses, you don’t really know its limitations or what its going to reject.

Fortunately some interested parties have created sites for the sole purpose of testing supplied expressions: the one I’ve most often used is here. However, as you can see there, no expression passes all good addresses or rejects all the bad addresses.

What really got me interested in this was the inconsistency of the expressions, the lack of traceability to the relevant specs, and even the opportunity to fix a known failure (as if I could even read the expression)! Then, while trying to find a URL validator, I tripped on a site created by Jeff Roberson, who constructed a complicated URL validator by using the relevant RFC, then developing a small regular expression snippet for each of the components, then finally assembling the final full-featured expression. Really impressive.

So early in 2013 I set out to develop a totally standards compliant regular expression. The first big hurtle was the claim that the spec used recursion, and thus no regular expression could ever meet it. Well, it turns out that only comments embedded within an email can be nested, and whoever saw a comment in an email address!

Comments take the form “(some text)”, and nesting occurs when “some text” contains a comment. So, all this fuss about recursion is focused on a feature that no one uses! In the end, to insure the regular expression I produced would pass some pretty severe existing tests, I implemented comments to a user specified fixed level using the “or”regular expression feature. Thus, a “comment” is { “comment” | “comment nested to one level” | “comment nested to two levels” }. Note that to pass the tests you need to handle a nesting level of 5, which greatly increases the size of the regular expression, and which no real app would ever use.

The second hurtle is that the test suites do not rely solely on the principal RFC (RFC-5322), but on related RFCs, some of which contradict 5322! In the end I had to incorporate information relating to the specification of IPV6 addresses (complex!), IPV4 addresses, part lengths, and even contradictions within the core RFC (text says one thing, the ABNF says something else).

A final hurtle was dealing with “deprecated” rules, those that the spec officially recommends and those are old and should be supported but shouldn’t be used anyway. In the end I solved this by deciding to NOT support deprecated rules (made my life easier).

It became obvious immediately that there is no “perfect” RFC – if the text and ABNF contradict themselves, you can have it one way or the other, but not both ways! The solution was to punt the decision to the final URL creator, and let that person make the decision as to what to accept and what not to.

Another issue was what to do about regex syntax: ObjectiveC on the Mac uses the ICU package, which was based on Perl. Portions of that syntax are not supported by the “C” POSIX side of Macs, and any other POSIX derived regular expression package. The spec uses terms that better match Perl too. In the end, I was able to craft the regular expression so that it would mostly work on both, and could be tailored for one or the other by changing a few items.

The final result is a Mac App that can construct a Regular Expression to your specification, output it in text or string format, and can be used to interactively test against a text the user entered or pastes into the app. Additionally, the app contains a class for use in validating or extracting regular expressions, and could be with some small effort ported to other languages. There is a C function to validate a single email address too.

AppScreenShot

So, what does the near-perfect email validating expression look like? Like this:

“^(?:(?:(?:(?: )*(?:(?:(?:\\t| )*\\r\\n)?(?:\\t| )+))+(?: )*)|(?: )+)?(?:(?:(?:[-A-Za-z0-9!#$%&’*+/=?^_`{|}~]+(?:\\.[-A-Za-z0-9!#$%&’*+/=?^_`{|}~]+)*)|(?:\”(?:(?:(?:(?: )*(?:(?:[!#-Z^-~]|\\[|\\])|(?:\\\\(?:\\t|[ -~]))))+(?: )*)|(?: )+)\”))(?:@)(?:(?:(?:[A-Za-z0-9](?:[-A-Za-z0-9]{0,61}[A-Za-z0-9])?)(?:\\.[A-Za-z0-9](?:[-A-Za-z0-9]{0,61}[A-Za-z0-9])?)*)|(?:\\[(?:(?:(?:(?:(?:[0-9]|(?:[1-9][0-9])|(?:1[0-9][0-9])|(?:2[0-4][0-9])|(?:25[0-5]))\\.){3}(?:[0-9]|(?:[1-9][0-9])|(?:1[0-9][0-9])|(?:2[0-4][0-9])|(?:25[0-5]))))|(?:(?:(?: )*[!-Z^-~])*(?: )*)|(?:[Vv][0-9A-Fa-f]+\\.[-A-Za-z0-9._~!$&'()*+,;=:]+))\\])))(?:(?:(?:(?: )*(?:(?:(?:\\t| )*\\r\\n)?(?:\\t| )+))+(?: )*)|(?: )+)?$”

This expression is the result of setting the “Validation” preset in the app, and pasted as one string. Note that it only tests against “local-part@domain” style addresses, not those and “mailbox” specs (DisplayName <local-part@domain>”. If that’s what you want, then check the appropriate box and generate a different one!

The Xcode project used to generate this Mac app can be found on my github site with the (historical) name of EmailAddressFinder.

A property/ivar/outlet was just changed, how do I find out who did it?

I see this post all the time on StackOverflow. The answer is surprisingly simple:

– if you have an ivar, convert it to a property with a synthesize ivar=ivar if needbe to avoid having to prepend a “_” to usages

– write your own setter, and add logic tests and NSLog messages

– put a breakpoint on the NSLog message, and run your app

Voila! You app stops when the value changes, and you can see who the offender is!

URL Verifier, Parser, and Scraper in Objective-C

I just updated the second version of my github project, URLFinderAndVerifier. Needing to verify a http URL as it was typed in to toggle an “Accept” button, I searched the web for such a thing, and unfortunately found oodles of them, none of which had any credentials.

Looking closer on another day, I found a site run by Jeff Roberson where he had meticulously worked through RFC-3896, constructing simple regular expressions then combining them to create a full regular expression that should properly parse any valid URL. Note that the standard allows much of what we think of as necessary information to be empty.

So I started with his expressions, then tweaked them slightly to handle more real world conditions, such as forcing a person to enter (the optional) ‘/’ at the start of the path segment (which acts as a end of authority marker).

Using my previous Email verifier test harness, I constructed a test engine that lets you test out various combinations of options. It has three components:

– construct the regular expressions for use in a text file or a NSString constant

– URLSearcher, a class that uses to regular expression to find or verify URLs

– a test app that you use to test various URLs, or “live” input mode

AppScreenShot

All regular expressions exist in text files that are heavily commented. A pre-processor reads these files and removes comments and extraneous characters (like spaces). This makes the files much more readable and understandable. These partial regular expressions that are used to build a full regular expression based on the options you select in the GUI.

Users can optionally enter Unicode characters, which the regular expression can optionally allow. Given a verified URL, a utility function in URLSearcher converts these syntactically correct strings to ‘%’ encoded ones that fully meet the RFC spec (Europeans should like this!)

Other options are to look for every scheme, or http/https, or http/https/ftp and the ability to spec capture groups to see URL subcomponents: user, host, port, path, query, and fragment.

PS: if you end up using this it would be great if you could give the StackOverflow post an up-arrow.

Your friend, the comma operator

C has had a comma operator for decades, the primary use of which is to separate function arguments.

By definition, the value of a comma separated statement is the right hand argument:

int i = j = 3, k = 4, l = 5;

sets i to 5, and is evaluated left to right. However, this is not the usage I have found for it.

Rather, I use it to bind statements that I fear some junior engineer is going to separate in the future – by mistake.

For instance:

[timer invalidate];
 timer = nil;

I could of course do:

{
    [timer invalidate];
     timer = nil;
}

but that looks like so much work for such a small thing. So instead, what I do is this:

[timer invalidate], timer = nil;

Its inconceivable that some developer in the future will separate those statements.

Thus, you can use the comma to tightly bind two or more statements together in a way that won’t get broken up by mistake.

Thoughts on formatting blocks

When I first started writing serious Objective-C code in 2005, I had the benefit of a stern taskmaster in Aaron Hillegas of the Big Nerd Ranch, on proper style. He read us a list of standards to follow, such as class methods always appear before instance ones, that all braces for methods and ivar wrappers appear as the first character of a line (well I think that’s what he said). In addition, this is the proper method syntax:

- (NSString *)foo:(SomeClass *)name;

The ‘-‘ or ‘+’ the first character, a space, the type, the name, no spaces between the ‘:’ and the next character, etc as you can see.

With blocks, well, I never saw a style I liked in sample code, and the sample code is all over the place in terms of styles. After a few months I’ve settled down to an uncommon style (which I hope to make less common by publicizing it!):

[dict enumerateKeysAndObjectsUsingBlock:^(NSString *key, NSArray *obj, BOOL *stop))
    {
        ... the code
    } ];

Using this style makes it very easy to spot the blocks in code, and also to visually identify where blocks are when you quickly scan a big file.

To my eye, putting the braces directly under the ‘[‘ makes it look too much like a simple C block unconnected to the line above.

Note that I’ve changed the id key and id obj to specific types. I totally forget where I picked this up – I may in fact have just tried to use it and found it would work. It sure saves me a lot of time in created a second object casted to the id value or using one or more casts within an expression.

When I assign a block variable, it gets a bit more tricky, but this is what I do for multi-line blocks:

dispatch_block_t b = ^{
                          ... code
                      };

I’ve really gotten to like this style, now with my strong opinions on C, Objective-C, and blocks styles I’m afraid I’ll never be able to integrate into a coding group that has already written a style guide. YMMV