Now that Xcode 5 supports doxygen annotation, it’s time for you to start using it!

Ever since Xcode 5 added native support for doxygen, it really pays to annotate your code with doxygen style comments, which only require a few extra characters to comments you may already make.

In looking for tips and tricks, I turned up an excellent how-to written 4 years ago (but still correct) that offers many different techniques. For example, you can remind yourself (and others) how to use a particular variable:

 @property(assign) int ts; ///< Short Comment

or line wrap for up to 3 lines of comments:

@property(assign) int ts; ///< Set the complete ...
                          ///< the progress ...

If you’d rather use C-style comments you can:

@property(assign) int ts; /**< Set the complete ...
                               the progress ... */

The parser seems really smart about how finding continuation comments and also ignoring leading white space. In my examples, I indented the second lines, but you can leave them pegged to the left margin if you prefer.

The more common doxygen usage is to provide annotation blocks above methods, and the oodles of options you can use to do that can be found in this recent post on StackOverFlow (make sure to upvote the question and answer!):

You can add this handy code snippet to your Xcode Code Snippet library:

/**
 <#description#>
 @param <#parameter#>
 @returns <#retval#>
 @exception <#throws#>
 */

as detailed here (and I just did!):

From then on you and others can get help during autocomplete, as well as by option-clicking on a method. I was slow to start using this, but now its getting to be second nature.

Also, if you start posting your open source code (using podspecs) to Cocoapods, they run doxygen on your files and create really nice documentation for them on CocoaDocs.

The Near-Perfect Email Validating Regular Expression

There have been hundreds of posts on the internet by developers claiming to have an “excellent” regular expression through which an app can validate an address supplied by an end user. However, what you get is a quite long unintelligible string. While you can test it against one or more addresses, you don’t really know its limitations or what its going to reject.

Fortunately some interested parties have created sites for the sole purpose of testing supplied expressions: the one I’ve most often used is here. However, as you can see there, no expression passes all good addresses or rejects all the bad addresses.

What really got me interested in this was the inconsistency of the expressions, the lack of traceability to the relevant specs, and even the opportunity to fix a known failure (as if I could even read the expression)! Then, while trying to find a URL validator, I tripped on a site created by Jeff Roberson, who constructed a complicated URL validator by using the relevant RFC, then developing a small regular expression snippet for each of the components, then finally assembling the final full-featured expression. Really impressive.

So early in 2013 I set out to develop a totally standards compliant regular expression. The first big hurtle was the claim that the spec used recursion, and thus no regular expression could ever meet it. Well, it turns out that only comments embedded within an email can be nested, and whoever saw a comment in an email address!

Comments take the form “(some text)”, and nesting occurs when “some text” contains a comment. So, all this fuss about recursion is focused on a feature that no one uses! In the end, to insure the regular expression I produced would pass some pretty severe existing tests, I implemented comments to a user specified fixed level using the “or”regular expression feature. Thus, a “comment” is { “comment” | “comment nested to one level” | “comment nested to two levels” }. Note that to pass the tests you need to handle a nesting level of 5, which greatly increases the size of the regular expression, and which no real app would ever use.

The second hurtle is that the test suites do not rely solely on the principal RFC (RFC-5322), but on related RFCs, some of which contradict 5322! In the end I had to incorporate information relating to the specification of IPV6 addresses (complex!), IPV4 addresses, part lengths, and even contradictions within the core RFC (text says one thing, the ABNF says something else).

A final hurtle was dealing with “deprecated” rules, those that the spec officially recommends and those are old and should be supported but shouldn’t be used anyway. In the end I solved this by deciding to NOT support deprecated rules (made my life easier).

It became obvious immediately that there is no “perfect” RFC – if the text and ABNF contradict themselves, you can have it one way or the other, but not both ways! The solution was to punt the decision to the final URL creator, and let that person make the decision as to what to accept and what not to.

Another issue was what to do about regex syntax: ObjectiveC on the Mac uses the ICU package, which was based on Perl. Portions of that syntax are not supported by the “C” POSIX side of Macs, and any other POSIX derived regular expression package. The spec uses terms that better match Perl too. In the end, I was able to craft the regular expression so that it would mostly work on both, and could be tailored for one or the other by changing a few items.

The final result is a Mac App that can construct a Regular Expression to your specification, output it in text or string format, and can be used to interactively test against a text the user entered or pastes into the app. Additionally, the app contains a class for use in validating or extracting regular expressions, and could be with some small effort ported to other languages. There is a C function to validate a single email address too.

AppScreenShot

So, what does the near-perfect email validating expression look like? Like this:

“^(?:(?:(?:(?: )*(?:(?:(?:\\t| )*\\r\\n)?(?:\\t| )+))+(?: )*)|(?: )+)?(?:(?:(?:[-A-Za-z0-9!#$%&’*+/=?^_`{|}~]+(?:\\.[-A-Za-z0-9!#$%&’*+/=?^_`{|}~]+)*)|(?:\”(?:(?:(?:(?: )*(?:(?:[!#-Z^-~]|\\[|\\])|(?:\\\\(?:\\t|[ -~]))))+(?: )*)|(?: )+)\”))(?:@)(?:(?:(?:[A-Za-z0-9](?:[-A-Za-z0-9]{0,61}[A-Za-z0-9])?)(?:\\.[A-Za-z0-9](?:[-A-Za-z0-9]{0,61}[A-Za-z0-9])?)*)|(?:\\[(?:(?:(?:(?:(?:[0-9]|(?:[1-9][0-9])|(?:1[0-9][0-9])|(?:2[0-4][0-9])|(?:25[0-5]))\\.){3}(?:[0-9]|(?:[1-9][0-9])|(?:1[0-9][0-9])|(?:2[0-4][0-9])|(?:25[0-5]))))|(?:(?:(?: )*[!-Z^-~])*(?: )*)|(?:[Vv][0-9A-Fa-f]+\\.[-A-Za-z0-9._~!$&'()*+,;=:]+))\\])))(?:(?:(?:(?: )*(?:(?:(?:\\t| )*\\r\\n)?(?:\\t| )+))+(?: )*)|(?: )+)?$”

This expression is the result of setting the “Validation” preset in the app, and pasted as one string. Note that it only tests against “local-part@domain” style addresses, not those and “mailbox” specs (DisplayName <local-part@domain>”. If that’s what you want, then check the appropriate box and generate a different one!

The Xcode project used to generate this Mac app can be found on my github site with the (historical) name of EmailAddressFinder.

A property/ivar/outlet was just changed, how do I find out who did it?

I see this post all the time on StackOverflow. The answer is surprisingly simple:

– if you have an ivar, convert it to a property with a synthesize ivar=ivar if needbe to avoid having to prepend a “_” to usages

– write your own setter, and add logic tests and NSLog messages

– put a breakpoint on the NSLog message, and run your app

Voila! You app stops when the value changes, and you can see who the offender is!

URL Verifier, Parser, and Scraper in Objective-C

I just updated the second version of my github project, URLFinderAndVerifier. Needing to verify a http URL as it was typed in to toggle an “Accept” button, I searched the web for such a thing, and unfortunately found oodles of them, none of which had any credentials.

Looking closer on another day, I found a site run by Jeff Roberson where he had meticulously worked through RFC-3896, constructing simple regular expressions then combining them to create a full regular expression that should properly parse any valid URL. Note that the standard allows much of what we think of as necessary information to be empty.

So I started with his expressions, then tweaked them slightly to handle more real world conditions, such as forcing a person to enter (the optional) ‘/’ at the start of the path segment (which acts as a end of authority marker).

Using my previous Email verifier test harness, I constructed a test engine that lets you test out various combinations of options. It has three components:

– construct the regular expressions for use in a text file or a NSString constant

– URLSearcher, a class that uses to regular expression to find or verify URLs

– a test app that you use to test various URLs, or “live” input mode

AppScreenShot

All regular expressions exist in text files that are heavily commented. A pre-processor reads these files and removes comments and extraneous characters (like spaces). This makes the files much more readable and understandable. These partial regular expressions that are used to build a full regular expression based on the options you select in the GUI.

Users can optionally enter Unicode characters, which the regular expression can optionally allow. Given a verified URL, a utility function in URLSearcher converts these syntactically correct strings to ‘%’ encoded ones that fully meet the RFC spec (Europeans should like this!)

Other options are to look for every scheme, or http/https, or http/https/ftp and the ability to spec capture groups to see URL subcomponents: user, host, port, path, query, and fragment.

PS: if you end up using this it would be great if you could give the StackOverflow post an up-arrow.

Your friend, the comma operator

C has had a comma operator for decades, the primary use of which is to separate function arguments.

By definition, the value of a comma separated statement is the right hand argument:

int i = j = 3, k = 4, l = 5;

sets i to 5, and is evaluated left to right. However, this is not the usage I have found for it.

Rather, I use it to bind statements that I fear some junior engineer is going to separate in the future – by mistake.

For instance:

[timer invalidate];
 timer = nil;

I could of course do:

{
    [timer invalidate];
     timer = nil;
}

but that looks like so much work for such a small thing. So instead, what I do is this:

[timer invalidate], timer = nil;

Its inconceivable that some developer in the future will separate those statements.

Thus, you can use the comma to tightly bind two or more statements together in a way that won’t get broken up by mistake.

Thoughts on formatting blocks

When I first started writing serious Objective-C code in 2005, I had the benefit of a stern taskmaster in Aaron Hillegas of the Big Nerd Ranch, on proper style. He read us a list of standards to follow, such as class methods always appear before instance ones, that all braces for methods and ivar wrappers appear as the first character of a line (well I think that’s what he said). In addition, this is the proper method syntax:

- (NSString *)foo:(SomeClass *)name;

The ‘-‘ or ‘+’ the first character, a space, the type, the name, no spaces between the ‘:’ and the next character, etc as you can see.

With blocks, well, I never saw a style I liked in sample code, and the sample code is all over the place in terms of styles. After a few months I’ve settled down to an uncommon style (which I hope to make less common by publicizing it!):

[dict enumerateKeysAndObjectsUsingBlock:^(NSString *key, NSArray *obj, BOOL *stop))
    {
        ... the code
    } ];

Using this style makes it very easy to spot the blocks in code, and also to visually identify where blocks are when you quickly scan a big file.

To my eye, putting the braces directly under the ‘[‘ makes it look too much like a simple C block unconnected to the line above.

Note that I’ve changed the id key and id obj to specific types. I totally forget where I picked this up – I may in fact have just tried to use it and found it would work. It sure saves me a lot of time in created a second object casted to the id value or using one or more casts within an expression.

When I assign a block variable, it gets a bit more tricky, but this is what I do for multi-line blocks:

dispatch_block_t b = ^{
                          ... code
                      };

I’ve really gotten to like this style, now with my strong opinions on C, Objective-C, and blocks styles I’m afraid I’ll never be able to integrate into a coding group that has already written a style guide. YMMV

Best Practices for Xcode Configurations

When you create a project, Xcode gives you Development and Release configurations. The former has DEBUG set to 1 and the optimizer turned off. The latter uses an optimizer setting of -Os (optimize for speed and size together), which is what I always use on submitted apps. Together these are not enough to thoroughly develop and test your app! I just cannot emphasize enough how important it is to add more configurations, and to use them judiciously.

You should add two new configurations, Distribution and ReleaseWithAsserts. The former is used to build the release that gets put into the store (or otherwise distributed to customers), and the latter is the one you will use most during development.

  • Deployment: a Release clone, it adds a define to the Preprocessor Macros DEPLOYMENT, so that code that does say push notifications uses the proper information. It also has NS_BLOCK_ASSERTIONS=1 and NDEBUG, which disable NSAsserts() and asserts() respectively.
  • Release: used for ad hoc or local testing, and it may pull in the development push server or modify some settings necessary for ad hoc testing. As provided by Apple but with the two assertion flags: NS_BLOCK_ASSERTIONS=1 and NDEBUG.
  • ReleaseWithAsserts: a pure Release clone.
  • ReleaseWithAssertsUnitTesting: a  Release clone with a UNIT_TESTING flag.
  • Debug: as provided by Apple.

Screen Shot 2013-04-25 at 12.25.13 PM

The reasoning is as follows. Sprinkle your code with as many asserts()/NSAsserts() as possible, to catch logic and pure coding errors as soon as possible. These will only be compiled into the code for Debug and ReleaseWithAsserts.

So many developers I’ve met use Debug for all development, and only switch to Release when done. This is a huge mistake as the compiled code is hugely different between optimized and raw unoptimized code. Particularly now with ARC, so many retains and releases obviated by ARC stay in the code. Debug compiled code runs slower, uses bloated machine code, and is many ways is quite different from what you will ultimately ship. ARC issues may never appear with Debug, only showing themselves at the last minute when you test with Release. Do yourself a huge favor and only use Debug for its true purpose: working with the debugger tracking down crashes or bugs.

The reason you want Debug for bug resolution is that truely every line of code in your program will match some machine code. With the optimizer, the one to one relationship is lost, and the compiler can reorganize your code and even remove some of it. When you need Debug you need it!

Also, so now that you have the assert flags to differentiate tested or distributed code, you can add other checks and/or logs into the code, by wrapping it with #ifndef NDEBUG (a double negative!).

I also add a line to any code wrapped in #ifdef DEPLOYMENT to issue a warning – so when building it I know absolutely that the code has been in fact compiled into the binary:

#ifdef DEPLOYMENT
#warning Using Deployment Push Notification

#endif

I finally figured out weakSelf and strongSelf

One problem with using blocks and asynchronous dispatch is that you can get into a retain cycle – the block can retain ‘self’, sometimes in mysterious ways. For instance, if you reference an ivar directly, what appears in the code is ‘theIvar’, the compiler generates ‘self->theIvar’. Thus, ‘self’, as a strong variable, is retained, and the queue retains the block, and the object retains the queue.

Apple recommends first assigning ‘self’ into a weak automatic variable, then referencing that in a block (see 1). Since the block captures the variable along with its decorators (i.e. weak qualifier), there is no strong reference to ‘self’, and the object can get dealloced, and at that moment the weak captured variable turns to nil.

__weak __typeof__(self) weakSelf = self;
dispatch_group_async(_operationsGroup, _operationsQueue, ^
{
[weakSelf doSomething];
} );

Thinking about this, it occurred to me that ‘weakSelf’ might turn to nil in the middle of ‘doSomething’, but after a few posts on the xcode list, I was given a reference to a clang document that explicitly states that any object within an expression is retained for the complete expression, and only released thereafter (see 2). Whew!

But how about:

__weak __typeof__(self) weakSelf = self;
dispatch_group_async(_operationsGroup, _operationsQueue, ^
{
[weakSelf doSomething];
[weakSelf doSomethingElse];
} );

Well, in this case, its possible for ‘weakSelf’ to be non-nil for the first method, but not for the second. Hmmm – the above is a simple example, most real code would get much more complex with other usages of ‘weakSelf’.

Apple calls this second example ‘non-trivial’ (see 1), and does what first seems like a bizarre set of steps: first create the ‘weakSelf’ object, then assign that to a ‘strongSelf’:

__weak __typeof__(self) weakSelf = self;
dispatch_group_async(_operationsGroup, _operationsQueue, ^
{
__typeof__(self) strongSelf = weakSelf;
[strongSelf doSomething];
[strongSelf doSomethingElse];
} );

or in Swift:

dispatch_async(dispatch_get_main_queue()) { [weak self] in
if let strongSelf = self {
//…
}
}
// See “Resolving Strong Reference Cycles for Closures” in The Swift Programming Language

I looked and looked at this trying to reason it out (guess I’m just slow). Finally, the light bulb lit, and I figured it out! When the block runs, it’s only captured ‘weakSelf’. At the instant the block starts up, ‘weakSelf’ is either ‘self’, or it’s nil. Your code (as Apple’s example does) can test to see if ‘strongSelf’ is set, and if so you can use ‘strongSelf->theIvar’ or the more normal ‘strongSelf.someProperty’ (the latter works fine with nil messaging, the former will crash if ‘strongSelf’ is nil).

If ‘weakSelf’ is equal to ‘self’, then ‘strongSelf’ retains it, and it stays retained until the block returns, when its released. It’s all or nothing.

I felt really good finally getting this, and its making my coding of blocks much easier.

NOTE:

If you’re puzzled by the use of __typeof__(self), it’s a non-standard clang macro that turns into the class of the parenthesized object, and was taken from GCC. The nice thing about using it is that you can make this line a Code Snippet (mine is called ‘WeakSelf’), and you don’t have to continually adjust the class type.

1) “Programming With ARC Release Notes, search for “For non-trivial cycles, however, you should use”

2) http://clang.llvm.org/docs/AutomaticReferenceCounting.html :

For __weak objects, the current pointee is retained and then released at the end of the current full-expression. This must execute atomically with respect to assignments and to the final release of the pointee.

NSOperation-WebFetches-MadeEasy now FastEasyConcurrentWebFetches

After reading the latest Apple Guides on NSOperation and Concurrency, I updated NSOperation-WebFetches-MadeEasy to use normal NSOperations. It simplified the code for sure. But then I asked myself why I was still using NSOperation and NSOperationQueue, and not GCD alone.

Since this established project has NSOperation in its title, I didn’t feel right in just updating it to GCD, so I cloned it to FastEasyConcurrentWebFetches and have now pushed that code up.

The big difference is that now is that most everything uses blocks and GCD. It turns out you can add a NSRunLoop to a block, and get the same ability to “sleep” awaiting messages as you do with NSOperation.

Screen Shot 2013-04-23 at 4.15.15 PM

This code will be the basis of a library I’m doing for my current employer, so it will get regular updates (as did NSOperation-WebFetches-MadeEasy), as bugs or other issues become evident.