waffle

Objective Dash

There aren’t very many web sites that I’ll say this about, but I am thoroughly honored to have been part of the inspiration for Matt Gallagher’s entry on the post-Objective-C world. Matt’s a fine programmer with a deep understanding in user interface design and a rare gift for education. When I am researching a Cocoa problem and I find an article on Cocoa with Love, the problem is usually solved in a better, deeper and more idiomatic way than most other alternative solutions.

That said, I don’t necessarily actually agree with him on everything. So let’s start out by tackling this section by section.

Programmers, onlookers and pundits have criticised Objective-C for longer than Apple has been using it. If the criticisms were valid and pressing, most could actually be addressed without replacing Objective-C/Cocoa.

To be sure, because Objective-C looks and smells funny, it has its fair share of detractors. Every languages do, even those that are either widely beloved or neutral enough to not ruffle anyone’s feathers, but Objective-C probably has more than the average. When I address or mention Objective-C criticism, I do implicitly mean valid criticism; tempered by actual use, forged over time and by actual scars and wounds. Matt goes on to list a few of these and I’ll admit to wanting some, but most are criticisms that are requests for a broader API, or seem like checkbox features — “if you add this, then I’ll use your damn language”. Never build anything out of interest to silence those requests.

Matt homes in on my actual concerns:

More relevant in the long term is the one feature that Objective-C can’t remove or fix: complaints about C itself, in particular C pointers. While code level compatibility with C is arguably Objective-C’s best feature, it is also the source of Objective-C’s most unpopular feature: the direct memory model.

Precisely. Many moons ago, Daniel Jalkut proclaimed that “C is the new Assembly”, and whether or not you agree, there’s a deeper meaning to that. Even if you avoid inline assembler in C, you can effectively poke every bit in your memory space at your leisure. You can’t build a robust abstraction over that as long as anyone can dip down into those bits. You can’t promise anything. Most of the time, this is not a huge issue, and most of the time, there are other abstraction layers (the OS, cosmic radiation) that could fuck something up just as badly. But that doesn’t mean that maintaining this layer is warranted in every case.

This means that whether or not Apple deprecate their direct memory model APIs (they probably won’t), and whether or not they introduce a new language, Apple will introduce an official application environment with an abstracted memory model.

I would hope so.

It is possible to create a memory model abstraction without using a genuine virtual machine [..] but a virtual machine allows you to lock down the abstraction so that there aren’t any accidental loop holes. You no longer need pointers. You can’t simply overwrite memory. You can’t overrun arrays. You can’t overflow buffers.

Ding ding ding. Life’s full of leaky abstractions, but C and C++ GCs have always been leakier. You have the pleasure of saying “now everything is managed automatically, where by everything I mean the things I and library providers opt into”. I know some C++ but I stay out of it because its magic implicitness scares the shit out of me, and I skinned my knees on Perl. I don’t need to juggle my choice of GC, manual memory management and whichever scheme the current class seems to be using, like “smart pointers” for every program I write.

Ahem. So anyway, a VM is good. Capability-based security is a good model — if you can’t reach something that does X and you can’t magically assemble whatever you want, you can’t do X. To the extent that you can go scribbling over memory in a language running on a VM, it’s a limitation of the implementation of the VM.

An important point to note is that the introduction of a virtual machine could preceed a new language.

If Apple decided that fear of manual memory management was keeping good programmers away but wasn’t ready to actually transition to a new language in one leap, it would be possible to transition to the virtual machine first (and gain many of the memory abstraction advantages) while keeping the code-level changes relatively minor.

This is unfortunately where Matt loses me. (Quotes from his post cease here, but his post doesn’t — I again recommend that you read it in its entirety.) It’s not that it’s not a great idea — it is! It would be an immediate payoff and it’d smooth the transition.

Like I’ve said before, I just don’t believe that this leap can be taken with Objective-C, and Matt has already touched upon the reason why. Objective-C is a layer on top of C. It started out as a veneer and now is a slightly bulkier, Mexican Gulf-esque coating of modern object orientation. You know this spiel by now; Objective-C is a superset; C++ is a C variant. Objective-C can’t ruin any C that would have compiled, and so to successfully transition every Objective-C program, you must keep everything from C.

But let’s actually conjure up a new language; maybe what I guess a lot of people think I’m talking about when I say “xlang”. Let’s talk about Objective Dash, which is the proverbial Objective-C without the C. Let’s start off by imagining it at 100% source compatibility with Objective-C and being the host language for the virtual machine. Let’s work through some interesting proposed source code.

#include "foo.h"

Okay, so right off the bat, we hit includes (or imports). Rather, we hit an environment where this matters, where for every compilation unit — which is tech for “source file” — it has to read in the header files in order. Every time a symbol of any kind: a type, a function, a variable is referenced, it must have been predeclared. The compiler starts from scratch for every compilation unit and follows all these imports to be able to satisfy these demands and this leads to a long list of things that are declared.

Try right clicking in Xcode and running Preprocess. This expands and flattens every import and macro; if I create a Foundation command line tool using that Xcode template and do this, I get a 85662 line file which takes seconds to load into Xcode, and don’t even think about typing if you have Code Sense on.

But this is not fair and not even relevant. The compiler does that much more efficiently and will be able to reuse much of that metadata. You could argue that most VMs known to man use languages for which order of declaration doesn’t matter, but it doesn’t block you, necessarily, from doing such a VM. So this was an intentional red herring — not every design decision of C that would have been made differently today is a deep offense, crushing the hope of VM-hood. It goes against the very grain of VM-hood to construct one that works like this (they are most happy when they prepare a set of data that they can check against at will, which is the way many modern language compilers work), but it’s probably not impossible.

What is a problem is pointers, like Matt said. This is a profoundly interesting statement:

"xyzzy";

Which type is this data? In C, thanks to some syntactic sugar that you probably don’t appreciate as such, it’s a null-terminated char array. In early C, you had to declare arrays by a fixed length, so to avoid having to type char[4] abc = "abc";, you could type char *abc = "abc";, and thus is shown the unification of arrays and pointers.

But that’s still C-level stuff. Let’s move upwards to Objective Dash level. Let’s check in on our favorite pair; the opaque value type NSDecimal and the class wrapper and NSNumber descendent NSDecimalNumber. In Objective-C, a variable of each are referred to as NSDecimal and NSDecimalNumber *. In Objective Dash you’d use… what, exactly? NSDecimal and NSDecimalNumber? Mixing value types and class types in the same namespace is not a new or bold or impossible idea (that’s how everything else does it), but it does have more than a small impact on backward-compatibility. Let’s say you introduce a macro for cross-compilation:

#ifdef __OBJC__
#    define CLASSNAME(X)    X *
#elif __OBJDASH__
#    define CLASSNAME(X)    X
#endif

Now you have to go through your entire code base, but you can still keep just one. There’s more, though. What about error parameters? During the past several years, as Apple has continued modernizing Cocoa, they have deprecated initializers and other methods where something could go horribly wrong at runtime due to no fault of the programmer (warranting an exception in some languages but the delivery of an NSError object in Objective-C) that don’t provide a way to get at the error. They return the error by indirection: a brief example is NSDocument’s dataOfType:error:. The error parameter is an input parameter only in a weird sense — in the sense that you input a pointer (NSError **) to where you’d like for the error to end up if it happens.

Now what’ll Objective Dash do? Maybe it’ll invent actual output parameters and the ability to pass in references. Or maybe it’ll provide a special NSReturnedError class, where you’re expected to assign the error to a special property instead. This requires some longer macros to maintain same source compatibility.

The list of pointer fun goes on, though. Dive into NSArray, which has getObjects:range:. This is a method where you work out in advance how many objects you want to deal with and get them into a memory buffer (presumably to avoid the message-sending overhead and memory management in acquiring all of them one by one). The memory buffer is id *, a pointer to the structure of every object — this doesn’t make sense without pointers and without being able to blit memory left and right. Objective Dash would probably rather have none of it. This means that you’ll have to rework everything that uses that to use another approach entirely. Now we’re starting to push serious overhead in maintaining separate methods.

It’s not even like getObjects:range: is alone. See NSData for two other methods related to much the same thing, only for actual byte data. NSString can provide and initialize with pointers to what would have to look very much like string data in some specific encoding. Tons of callbacks also use void * as generic pointers to “whatever data you’d please”. These methods are being replaced en masse with blocks where the context could probably be passed in through value capturing, but not all of them are, and it’s not going to happen overnight.

The NSCoder protocol, implemented for standard serialization since the very beginning, have several methods that work with void *, and you’d have to assume that that’s a wide net if anything. I’ve worked with plenty of platforms with toothless serialization schemes where everyone wrote their own and didn’t bother, but Cocoa’s is decent enough and widely adopted; if you’re in a property list on disk or you’re anything that wants to have an Interface Builder palette, you have to conform to NSCoding and work against methods where you are provided an NSCoder object, and some of that code will use those methods.

And, lest we forget, this is only pointers. What about inline assembly? There’s plenty of code to stay compatible with that uses assembly, some where needed and maybe some where it isn’t, but that’s potential for breakage that requires skipping two steps up the abstraction layer ladder (depending on your opinion of C, of course). There’s a guarantee for bifurcation if I saw one.

I haven’t even touched a lot of the hairy stuff; I’m also assuming that Objective Dash will solve things like its own primitives in the backwards-compatible way, at least in the beginning. My point at the end of all this is that you can’t possibly just shove it in there and it’ll work. It will require non-trivial amounts of rejiggering, to the point where a project that could easily slip into this new shape would likely be a project that has very little to gain from going to a virtual machine; that does none of these dangerous things.

So that’s why, I guess, I’m convinced that xlang is coming. Apple’s running out of incremental steps towards this virtual machine. They can get there with Objective-C intact, but only in the trunk, with another, better suited language in the driver’s seat. I believe I speak for every Objective-C programmer when I say that lobotomizing the language directly until it’s able to drive itself will only make it a lot more variable to wrap your head around.

For this one, for this language, for this virtual machine, I think Apple actually needs to start fresh, or at least unrestricted by C.

Comments [+]

  1. I agree that, if Apple does move to a completely managed environment, they should introduce a new language along with it. I’m still not convinced that’s happening, though–I think iOS has committed Apple to Objective-C for the indefinite future.

    Still not big on the theoretical xlang name, though. ;) If I was king, I would call this fantasy language Cocoa since it would probably resemble the framework without the C-isms.

    By Preston · 2010.07.17 02:02

  2. Preston: Indeed. xlang isn’t really going to be called xlang. It’s just my placeholder for a language, not even a codename for whichever language that may be. It’s the idea of such a language.

    By Jesper · 2010.07.17 08:16

  3. iLang?

    By Chris · 2010.07.17 14:14

  4. Please don’t make me slap you.

    By Jesper · 2010.07.17 14:33

  5. There’s the LLVM bytecode format. I’m rather sure that someone on the LLVM team has been experimenting compiling native C code to LLVM bytecode, and an acid test for that would be running the bytecode on different platforms. (Intel and, say, ARM would be rather sensible choices.)

    I think it’d be entirely possible to evolve Objective-C and slowly phase out C compatibility. The compiler already has enough type information to distinguish between Objective-C objects vs C objects and can start disallowing things such as pointer arithmetic on Objective-C objects. Provide a different language that’s “mostly” compatible with C (Objective-C–) that gets rid of the Cisms in some APIs (bitmasks for flags, structs), encourage developers to switch to it, and start phasing out Objective-C. All hypothetical, but possible.

    By Andre Pang · 2010.07.18 05:12

  6. Sure it’d be possible. I just don’t think it’d be worth it. The bigger the code base is, and the more helped it’d be by a seamless transition, the more likely it is to run into stuff because you remove pointer tricks and inline assembly. It won’t be seamless that way, it’d be quite the opposite.

    By Jesper · 2010.07.18 11:31

  7. As I mentioned in a previous post’s comments, my theoretical language Objective-CX is exactly what you’re pointing towards.

    Basically, take all the Cocoa objects as the primitives (boxed integers/floats/etc as NSNumbers/NSValues) and have all normal suffix/infix/postfix/surroundfix operations become (internally, just like Python) special method calls on those classes.

    Then all your language constants like simple unadorned strings and numbers would start off automatically as NSStrings and NSNumber or NSDecimalNumbers, etc. It’d also perhaps be helpful to have things like NSPoint (3@4) and NSDate, etc. literals, unless that got overly baroque. NSDictionary, NSArray, and NSSet literals would be Javascript-like: {”a”:1, “b”:2}, [1, 2, 3], {1, 2, 3} etc. Even NSRects could be Smalltalk-like: 3@4 extent: 10@10.

    You could even have “atoms” (#name or #method:name:with:params) that are simple SEL literals. (And implemented that way, interned normally.) But those would be more generally useful, just as in Ruby or Lisp.

    Since everything’s an object, you wouldn’t really need type declarations, so it’d be like Javascript, with vars everywhere, or perhaps just “id” (since you’d no longer need the “*” part of object class declarations). Or, you could be like Python where the first assignment effectively creates the variable (but then you’d need some kind of “upframe” reference syntax).

    Of course, then to get efficiency for inner loops, you’d have to do runtime optimizations like all the Javascript engines are doing for dynamic type detection & dispatching, but that’s pretty well understood now.

    C-level pointer manipulations like * and [ ] either go away (in the case of *, or else become the Python “splat” operator) or become higher-level __subscript method invocations (in the case of [ ], which then becomes either a dictionary lookup or an array reference, depending on the class involved).

    Property getter/setter syntax would remain foo.bar and foo.bar = baz, and could stay at the current Obj-C semantics. You could even take a huge Pythonic/Javascriptic leap and have dynamic property creation, making any object dictionary-like (you already have the runtime support for same in the latest 10.6/64-bit/iOS runtimes).

    Higher-level constructs like “for x in” stay higher-level and could in fact stay probably nearly as efficient.

    Real lambdas, going beyond blocks and their quirks, perhaps using the block syntax, would be critical.

    switch statements could finally be Ruby or Javascript-level dispatches with obvious internal optimizations for literal ranges, etc.

    You wouldn’t really need the [ ] messaging syntax, except as a crutch/transitional aid; you could parse normal Smalltalk-flavored messaging without the [ ] hint that you’re leaving C and entering Objective-C. Of course, you’d have to use parens to disambiguate in some cases. E.g., a add: b, or [a add: b] or (a add: b) would all be fine.

    Personally, I’d make Objective-CX a full expression language, where any construct, including control structures, could return a value, but that’s just my Lisp bias. It would probably make parsing too hard, particularly since { } [ ] and friends are doing such multiple duties (control structures vs. dictionary/array/set literals).

    You could have Python generators (which are generally understood to be more general than iterators).

    Etc., etc.

    You’d definitely have a much different language, but one which was quite familiar to Objective-C programmers, just moved up one major level of abstraction.

    By Chris Ryland · 2010.07.19 17:25

  8. Chris: I find that the message sending brackets help more than they annoy with regards to disambiguating where the message goes. Of course, this could be helped tremendously with some advanced syntax highlighting in Xcode to highlight the different messages, but it only knows to do that so far. (Won’t work well for proxies.) Then again, some of the reasons why you nest a lot of them have to do with lack of other constructs. Literals and operators could do away with much of that unless you’re really into confusion.

    My second language was Perl, I love Ruby and I can stand JavaScript, so I’m aboard with expression languages. I am certainly onboard with as many literals as you can fit in.

    The interesting thing about generators and iterators is that you can actually create them using Objective-C now. You just have to turn them “inside-out” by passing a block and you’re able to do pretty well lazy evaluation of things that used to go build an array and return that.

    The latest additions to “dynamic property creation” in Objective-C is like, but not the same as, fully dynamic properties. It just means you can leave out @synthesize and the ivar declaration and the compiler will hook you up. In a dynamic language, with a dynamic property bag approach, you could redefine properties to be read/write and stick a value of any type in any property; the runtime doesn’t particularly allow the latter today.

    All that said, I believe Objective-CX is a lot closer to xlang than to Objective Dash. You don’t wanna be Objective Dash because it wouldn’t provide any advantages, and it was the objective (heh) of this post to show that. Better start fresh and carry over what you do want.

    By Jesper · 2010.07.20 06:55

  9. What I meant about dynamic property creation is that there are now Obj-C implementation routines for adding new properties on the fly, and the runtimes will adapt for those.

    By Chris Ryland · 2010.07.23 02:04

  10. Wasn’t that already available? Maybe not, at second thought. They added that for MacRuby (or xlang); properties are declared in Objective-C and you can’t possibly call something with property syntax with the intention of calling a property (you can also use it for illicit messages to 0-and-1 parameter methods) without seeing it defined somewhere.

    By Jesper · 2010.07.23 10:13

Leave a comment

Your e-mail address is never shown. If you type a line break in the comment, it will show up as a line break (naturally). The following HTML is allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)


Please note: Your comment will not show up at once. Unless you're spamming or being abusive, you have nothing to worry about. (Read the full policy.)