Thought Transforms

Friday, October 31, 2008

Persistent View Data in MVC

So, here's a question that's been puzzling me for a long time. I don't have an answer, I'm just throwing this out to see what folks think.

Imagine that you have an application with a Model-View-Controller architecture. You have multiple views on the model, and at least one of the views has some state that ought to get saved with the document. For example, imagine an application for developing an interactive presentation (yes, I did actually write this one once) - your model contains presentation screens, and links between them. Some screens are menus, which might branch out to multiple other screens depending on the user's selection. One of your views is a graphical layout, representing the screens as nodes and the links as arcs between them.

Obviously, the screens and the links between them belong in the model, but what about the way that they're laid out in the view? That's not technically part of the data - for example, it has no effect on the final presentation. It's relevant only to the single view that uses it - if you had two graphical layout views, you could reasonably expect that they might have different layouts on the same model. What if you had three or four views, each with its own persistent state of this type? For example, imagine that one of your other views is a list of slides, and the list ordering is customizable.

As I see it, there are two choices:
1. Store the layout in the model. This has the advantage of simplicity - the view has no state, there's no synchronization to take care of, everything gets serialized together. Unfortunately, it means that any time you add another view, you have to add a whole bunch of view-specific crap to your model, even though you technically haven't changed the underlying data. It also means that all views have access to all other views' data, which could cause problems and violates the Principle of Least Knowledge.
2. Store the layout in the view. This has the advantage of keeping the data where it's needed, and hiding it from everyone who doesn't need it. However, it seriously complicates serialization of the data to a file (because now all the views have to get involved), and it means that the view's data needs to be kept in sync with the model's data (if someone deletes a slide from one view, all the other views need to discard their related data).

My personal experience is that option 1 is less damaging in the long run than option 2, even though option 2 is a technically more elegant design. There are hacks you can put into the model to make this less bad - for example, you can hand each view a unique cookie that it can use to store arbitrary data in a bucket that the model treats as a black box, but that it serializes out with its data. That way, the views don't have access to each others' data, but it's still a kludge.

Any thoughts? Am I missing some elegant solution for this one? This comes up surprisingly often, though usually not as dramatically as in my example.

Monday, October 27, 2008

Technologies: Multithreading

The multicore "revolution" is upon us, and we're all going to have to adapt or die - at least those of us who have ambitions beyond the 8051! Unfortunately, writing threads is a pain in the ass. It's platform-specific, error-prone, and the most common interface (pthreads) is a steaming pile of...well, you get the point. Technology to the rescue!

The goal of multithreading technology, as usual for software, is to raise the level of abstraction. Instead of starting, joining, and stopping threads, locking mutexes, and signaling semaphores, you'd like to be able to express yourself at a higher level, like "run this operation in parallel". You don't want to have to worry about figuring out how many free processors the machine has, load-balancing your threads across those processors, synchronizing the processing results of your threads, etc etc ad nauseum.

The most common technology right now for high-level threading is OpenMP. This is an industry standard for extensions to the C and C++ languages, implemented by compilers. It's a fairly successful standard, in that most major compilers these days actually support it. It provides a small function library and a set of pragmas that allow you to specify the intended parallel behavior of your code, and then the compiler generates the threading behind the scenes.

An up-and-coming contender is Intel's Threading Building Blocks library. This is a pure library, with no compiler dependencies. By means of generic (template-based) classes in the style of the standard C++ library, it provides powerful parallel-processing primitives (sorry, couldn't resist the alliteration), and handles all the threading itself.

The common factor between these two technologies is that both of them relieve you of ever having to deal with threads directly, and both of them gracefully degrade to the single-threaded case, which is extremely useful for debugging ("let me turn off the threading and see if the bug still happens").

Conveniently enough, I recently had to rewrite some OpenMP code in TBB, which gave me a great chance to compare their strengths and weaknesses. Bottom line, OpenMP is easier and results in clearer code, as long as you're dealing with a fairly simple problem. TBB is much more powerful, and scales much better when your problem is not so simple. For example, here's the same code written both ways:

float a[100], b[100], c[100];
// ... initialize a and b

OpenMP

#pragma omp parallel for
for (int cnt = 0; cnt < 100; ++cnt)
c = a + b;

TBB

using namespace tbb;

classAdder {
const int myA, myB, myC;
public:
Adder(const int *a, const int *b, int *c) : myA(a), myB(b), myC(c) {}
void operator(const blocked_range& r) const {
for (int cnt = r.begin(); cnt != r.end(); ++cnt)
myC[cnt] = myA[cnt] + myB[cnt];
}
};

task_scheduler_init init;
parallel_for(blocked_range(0, 100), Adder(a, b, c), auto_partitioner());

A bit more complicated in the TBB case, no? However, all that complexity means that if I suddenly decide to change the for loop, say, to a do-while loop, it's pretty easy with TBB. With OpenMP, not so much - in fact, some very simple things (such as breaking out of a loop being executed in parallel) are virtually impossible. That's the price you pay for a simple syntax.

Overall, if all you want is to decompose a processing step using data parallelism, OpenMP is the better bet. However, as your problem becomes more complicated, you may find that you need to switch over to TBB for its greater expressiveness and power.

Wednesday, October 22, 2008

Any ideas on a better name for the Blog

OK, So yeah, some of you have commented that "Computer Science" is not the ideal name for the blog.

Any thoughts on what the title should be ?

Programming Languages: Late Binding

So over the past few months I have started to do some iPhone development. As I started reading up on how to write software for the iPhone I quickly figured out that I would need to ramp up on Objective C, which up until now I always thought of as another Object Oriented version of C (C with Objects) like C++. What I didn't realize was how far, under the hood it is from C++ and C. As I started playing around with it, I noticed that Methods were actually messages that were bound at run-time alla Smalltalk. This rekindled an old research interested I had in programming languages that dates back to Graduate school. I was always intrigued by languages with late binding or that were loosely typed. We studied LISP/Smalltalk to name a few. Of course, like many CS majors, when I entered the real world those languages were left behind for C/C++.
LISP folks will tell you that there is no language as powerful and expressive (I know very little about LISP, other than how to solve Towers of Hanoi. Paul Graham is a huge advocate for LISP, and developed ViaWeb back in the 90s using LISP (The entire ViaWeb product is written in LISP). He credits their utilization of LISP as a major reason they were able to rapidly add new features to their site. In his book Hackers and Painters he tells stories of their competition spending weeks/months working on some new enhancement to their web site then, then formally announcing the new feature.In most cases, within hours the VieWeb team would have that same feature available on their site. Again, Graham credits their utilization of LISP and the ability to make updates/enhancements on the live system.

Anyone have any thoughts/real world experience on these kinds of Languages ?

Tuesday, October 21, 2008

Operating Systems: UNIX: Realtime Signals pointer payload option

So I have been working with real-time signals on Linux. I wanted a way to have two processes signal each other and be able to pass some data in addition to the actual signal. The way this works is to use the int sigqueue(pid_t pid, int signo, const union sigval value); system call to send a signal. The usual parameters are in play here and obvious (pid,signo). What I found interesting is how the value parameter is specified.

The union sigval has the following members:

    int sival_int;  // a simple integer value
   void *sival_ptr;  // a pointer

The sival_in met my needs, but I then got curious as to how one would/could use the pointer value instead. According to the documentation, sival_ptr enables the sending process to send a pointer to the receiving process. This confused me at first glance because both processes would have their own address space, so an address in one is not the same in another. One possiblitly could be if the two processes had a shared memory region between them. But even then, I might just pass an index to that region (via the sival_int) union member and have the receiving process then go lookup the correct address of the shared memory address.

Anyone have any experiance with sival_ptr ? Any idea what the original intent of this field was ?

Wednesday, August 22, 2007

Programming Languges - LISP

Save Now

Welcome !

Welcome to crt0!

This blog is specifically for discussion of Computer Science related topics. Its purpose is to provide a forum for interesting conversations/ideas in the field of Computer Science.

First some Ground Rules:

No ranting is allowed on this blog. This blogs purpose is specifically to provide a forum for discussion of interesting CS related topics. The idea for this came during a conversation between an old friend of mine (CS guy too) and I as we reminisced about being younger and how we used to spend hours learning/sharing/discussing interested CS topics, many of which led to fun projects.
No CS topic is off limits its, but do try to keep topics organized. No need for a discussion on LISP and its cool features/capabilities mixed in with a discussion on the latest scheduling algorithm that someone has come up with for their way-cool OS (Unless of course the OS is written in LISP)

2. No CS topic is off limits. The more topics, the better.