BREAKING NEWS
latest

728x90

468x60

Showing posts with label parallel computing. Show all posts
Showing posts with label parallel computing. Show all posts

Wednesday, August 17, 2011

Parallelism On The Cloud And Polygot Programmers


I am very passionate about the idea of giving developers the control over parallelism without them having to deal with the underlying execution semantics of their code.

The programming languages and the constructs, today, are designed to provide abstraction, but they are not designed to estimate the computational complexity and dependencies. The frameworks such as MapReduce is designed not to have any dependencies between the computing units, but that's not true for the majority of the code. It is also not trivial to rewrite existing code to leverage parallelism. As, with the cloud, when the parallel computing continues to be a norm rather than an exception, the current programs are not going to run any faster. In fact, they will be relatively slower compared to other programs that would leverage parallel computation. Robert Harper, a Professor of Computer Science at Carnegie Mellon University recently wrote an excellent blog post - parallelism is not concurrency. I would encourage you to spend a few minutes to read that. I have quoted a couple of excerpts from that post.

"what is needed is a language-based model of computation in which we assign costs to the steps of the program we actually write, not the one it (allegedly) compiles into. Moreover, in the parallel setting we wish to think in terms of dependencies among computations, rather than the exact order in which they are to be executed. This allows us to factor out the properties of the target platform, such as the number, p, of processing units available, and instead write the program in an intrinsically parallel manner, and let the compiler and run-time system (that is, the semantics of the language) sort out how to schedule it onto a parallel fabric."

The post argues that language-based optimization is far better than machine-based optimization. There's an argument that the machine knows better than a developer what runs faster and what the code depends upon. This is why, for relational databases, the SQL optimizers have moved from rule-based to cost-based. The developers used to write rules inside a SQL statement to instruct the optimizer, but now the developers focus on writing a good SQL query and an optimizer picks a plan to execute the query based on the cost of various alternatives. This machine-based optimization argument quickly falls apart when you want to introduce language-based parallelism that can be specified by a developer in a scale-out situations where it's not a good idea to depend on a machine-based optimization. The cloud is designed based on this very principle. It doesn't optimize things for you, but it has native support for you to introduce deterministic parallelism through functional programming.

"Just as abstract languages allow us to think in terms of data structures such as trees or lists as forms of value and not bother about how to “schedule” the data structure into a sequence of words in memory, so a parallel language should allow us to think in terms of the dependencies among the phases of a large computation and not bother about how to schedule the workload onto processors. Storage management is to abstract values as scheduling is to deterministic parallelism."

As far as the cloud computing goes, we're barely scratching the surface of what's possible. It's absolutely absurd to assume that the polygot programmers will stick to one programming model and learn to spot difference between parallelism and concurrency. The language constructs, annotations, and runtime need to evolve to help the programmers automate most of these tasks to write cloud-native code. These will also be the core tenants of any new programming languages and frameworks. There's also a significant opportunity to move the existing legacy code in the cloud if people can figure out a way to break it down for computational purposes without changing it i.e. using annotations, aspects etc. The next step would be to simplify the design-deploy-maintain life cycle on the cloud. If you're reading this, it's a multi-billion dollars opportunity. Imagine, if you could turn your implementation-specific concurrent access to resources into abstract deterministic parallelism, you can indeed leverage the scale-out properties of cloud fairly easily since that's the guiding principle behind the cloud.

There are other examples you would see where people are moving away from an implementation-centric approach to an abstraction that is closer to the developers and end users. The most important shift that I have seen is from files to documents. People want to work on documents; files are just an instantiation of how things are done. Google Docs and iPad are great examples that are document-centric and not file-centric.

Photo courtesy: bass_nroll on Flickr

Friday, September 12, 2008

Google Chrome Design Principles

Many of you would have read the Google Chrome comic-strip and also would have test driven the browser. I have been following few blog posts that have been discussing the technical and business impact but let's take a moment and look at some of the fundamental architectural design principles behind this browser and its impact on the ecosystem of web developers.
  • Embrace uncertainty and chaos: Google does not expect people to play nice. There are billions of pages with unique code and rendering all of them perfectly is not what Google is after. Instead Chrome puts people in charge of shutting down pages (applications) that do not behave. Empowering people to pick what they want and allow them to filter out the bad experience is a great design approach.
  • Support the journey from pages to applications to the cloud: Google embraced the fact that the web is transitioning from pages to applications. Google took an application-centric approach to design the core architecture of Chrome and turned it into a gateway to the cloud and yet maintained the tab metaphor to help users transition through this journey.
  • Scale through parallelism: Chrome's architecture makes each application a separate process. This architecture would allow Chrome to better tap into the multi-core architecture if it gets enough help from an underlying operating system. Not choosing a multi-threaded architecture reinforces the fact that parallelism on the multi-core is the only way to scale. I see an opportunity in designing a multi-core adaptation layer for Chrome to improve process-context switching since it still relies on a scheduler to get access to a CPU core.
  • Don't change developers' behavior: JavaScript still dominates the web design. Instead of asking developers to code differently Google actually accelerated Javascript via their V8 virtual machine. One of the major adoption challenges of parallel computing is to compose applications to utilize the multi-core architecture. This composition requires developers to acquire and apply new skill set to write code differently.
  • Practice traditional wisdom: Java introduced a really good garbage collector that was part of the core language from day one and did not require developers to explicitly manage memory. Java also had a sandbox model for the Applets (client-side runtime) that made Applets secured. Google recognized this traditional wisdom and applied the same concepts to Javascript to make Chrome secured and memory-efficient.
  • Growing up as an organization: The Chrome team collaborated with Android to pick up webkit and did not build one on their own (actually this is not a common thing at Google). They used their existing search infrastructure to find the most relevant pages and tested Chrome against them. This makes it a good 80-20 browser (80% of the people always visit the same 20% of the pages). This approach demonstrates a high degree of cross-pollination. Google is growing up as an organization!