<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Programming Practices</title>
	<atom:link href="http://bolour.com/blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://bolour.com/blog</link>
	<description></description>
	<lastBuildDate>Mon, 14 Sep 2009 18:17:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Update Succession in Replicated Key-Value Stores</title>
		<link>http://bolour.com/blog/2009/09/update-succession-in-replicated-key-value-stores/</link>
		<comments>http://bolour.com/blog/2009/09/update-succession-in-replicated-key-value-stores/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 07:16:19 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/?p=114</guid>
		<description><![CDATA[In an earlier blog we looked at the use of vector clocks for keeping track of temporal relations between events in an asynchronous event system. To recap:
In an asynchronous event system each event is marked by a node-clock pair; intra-node temporal relations between events are based on clock values at a given node; and inter-node [...]]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/">earlier blog</a> we looked at the use of vector clocks for keeping track of temporal relations between events in an <a href="http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/#asynchronous-distributed-event-system">asynchronous event system</a>. To recap:</p>
<blockquote><p>In an <strong><em>asynchronous event system</em></strong> each event is marked by a node-clock pair; intra-node temporal relations between events are based on clock values at a given node; and inter-node temporal relations between events rely on message transmittals being earlier than corresponding message receipts.</p></blockquote>
<p>And we saw <a href="http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/#vector-clock-dominance">earlier</a> that in such a system, an event E2 is a <em>temporal successor</em> of another event E1 if and only if the vector clock of E2 dominates the vector clock of E1. [The vector clock of an event is a map from the nodes of a system to the latest clock values of those nodes <strong><em>known to have occurred</em></strong> before (or at the same time as) the event.]</p>
<p>In this blog we&#8217;ll see how to extend the use of vector clocks to keep track of update succession in update-anywhere replicated key-value stores. This type of store is exemplified by Amazon&#8217;s <a href="http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf">Dynamo</a>, and by the open source system <a href="http://project-voldemort.com/design.php">Voldemort</a>.  In the literature I have seen so far on these systems, it is assumed without elaboration that vector clocks can be used to represent update succession. But as we&#8217;ll see shortly, this assumption is not immediately evident. And to prove it requires some constraints on asynchronous updates.</p>
<p>My aim in this blog is to outline the difference between <em>temporal succession</em> and <em>update succession</em>, and to show what this difference means for the use of vector clocks in update-anywhere data stores.</p>
<h3>General Asynchronous Events</h3>
<p>In order to demonstrate the use of vector clocks for update succession, I need to digress first to generalize the model of message passing between asynchronous events. The first generalization is to allow events to be both transmitters and receivers of multiple messages. The second generalization is to allow loopback messages from a node to itself.</p>
<p>Figure 1 depicts this more general model.</p>
<blockquote><p><img class="aligncenter size-full wp-image-119" title="vector-clock-general" src="http://bolour.com/blog/wp-content/uploads/2009/09/vector-clock-general.gif" alt="vector-clock-general" width="521" height="232" /></p>
<p>Figure 1. General Asynchronous Event System</p></blockquote>
<p>[In Figure 1, the notation <em>E(node, clock)</em> designates an event that occurred at the given node and the given clock value at that node.]</p>
<p>It is easy to extend <a href="http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/#vector-clock-dominance">earlier arguments</a> about the equivalence of temporal succession and vector clock dominance to this more general model. The main difference between the two models is that in <a href="http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/#vector-clock-propagation">propagating vector clocks</a> we may now have to include the vector clocks from multiple message sources in our <em>maximal merge</em> computation.</p>
<h3>Asynchronous Updates and Distributed Versions of Data Items</h3>
<p>A replicated data store includes a set of key-value pairs, called <em>data items</em>, each of which is replicated to a number of nodes. For high write availability, an update to the value of a data item is allowed to be written at any available node.</p>
<p>Then independent and asynchronous updates of the value associated with a given key may have to be written to different nodes, so that multiple versions of a data item may coexist in the store as a whole. Each such differing version of a data item may, in its own right, carry useful information. Therefore, in general, these versions are not allowed to blindly overwrite each other. For maximum flexibility, coexisting versions of a data item are resolved (merged) by application code specific to the use of each instance of a data store.</p>
<p>This scenario leads to a model of the evolution of a data item in which:</p>
<ul>
<li>A read by the application may cause a number of different versions of the data item for a given key to be read from the data store.</li>
<li>The application creates a single updated version of the data item based on all the versions read.</li>
<li>The new version is then written and it obsoletes and replaces all the versions read in this particular update operation.</li>
</ul>
<p>The updated version is then an <em>update successor</em> of each version read. And the versions read for the update are <em>update precursors</em> of the updated version. Of course, the update successor and the update precursor relations are transitive. And we define them to be reflexive as well.</p>
<p>We know that an updated version of a data item should obsolete and replace all of its [proper] precursors. But when the new version of the data item is first written, some of its precursors may not be present at the node of this initial write. And even for those precursors that are present at this initial write node, there are replicate copies at other nodes that also need to be purged. While this new version will be replicated to all replicate nodes, replications may have to take place asynchronously to the original write of this new version. Therefore, the new version of the data item needs to carry with it information about its precursors, so that they can be purged once its replicate copies reach other nodes.</p>
<h3>Writes of Data Items as Asynchronous Events</h3>
<p>Conceptually, we may consider an entire update operation &#8211; including all its reads, their resolution, and the subsequent write of a new version &#8211; as an event in a general asynchronous event system. And for the purpose of tracking update succession, we may consider this event as occurring at the node in which the new update version is first written and at the clock value of the write at that node. Looked at in this manner, the corresponding reads can be thought of as messages sent from earlier write events (earlier versions of the data item)  to the new update event (the new version of the data item).</p>
<p>The upshot is that if we identify versions of a data item with update (or initial write) events, we have here a system of events that is similar to our earlier general asynchronous event system.</p>
<p>Figure 2 depicts the update succession of versions in this scenario.</p>
<blockquote><p><img class="aligncenter size-full wp-image-122" title="vector-clock-update-succession" src="http://bolour.com/blog/wp-content/uploads/2009/09/vector-clock-update-succession.gif" alt="vector-clock-update-succession" width="521" height="232" /></p>
<p>Figure 2. Update Succession with Asynchronous Versions</p></blockquote>
<p>In Figure 2, a version of a data item initially written to node <em>n </em>at clock value <em>c </em>of that node is depicted as <em>V(n, c)</em>. Note in particular, in Figure 2, that a version may be the immediate update precursor to two asynchronous versions &#8211; as in V(1, 1) being asynchronously updated to V(2, 3) and to V(3, 2) &#8211; and that a version may be the immediate update successor of two asynchronous versions &#8211; as in V(1, 7) succeeding both V(2, 3) and V(3, 4).</p>
<h3>Update Succession versus Temporal Succession</h3>
<p>The similarity of our update/version event system and our earlier asynchronous event system depicted in Figure 1, leads us to associate vector clocks with write events (and corresponding versions of a data item) and to try to use them to determine the <em>update succession</em> of versions, and thereby to cause the obsolescence and purge of updated versions.</p>
<p>But before we can make the leap between the two event systems, there is another crucial property of asynchronous event systems that we have yet to establish for write events in a replicated data store: the linear temporal succession of events within each node according to their clock values.</p>
<p>Is there, in fact, a linear order of <em><strong>update succession</strong></em> for a data item within each node according to clock value in an update-anywhere replicated data store?  Well, not by default. Following is a trivial counter-example.</p>
<p>Consider two different clients reading the same version of a data item, and proceeding to update it independently at the same node. If the system blindly writes both update versions to the data store, then one can occur at a clock time later than the other. But the second update version is not an <em><strong>update</strong></em> successor of the first: <em><strong>it was not created by reading the first and updating it, and it does not obsolete the first</strong></em>. This is a crucial difference between the event system of update-anywhere replicated data stores, and the general asynchronous event system we saw earlier.</p>
<h3>The Case for Read Validation</h3>
<p>The only way I know to remove this difference is to assume that in an update, reads are validated within the update transaction at its primary write node for those versions of the data item that were <em>created</em> at that node. If further versions of the data item &#8211; later versions than those that were read by the update operation &#8211; were created at the node where the update is first written, the update would be rejected and possibly retried.</p>
<p>Read validation specialized to the the primary node of an update in this manner would imply that the versions of a data item created at a given node are totally ordered in time via the local clock value at the node, and that this total ordering entails update succession: each version of a data item created at a node is an update successor of the version immediately before it. I&#8217;ll call this condition <strong><em>totally ordered local update succession</em></strong>.</p>
<blockquote><p><em><strong>Totally ordered local update succession</strong></em>: Within the sequence of versions of a data item created at a given node, ordered by their clock values at that node, each version is the result of an update whose read set included the immediately preceding version.</p></blockquote>
<p>At this point, we have the sought-after similarity in the structure of the <em>predecessor </em>relation and its relation to clocks for general asynchronous events, and the structure of the <em>precursor </em>relation and its relation to clocks for write events of a given data item (and for corresponding versions) in an update-anywhere replicated data store. But as have seen, <em>temporal succession</em> defined through the predecessor relation for asynchronous events is equivalent to vector clock dominance. Therefore, <em>update succession</em> defined through the precursor relation for write events of a given data item (and for corresponding versions) must be equivalent to vector clock dominance as well.</p>
<h3>Propagating Vector Clocks to New Versions</h3>
<p>To maintain vector clocks for versions of data items, we need to perform the maximal merge of the immediate precursors of a version plus the node-clock of the version itself. The precursors are the versions read by the update operation. So reads need to piggy-back vector clocks with each version of a data item read. And, of course, writes need to store the new (maximally merged) vector clock with each update version of a data item. All versions of the same data item residing at the node of the write and dominated by the new vector clock then become obsolete and may be purged.</p>
<h3>But What about Replicate Writes?</h3>
<p>Replicate writes were excluded from our event system of writes/versions  because replicate writes do not in fact create new versions of data items, nor new vector clocks. A replicate write simply copies a version and its vector clock intact from one node to another. Whether an update operation reads a version of a data item from its initial write node or from a replicate node is immaterial to the relationship between that version and the update version.</p>
<p>Of course, upon reaching a replicate node, a replica&#8217;s vector clock obsoletes any versions of the data item whose vector clocks it dominates, and allows them to be purged from that replicate node.</p>
<h3>Acknowledgments</h3>
<p>Thanks to the members of the Silicon Valley Patterns Group and in particular to Wayne Vucenic and Chris Tucker for useful discussions on distributed key-value stores. A special thanks to Jay Kreps, the creator of Voldemort, for participating in our group discussions.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2009/09/update-succession-in-replicated-key-value-stores/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Vector Clocks for Representing Temporal Relations between Distributed Events</title>
		<link>http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/</link>
		<comments>http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 22:23:19 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[distributed systems]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/?p=59</guid>
		<description><![CDATA[In this blog I&#8217;ll review the use of vector clocks for comparing the times of occurrence of dispersed events in a distributed system. Vector clocks are well-known in the literature, and there are plenty of resources on the net about them. See, for example, the Wikipedia entry about them for original references and credits.
Vector clocks [...]]]></description>
			<content:encoded><![CDATA[<p>In this blog I&#8217;ll review the use of vector clocks for comparing the times of occurrence of dispersed events in a distributed system. Vector clocks are well-known in the literature, and there are plenty of resources on the net about them. See, for example, the <a href="http://en.wikipedia.org/wiki/Vector_clock">Wikipedia entry about them</a> for original references and credits.</p>
<p>Vector clocks are useful in a scenario where we are not able or willing to make assumptions about clock drift rates at independent nodes of a distributed system, or about message transmission rates between nodes. In such cases, our primary source of information about temporal relations between events across nodes is the fact that for each message sent from node i to node j, the transmittal of the message from node i happens before the receipt of the message at node j. For example, suppose that a message is transmitted from node 1 when node 1&#8217;s clock value is 1, and is received at node 2 when node 2&#8217;s clock value is 3. If the clock value and the node of origin are piggy-backed on the message, then node 2 will know that its clock value of 3 is after node 1&#8217;s clock value of 1.</p>
<p>Based solely on the fact that message traversal entails some (unknown) delay, and that clock values at each node are increasing over time, we would like to be able to determine, for any two events in the system, whether one occurred before the other, or whether nothing can be asserted about their temporal relation.</p>
<p>My primary motivation for going over vector clocks here is to set the stage for my next blog about the use of vector clocks in update-anywhere replicated data stores.</p>
<p>But before we get there, here is a concrete example of the direct application of vector clocks. Consider a battlefield situation where a number of independent moving agents are observing a number of moving objects and communicating the positions of these objects to each other. The observers may get in and out of communication range of each other, and their communications equipment may get jammed or damaged and later be unjammed and repaired. Vector clocks make it possible to determine when an observation in such a scenario is obsoleted by another, so that decisions may be made based on the most recent observations for each object of interest.</p>
<h3>Asynchronous Event System</h3>
<p>The kind of event system described above is known as an <em>asynchronous event system</em>.</p>
<p><a name="asynchronous-distributed-event-system"> </a></p>
<blockquote><p><strong><em>Asynchronous event system.</em></strong> A system of events in which each event is marked by a node+clock pair, where intra-node temporal relations between events are based on clock values at a given node, and where inter-node temporal relations between events rely on message transmittals being earlier than corresponding message receipts.</p></blockquote>
<p>Figure 1 depicts the elements of an asynchronous event system.</p>
<blockquote><p>
<img src="http://bolour.com/blog/wp-content/uploads/2009/09/vector-clock-basic-event-system.gif" alt="vector-clock-basic-event-system" title="vector-clock-basic-event-system" width="521" height="232" class="aligncenter size-full wp-image-78" /></p>
<p>Figure 1. Nodes, Clocks, and Messages in an Asynchronous Event System
</p></blockquote>
<p>The notation <em>E(node, clock)</em> is used to represent an event that occurred at a given node and a given clock value at that node. We&#8217;ll call the pairing of a node and a clock, a<em><strong> node-clock</strong></em> (e.g., (3, 6) &#8211; the clock value of 6 at node 3). Nodes and clock values are represented as non-negative integers. We&#8217;ll typedef these integers to <em>Node</em>, and<em> Clock</em>, respectively, and represent their pairing as a class named<em> NodeClock</em>.</p>
<h3>The Predecessor Relation between Events</h3>
<p>In this scenario, the events can be thought of as forming the vertices of a graph with two types of edges: a <em><strong>last</strong></em> edge from an event to the last event that immediately preceded it at the same node; and a<em><strong> source</strong></em> edge from a message receipt event to the corresponding message transmittal event.  Let&#8217;s call this graph the <em><strong>predecessor graph</strong></em> of the set of events, and let&#8217;s call the direct relation represented by this graph, that is, the union of the <em>last</em> and the <em>source</em> relations, the <em><strong>immediate-predecessor</strong></em> relation.</p>
<p>Figure 2 depicts the predecessor graph of the distributed events in Figure 1.</p>
<blockquote><p>
<img src="http://bolour.com/blog/wp-content/uploads/2009/09/vector-clock-predecessor-graph.gif" alt="vector-clock-predecessor-graph" title="vector-clock-predecessor-graph" width="518" height="235" class="aligncenter size-full wp-image-95" /></p>
<p>Figure 2. Predecessors of Events
</p></blockquote>
<p>Suppose we have two references E1 and E2 to events in such a system. Then E2 succeeds E1 in time, <em><strong>as far as we know</strong></em>, if and only if there is a path from E2 to E1 in the predecessor graph of events. Of course, if E1 and E2 refer to the same event, then their times are also comparable.  (For simplicity, I am making the (inessential) assumption that at each node, different events occur at different clock values.) Thus, comparability in time leads directly to the reflexive-transitive closure of the immediate-predecessor relation, which I will call simply the <em>predecessor</em> relation.</p>
<blockquote><p><em><strong>predecessor: reflexive-transitive closure of (source + last)</strong></em></p></blockquote>
<h3>Representing the Predecessor Relation</h3>
<p>Vector clocks are a natural data structure motivated by the need for an efficient representation of the predecessor relation.</p>
<p>The direct (and inefficient) representation of the predecessor relation would include, for each event, a list of all node-clocks of events that precede it (directly or transitively) in the predecessor graph.</p>
<pre>	class Event {
		Node origin;
		Set&lt;NodeClock&gt; predecessors;
	}</pre>
<p>In this representation, the predecessor set for event E(3, 5) of Figure 2 is:</p>
<blockquote><p>
(3, 5),<br />
(3, 4),<br />
(2, 4),<br />
(2, 3),<br />
(1, 1)
</p></blockquote>
<p>But we can keep the size of this representation bounded by removing, for each predecessor node, all but the latest node-clock for that node. In our example, the predecessors of event E(3, 5) at node 2 include both E(2, 4), and E(2, 3). But we would know by comparing clock values at node 2, that event E(2, 3) is a predecessor of event E(2, 4). So there is no need keep (2, 3) explicitly in the predecessor set. Since E(2, 4) is the <em>latest</em> predecessor of E(3, 5) at node 2, we know that any other event originating from node 2 whose clock value is less than 4 is also a predecessor of E(3, 5).</p>
<p>So we end up with a representation in which the set of predecessors can be replaced by a map of latest predecessors:</p>
<pre>	class Event {
		Node origin;
		Map&lt;Node, Clock&gt; latestPredecessors;
	}</pre>
<p>The map of latest predecessors is called a <em><strong>vector clock</strong></em>.</p>
<p>In what follows, I&#8217;ll use the terms <em>vector clock</em> and <em>latest predecessors</em> interchangeably.  The former is standard terminology. The latter is more intention-revealing in this writeup.</p>
<blockquote><p><em><strong>vector clock of event: map of latest predecessors of an event for each node</strong></em></p></blockquote>
<p>The vector clock of event E(3, 5) in our running example is then:</p>
<blockquote><p>
(1, 1),<br />
(2, 4),<br />
(3, 5)
</p></blockquote>
<p>Clock values for particular nodes in a vector clock may be represented by array reference notation. For example, E.vectorClock[1] (or E.latestPredecessors[1]) refers to the clock value of event E&#8217;s latest predecessor originating at node 1.</p>
<p><a name="vector-clock-direct-test"> </a></p>
<h4>Direct Test of Temporal Succession</h4>
<p>To summarize, including vector clocks in the representation of events provides for an easy test of temporal succession:</p>
<p>For any two events E1(n1, c1) and E(n2, c2):</p>
<blockquote><p>E1 is a predecessor of E2 (E2 succeeds E1), if and only if,</p>
<p>c1 &lt;= E2.latestPredecessors[n1] (c1 &lt;= E2.vectorClock[n1]) &#8230; <strong> (1)</strong></p></blockquote>
<p>Of course, for the latter comparison to be true, E2 must have a latest predecessor for node 1. To finesse this case, we may use a clock value of -1 for each node that has no predecessor in a vector clock.</p>
<p><a name="vector-clock-dominance"> </a></p>
<h3>Vector Clock Dominance Test of Temporal Succession</h3>
<p>For our direct test (1) to be useful, we need to keep track of both the vector clocks of events, and the nodes of origination of events.</p>
<p>But what if we don&#8217;t keep track of the nodes of origination?</p>
<p>In that case, we can use an equivalent test based solely on vector clocks: <em><strong>the vector clock dominance test</strong></em>. A vector clock vc1 is dominated by another, vc2, if for every node, the clock value of vc1 is no greater than the corresponding clock value of vc2.</p>
<p>It is then easy to demonstrate that:</p>
<blockquote><p><strong><em>vector clock dominance is equivalent to temporal succession</em></strong></p></blockquote>
<p>Or, in symbols:</p>
<blockquote><p>E1(n1, c1) &lt;= E2(n2, c2), if and only if,</p>
<p>E1.latestPredecessors &lt;= E2.latestPredecessors</p></blockquote>
<p>where the relational symbol &lt;= is overloaded to depict both the<em> is-dominated-by</em> relation between vector clocks, and the temporal succession relation between events.</p>
<p>The equivalence of vector clock dominance and temporal succession is a fairly simple consequence of the <a href="#vector-clock-direct-test">direct test for temporal succession</a> established above. Here are the details for completeness.</p>
<p>Suppose E1(n1, c1) &lt;= E2(n2, c2). Clearly, since E2 succeeds E1, E2&#8217;s predecessors must include all of E1&#8217;s predecessors, and so E2&#8217;s latest predecessor at each node can&#8217;t have occurred any earlier than E1&#8217;s latest predecessor. In other words, temporal succession implies vector clock dominance.</p>
<p>Suppose, on the other hand, that E2&#8217;s vector clock dominates E1&#8217;s: E1(c1, n1).latestPredecessors &lt;= E2(n2, c2).latestPredecessors. Then by the definition of vector clock dominance:</p>
<blockquote><p>E1(n1, c1).latestPredecessor[n1] &lt;= E2(n2, c2).latestPredecessors[n1] &#8230; <strong>(2)</strong></p></blockquote>
<p>(this relation holds for every node, and in particular for node n1). But since the predecessor relation was defined to be reflexive, the latest predecessor of E1(n1, c1) at node n1 (its node of origination), is E1(n1, c1) itself. So (2) simplifies to:</p>
<blockquote><p>c1 &lt;= E2.latestPredecessors[n1]</p></blockquote>
<p>which by (1) above implies E1 &lt;= E2.</p>
<p><a name="vector-clock-propagation"> </a></p>
<h3>Propagating Vector Clocks</h3>
<p>It is easy to see that when a new event occurs, its vector clock has to become the <em>maximal merge</em> of the vector-clocks of its immediate predecessors plus its own node-clock. The maximal merge of a set S of vector clocks is a vector clock in which the clock value for each node, n, is the maximum clock value for n within the vector clocks in S.  This type of computation can easily be performed if we piggy-back the vector clock of a message transmittal event onto the message.</p>
<p>This completes our overview of the use of vector clocks in asynchronous event systems.</p>
<h3>What&#8217;s Next</h3>
<p>As mentioned earlier, this blog forms the backdrop for my next blog on the use of vector clocks in update-anywhere replicated data stores. We&#8217;ll see there that just as in our battlefield example, vector clocks help remove obsoleted observations from consideration for each agent, in a replicated data store, vector clocks help remove obsoleted versions of a data item from the data store for each replica of the store.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2009/09/vector-clocks-for-representing-temporal-relations-between-distributed-events/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Google App Engine Data Model</title>
		<link>http://bolour.com/blog/2009/06/the-google-app-engine-data-model/</link>
		<comments>http://bolour.com/blog/2009/06/the-google-app-engine-data-model/#comments</comments>
		<pubDate>Mon, 01 Jun 2009 22:40:53 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[database]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/?p=29</guid>
		<description><![CDATA[Decades ago in my college database class we learned about relational databases, network databases, and hierarchical databases. Back then, relational was cool. And hierarchical was definitely passé. Today, hierarchical is making a comeback with the Google App Engine (GAE). Here is a brief overview.
In GAE, persistent data for a given application consists of a set [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;">Decades ago in my college database class we learned about relational databases, network databases, and hierarchical databases. Back then, relational was cool. And hierarchical was definitely passé. Today, hierarchical is making a comeback with the Google App Engine (GAE). Here is a brief overview.</p>
<p>In GAE, persistent data for a given application consists of a set of <strong><em>entities</em></strong>. An entity has a <strong><em>kind</em></strong>: a string that designates a set of similar entities. However, entities of a given kind need not be homogeneous.</p>
<p>The properties of an entity are represented by a name-value map. The names are strings. The values are basic types including dates and blobs. Properties may also be multi-valued. The fact that homogeneity within kinds is not a requirement means that different entities of a given kind may have different sets of property names, and that they may have differently-typed properties for identically named properties.</p>
<p>Entities form distinct strict hierarchies within a data store. An entity either has a unique parent, or no parent (a root entity). A root entity and all of its descendants form a cluster of entities known as an <em><strong>entity group</strong></em>. The parent relationship between entities and the corresponding entity groups arising from this relationship play a pivotal role in the GAE data store.</p>
<p>One way in which the hierarchical nature of the model manifests itself is in the construction of entity keys. Within a given kind, a root entity is identified either by a unique name, a string, or by a unique ID, a long integer. So a root entity is globally uniquely identified by the combination of kind and name/ID. Let&#8217;s call this combination of kind and name/ID a <em>simple key</em> [my terminology]. Non-root entities are identified within their parents by unique simple keys. So in general, a key in the data model can be thought of as a path composed of simple keys. A key of length 1 uniquely identifies a root entity. A key of length 2 uniquely identifies an entity at level 2 of the hierarchy. And so on.</p>
<p>Another way in which the hierarchical nature of the model manifests itself is in the togetherness semantics of entity groups, which, as you will recall, consist of a root entity and all of its descendants. The underlying storage structure used to store entities is Google&#8217;s BigTable. BigTables are stored in a distributed fashion by assigning ranges of records (based on key) to different BigTable servers. The ranges are called <strong><em>tablets</em></strong>. What&#8217;s special about entity groups with respect to this type of sharding is that members of an entity group are not divided to different servers: they stay close together at all times, and are managed by a single server at any given time. A transaction involving members of the same entity group can therefore be managed by a single server and implemented simply and efficiently.</p>
<p>Currently GAE transactions are limited to single entity groups. GAE could, but currently does not, support transparent transaction management across multiple entity groups, by implementing distributed transactions under the covers.</p>
<p>Why can&#8217;t we have a dummy root entity and have the entire data store hang off of that in a single entity group? Because GAE&#8217;s design limits throughput by entity group. Specifically:</p>
<ul>
<li>GAE uses optimistic concurrency control for transactions, and its concurrency control algorithm operates at the root level of an entity group. The optimistic concurrency control timestamp of the root reflects the last update time of any entity in the group. So large entity groups increase the likelihood of timestamp conflicts and resulting rollbacks.</li>
<li>Currently the data store cannot support more than about 10 writes per entity group per second all told. So large entity groups reduce parallelism between transactions since a transaction affecting any entity in an entity group requires a write to the group&#8217;s root entity.</li>
</ul>
<p>One other major restriction on transactions is that queries are not supported within transactions. You can get an entity by its key within a transaction. But you cannot run a search based on property values within a transaction.</p>
<p>Relationships other than the special parent relationships may be represented explicitly in an entity as a property whose value is the key of the related entity. [Parent relationships, of course, are managed implicitly by GAE.] But GAE does not provide special support for relationships other than the parent relationship. No special support for traversal or joins, for example.</p>
<p>By default all indexable properties are automatically indexed (text and blob values are not indexed). It is also possible to explicitly create composite indexes on more than one property. Surprisingly for a search engine company, full text indexing of text fields is not supported at this time.</p>
<p>This completes my brief overview of the data model.</p>
<p>Clearly, there are a number of choices in the basic design of the GAE data model that are different from those of relational databases. Some of these, like native support for hierarchies and multi-valued properties can help you model your application more easily. Clearly too, other choices, such as the transaction restrictions, and the non-existence of native support for joins, can make your applications more complex, or less able to satisfy the real requirements of your users.</p>
<p>Now let&#8217;s shift attention to GAE&#8217;s high level APIs. GAE provides JDO and JPA interfaces to its persistent data. The main abstraction embodied by these high-level interfaces is the homogeneity of kinds. Since in JDO and JPA entities are persistent versions of Java objects, an entity kind in these interfaces represents a Java class that requires persistence. All entities of a given kind then necessarily have the same set of properties and corresponding property types.</p>
<p>Unfortunately, at the moment, the high-level interfaces do not hide the fact that transactions are limited to individual entity groups. A transaction that spans a second entity group triggers an exception, independently of which interface is used.</p>
<p>Nor do the high-level interfaces transparently provide joins or certain other familiar SQL functions at this time. In other words, the JDO and JPA query language variants supported by GAE are somewhat limited. A join query appearing in a JDO query, for example, will trigger an exception.</p>
<p>The designers of the data store API have so far avoided exposing functionality whose performance may be iffy, or which may complicate the management of the data store. So for now, we have to roll our own frameworks, or look to third party offerings to fill in the gaps: gaps between our expectations of functionality from databases, conditioned as they are by relational databases, and what the GAE can reasonably deliver.</p>
<p>As an example, the impression one gets is that the App Engine folks are not at this time eager to embrace transparent distributed transactions across entity groups in the base GAE product. Clearly life can get more complicated with say two-phase commit and the possibility of a distributed transaction coordinator crashing after the prepare phase of a transaction. Other developers, however, are working on distributed transactions (see, for example, <a title="this talk" href="http://code.google.com/events/io/sessions/DesignDistributedTransactionLayerAppEngine.html" target="_self">this talk</a> at Google IO 2009).</p>
<p>The extent to which such third party additions find success, and are worked into the standard GAE development ecosystem is something to keep an eye on over next months and years.</p>
<p><strong>References</strong></p>
<p><a title="http://labs.google.com/papers/bigtable-osdi06.pdf" href="http://labs.google.com/papers/bigtable-osdi06.pdf" target="_self">http://labs.google.com/papers/bigtable-osdi06.pdf</a><br />
<a title="http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine" href="http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine" target="_self">http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine</a><br />
<a title="http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore" href="http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore" target="_self">http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore</a><br />
<a title="http://www.stanford.edu/class/ee380/Abstracts/081105-slides.pdf" href="http://www.stanford.edu/class/ee380/Abstracts/081105-slides.pdf" target="_self">http://www.stanford.edu/class/ee380/Abstracts/081105-slides.pdf</a><br />
<a title="http://www-users.itlabs.umn.edu/classes/Fall-2008/csci8101/bigtable.pdf" href="http://www-users.itlabs.umn.edu/classes/Fall-2008/csci8101/bigtable.pdf" target="_self">http://www-users.itlabs.umn.edu/classes/Fall-2008/csci8101/bigtable.pdf</a></p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2009/06/the-google-app-engine-data-model/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Iterative Time-Boxing</title>
		<link>http://bolour.com/blog/2008/08/iterative-time-boxing/</link>
		<comments>http://bolour.com/blog/2008/08/iterative-time-boxing/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 16:41:56 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2008/08/20/iterative-time-boxing/</guid>
		<description><![CDATA[Iterative time-boxing, or ITB for short, is the strategy of strictly time-boxing successive stages of work on a software requirement. In this note, I&#8217;ll contrast ITB as a planning technique with conventional estimation planning for iterative and incremental development (IID), and consider its effective use.
Agile IID typically works like this: Reduce a requirement to a [...]]]></description>
			<content:encoded><![CDATA[<p>Iterative time-boxing, or <em>ITB</em> for short, is the strategy of strictly time-boxing successive stages of work on a software requirement. In this note, I&#8217;ll contrast ITB as a planning technique with conventional estimation planning for iterative and incremental development (IID), and consider its effective use.</p>
<p>Agile IID typically works like this: Reduce a requirement to a point where it is simple and well-defined, and where a simple design exists for it; estimate and fully implement that design; and then iterate, ratcheting up the requirement and/or the design a notch at each stage. I&#8217;ll call this idea of estimating and releasing a completed piece of evolving functionality at each IID iteration <em>incremental completeness</em>.</p>
<p>To illustrate, let&#8217;s take an extreme example of a requirement for which incremental completeness works well: a simple user registration/login system. We get a sketch of the registration/login screen from the customer and a short description of what is required. We build a simple system that implements basic registration and login functions, and that keeps track of the logged in user. When this implementation is subjected to feedback, a reviewer points out that passwords are not encrypted in the database, and that the system accepts weak passwords. So we enhance password security in the next iteration. Next someone points out that a user may be registered against his will. So we enhance the registration system by an explicit opt-in email. And so on.</p>
<p>The take-away from this example is that at each stage of the game, the next evolutionary step can be defined unambiguously, and we know fairly well what it means to <em>complete</em> that next step.</p>
<p>But it is not necessarily the case that the next evolutionary step in our work on a requirement be so well-defined. There are times when the goals of our next evolutionary step are fuzzy and uncertain, and when it is difficult to define with sufficient clarity what it means to <em>complete </em>the next evolutionary step.</p>
<p>This type of uncertainty crops up quite often in architectural requirements, for example, the requirements for a system&#8217;s user interface, such as performance, look and feel, and accessibility. A common scenario is that frameworks exist, be it for UI, persistence, work flow, and so on, but that none of the existing frameworks covers our requirements fully. We are then left with having to investigate, to understand in what ways we can augment each candidate framework with custom code, and in what ways we may have to compromise our requirements.</p>
<p>The goals for the next iteration of such work are often fuzzy. For example, in our UI investigation, the goal may be to come up with a <em>reasonable compromise</em> on accessibility among the third party web component frameworks available to us. The <em>completion</em> of such a task is hard to pinpoint. We can do our best in an iteration to understand the capabilities of various candidate frameworks and compare them with our requirements. But a <em>reasonable compromise</em> on accessibility in a UI framework does not admit of a binary acceptance test.</p>
<p>This type of work is investigative and experimental in nature, and it is uncertain in its goals and results. For such work, it is more important to focus on gaining as much knowledge and on reducing as much uncertainty as possible in each iteration, rather than on strictly completing something in each iteration.</p>
<p>And because completion may not be well-defined up-front for the iterations of such work, estimation as such is not as relevant to them. In fact, forcing estimation on such tasks can be counter-productive. The work is uncertain, so our estimates are likely to be way off. If they are too high, work can expand to fill its estimated time, and if they are too low, we will have set ourselves up for failure.</p>
<p>What is more important is to break up the work into reasonable time intervals, and to allow project stake holders the opportunity to weigh in on the direction of such work as it unfolds in its successive iterations.</p>
<p>The result is ITB. ITB specializes IID is three main ways. First, in ITB, time-boxing replaces estimation. Second, in ITB, backtracking and experimentation is accepted as the norm. Third, in ITB the intermediate results of an unfolding task at each stage need not be completed code, nor even executable.</p>
<p>I have found that good results from ITB depend on a few common-sense practices:</p>
<ul>
<li><strong>Limited domain of investigation.</strong> Try to set some parameters on what will be done in each time-boxed subtask. We want to have flexibility, but not spend time on tangential investigations, or on requirements that have little chance of being realized in the near future.</li>
<li><strong>Small total number of hours in each time box.</strong> A small limit on total hours provides a natural break for stake holders other than the responsible developers to provide feedback and to affect timely course corrections.</li>
<li><strong>Flexibility in the elapsed time of each time box.</strong> To resolve uncertainty often requires gestation time: time to reflect, time to sleep on an idea, time to query forums and wait for responses, time to set up and wait for meetings with experts. Thus, work on uncertain tasks can be quite <em>sparse</em>, and our elapsed time limits for time-boxed tasks need to allow for such sparseness.</li>
<li><strong>Production of tangible artifacts in each iteration.</strong> A tangible artifact may be working code, a UML design, a benchmark, or an English writeup of the pros and cons of several alternative designs. Tangible artifacts provide focus and act as a springboard for further discussion and feedback in succeeding iterations.</li>
</ul>
<p>In summary, incremental completeness and iterative time-boxing are complementary techniques for managing the division of work in iterative and incremental development. In each case, work on a requirement unfolds incrementally and dynamically and is informed by work in previous iterations.</p>
<p>Incremental completeness focuses on estimating and completing a well-defined piece of evolutionary functionality for a requirement at each stage. At each stage, the next increment of work is considered well-defined enough to be subject to estimation. But, the estimate does not generally limit the actual time spent to complete the estimated task. So <em>incremental completeness stipulates a flexible time frame but a well-defined work product at each stage</em>.</p>
<p>ITB, on the other hand, focuses on making as much progress as possible towards an uncertain goal within a limited time frame. So <em>ITB stipulates a well-defined time frame but a flexible work product at each stage</em>.</p>
<p>Many requirements can benefit from a planning trajectory which starts with ITB and ends with incremental completeness. In general, planning is more effective if a mix of these approaches is allowed depending on context and the type of uncertainty at each stage.</p>
<p>© Copyright 2008 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2008/08/iterative-time-boxing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Monads through Pictures</title>
		<link>http://bolour.com/blog/2007/02/monads-through-pictures/</link>
		<comments>http://bolour.com/blog/2007/02/monads-through-pictures/#comments</comments>
		<pubDate>Sun, 04 Feb 2007 23:33:38 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[languages]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2007/02/04/monads-through-pictures/</guid>
		<description><![CDATA[Here is an excerpt from the introduction to my recent write-up on monads.
Monads in functional programming provide a framework for aspect-aware computations to be composed to build higher-level aspect-aware computations. An aspect-aware computation is some basic computation that has been enhanced, augmented, or, more generally, just transformed, to become aware of some generic concern.
&#8230;
I have [...]]]></description>
			<content:encoded><![CDATA[<p>Here is an excerpt from the introduction to my recent write-up on monads.</p>
<p><em>Monads in functional programming provide a framework for aspect-aware computations to be composed to build higher-level aspect-aware computations. An aspect-aware computation is some basic computation that has been enhanced, augmented, or, more generally, just transformed, to become aware of some generic concern.</em></p>
<p><em>&#8230;</em><br />
<em>I have found that the introduction a few key pieces of terminology, and the pictorial representation of some of the basic concepts of programming with monads, have enhanced my grasp of the idea of a monad, and my conversations about it with others. In this article, I&#8217;ll share some of that terminology and pictures by using them to retrace the development of the idea of a monad.</em></p>
<p><a href="http://www.bolour.com/papers/monads-through-pictures.html">Read more &#8230;</a></p>
<p>Discuss this article here.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2007/02/monads-through-pictures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Some Consequences of Pull</title>
		<link>http://bolour.com/blog/2006/10/some-consequences-of-pull/</link>
		<comments>http://bolour.com/blog/2006/10/some-consequences-of-pull/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 21:37:14 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2006/10/24/some-consequences-of-pull/</guid>
		<description><![CDATA[There&#8217;s been a lot of interest recently in the application of lean principles to software development.  See for example the 2003 book Lean Software 	Development: An Agile Toolkit for Software Development Managers by Mary and Tom Poppendieck. A basic principle of lean thinking is pull. In pull, the materials and services required by a [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s been a lot of interest recently in the application of lean principles to software development.  See for example the 2003 book <em>Lean Software 	Development: An Agile Toolkit for Software Development Managers</em> by Mary and Tom Poppendieck. A basic principle of lean thinking is <em>pull</em>. In pull, the materials and services required by a given task are requested, or <em>pulled</em>, by workers responsible for that task just in time to make them available when they can be used. Pull devolves power to workers. Workers decide based on the conditions on the ground what they need and when they need it. The role of a <em>plan</em> is de-emphasized in favor of dynamic self-organization.</p>
<p>Pull may be contrasted with <em>command-and-control</em>. A good generic account of this contrast appears in <a href="http://weblog.halmacomber.com/fayol_to_flores.pdf"> Leadership in 	Project Management: Time for a Shift From Fayol to Flores</a>.  Here is a brief summary of this contrast based in part on that account.</p>
<p>The traditional style of project organization as a command and control regime is rooted in the work of Henri Fayol a century ago. This type of management may be summarized as: make a grand plan; set it in motion; apply thermostatic controls to it to keep it on track; and motivate workers by reward and punishment.</p>
<p>In contrast, an empowering pull style of project organization can be summarized as: have people figure out what they need from others, and make requests for needed materials and services (pull); allow people to volunteer to fulfill those requests; obtain commitments for the delivery of the requests based on mutually negotiated conditions of satisfaction; and foster an environment in which commitments are taken seriously.</p>
<p>Of course, one commitment may require other pieces of work for its satisfaction, leading to additional requests and commitments to fulfill those requests, and so on. The result is a <em><strong>network of commitments</strong></em>, a phrase coined in the eighties by Fernando Flores, a philosopher-turned-management-guru.</p>
<p>The conditions of satisfaction for each commitment, including its time of delivery, are <em><strong>negotiated</strong></em> between a requester, generically called a <em><strong>customer</strong></em>, and a <em><strong>provider</strong></em>. Recognizing that negotiation is a normal part of the customer-provider relationship empowers providers by acknowledging their autonomy.</p>
<p>In an empowering regime, people are motivated by having the freedom to produce things to their standards of quality and utility, by the honor of keeping commitments, and by the need to maintain the respect and trust of their peers. Trust means that people can rely on promised commitments for the fulfillment of their requests, so that they in turn can reliably make dependent commitments to other requests. To develop trust, team members make and keep significant commitments to each other, and they welcome feedback.</p>
<p><strong>Consequences </strong></p>
<p>Organizing our work around autonomous pulled commitments means:</p>
<ol>
<li><strong><em>Incremental commitment</em></strong>.  As we commit to and start developing functions and features in an iteration, we need the ability to pull commitments of assistance from other team members to perform specialized sub-tasks, to pair up with area specialists, or simply to obtain more bandwidth for the timely delivery of our commitments.  So it must be the case that others leave time on their schedules to provide these types of assistance in a timely manner. This is only possible if we commit our time incrementally and in small chunks, and leave sufficient slack in our schedules. So the scheduling of commitments is an incremental, and highly dynamic affair, and the <em>plan</em>, as such, unfolds on a daily basis.</li>
<li><strong><em>Local autonomy in distributing work</em></strong>.  The overall decomposition of the work of an iteration into primary commitments is a group responsibility. But it is left to the provider of a commitment to divide its work as he sees fit, and to get others to work with him by pairing, sub-contracting, or pulling specific expertise. The best way to collaborate depends on the nature of the work and the sensibilities of the the collaborating individuals. Management does not drive these lower-level work distribution issues.</li>
<li><strong><em>Provider estimates supersede group estimates</em></strong>.  The estimates by which the team tracks the progress of commitments are those furnished after due reflection and negotiation by individual providers.  Team members need reasonable time to reflect on what is being committed to, and to line up collaborators and include their inputs into estimates. Providers estimates are furnished dynamically during the course of the iteration as commitments are made. Group estimates arrived at in the planning game are considered as rough measures of required work. Nevertheless, group estimates are of great value. The exercise of group estimation allows all team members to contribute their experience to the planning process. And group estimates provide guidelines for comparison with the ultimate provider-supplied estimates. A large discrepancy between the two has to be understood as part of the negotiation for each commitment.</li>
</ol>
<p>Unfortunately, software is notoriously hard to estimate.  So in an atmosphere of high estimation uncertainty, what does it mean to meet commitments? Basically it means making fuzzy estimates with relatively narrow ranges, and consistently falling within the estimated range in a statistically balanced way.</p>
<p>It is unnecessary and counter-productive to measure estimation accuracy and judge people by that measurement. But estimation accuracy is one area for self-evaluation. And in a tightly-knit team, people will come to know the reliability of different people&#8217;s estimates soon enough, and to balance that with other variables that go into the trust equation, such as propensity to collaborate, amicability, and so on.</p>
<p>Upper management tracks the progress of the team as a whole by using <em>burn charts</em>: comparing group estimates for fully-completed commitments at a point in time to actual effort up to that point. Lower management tracks each individual commitment by comparing its actual effort with its provider-supplied estimate, raising a flag when actual effort approaches the pessimistic end of the provider-supplied estimate.</p>
<p>© Copyright 2006 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2006/10/some-consequences-of-pull/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Task Dependencies in Plans</title>
		<link>http://bolour.com/blog/2006/10/task-dependencies-in-plans/</link>
		<comments>http://bolour.com/blog/2006/10/task-dependencies-in-plans/#comments</comments>
		<pubDate>Tue, 24 Oct 2006 07:32:11 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2006/10/24/task-dependencies-in-plans/</guid>
		<description><![CDATA[The GANTT chart, as embodied in Microsoft Project, has become the de facto method of representing software development plans in many organizations. This is unfortunate. For, the precise temporal positioning of all tasks within a timeline is seldom necessary for managing the progress of a software project.
GANTT charts reinforce two false assumptions about development plans: [...]]]></description>
			<content:encoded><![CDATA[<p>The GANTT chart, as embodied in <em>Microsoft Project</em>, has become the de facto method of representing software development plans in many organizations. This is unfortunate. For, the precise temporal positioning of all tasks within a timeline is seldom necessary for managing the progress of a software project.</p>
<p>GANTT charts reinforce two false assumptions about development plans: that each task must be completed within an estimated duration and effort, and that each task must be completed by a particular time. The pre-assignment of developers to tasks in specific time periods breaks down because there is significant estimation uncertainty in software tasks.</p>
<p>Agile methodologies recognize these shortcomings, and use simpler means of planning that de-emphasize precise temporal positioning of tasks and people in development plans. Agile developers can generally work in many different areas of the code base, partly because they can easily pair with area specialists. Since most developers can then work on most of the code base, the necessity to pre-assign developers to tasks in time is considerably diminished.</p>
<p>Of course, the breakdown of a release, or an iteration, into distinct tasks typically results in temporal dependencies between those tasks. But then a PERT chart representing these dependencies that also includes just effort/duration estimates for each task is a more appropriate means of depicting the task breakdown.</p>
<p>Still, agile methodologies can generally get away with simpler plans than even such PERT charts. So what gives? <em><strong>Is it important to represent task 		dependencies in the development plan?</strong></em> I believe it is. At the same time, there are simple ways to avoid the explicit representation of many dependencies in the development plan.</p>
<p>As development teams, we need to make and keep delivery commitments to the rest of the organization. A crude projection of the delivery of a release is obtained by estimating effort for individual tasks, summing up the estimates, and dividing the total by the available manpower. A better projection is obtained by taking task dependencies into account and by exposing critical paths.</p>
<h4>Finessing Dependencies by Phasing</h4>
<p>A simple strategy for ensuring the correct temporal ordering of dependent tasks is the use of temporal phases. For example, the <em>Rational Unified 	Process</em> defines four phases of the development process: <em>inception</em>, <em>elaboration</em>, <em>construction</em>, and <em>transition</em>. So the dependency of construction work on inception tasks, like baselining the release&#8217;s vision and architecture, is automatically satisfied.</p>
<p>Phasing has gotten a bad wrap because of the well-known shortcomings of the particular phases of the waterfall model. But it is not necessary to use waterfall phases in our plans.</p>
<p>An <a href="http://www.bolour.com/blog/index.php?p=20">earlier blog</a> in this series talked about a pipelined phasing model of iterative development, in which the (short) pipeline stages of <em>R&#038;D</em>, <em>construction</em>, and <em>deployment</em> occur in sequence for each iteration of an iterative and incremental development project. In that model, the dependency between the program design+construction of a feature, and the R&#038;D work needed to conceive that feature, is implicitly satisfied by the temporal ordering of the stages in which these activities are scheduled. Phasing provides a simple and intuitive method of covering dependencies without directly representing them.</p>
<p>But clearly phasing cannot cover all dependencies.</p>
<h4>Service Dependencies</h4>
<p>The most prominent example of dependency in software is the dependency of one piece of functionality on another. I will call this type of dependency a <em>service dependency</em>. In a service dependency, a <em>client task</em> is dependent on a <em>service task</em>.</p>
<p>There are times when the developer(s) providing the dependent function will also develop some of its dependencies as a side effect. In that case, from the point of view of the development plan, there is no need to track the dependencies folded into the dependent task. But folding the development of a dependency into the development of a dependent is not necessarily the right approach. Sometimes the dependency is a whole separate abstraction, and it makes sense to use a separate task for its development.</p>
<p>For example, a web site registration system generally depends on an email service to send opt-in confirmation messages to registering users. If we actually have to build the email service ourselves on top of a standard email package, say because our email service has special requirements, then the email service is considered to be a separate component, and the registration component becomes dependent on the email service.</p>
<p>The nice thing about a service dependency is that the client can be isolated from the service by using an abstract interface. Then the service may be mocked to enable the client development task to proceed in parallel with the service development task. Because the implementation of a service dependency can often be mocked, once a service interface is drafted, work on the client and the service may proceed in parallel. Later, when the production implementation of the service becomes available, the client and the service are integrated.</p>
<p>When all service functions needed by the client can easily be mocked, the client can be built and tested separately in its entirety, and integration with the production implementation of the service can be expected to be a simple matter. One might then forgo the explicit representation of the integration task.</p>
<p>But significant integration work can result from the infeasibility of mocking every last nuance of the service interface that is used by the client. At integration time, client tests that exercise the hard-to-mock functions of the interface have to be written, and these tests have to be made to pass with the production implementation of the service. In this case, the integration task needs to be tracked separately from the client, since work on it is not necessarily contiguous with work on the mock-based client.</p>
<p>Here is a sketch of the resulting tasks and their dependencies embodied in an <em>activity on node</em> PERT chart with effort/duration estimates (assuming, for simplicity, that effort and duration are the same).</p>
<div style="text-align: center"><img border="0" alt="Figure-1" src="http://bolour.com/blog/images/integration-dependency.gif" /></div>
<div align="center"><strong>parallelized client and server tasks and their dependencies</strong></div>
<p>Clearly the benefits of parallelism come at the price of a more complicated task structure. But the benefits are often worth the price.</p>
<p>Of course, if we can anticipate service dependencies during the R&#038;D stage of an iteration, then we just draft up a service interface right there and then, and the dependencies of the client and service on that interface will be implicitly satisfied.</p>
<h4>Summary</h4>
<p>There are clearly a variety of ways of avoiding the explicit representation of dependencies in the development plan: phasing, folding dependencies into dependent tasks, ignoring trivial integration dependencies, or not including dependent tasks in the plan until their dependencies are satisfied. But every so often, a dependency structure will crop up that cannot be finessed, as illustrated by non-trivial service integration dependencies. These dependencies may be represented in a PERT chart for critical path analysis, and to aid in anticipating the likely division of labor between developers.</p>
<p>© Copyright 2006 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2006/10/task-dependencies-in-plans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Code Stewardship</title>
		<link>http://bolour.com/blog/2006/10/code-stewardship/</link>
		<comments>http://bolour.com/blog/2006/10/code-stewardship/#comments</comments>
		<pubDate>Mon, 23 Oct 2006 19:52:46 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2006/10/23/code-stewardship/</guid>
		<description><![CDATA[The benefits of collective code ownership are well-known.  Everyone in the team needs to be able to contribute to every part of the code base. But collective code ownership has an Achilles heel. The quality of the code can degrade as team members try to evolve unfamiliar parts of the code base under schedule [...]]]></description>
			<content:encoded><![CDATA[<p>The benefits of <em>collective code ownership</em> are well-known.  Everyone in the team needs to be able to contribute to every part of the code base. But collective code ownership has an Achilles heel. The quality of the code can degrade as team members try to evolve unfamiliar parts of the code base under schedule pressure. As innovators and craftsmen, programmers take pride in their work. And they need the satisfaction of seeing their carefully crafted designs evolved by the rest of the team without losing their conceptual integrity.</p>
<p>There is, of course, an ideal of a gelled team in which communication is so rich that all team members will do the right thing instinctively as they evolve different parts of the code. But it is a fact of life that differences in sensibility and skill exist, and persist, in many teams. And not every team member can be expected to be familiar enough with every part of the code base to preserve its integrity as it evolves.</p>
<p>The upshot is that mechanisms are needed to allow team members with different sensibilities and different levels of skill to work with confidence on pieces of code that they are not necessarily intimate with, while preserving the integrity of those pieces of code, as conceptualized by the team members who created them, and who have invested energy and craftsmanship to perfect them to their current state.</p>
<p>One mechanism to prevent code degradation is what I call <strong><em>code 		stewardship</em></strong>: having each module be looked after by a single steward, even as all team members are allowed to contribute to that module. The steward is the custodian of the integrity of a module. He maintains intimate knowledge of the module. But he does not necessarily work at all times on just the modules he stewards. The same developer is at the same time steward to some modules, and developer of features touching modules stewarded by others.</p>
<p>Stewardship is a temporary privilege delegated by the team to a volunteer for each module, generally someone who has worked in a sufficiently focused manner on a particular module. The steward serves the team, which retains community ownership of the code base. Stewardship is rotated every few quarters to spread knowledge, and to bring fresh ideas to bear on each module.</p>
<p>It is important to have just one person steward each module. Accountability for the integrity of that module then falls squarely on that one person.</p>
<p>How do we practice stewardship with minimum ceremony? The key is <em><strong>informal 		advice and consent</strong></em> throughout the development process. Developers <em>pull</em> advice and consent from stewards on a regular basis in the process of developing features impacting different modules.</p>
<p>Code stewardship requires that before checking a change to a module into the production branch, the code be subjected to the advice and consent of that module&#8217;s steward. It would be normal and expected for such advice and consent to result in significant refactoring, unless, of course, the steward was kept in the loop as the change was considered and developed.</p>
<p>Stewards need flexibility in their schedules so that they can quickly respond to requests for advice and consent. And developers need to anticipate the need for advice and consent and pro-actively request pair time with stewards.</p>
<p>To enhance the process of communication between the feature developers and the stewards, it is useful to give stewards visibility into actual code changes in their modules, as these changes are being developed, and before the final checkin to the production branch. The changes may be shared by using provisional code branches, with automatic notification of changes to corresponding stewards. Provisional code sharing allows the steward to reflect on pending changes and to research issues and alternatives in his own time. This type of off-line communication augments, but in no way replaces, face-to-face advice and consent sessions.</p>
<p>Finally a trivial one-line change should never require ceremony in a small team. Just check it in to the production branch, and have the steward be automatically notified.</p>
<p>In summary: It is the prerogative of each developer to be able to work in each module, subject to code stewardship. Therefore, developers are expected to pull advice and consent from stewards. It is the prerogative of the stewards (and ultimately the architect) to protect the integrity of the code base. Therefore the stewards and the architect get veto power over production code. Veto power is used judiciously and sparingly.</p>
<p>© Copyright 2006 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2006/10/code-stewardship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Iteration Pipelining</title>
		<link>http://bolour.com/blog/2006/10/iteration-pipelining/</link>
		<comments>http://bolour.com/blog/2006/10/iteration-pipelining/#comments</comments>
		<pubDate>Mon, 23 Oct 2006 03:19:43 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2006/10/23/iteration-pipelining/</guid>
		<description><![CDATA[Iterative and incremental development (IID) is the established practice for dividing software projects in time. IID is often interpreted  to mean a succession of complete development cycles, meaning that each iteration includes within its temporal boundaries the analysis, design, implementation, and deployment of the features produced in that iteration.
But practically speaking, some analysis, and [...]]]></description>
			<content:encoded><![CDATA[<p>Iterative and incremental development (IID) is the established practice for dividing software projects in time. IID is often interpreted  to mean a succession of <em>complete development cycles</em>, meaning that each iteration includes within its temporal boundaries the analysis, design, implementation, and deployment of the features produced in that iteration.</p>
<p>But practically speaking, some analysis, and some interaction design  generally precedes the bulk of the program design and construction work in each iteration. Customers don&#8217;t normally come to the planning game of an iteration with a blank slate. They do at least some homework on functions they need built, and on how users might interact with the system to access those functions. In fact, by the time we get to the planning game, there may well be a draft PRD outlining, albeit briefly, the required functions and user interactions for that iteration. I call the analysis of required functions and the design of the associated user interactions, <em><strong>functional 		design</strong></em>.</p>
<p>If we accept that it makes sense for functional design work to be front-loaded for each iteration, it is then a short step from there to a two-stage pipeline that includes a <em><strong>functional design stage</strong></em>, and a <em><strong>development 		stage</strong></em>. Because different sets of people generally do functional design and development, and because it makes sense for much of functional design to precede development for each iteration, the activities of functional design and development may, to a large extent, be parallelized in a two-stage pipeline.</p>
<p>In a recent project we put in place this kind of a pipeline, and we found it worked quite well for us in streamlining our work. During the development of the n&#8217;th iteration, the PRD for the n+1&#8217;st iteration would be researched and produced.</p>
<p>The practice of pipelining functional design and development is not uncommon in IID projects. But there is concern that pipelining can lead to disruptive cross-iteration interruptions. For example, while the functional designers are busy designing the functions of the next iteration, the developers need bandwidth from them for the clarification of the functions of this iteration. In our experience, interruptions in this direction, while frequent, were welcomed.</p>
<p>Interruptions in the other direction turn out to be more serious. Let&#8217;s see why, and consider how to reduce their impact.</p>
<p>As our functional designers come up with the functions of the next iteration, it is necessary at the same time to consider the technical feasibility of these functions. It would be futile to pursue the functional design of some cool feature, if its implementation is not feasible. That means that a level of engineering R&#038;D work has to happen in conjunction with functional design.  Who does engineering R&#038;D? Developers, of course. That is, the same people who do development. But if the same people do engineering R&#038;D and development, it becomes harder to parallelize these activities, and pipelining them can result in a loss of focus and productivity as developers multi-task between them for two successive iterations.</p>
<p>In order to allow for pipelined parallelism between engineering R&#038;D and development with minimal disruption, we must free ourselves of the ingrained notion that all developers build production code all the time. As argued in an <a href="http://www.bolour.com/blog/index.php?p=19">earlier blog on development 	roles</a>, one way to do this is to use two distinct development roles: <em><strong>designer/builder</strong></em> and <em><strong>R&#038;D engineer</strong></em>. Designer/builders concentrate on designing and developing the production system. R&#038;D engineers concentrate on feasibility issues and exploring alternative solutions to hard technical problems. It is, of course, not necessary to typecast team members into these roles. People can move between them. What is important is that being responsible and accountable for different kinds of things, people in these roles are each allowed to focus on their areas of responsibility, and to work at a pace appropriate to their work product.</p>
<p>Now that the first stage of the pipeline includes both functional design and engineering R&#038;D work, I prefer the name <em><strong>R&#038;D stage</strong></em> for this first stage. And since <em>development</em> includes engineering R&#038;D as well as program design and construction, the name <em>development stage</em> is no longer appropriate for the second stage of the pipeline. I will call the second stage the <em><strong>construction stage</strong></em>.</p>
<p>The R&#038;D stage of the pipeline is the time to figure out the functions to be built, and to come up with feasible approaches to their implementation, away from the day-to-day pressures of building production code. This type of work is exploratory in nature, and it has a very different pace and expectation of success than the more or less clockwork pace of construction activity in most shops. The construction stage of the pipeline includes normal program design and construction.</p>
<p>Of course, not all risk can be eliminated before construction begins.  But to the extent that conceptual and technical issues can be anticipated and explored before the construction stage of an iteration, interruptions, task switches, and tensions due to surprise under deadline pressure may be mitigated in the normal program design and construction work for the features of the iteration.</p>
<p>To recap: An initial R&#038;D pipeline stage for each iteration enhances workflow by allowing work on functional design and normal development to proceed in parallel, and by reducing the risk of hiccups in the middle of normal program design and construction, so that most developers may work on designing and building the features of an iteration at a consistently high clip most of the time.</p>
<p>Similarly, a final <em><strong>deployment stage</strong></em> can help parallelize the nitty-gritty work associated with getting all acceptance tests to pass and deploying the final system, thus freeing most developers to start working on the features of the next iteration.</p>
<p>Here is a schematic of the resulting <em><strong>3-stage development pipeline</strong></em>.</p>
<div style="text-align: center"><img border="0" alt="figure" src="http://bolour.com/blog/images/staging.gif" /></div>
<div align="center"><strong>3-stage iterative and incremental development pipeline</strong></div>
<p>By the end of the construction stage, all automated tests, including automated acceptance tests, would be passing. What remains is for acceptance testers to run through the remaining manual tests one last time. This activity can be time-consuming. But if there is significant test coverage by automated tests, the bug yield from this last pass through manual tests is likely to be modest.</p>
<p>In that case, for a small development team, a single developer can usually be shunted to do release engineering work, including defect investigation, for the deployment stage of the pipeline. This release engineer (with expanded responsibility for defect tracking) buffers developers from interruptions by acceptance testers. The release engineer, being a developer, can also track down and isolate defects to particular areas of the code before involving the area specialists in the fix. That frees the area specialist from the burden of tracking down and pinpointing bugs, which is often the more time-consuming and disruptive part of bug fixing.</p>
<p>Of course, this function may be rotated between members of the team, to spread the pain, if that is how it is viewed.</p>
<p>© Copyright 2006 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2006/10/iteration-pipelining/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A View of Development Roles</title>
		<link>http://bolour.com/blog/2006/10/a-view-of-development-roles/</link>
		<comments>http://bolour.com/blog/2006/10/a-view-of-development-roles/#comments</comments>
		<pubDate>Thu, 19 Oct 2006 20:46:33 +0000</pubDate>
		<dc:creator>Azad Bolour</dc:creator>
				<category><![CDATA[process]]></category>

		<guid isPermaLink="false">http://bolour.com/blog/2006/10/19/a-view-of-development-roles/</guid>
		<description><![CDATA[Specialization in teams allows team members to do what they are good at and what they like to do. There are at least three types of specialization in software development teams: functional specialization, subsystem specialization, and technology specialization. Here I&#8217;d like to focus on functional specialization. A role is a specialized function in a team: [...]]]></description>
			<content:encoded><![CDATA[<p>Specialization in teams allows team members to do what they are good at and what they like to do. There are at least three types of specialization in software development teams: functional specialization, subsystem specialization, and technology specialization. Here I&#8217;d like to focus on functional specialization. A <em>role</em> is a specialized function in a team: a locus of responsibility and authority to perform certain types of functions.</p>
<p>All software development methodologies pay special attention to roles. As an example, the <em>Crystal Clear</em> methodology of Alistair Cockburn defines the roles of <em>sponsor</em>, <em>expert user</em>, <em>lead designer</em>, <em>designer-programmer</em>, <em>business expert</em>, <em>coordinator</em>, <em>requirements gatherer</em>, <em>tester</em>, and <em>writer</em>.</p>
<p>Clearly, the way we divide our work into roles has a major effect on our team&#8217;s health and productivity. One way to choose the roles appropriate for our teams is to start with a suitable methodology, and to try and fit the team to that methodology. Another is to simply follow precedent in our organizations. A different way is to consciously decide what roles make sense for our teams, by drawing on the body of knowledge in different software traditions, and by using our own experience and intuition.</p>
<p>The teams I work in are generally small development teams, typically working on vertical commercial applications. So here are some role basics, based on my experience, that might inform the conversation about what roles actually make sense for our particular teams.</p>
<p>The way I look at it, there are three basic functional areas, or <em><strong>disciplines</strong></em>, in development teams.</p>
<ul>
<li><strong><em>Functional design.</em></strong> The functions of the system have to be conceived and their externally visible form has to be designed.</li>
<li><strong><em>Engineering R&#038;D.</em></strong> Feasible approaches to implementing the required functions have to be figured out.</li>
<li><strong><em>Construction.</em></strong> The required functions have to built in a production implementation.</li>
</ul>
<p>This view of the world is informed, but not necessarily constrained, by the ideas of Alan Cooper (see, <a href="http://archive.computerhistory.org/resources/moving-image/CHM_Lectures/2002/imagine_this.cooper-alan.lecture.2002-12-05.102656964.wmv">Imagine 	This</a>, a 2002 talk by Cooper at SDForum).</p>
<p>Work in various disciplines occurs throughout each iteration. The term <em>discipline</em> comes from the <em>Rational Unified Process</em> (<em>RUP</em>), which defined its own set of disciplines. As explained in the <em>RUP</em> literature, disciplines and <em>phases</em> are orthogonal concepts.</p>
<h4>Functional Design versus Construction</h4>
<p>In 1970, Winston Royce in his influential paper, <a href="http://facweb.cti.depaul.edu/jhuang/is553/Royce.pdf">Managing the 	Development of Large Software Systems</a>, proposed that the most basic activities in software development are <em>analysis</em> and <em>coding</em>. Functional design and construction are somewhat more general incarnations of Royce&#8217;s analysis and coding disciplines. This type of distinction is also present in all major traditions of software development, including agile methodologies, though the details vary considerably from one methodology to the next.</p>
<p>Functional design as defined here includes both requirements analysis and the design of the external interfaces through which the system&#8217;s required functions are realized. Because these activities are tightly coupled, they belong to the same discipline.</p>
<p>Another activity that is tightly coupled with the conception of a system&#8217;s required functions and its user interface is acceptance testing. Clearly, it is up to the functional designers of a system to verify that what is constructed is what was conceived. So acceptance testing as a function fits into the  functional design discipline.</p>
<p>It is fairly non-controversial to separate out the activities of functional design into their own roles, such as product manager, GUI designer, and acceptance tester. The main point of departure here from established mainstream practice is to organize acceptance testing under the functional design discipline.</p>
<h4>Construction versus Engineering R&#038;D</h4>
<p>In most other technical fields, there is a fairly clear line between engineering R&#038;D and the production of the end product. In software, this line is blurred. Nevertheless, there are compelling reasons, I believe, to consider differentiating between these activities in our teams. To recap, engineering R&#038;D has to do with figuring out feasible approaches to implementing the functions of a system; and construction has to do with building robust working systems.</p>
<p>Engineering R&#038;D can easily be mistaken for what we have come to know as <em>design</em>, which reminds us of the outdated waterfall-era separation of design and coding. In fact, engineering R&#038;D does involve design, of course, but also much coding to try out ideas, prototype, simulate, benchmark, and so on. Similarly, construction involves coding, of course, but also much design to model the domain, to minimize coupling and maximize cohesion, to design for testability, and so on.</p>
<p>The distinction between engineering R&#038;D and construction leads to the roles of <em><strong>R&#038;D engineer</strong></em> and <em><strong>designer/builder</strong></em>.</p>
<p>Engineering R&#038;D per se is not a phase. It is a type of function performed in software development. This type of function may be correlated with stages of development. But that is a different topic for a different day.</p>
<p>In my mind, the principle reason for differentiating engineering R&#038;D and construction is that they each run on a different clock. Engineering R&#038;D is experimental and uncertain in its outcome. R&#038;D schedules need to tolerate a large degree of uncertainty and failure. In contrast, construction work is less risky. Construction is generally amenable to more systematic and accurate estimation. And management has much stricter expectations of timeliness and success from construction work than from engineering R&#038;D.</p>
<p>Alan Cooper made this type of distinction a number of years back between what he calls <em>engineers</em> and <em>programmers</em>. He argues that these two types of practitioners need certain distinct sensibilities, and that it is probably easier to find world-class people in each of these areas, than people who are world-class in both. At the risk of oversimplification, one might say that, while there is considerable overlap between the two disciplines, R&#038;D engineers gravitate towards the challenges of problem solving, while builders gravitate towards the challenges of creative craftsmanship to make real working systems.</p>
<p>Be that as it may, the fact that engineering R&#038;D and construction each has its own pace, alone, I think, motivates an R&#038;D engineer role in our teams. The R&#038;D engineer is relatively free of the intense demands of building production code under aggressive schedule pressure, and can spend time researching high-value/high-risk ideas, and alternative solutions to the non-obvious requirements of the system. Of course, people need not be typecast into the roles of R&#038;D engineer or designer/builder. One can certainly move between the two.</p>
<p>For many vertical applications, the ratio of R&#038;D engineers to designer/builders would be small. I consider the role of the traditional hands-on software architect as an engineering R&#038;D role. And in some projects, the architect may be the sole R&#038;D engineer.</p>
<h4>Summary</h4>
<p>Here is a summary the basic roles motivated by this world-view.</p>
<div style="text-align: center"><img border="0" alt="figure" src="http://bolour.com/blog/images/development-roles.gif" /></div>
<div align="center"><strong>software development roles and disciplines</strong></div>
<p>Each basic discipline has a lead, or <em>coordinator</em>. The coordinator of the functional design discipline is the product manager. The coordinator of the engineering discipline is the traditional hands-on architect. And the coordinator of the construction discipline is the <em>master builder</em>, the supervisor on the ground with ultimate responsibility for the construction of the production system. All team roles report to a single team lead with ultimate responsibility for all team functions. The team lead is in easy access for escalation and resolution of issues between the different disciplines.</p>
<p>It goes without saying that roles are <em>open</em>: actively seeking feedback and collaboration from other roles and other disciplines.  Also, of course, the same individual may play more than one role at any given time.</p>
<p>© Copyright 2006 Bolour Computing.</p>
]]></content:encoded>
			<wfw:commentRss>http://bolour.com/blog/2006/10/a-view-of-development-roles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
<enclosure url="http://archive.computerhistory.org/resources/moving-image/CHM_Lectures/2002/imagine_this.cooper-alan.lecture.2002-12-05.102656964.wmv" length="256427588" type="video/x-ms-wmv" />
		</item>
	</channel>
</rss>
