May 20, 2009

The domain object that does everything

Continuing with my open ended series of posts whereby I conduct a refactoring experiment with BlogEngine.NET

In my previous post ( Better readability using List.ForEach() and lambda expressions ) I refactored the newly created Catalog class and was left with some method calls that looked like this:

Post.PublishedPosts.ForEach(post => post.AddToCatalog());
 
The method for accessing the list of posts resides in the Post class itself, making the data access syntax somewhat similar to how it’s done with the ActiveRecord pattern or CSLA.

I don’t particularly like this structure, because it mixes what I consider to be two different concerns: domain (entity) logic and data persistence. It gives the Post class at least two distinct reasons to change, violating the Single Responsibility Principle and making the class probably a little more complex than it could be.

So, I decided to dive into the Post.cs file and see what I could do to simplify the class and remove some of the concerns that don’t fit directly in the domain object. What exactly is in the Post class? What does it do? After a quick once-through, I could see that the Post class:

  • Has properties that define the “Post” entity, like Title, Author, Content, etc.
  • Has event handlers that fire when posts are added, removed, rated, etc.
  • Manages an in-memory list of “Post” entities
  • Coordinates persistence to the data store by using the BlogService class
  • Formats its own relative, absolute, trackback, and permalink URIs by accessing the Utils and BlogSettings classes
  • Sends its own emails by calling out to the Utils class

There’s a lot going on in there. Personally, I think the domain object / entity class only needs to be concerned with the first two items in the list, and it should delegate out the rest of the stuff.

For the persistence, what I would prefer to see is something more like the Repository pattern used in Domain Driven Design. With a repository, all of the persistence is routed through a single class that acts “like an in-memory domain object collection.” Which, coincidentally, is exactly what’s stuffed inside the Post class. I just need to let it out.

In our case, the Post repository would be a class called Posts, physically and logically separated from the domain object’s Post class. We would end up with something that looks like this:

Posts.Published.ForEach(post => post.AddToCatalog());
 
From a syntax perspective, there’s only a subtle difference, but it’s an important one. We’re now treating the collection of Posts as a first-class object. And, by separating the persistence concerns out of the Post.cs class, it becomes much simpler and easier to maintain. As does the repository (Posts.cs) itself.

Persistence, however, isn’t the only external concern that needs to be pulled out. The URI generation is now in a class called RoutingAgent, which is also where I placed former Utils methods that calculate the RelativeWebRoot and AbsoluteWebRoot and such. The email logic is in the MailAgent class, as an extension method:

public static void SendNotifications(this Comment comment, Post post)
 
In the end, the Post.cs class is greatly simplified, and the post entity is no longer tightly coupled with non-post concerns like System.Net.Mail, BlogEngine.Core.Providers, System.Globalization, etc.

Next I’ll have to do the same type of refactoring with the other domain objects (Page, Comment, etc.).

May 13, 2009

Book Review: Dreaming In Code



Dreaming In Code: Two Dozen Programmers, Three Years, 4,732 Bugs, And One Quest For Transcendent Software

I found Dreaming In Code to be a fascinating book. On the surface it’s a narrative of a software development project, with some interesting industry history sprinkled throughout for context. But by telling it’s story, author Scott Rosenberg reveals two underlying truths about software development: producing quality, useful, non-trivial software applications is hard. And no team, no matter how good the members are, is immune to the inevitable setbacks and pitfalls.

Rosenberg, cofounder of the news and entertainment site Salon.com, does a great job detailing much of the early development of the Chandler Project, a “Note-to-Self Organizer”. The brainchild of Mitch Kapor (co-founder of Lotus Development Corp.), Chandler was intended to be the free, flexible, useable, everyman’s replacement for the expensive, corporate, stiff, hard to use Microsoft Outlook as a personal organization and communication tool.

“Chandler is an open source Note-to-Self Organizer. It features calendaring, task and note management and consists of a desktop application, web application and a free sharing and back-up service called Chandler Hub.
Our goal is to serve the way people actually work, independently and together, particularly in small groups, a market segment we believe is underserved. Our belief is that personal and collaborative information work is by nature iterative and that the existing binary Done/Not-Done, Read/Unread, Flagged/Unflagged paradigm in productivity software poorly accommodates the reality of how people work.”

Dreaming In Code is largely the story of how the grand vision of the birth of a new paradigm in personal organization software was slowly brought down to earth by the realities every programmer team faces, despite being helmed by a successful software visionary and staffed by some of the best and brightest in the industry. Faced with a lack of consistent direction, personality conflicts, time constraints, and sometimes simply biting off more than they could chew, the team struggled to find the success they were looking for.

Throughout the book, Rosenberg deftly helps the reader understand the motivation and principles that drove the team to try and create something new and different. To them, it wasn’t just a product, it was the beginning of a movement. But, there was no escaping the fact that it was also a software engineering project, lofty ambitions and altruistic motivations notwithstanding. We, through the author, are there in the meetings and celebrations, the good times and the frustrating times, riding the roller coaster along with everybody else.

Unfortunately, the book ends too soon. Rosenberg spends three years intimately involved with the project, but has to leave to publish it before the product is even close to being shipped. As a matter of fact, as the book comes to a close, we don’t even know if Chandler 1.0 ever shipped at all (it did). From the author’s blog:

It’s been close to six years since Mitch Kapor first announced plans for Chandler, and the application today is quite different from what was envisioned then. But it does fulfill at least a portion of the ambitious agenda Kapor set: It’s fully cross-platform, and, from the user side, it takes a very flexible approach to data. The program was once positioned as a calendar with email and task capabilities, and it’s still got those features, but it’s now presented as a notebook program — it’s “The Note-To-Self Organizer.” You store information free-form and then can organize it according to now/later/done triaging, turn items into tasks and schedule them on the calendar, group data in multiple collections, and share it across the web via the Hub server. I’m looking forward to experimenting more with it.

Aside from the missing end to the story, which can be found online anyway, the book is a great read.

The book jacket says that Dreaming In Code is “Not just for technophiles but for anyone captivated by the drama of invention.” I don’t think anybody who’s not at least somewhat related to software project management would find much interest in it. For those of us who are, though, it’s highly recommended.

May 07, 2009

Using List.ForEach() and lambda expressions

In my previous post, I refactored a small bit of the BuildCatalog() method, removing the need for the Catalog to know how to filter out the published posts and pages, and the approved comments.

What we were left with was a pretty simple method, but I think it could be made even more readable with a few tweaks. Here’s the code as it is after my changes:
private static void BuildCatalog()

{

OnIndexBuilding();

 

bool commentsEnabled = BlogSettings.Instance.EnableCommentSearch;

 

lock (_SyncRoot)

{

_Catalog.Clear();

 

foreach (Post post in Post.PublishedPosts)

{

AddItem(post);

if (commentsEnabled)

{

foreach (Comment comment in post.ApprovedComments)

{

AddItem(comment);

}

}

}

 

foreach (Page page in Page.PublishedPages)

{

AddItem(page);

}

}

 

OnIndexBuild();

}

There are a three foreach() blocks working on List<> objects, and within each block there’s a single line of code. That kind of code is just begging to be made a bit more elegant.

The .NET Framework 2.0 added the List.ForEach() method, which allowed for something like:

post.ApprovedComments.ForEach(delegate(Comment comment)

{

AddItem(comment);

}

Using a delegate, this syntax isn’t substantially better nor easier to read. However, toss in a lambda expression instead (introduced by .NET Framework 3.0), and ForEach() suddenly becomes juicily delicious:
post.ApprovedComments.ForEach(comment => AddItem(comment));

Yummy. All that blocky wordiness condensed to a simple, readable, one-line statement. Once we sprinkle those into our BuildCatalog() method, we may end up with something like:

private static void BuildCatalog()

{

OnIndexBuilding();

 

lock (_SyncRoot)

{

_Catalog.Clear();

 

if (BlogSettings.Instance.EnableCommentSearch)

{

Post.PublishedPosts.ForEach(post => AddPostAndComments(post));

}

else

{

Post.PublishedPosts.ForEach(post => AddItem(post));

}

Page.PublishedPages.ForEach(page => AddItem(page));

}

 

OnIndexBuild();

}

Much cleaner code, I think. My final version of the Catalog class, after extracting it from the Search.cs file and adding a few more refactorings (like an extension method or two), looks like this:

public static class Catalog

{

private static List<Entry> entries = new List<Entry>();

private readonly static object syncRoot = new object();

 

internal static List<Entry> Entries

{

get { return entries; }

}

 

internal static void AddItem(IPublishable item)

{

item.AddToCatalog();

}

 

internal static void Build()

{

OnIndexBuilding();

lock (syncRoot)

{

entries.Clear();

Page.PublishedPages.ForEach(page => page.AddToCatalog());

if (BlogSettings.Instance.EnableCommentSearch)

{

Post.PublishedPosts.ForEach(post => post.AddSelfAndCommentsToCatalog());

}

else

{

Post.PublishedPosts.ForEach(post => post.AddToCatalog());

}

}        

OnIndexBuilt();

}

 

private static void AddSelfAndCommentsToCatalog(this Post post)

{

post.AddToCatalog();

post.ApprovedComments.ForEach(c => c.AddToCatalog());

}

 

private static void AddToCatalog(this IPublishable item)

{

var entry = new Entry

{

Item = item,

Title = TextAgent.CleanContent(item.Title, false),

Content = HttpUtility.HtmlDecode(TextAgent.CleanContent(item.Content, true))

};

 

if (item is Comment)

{

entry.Content += HttpUtility.HtmlDecode(TextAgent.CleanContent(item.Author, false));

}

entries.Add(entry);

}

 

public static event EventHandler<EventArgs> IndexBuilding;

private static void OnIndexBuilding()

{

if (IndexBuilding != null)

{

IndexBuilding(null, EventArgs.Empty);

}

}

 

public static event EventHandler<EventArgs> IndexBuilt;

private static void OnIndexBuilt()

{

if (IndexBuilt != null)

{

IndexBuilt(null, EventArgs.Empty);

}

}

}

May 04, 2009

My little baby’s all grown up

In my previous post I began refactoring the BlogEngine.NET Search.cs file, specifically a portion of the “catalog” functionality. I’m going to continue my refactoring here …

But first, a little context. It seems that the searching algorithm (a method called BuildResultSet() ) works on “entries” in a collection called “catalog”, returning a set of “results”. An Entry is a struct defined right in the Search.cs file:

/// <summary>

/// A search optimized post object cleansed from HTML and stop words.

/// </summary>

internal struct Entry

{

/// <summary>The post object reference</summary>

internal IPublishable Item;

/// <summary>The title of the post cleansed for stop words</summary>

internal string Title;

/// <summary>The content of the post cleansed for stop words and HTML</summary>

internal string Content;

}

A Result is actually a completely separate class, contained also in the Search.cs file:

/// <summary>

/// A result is a search result which contains a post and its ranking.

/// </summary>

internal class Result : IComparable<Result>

{

/// <summary>

/// The rank of the post based on the search term. The higher the rank, the higher the post is in the result set.

/// </summary>

internal int Rank;

 

/// <summary>

/// The post of the result.

/// </summary>

internal IPublishable Item;

 

/// <summary>

/// Compares the current object with another object of the same type.

/// </summary>

/// <param name="other">An object to compare with this object.</param>

/// <returns>

/// A 32-bit signed integer that indicates the relative order of the objects being compared. The return value 

/// has the following meanings: Value Meaning Less than zero This object is less than the other parameter.Zero 

/// This object is equal to other. Greater than zero This object is greater than other.

/// </returns>

public int CompareTo(Result other)

{

return other.Rank.CompareTo(Rank);

}

 

public override int GetHashCode()

{

return Item.Id.GetHashCode();

}

}

Catalog, meanwhile, is defined as a local variable:

private static Collection<Entry> _Catalog = new Collection<Entry>();

Each of these items have a few methods set aside for them, such as “BuildCatalog()”, “AddItem()”, “CleanContent()”, and so forth. These pieces of code are all folded up and hidden away within regions in the Search.cs file.

The searching itself is done in a few methods, primarily BuildResultSet():

/// <summary>

/// Builds the results set and ranks it.

/// </summary>

private static List<Result> BuildResultSet(string searchTerm, bool includeComments)

{

List<Result> results = new List<Result>();

string term = CleanContent(searchTerm.ToLowerInvariant().Trim(), false);

string[] terms = term.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

string regex = string.Format(System.Globalization.CultureInfo.InvariantCulture, "({0})", string.Join("|", terms));

 

foreach (Entry entry in _Catalog)

{

Result result = new Result();

if (!(entry.Item is Comment))

{

int titleMatches = Regex.Matches(entry.Title, regex).Count;

result.Rank = titleMatches * 20;

 

int postMatches = Regex.Matches(entry.Content, regex).Count;

result.Rank += postMatches;

 

int descriptionMatches = Regex.Matches(entry.Item.Description, regex).Count;

result.Rank += descriptionMatches * 2;

}

else if (includeComments)

{

int commentMatches = Regex.Matches(entry.Content + entry.Title, regex).Count;

result.Rank += commentMatches;

}

 

if (result.Rank > 0)

{

result.Item = entry.Item;

results.Add(result);

}

}

 

results.Sort();

return results;

}

So, where am I going with this? I’m glad you asked. :-)

It seems to me that this one file, Search.cs, is trying to do too many complex things. It’s defining several entities (Entry, Result, Catalog), managing them (BuildCatalog(), BuildResultSet()), searching them (APMLMatches(), Hits()), manipulating them (CleanContent(), FindRelatedItems(), ResultToPost()), and handling events (Post_Saved(), Post_CommentAdded(), OnSearching()). It’s probable that the search functionality started out pretty simple, and grew more complex over time.

When that happens to our code, we should practice a little “ruthless refactoring” and extract these groups of functionality into their own classes where appropriate. In other words, when your baby methods grow up, make them move out and get their own home. In this way, we can keep the “Search” class concerned with searching, not maintaining the content catalog or sanitizing the result’s text. I don’t think enough of that was done here, so I’m going to do it.

The first thing I did when I opened up this file (even before what I did in the previous post), was extract the “catalog”, and give it some freedom. My new Catalog.cs, all grown up now (and with a few more refactorings which I’ll explain later), looks like this:

public static class Catalog

{

private static List<Entry> entries = new List<Entry>();

private readonly static object syncRoot = new object();

 

internal static List<Entry> Entries

{

get { return entries; }

}

 

internal static void AddItem(IPublishable item)

{

item.AddToCatalog();

}

 

internal static void Build()

{

OnIndexBuilding();

lock (syncRoot)

{

entries.Clear();

Page.PublishedPages.ForEach(page => page.AddToCatalog());

 

bool commentSearchEnabled = BlogSettings.Instance.EnableCommentSearch;

if (commentSearchEnabled)

{

Post.PublishedPosts.ForEach(post => post.AddSelfAndCommentsToCatalog());

}

else

{

Post.PublishedPosts.ForEach(post => post.AddToCatalog());

}

}

OnIndexBuilt();

}

 

private static void AddSelfAndCommentsToCatalog(this Post post)

{

post.AddToCatalog();

post.ApprovedComments.ForEach(c => c.AddToCatalog());

}

 

private static void AddToCatalog(this IPublishable item)

{

var entry = new Entry

{

Item = item,

Title = TextAgent.CleanContent(item.Title, false),

Content = HttpUtility.HtmlDecode(TextAgent.CleanContent(item.Content, true))

};

 

if (item is Comment)

{

entry.Content += HttpUtility.HtmlDecode(TextAgent.CleanContent(item.Author, false));

}

entries.Add(entry);

}

 

public static event EventHandler<EventArgs> IndexBuilding;

private static void OnIndexBuilding()

{

if (IndexBuilding != null)

{

IndexBuilding(null, EventArgs.Empty);

}

}

 

public static event EventHandler<EventArgs> IndexBuilt;

private static void OnIndexBuilt()

{

if (IndexBuilt != null)

{

IndexBuilt(null, EventArgs.Empty);

}

}

}

Similarly, I’ll do the same type of extraction to other pieces of Search.cs, leaving each of the individual pieces simpler, more single-purpose, and ultimately (I hope) easier to understand and manipulate.

Coming up next, I take advantage of some .NET 3+ functionality to further refine the code.