Saturday, May 7, 2011

SharePoint 2010 Metadata-based Tag Cloud Web Part

I recently had the opportunity to create a search-based tag cloud web part for SharePoint 2010. This blog post is a walkthrough of the code created to do it. Note that there were specific business requirements that required that the metadata tags be pulled directly from a list. The code could easily be modified, however, to pull the metadata tags from another location. You could, for example, use Linq or search itself to pull the data.

I am going to jump right into the code and provide notes along the way.

Here are the using statements that you will need:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Web.UI;
using System.Web.UI.WebControls.WebParts;
using Microsoft.SharePoint;


Below is the namespace and the class declaration. As with virtually any web part I create, inherit from the WebPart class:

namespace My.WebParts
{
    [ToolboxItemAttribute(true)]
    public class TagCloud : WebPart
    {


I'm going to start by creating two dictionaries. The first dictionary is sorted and will be used to hold the words that will appear in the tag cloud (string key) along with the number of occurences of the word (int). Tag clouds are generally in alphabetical order, which is why I use a sorted dictionary for this one.

The second dictionary stores the ids associated with a tag. These are the actual metadata ids stored in the SharePoint database. I need these because I want, when you click on a tag, to display the search results for that tag. We are going to use a refinement parameter in search (more on that later) and this requires the metadata id.

        private SortedDictionary<string, int> dict = new SortedDictionary<string, int>();
        private Dictionary<string, string> dictTagIds = new Dictionary<string, string>();


Next I will declare the web part properties that can be set by the SharePoint user or administrator that creates the web part.

The first property is the name of the list that contains the metadata column to be used in the tag cloud. I agree that this approach is somewhat limiting, but this was part of my business requirements. Please keep in my that it is a rather simple change to have the data come from a SharePoint Linq or Search itself.

        [WebBrowsable(true),
        Category("Custom Properties"),
        Personalizable(PersonalizationScope.Shared),
        DefaultValue("My List"),
        WebDisplayName("List Name"),
        WebDescription("Name of the list that contains the metadata column to be used in the tag cloud")]
        public string listName { get; set; }


The second column tells the code which column on the list contains the metadata column to be used in the tag cloud. I don't check to make sure it's a metadata column in the code, but you could easily add one to make sure that the user enters a valid metadata column on the list. In fact, a nice enhancement would be to just make it a drop-down that shows valid metadata columns on the list selected:

   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue("Tag"),
   WebDisplayName("Column Name"),
   WebDescription("Name of the metadata colum to be used in the tag cloud")]
   public string fieldName { get; set; }


The third property sets the maximum tags to display in the tag cloud. I don't set an overall limit regardless of what the user enters, but that should probably be there as well:

   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue("50"),
   WebDisplayName("Maximum Tags to Display"),
   WebDescription("Enter the maximum number of tags to display")]
   public int maxNumberOfTags { get; set; }


The fourth property is the search scope to use when you click on a tag and display search results. Again, I don't validate this, but you could easily add that:

   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue("MySearchScope"),
   WebDisplayName("Search Scope"),
   WebDescription("The Search Scope to be used when clicking on a tag in the tag cloud")]
   public string searchScope { get; set; }


The fifth property is the relative url of the search results page to show when you click on a tag. We use a custom search results page for the tag cloud. So, this was necessary:

   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue("../SearchCenter/Pages/Results.aspx"),
   WebDisplayName("Search Center Relative URL"),
   WebDescription("The relative URL of the search results page to use when clicking on a tag in the tag cloud")]
   public string searchCenterResultsPageUrl { get; set; }


The final two properties let the user set the minimum and maximum font sizes to use in the tag cloud:

   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue(10),
   WebDisplayName("Minimum Tag Font Size"),
   WebDescription("The minimum size font to use in the tag cloud")]
   public int minFontSize { get; set; }


   [WebBrowsable(true),
   Category("Custom Properties"),
   Personalizable(PersonalizationScope.Shared),
   DefaultValue(35),
   WebDisplayName("Maximum Tag Font Size"),
   WebDescription("The maximum size font to use in the tag cloud")]
   public int maxFontSize { get; set; }


Override the CreateChildControl method and call the methods that do the grunt work:

   protected override void CreateChildControls()
   {
       GetMetadataValues();
       Controls.Add(new LiteralControl(BuildTagCloudHtml()));
   }


The GetMetadataValues method goes out to the list that was specified in the web part properties and gets the metadata from the appropriate column. It then uses Linq to count each of the occurences of a tag. Once everything is counted and sorted, I trim off the least frequent occurences of a word based on the web part property. And finally, the words along with their occurences are put into the sorted dictionary. If you wanted to change the web part to pull from a search query instead of a list, this is where you would do it.

        private void GetMetadataValues()
        {
            using (SPWeb web = SPContext.Current.Site.RootWeb)
            {
                try
                {
                    SPList queryList = web.Lists[listName];
                    SPListItemCollection items = queryList.GetItems();

                    List<string> words = new List<string>();
                    foreach (SPListItem item in items)
                    {
                        string[] tags = item[fieldName].ToString().Split(';');
                        foreach (string tag in tags)
                        {
                            string[] tagLabelAndGuid = tag.Split('|');
                            string tagLabel = tagLabelAndGuid[0];
                            string tagGuid = tagLabelAndGuid[1];
                            words.Add(tagLabel);
                            if (!dictTagIds.ContainsKey(tagLabel))
                                dictTagIds.Add(tagLabel, tagGuid);
                        }
                    }

                    var wordCount =
                        from word in words
                        group word by word into g
                        select new { g.Key, Count = g.Count() };

                   
                    var trimmedWordCount =
                        (from word in wordCount
                         orderby word.Count descending
                         select word).Take(maxNumberOfTags);

                   
                    foreach (var stuff in trimmedWordCount)
                    {
                        dict.Add(stuff.Key, stuff.Count);
                    }
                }
                catch (Exception ex)
                {
                    if (!EventLog.SourceExists("TagCloudWebPart"))
                        EventLog.CreateEventSource("TagCloudWebpart", "Application");
                    EventLog.WriteEntry("TagCloudWebpart", ex.Message);
                }
            }
        }


Once I have all the data, I build the html that will set the relative font sizes for the tag cloud. Note that, each tag cloud term has a link attached to it. Using the scope and custom search page specified in the web part parameters, I add parameters to the search link that narrow the search results to only include the items tagged with the metadata term that was clicked. This is essentially the "r" parameter (for refinement) in the search results. The r parameter must be url encoded and requires both owstaxid field name and the guid of the field (which is why I retrieved that in the last step).

        private string BuildTagCloudHtml()
        {
            StringBuilder htmlString = new StringBuilder();

            int minVal = FindDictionaryMinValue(dict);
            int maxVal = FindDictionaryMaxValue(dict);

            if (dict.Count > 0)
            {
                htmlString.Append("<p><center>");
                foreach (var word in dict)
                {
                    double weight = (Math.Log(word.Value) - Math.Log(minVal)) / (Math.Log(maxVal) - Math.Log(minVal));
                    int fontsize = (int)(minFontSize + Math.Round((maxFontSize - minFontSize) * weight));
                   
                    string guid;
                    dictTagIds.TryGetValue(word.Key, out guid);
                    string rParameter = "\"owstaxId" + fieldName + "\"=#" + guid + ":\"" + word.Key + "\"";
                    rParameter = "&r=" + System.Web.HttpUtility.UrlEncode(rParameter);
                    string kParameter = "?k=" + word.Key;
                    string sParameter = "&s=" + searchScope;
                    string title = "\" title=\"" + word.Key;
                    string style = "\" style=\"font-size:" + fontsize + "pt\">";

                    htmlString.Append("<a href=\"");
                    htmlString.Append(searchCenterResultsPageUrl);
                    htmlString.Append(kParameter);
                    htmlString.Append(sParameter);
                    htmlString.Append(rParameter);
                    htmlString.Append(title);
                    htmlString.Append(style);
                    htmlString.Append(word.Key);
                    htmlString.Append("</a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ");
                }
                htmlString.Append("</center></p>");
            }
            else
            {
                htmlString.Append("<p>No Results</p>");
            }

            return htmlString.ToString();
        }


There are two helper methods that calculate the minimum value and the maximum value in the sorted dictionary:

        private int FindDictionaryMaxValue<T, U>(SortedDictionary<T, U> enumerable)
        {
            int maxVal = int.MinValue;

            foreach (KeyValuePair<T, U> pair in enumerable)
            {
                int curVal = Convert.ToInt32(pair.Value);
                if (curVal > maxVal)
                {
                    maxVal = curVal;
                }
            }

            return maxVal;
        }

        private int FindDictionaryMinValue<T, U>(SortedDictionary<T, U> enumerable)
        {
            int minVal = int.MaxValue;

            foreach (KeyValuePair<T, U> pair in enumerable)
            {
                int curVal = Convert.ToInt32(pair.Value);
                if (curVal < minVal)
                {
                    minVal = curVal;
                }
            }

            return minVal;
        }


Finally, close the class and the namespace:

    }
}


A few additional notes:
  • The font sizes for the tag cloud are calculated using a logarithmic method that tries to evenly distribute the font sizes across the collection of tags. You may want to change this depending on your expected data distribution. I think, however, that the formula does a very nice job of evenly distributing things.
  • The metadata column on the list and the metadata field in the store are assumed to have the same name in this code. This was based on the business requirements I was given. You may want to change that dependency if you follow this code exactly.
  • The owstaxId portion of the field name is hard-coded in the r parameter. This is setup automatically for every metadata field created. See this post for details: http://msdn.microsoft.com/en-us/library/ff625182.aspx
  • This code was setup and tested using enterprise search on SharePoint 2010. No other versions were tested with this code.
Enjoy! And please post comments if you have some thoughts about it or suggestions for improvement.

4 comments:

  1. Hi Dough,

    Very nice article. This is what I was also looking. But I have a query. You are passing listName in GetMetadataValues(). I want to pass all the lists there and that I don't know how many list could be there. So how can I pass that.

    -Ankit

    ReplyDelete
  2. Hi Ankit. Thanks for your comment. Assuming you could identify which lists you need to query, the first option that comes to mind is to just loop through each of the lists, query it, append the result set of the list query to the SPListItemCollection items.

    ReplyDelete
  3. Hi Doug,

    This WebPart works in Sharepoint Foundation?

    ReplyDelete
  4. Hi Alexandre, Thanks for your question. Unfortunately I don't know if it works in foundation as I haven't worked with anything other than SharePoint Server. I suspect that it might not the way it uses search.

    ReplyDelete