Martin Camitz - Cosmik Debris

This a code blog, mostly. Some about Zero Hate, my project for battling eHate. Some dojo, some Asp.net Mvc.

Apr 29, 2013

With my new survey software suite, a survey is as easy to enter as

What did you drink last night?
o Gin/Tonic
o Marguerita
o Cosmopolitan
o Other: []

My clients will be able to enter survey with ease and mostly without my involvment. They will be able to share survey by email for review.

Now all I have to do is get it out of my brain and into a compiler. But why not?

I'm picturing something like markdown which outputs an intermediary XML representation in an update-as-you-type-editor, maybe online. With some meta data in the header you'd be able to publish it in a dropbox folder just as I am this blog post.

It's the representation at the core of this that I want to talk about today. I propose a new representation of a survey. I'm calling it Flamingo.

Background

I sometimes dabble with surveys in particular RDS surveys. I made my own survey engine many years back. It was replaced by the research team with a modified version of LimeSurvey.

LimeSurvey is very nice but it's stuck in an inflexible model. One it shares with many other engines and XML representations I've seen (1,2,3,4). Sorry for ranting about LimeSurvey, it just happens to be in the line of fire.

There are the database tables. Questions. Answers. Question attributes. Conditions.

There are the question types. Radio list. Dropdowns. Yes/no. Text. Numeric. LimeSurvey offers several different types of array questions. 20 or so different types altogether. LimeSurvey survey calls this "vast". Nonetheless, I've had to add three question types and hack five.

So there's one motivation for rethinking the whole thing. I do not propose a presentation enginge that we will never need recoding for different needs and purposes but I most certainly want to separate the presentation from survey content and logic.

This last week I've been dealing with some pretty complex stuff involving a country selector. We ultimately had to find LimeSurvey-conforming solutions to it. Hacking the model and presentation was too big an operation.

What I propose is not only a brand new representation of a survey but a whole new way of thinking about a survey from the bottom up, including presentation and interpretation of responses.

The key advantages are:

  • High flexibility
  • Partial compliance with HTML
  • Minimum dependence between survey, response, presentation and interpretation of responses
  • Redundant referencing of survey elements
  • Cross references of survey elements
  • Repetition, validation and other survey logic included
  • Dynamically generated survey elements

Start

Let's give it a go then. Let's go way back. What's the atomic building block of a survey? The option. I propose this is a survey.

<option>Gin/Tonic</option>
<option>Marguerita</option>
<option>Cosmopolitan</option>
<option>Steak</option>
<option>Lobset</option>
<option>All of the above</option>

That's completely answerable. The questionnaire is interpretable and so are the responses, given the right context which may be given as a title or verbally for all I know. No assumptions are made, particularly about which or how many choices are possible together nor what question is actually being asked. The response recorded may be just a table with entries identifying the user and the options selected.

Just for the sake of it, let's impose some implicit logic and provide the user with some context.

<group>
    <label>What did you drink last night?"</label>
    <option>Gin/Tonic</option>
    <option>Marguerita</option>
    <option>Cosmopolitan</option>
</group>
<group>
    <label>What did you eat last night?"</label>
    <option>Steak</option>
    <option>Lobster</option>
    <option>All of the above</option>
</group>

This is a refreshing change of perspective for me. All these years I've been thinking of a survey as a list of questions. What I've come to realize it that it's really a list of options. The question is just meta data and so is the grouping and any explicit or implicit logic.

This is my basic proposition for a new way of representing a survey - with XML, using two basic elements, options and groups.

Before we move on, though, let's change the name of those two elements. The thing is, compliance with HTML, turns out to be a good idea. It makes building a presentation engine based on this scheme much easier. One, most of the stuff you can display straight off. Two, for those knowledgeable in HTML, the documentation is mostly implicit.

I've come up with a set of principles that seem to work rather well for me. Partial compliance with HTML is the second. Hence I'm calling groups div and options input. Ignore the fact that inputs, if they really were in HTML, would have no content. Ignore also the fact input elements without any other attributes produce textboxes.

Basic principles

The complete set of basic principles for Flamingo is as follows.

  • Two atomic elements that completely define a questionnaire
  • Comply with HTML when possible
  • Always assume multiple choice options
  • All the elements are identifiable by their XPATH
  • All contents of label attributes and elements act as resource keys when url-encoded

Incidendtally, the second to last principle is implicit in XML.

We've covered two of these principles, although their merit may be unclear for the moment. Let's move on to the third.

Assume multiple choice

The default input is a checkbox. Heh? Bear with me, I just want to change the default.

Your presentation engine's first point of order is hence to replace

<input>Gin/Tonic</input>

with

<input type="checkbox"><label>Gin/Tonic</label>

Not so long ago form elements came in three flavors, text (including text area), single choice (radios and dropdowns), and multiple choice (checkboxes). This shaped our thinking of what a survey was and what it could do. In effect it is constraining research. HTML5 specifies a wealth of new form elements and we've been introduced to many more in the form or custom widgets. Rather than expand the model significantly (infinitely) the proposition is to narrow it down to the basics.

In my version, the basic questionnaire is one built from one basic element, the option. And you can select anything! That means checkboxes. Constraints and validation come on top of this. It's context. It could easily be provided by someone looking over your shoulder and correcting you. I'm not saying it should be that way, but that's the basic starting point

Now let's break the mold.

<label>What did you have for dinner last night?</label>
<div>
    <label>Drink:</label>
    <input>Gin/Tonic</input>
    <input>Marguerita</input>
    <input>Cosmopolitan</input>
    <input>Tequila sunrise</input>
</div>
<div>
    <label>Eat:</label>
    <input>Steak</input>
    <input>Lobster</input>
    <input>I don't know</input>
</div>

What are those? Questions? Subquestions? Any survey model that relies on question as the basic atomic element will run into an "oops" here. I could probably hack this specific layout into LimeSurvey without altering the model, but for each variation I'd be doing more hacking. And for LimeSurvey that means php. Isch.

Easy for me to say, I'm specifying the representation. You're coding the presentation engine and I'm saying you have to be super adaptive. But I'm going to make life as easy as I can for you. Be like your browser. It can present infinite variations of HTML, even faulty HTML. I'll partially comply with HTML so you can display most of everything minimum of regular expression text manipulation. And since the input is the basic atomic element, what you send back as a response to the server will be just those. If your engine can't display the survey exactly the way the author intended, resort to unformatted HTML straight off (just remember to replace the inputs appropriately) and post the responses as is.

Apparently time to show you what a response might look like.

Responses and the XPATH

The response database table might look like this. It's a recommendation and not really part of the specification I propose.

respondent timestamp xpath NPATH   value    content
1   123456  "div[0]/input[3]"    "div[0]/input[3]"    ""  "Tequila sunrise"
1   123556  "div[1]/input[0]"    "div[1]/input[0]"    ""  "Steak"
1   123756  "div[1]/input[1]"    "div[1]/input[1]"    ""  "Lobster"

The complete XPATH is noted along with the value. The value may be a textbox value or something from a custom widjet. This means the interpretation of response is dependent on the order and layout of the survey. A concern for the results interpretation, in my opinion. However, if you need more control, add a name attribute to the input element. The last entry above may now read:

1   123756  "div[1]/input[1]"    "div[1]/input[@name='lobster']"    ""    "Lobster"

That's NPATH, with the n for name. Using either the XPATH or the NPATH you can reference the element in your survey XML and the rendered questionnaire. Also key. It helps in server side validation and cross referencing at the interpretation stage.

But wasn't the objective to reduce dependency between response and representation? Yes. Firstly, the right most column is still there, with the content. It may be all you need. The additional columns are there to aid you in the interpretation.

Secondly, while the response and interpretation of the response is dependent on the survey layout, the thing to note, is that the response model is not. This is key.

This is one of the things I address with the XPATH/NPATH/value/content redundancy. When you alter the survey you will be able to compare the XPATH with the NPATH, back trace through the response table and be able to deduce what option is chosen. To aid you have the complete version history of the survey, don't you?

Reorder, add or delete questions, divide or merge questions, duplicate the question... Have two versions of the survey running in parallell. You can design a response server that will be able to record data in a way you deem sufficient to interpret.

I don't consider this more than a recommendation. Exactly what you choose to record is entirely up to you. You may even submit an xml representation of the response, if you wish.

Timestamp? Again a recommendation. This is just where we are today regarding what data is. The age when a response was a check in a box is over. Valuable information can be gleaned from the behavior of the respondent, including how long he/she takes to respond and how many times the answer is changed. The least information is a cross section of the data including only the latest answer.

One final thing to note, the engine may reorder everything as you wish. Maybe the author has requested randomized questions. This should not affect the XPATH of the options. The XPATH references the survey XML template, not the rendered display. That includes dynamically generated elements which we'll get to later.

Another example

To further the discussion with another example, consider the following. I've introduced that label short hand here, but that's not the point.

<div>
    <label>Rate the following dishes</label>
    <div label="Steak">
        <input>Hate it</input>
        <input>It's ok</input>
        <input>Love it</input>
    </div>
    <div label="Lobster">
        <input>Hate it</input>
        <input>It's ok</input>
        <input>Love it</input>
    </div>
</div>

It should be apparent how this is intended to be displayed. LimeSurvey designates these array questions and we've all seen them in customer surveys, rating product experience, for example. It's very similar to the former example except that the content of the input elements repeat. The options' content are to be displayed as column headers. Lobster and Steak are row headers.

However, if the presentation engine does not have support for this particular layout, that's fine too. Just display as is (formatting the inputs, obviously). The survey is entirely legible in either case.

Single choice and multiple choice

As I've said, I consider all input elements checkboxes by default, because that's what I consider a survey, options. If you're in Europe, both steak and lobster may not be acceptable at the same time. I consider it validation and validation can be specified in my XML schema but the basic assumption is that it is not required. It's meta data, same as questions and part of a context which may be given in XML or verbally or be implicit.

So you may parse type="checkbox" (your default as opposed to "text"), type="text", type="radio". I consider that validation also and so does apparently the HTML5 group since they provide additionally type="email", type="date" etc.

Consider radio buttons. In order for them to work exclusively, they must have the same name. That implies that the following are alike from the presentation engine stand point.

<input name="eat">Lobster</input>
<input name="eat">Steak</input>

and

<input type="radio">Lobster</input>
<input type="radio">Steak</input>

We've indicated single choice. There are subtle differences, though. The first may produce a dropdown or any other single choice input method of your devising while the second definitely hints radio buttons. The response recorded may also be different.

1   123756  "input[0]"    "input[@name='eat']"    ""    "Lobster"

vs

1   123756  "input[0]"    "input[0]"    ""    "Lobster"

See how beautifully that works? By column 3, they are equivalent.

I'll also allow:

<div type="radio">
    <input>Lobster</input>
    <input>Steak</input>
</div>

and

<div max="1">
    <input>Lobster</input>
    <input>Steak</input>
</div>

What about five-to-ten-tuple choice questions? Single choice is just a subset of multiple choice. The constrained mindset we're used to stems from HTML 1.0, or even further back, an antique radio with preset stations. That's why they're called radio buttons, in case you've ever wondered.

Imagine a set of attributes for your steak. Bloody through charred. Salted. White pepper. Black pepper. T-bone or rib eye. Some are exclusive, some are not. With LimeSurvey you would be forced to design your survey in a restrictive way, dividing up into several questions. I'll give you much more flexibility.

That's why I prefer the last example to be honest, the one with max. But even multiple choice is just a subset of survey logic which I'll get to in a moment.

In an above example there was a semantically different checkbox: "I don't know". It's not clear whether this should be a radio button or check box. At the basic level, if the presentation has no other hints, I consider a concern for response interpretation.

Technically though, there are many ways to infer abstention and ignorance. I may wish to have one "I don't know" and one "I don't want to answer". Or I may combine them. They may be placed before or after the "Other" option. I think this should always be up to the author.

Textboxes

What about textual data, keyboard entered data in textboxes? Again, I consider keyboard entered data just another option. Consider:

<input/>

All browsers display that as a textbox.

The response may be recorded as:

1   123456  "option[0]"    "option[0]"    ""  "My dad's pink tux"

Anyone who sees this out of context will not know whether this was a checkbox or a textbox or anything else. He or she may or may not be keen on knowing, but either way, it's not a basic premise.

As mentioned, feel free to use type="email" or anything else supported by HTML5 if you wish. It's part of this spec too and it hints the engine of what we expect in terms of display and validation.

Survey logic

Survey logic encapsulates part of what we've discussed above. I consider three separate parts.

  • Constraints/Validation
  • Control
  • Display

Constraints/Validation

Constraints usually come in to play for specifying how many checkboxes we can select in a group, what type of characters we may enter into a field. This is closely related to validation in my view. I may convey information of what I expect of the user experience. Or I may leave it to you just get me reponses in a way of your design just so long as they fit my validation. That could mean using radio buttons. Or it could mean a flashing red alert explaining to the user that one too many options has been selected.

The max attribute, and any other that HTML5 specifies, is a shorthand of the constraint attribute. The following is equivalent.

<div constrain=".option.count('checked')<=1"/>

The following is just slightly more rigid. It implies that the constraint is to be executed at validation time though does not specify when the validation time might be. It may be on click.

<div validate=".option.count('checked')<=1"/>

I may provide the following information if I'm in the mood, employing another HTML attribute. This is a more hands on constraint.

<input onchange="if(this.checked){//optgroup[@name='cook'].unset('checked');this.set('checked')}">Medium rare</input>

It's supposed to uncheck all other cooking options. It is not javascript and makes no pretense to be either. You'll have to parse the XPATH, for starters, loop through the resulting nodes and add appropriate identifiers. For security reasons, no presentation will want to execute javascript from an xml sheet, anyway.

Display

Similarly, the enabled attribute, used to control where an option is visible/enabled, already implies a condition, for example:

<input enabled="!(//option[@name='steak'].checked)">Medium rare</input>

Control

Finally, control logic is about what questions are executed when and possibly in what order, depending on responses and other circumstances. It should be clear by now that I'm not about to introduce a goto statement. I think control concerns should be mapped to display logic as far as possible. Having said that, I haven't ruled out control logic completely though I can think of no circumstances off hand where it might come in handy.

Other features

Cross references

To repeat a group or option anywhere (even before the definition), just reference.

<input ref="input[3]"/>

will suffice if it's unique. Otherwise maybe our named lobster.

<input ref="@name='lobster'"/>

We can repeat that as many times as we want. The XPATH makes sure we can interpret the response. This makes it easy to have several questions with long list of countries. In fact, one of the things that prompted these ideas was the use of long lists in LimeSurvey. If multiple options are permissible then LimeSurvey will dynamically create a response table with as many columns for that question as there are options. You can quickly run into the limit of number of columns allowed by MySQL and before that, php stalls.

My take on the issue.

<div label="What countries did you visit in 2010?" ref="@name='country'"/>
<div label="What countries did you visit in 2011?" ref="@name='country'"/>
<div label="What countries did you visit in 2012?" ref="@name='country'"/>

The concern for the presentation is to decide on a multiple choice dropdown, a full page of checkboxes with an arbitrary number of columns or perhaps an object store backed combo box widget.

Dynamically generated survey elements

Cross references together with the severing of the dependency between responses and survey implies the following rather significant bonus. The questionnaire may be dynamically constructed. Arbitrarily repeated questions are a breeze. In fact there is little need for a representation at all for simple ad hoc questionnaires. Just hard code a form and send it off to your response engine. (That's rather stating the obvious. What I mean is, do try on my recommendation for a response table.)

Take the previous example with multiple choice countries. When the user selects a country, let the presentation generate a new dropdown. This is a real life example which I've had to solve with LimeSurvey in a backwards bending way the previous week.

Remember though, implicitly, you're adding elements to the template, not to the rendered HTML. It may or may not have implications for the XPATH and hence the responses. That's the rule anyway.

However, dynamically generated survey elements is more than just a bonus, it's actually implied in many cases by the principle of always assuming multiple choice. Consider the following.

<div label="Eat">
    <input>Steak</input>
    <input>Lobster</input>
    <input repeat="true" label="Other"/>
</div>

The engine should generate the last input as many times as the user requires. To be honest I think this is an annoying loop hole, one of many to be found. I'd sooner let the author opt in via the repetition feature described later. But I'd like you to give me a chance to find a way that would not break my list of principles or to patch it up.

Presentation hints

I want to have dependencies between the representation and the presentation as weak as possible. But hints to the presentation has obvious benefits, for example, indicating preference for radio buttons over dropdown, page breaks and instructions.

The class attribute seems the immediate choice for presentation. That way a tip can be italicized by CSS with minimum fuss.

<label class="tip">

<div class="indent3">

<div class="dropdown">

<label class="faq">Why are we so interested in what you had for dinner?</label>

I intend to specify a recommended list of such hints. Presentation engines are free to specify additions to that list as long as they commit to displaying surveys that don't conform to them.

Resource keys

That last one deserves an extra look. We never got around to the last principle, the one about resource keys, did we? That's what this is.

All contents of label attributes and elements act as resource keys when url-encoded. That smacks primarily two flys, images and locale. I'm just mentioning it, I think most can see how this will work out. The basic idea for languages is that you design the survey in your prefered language. Then you simply extract the indexes and write a translation in any other language you require.

If you write a FAQ, then the above indexes into that. The label may be just a question mark icon. Up to you.

Repetition

Repeating the country dropdown based on the answer given to a previous question, try:

<div ref="@name='country'" repeat="//option[@name='nCountriesVisited'][0].value" />

Addressing the problem mentioned earlier, when selecting an option generates new option, you would write for instance, given they are single choice dropdowns:

<div ref="@name='country'" repeat="..//option.count('checked')+1" />

I'll be the first to agree, that's a challenge to implement, but I suppose, if the history of HTML compliance is any guide, I suppose I could allow for partial compatibility.

However, the above is bound to happen quite often so the spec will probably accept a special case.

<div ref="@name='country'" repeat="true"/>

That may clue to a way out of the dilemma mentioned earlier, with the "Other" option. If you really wanted a repeating textbox, then something like this would accomplish it.

<div>
    <label>Eat:</label>
    <input>Steak</input>
    <input>Lobster</input>
    <input repeat="true" label="Other"/>
</div>

Closing

Thank you for reading. Tell me what you think. Consider the spec open source, work in progress and feel free to contribute. I'm going to put up a wiki.

It's a long way of to 1.0. The markdown type app, a presentation engine to work client side, a response server and a survey backend, should be developed in parallel. A little bit down the road, a UI.

Currently I don't have the resources but I think it would be great. Consider funding me if you stand to benefit.

comments powered by Disqus

Feb 11, 2013
For a post in English about the Zero Hate initiative, read rather Zero Hate. This post is intended mainly for a Swedish general audience.
Jag har tidigare skrivit om Zero Hate här.
Programmet Zero Hate är fritt att använda här här.

Zero Hate initiativet har fått ordentlig spridning tack vare screencasten, Computer Swedens artikel och alla vänner på Twitter och Facebook.

Jag har fått mycket positiva kommentarer, för att inte säga översvallande, och det gör mig mycket glad och jag blir mer och mer övertygad vi kan få det här att göra stor skillnad.

Jag har också fått många erbjudanden om hjälp. Projektet är open source och vem som helst kan dela och bidra med kod.

Framförallt har Måns Magnusson, @MansMeg, joinat mig för att bygga en statisktisk algoritm för att avgöra vilka kommentarer som ska flaggas för moderering. Det är alltså en robot med hjälp av en statistisk modell svarar på frågan, hur hotfull eller kränkande är denna kommentaren, och svara med en siffra mellan 0 och 1000.

Roboten behöver tränas. Det behövs ett tusental verkliga kommentarer som du fått eller skrivit, på väggposter på Facebook, tidningsforum eller liknande. Detta behandlas sedan av robothjärnan för att den ska lära sig avgöra vad som är ok och vad som är hot eller kränkande.

Hjälp oss samla in kommentarer till robothjärnan!

zerohate.simpletask.se/help.html finns ett enkelt formulär där du kan bidra med kommentarer.

Sprid ordet!

Även vanliga, neutrala och snälla kommentarer behövs för att roboten ska kunna känna igen skilladen. Därför kan även du delta som inte inte utsatts för näthat.

Ju fler kommentarer vi får in desto bättre fungerar algoritmen.

Integritet

Dina data behandlas anonymt och vi lovar att endast använda kommentarerna inom projektets ramar. Vi har bakgrund inom medicinsk forskning och vi vet vad personuppgiftslagen och forskningsetik innebär.

Men! Som de forskare vi är ser vi potentialen med en sån här datainsamling. Många forskare kommer vara intresserad av att få tillgång till denna data i en eller annan form till all möjlig slags forskning, till gagn mot näthatet.

Vill du så kan du ange din epost adress. Då kommer vi fråga dig om samtycke om det blir aktuellt att använda data i ett nytt projekt.

Eller så kan du ge samtycke redan nu för att data ska hållas tillgängliga för all forskningsverksamhet. Det är nätkärlek, det!

Hjälp oss göra nätet mer demokratiskt och fritt! Sprid ordet!

comments powered by Disqus

Feb 8, 2013
For a post in English directed at a coder audience, read rather Zero Hate. This post is intended mainly for a Swedish general audience.
Hjälp oss samla in data för att träna Zero Hate.

Jag har inte så mycket mer att tillägga om #näthat. Jag spyr med er. Jag kan bara säga att jag inom samma vecka är ganska nöjd med att vara svensk och skamsen. Jag tänker på The Economists behandling av Norden i senaste numret.

Jag har utvecklat ett verktyg för att snabbt ta bort kommentarer en masse. Jag kallar det Zero Hate. Vem som helst med en sida på Facebook (page) kan använda det för moderering av kommentarer.

Jag gjorde en screencast för att nå ut. Nu kan se den här.

Om ni har förslag eller kommentarer, tweeta. Vill ni hjälpa mig koda? Tack, gärna, tweeta!

Vad är motiveringen här?

H&M säger att dom tagit bort 3000 kommentarer. 2000 återstod. De får de de förtjänar pga policy och beredskap, men det är lätt att se att uppgiften inte är trivial. De hann inte begränsa skadan för sig själva eller de utsatta. Nästa gång klarar de det bättre, men de kommer inte vara nästa target heller.

Det är lätt att se hur kommentarstormar blir till. Sällan blir kommentarer fler än 10 i antal. Tar de sig över hamner de lätt på 100. Därefter tusentals. Mekanismen är att Facebook rankar populära poster högt på användarnas feeds, varpå de blir ännu populärare. Det kallas kritiskt fenomen och förekommer inte bara på Facebook och inte bara på nätet. H&M marknadsavdelning hoppas hitta nästa "tipping point", fast en positiv sådan.

Vissa ämnen attraherar en viss sorts kommentarer och då blir det läskigt väldigt fort.

Det är utomordentligt viktigt att ligga steget för när det väl brakar loss. Det handlar inte bara om beredskap. Man måste ha verktyg för att hålla inflödet i schack. Om det kommer 100 kommentarer i minuten måste du kunna ta bort 100 kommentarer i minuten. Då kväver du de mörka krafterna i sin linda. Hat-Sven får aldrig upp ögonen för posten eftersom hat-Niklas och hat-Olles kommentarer försvann inom två minuter från att de postades. Då behöver du inte ta bort 5000 kommentarer, kanske bara 100.

Givet att du inte har den beredskapen - H&Ms katastrof inträffade på helgen - kan du begränsa tillväxten genom att ta bort 5000 kommentarer på måndagen. Men då vill du läsa dom i äcklighetsordning. Värst, fulast och hotfullast överst. Du kan ta bort det 2000 översta utan att läsa dem.

Det är vad mitt program är till för. För tillfället är sorteringen i datumordning. Inga sökord. Men jag jobbar på det.

Så här ser jag framtiden.

  • Valbara sidor
  • Sökord, ranka, sortera och filtrera
  • Egna sökord
  • Realtidsuppdatering
  • Notifikationer
  • Språkbehandling/datorinlärning

comments powered by Disqus

Feb 8, 2013

For a post in Swedish directed at a Swedish general audience, read rather Noll Hat. This post is intended mainly for coders.

Dear people and in particular the parts of the audience stumbleing upon this site who happen to be from Sweden. None of you will have missed the awful display of hatred, bigotry and plain idiocy depicted in this weeks Uppdrag Granskning. This comes the same week that our region in the Nordic is haled by The Economist as role models. The grass, perhaps plenty green over here, is twiny, sharp and nasty.

In short, women are subjected to insults and death threats and everything in between, when they express their views on Facebook.

Amid outcries and shame, the pragmatist within me caught foothold on something. H&M say they deleted 3000 comments. 2000 remained but that's beside the point. It takes too long to read through, monitor and delete posts on your Facebook pages. I'm guessing max 10 per minute per person. Comments stream at many times that speed, at least when it counts the most i.e. most people are reading. It's clear to me that on trendy topics the rate of commenting, fair or otherwise, is a critical phenomenon and the growth is exponential. Certain subject attract women haters and they spur each other to write ever more twisted comments.

We need better tools to curb the rate, early and effectively. I tweeted about it and at the same time realized it was within my capacaity to do it and fairly quickly.

The result is Zero Hate.

This rather ruddy proof of concept lets you select multiple tweets and deletes them in batch. It's really fast and sure beats doing it one by one.

It's of limited use unless you can sort the comments according to contents of certain words and that's the primary target of the upcoming version.

The following this is the first screencast I've made and I know it shows. But it got the word out. It's in Swedish.

So this is a call for participation. Tweet me.

Github: camitz/zerohate

Anyone who can take over the server coding would be most welcome. Any platform will do.

Anyone who has experience in natural language processing, let thyself be known. We need to process comments server side and score them while training the AI, only temporarily persisting them.

Feature request list:

  • Selectable pages
  • Search for keywords, rank and order comments
  • Add to and persist, keyword list
  • Locally persisted comment store
  • Support for Facebooks paging, field limitation
  • Batch deletes
  • Realtime updates
  • Notifications
  • Language processing/machine learning

comments powered by Disqus

Nov 29, 2012

I know there is good rational behind putting off code optimizations, but I think we've been going overboard with that theme. So it was with relief that my boss told me now is the time to start profiling. Time to install MiniProfiler, in other words. Current version is 2.0.2.

I don't know how people can live without it. It's love and kittens for Asp.NET MVC developers, as Scott H says.

I've been using it for some time on several projects but I haven't installed the Entity Framework till now. That took a bit of head scratching as most of the stuff out there, including MiniProfiler docs, is about Code First setups. Some of my code is getting a bit antiquated, not least my Model First EF 4.2 usage. I just don't know what the stuff on miniprofiler.com means.

EF MiniProfiler for Model First projects

Nevertheless, help was available - as countless times before - signed Darin Dimitrov.

var connectionString = ConfigurationManager
    .ConnectionStrings["MyConnectionString"]
    .ConnectionString;
var ecsb = new EntityConnectionStringBuilder(connectionString);
var sqlConn = new SqlConnection(ecsb.ProviderConnectionString);
var pConn = ProfiledDbConnection.Get(sqlConn, MiniProfiler.Current);
var context = ObjectContextUtils.CreateObjectContext<CYEntities>(pConn);

What about Unity Container?

I'm also using Unity Container to inject my object context so if I'm going to put the above anywhere, it's in my container. Previously my container was configured like this.

IUnityContainer container = new UnityContainer()
    .RegisterType<CYEntities>(new PerHttpRequestLifetime(), new InjectionConstructor(connectionString));

I need to use a customized piece of code to do the creating for me and I recall this is what is known as a factory. Unity supports this through InjectionFactory.

Hence

IUnityContainer container = new UnityContainer()
    .RegisterType<CYEntities>(GetLiftimeManager(lifeTimeManagerType), 
         new InjectionFactory(c=> {
            var ecsb = new EntityConnectionStringBuilder(connectionString);                                                                                                                 var sqlConn = new SqlConnection(ecsb.ProviderConnectionString);                                                                                                                 var pConn = new EFProfiledDbConnection(sqlConn, MiniProfiler.Current);                                                                                                                 return pConn.CreateObjectContext<CoinEntities>(); 
         }));

And it works, too.

comments powered by Disqus

Oct 30, 2012

A month ago I successfully chained together log4net, StatsD and CloudWatch. I put together a backend for StatsD in node.js called aws-cloudwatch-statsd-backend.

Yesterday I finally updated it into something I can actually use. I'm using it in production on Cocoin.

StatsD's capability of pushing metadata along with the request is limited. But you can parse the bucket name for information. I wanted to extract namespace and metric name.

Currently I'm sending UDP packages of this style to count the requests on the site.

App/Controller/Action/Request:1|c

This get's translated to a cloudwatch request with App/Controller/Action as a namespace and Request as the metric name.

Using the bucket name for metadata tampers with generality but doesn't blow it completely. You can still add on arbitrary backends.

To install it, use npm.

npm install aws-cloudwatch-statsd-backend

Here is an example configuration.

{
    backends: [ "aws-cloudwatch-statsd-backend" ],
    cloudwatch: 
    {
        accessKeyId: 'YOUR_ACCESS_KEY_ID', 
        secretAccessKey: 'YOUR_SECRET_ACCESS_KEY', 
        region: 'YOUR_REGION',
        processKeyForNames:true
    }
}

You can also override both the namespace and metric name (metricName) in the config.

Full documentation on the npm and GitHub pages.

My other monitoring project, CloudWatchAppender, and appender for log4net wihout aggregation, has now been downloaded 236 times. Very pleased with that.

comments powered by Disqus

Sep 18, 2012

I started out this blog connecting log4net to CloudWatch, a monitoring service in Amazon's AWS cloud. The idea was to turn all those log4net log events that you might have in your code into graphs on CloudWatch, while modifying only the config.

The result was CloudWatchAppender which was my first ever project on GitHub (and NuGet) and which I've developed in a few crucial steps since. It has matured to a feature rich little lib and can do alot more than count events. It has seen a humble 66 downloads from NuGet. Totally worth the effort, if you ask me, not least since I use it in production extensively.

One caveat that was immediately realized that you had to be very selective with the events you send off. One, it'll quickly clog up your bandwidth. How does ten, twenty, a hundred or even one http-requests to CloudWatch per incoming page request sound? Secondly, CloudWatch imposes it's own limit as well.

This only diminished it's usefulness but we would like a way to aggregate the data and perhaps leverage CloudWatch support for sending statistics. This is something that you'd rather not do within a web application, typically with a lifetime in the order of a single request. The alternative is a more permanent service hanging around in the sidelines, acting as a middleman between your app and CloudWatch, sending statistics maybe a few times every minute.

Taking a look at whats out there, there is a fairly standard solution, or so it seems to me... for Unix environments. It's called Statsd + Graphite. Statsd is an excellent and very lightweight aggregator written in node.js.

Statsd comes ready for use with Graphite out of the box, which is an open source viewer for all kinds of statistics. It look's great from what I can tell. I wasn't able to install Graphite on my system.

As for Statsd, well it's node.js. Node.js = javascript so I was not intimidated, especially since node.js now fully supports Windows. The installer is downloadable off the official site.

This holds some promise if we can also find something node.js + AWS. Google turned up Awssum and in particular node-awssum which is exactly what we're looking for.

So that's what we're going to do for this post: Connect log4net to Statsd to CloudWatch.

How easy is this?

I think very. It's all install, configure and test save for a backend for Statsd. I've never done anything in node.js before although I regularly code in javascript and dojo.

First, if you haven't already, download and install node.js. Test it in a console window. Issue node. If that brings you into the direct interpreter then you're good to go.

Download and extract Statsd or clone it with GitHub for Windows if you're on a development system. Rename exampleConfig.js to myConfig.js.

In the console window navigate to where Statsd is and issue

node stats.js ./myConfig.js

You should now have

The server is up.

and some other bits of info.

Will it recieve data? Bring up that config.js again, replace the default backend with the one that outputs to console, set the flushinterval to 3 seconds. You've got something that looks like the following. I've removed some stuff that is of no consequence to us.

{
flushInterval: 3000
, port: 8125
, backends: [ "./backends/console" ]
}

That will aggregate everything incoming on UPD on port 8125 and pass it on to the backend every 3 seconds. The backend is a pluggable piece of code. it can be a npm module, if you're familiar with that sort of lingo. In this case it's going to dump stuff to the console. Understandably the input has to be formated to suit Statsd. "gorets:1|c" seems an adequate choice, taken from the Statsd readme. That's a counter with increment 1 and it's called "gorets". Terminology for Statsd is bucket for the counter name.

Test it

In Visual Studio we're going set up a new console project to test our gear. Create the project from a console template. Then put the following in program.cs and hit F5. It's basically just Microsoft example code.

using System;
using System.Net;
using System.Net.Sockets;
using System.Text;

class Program
{
    static void Main(string[] args)
    {
        Socket s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram,
            ProtocolType.Udp);

        IPAddress broadcast = IPAddress.Parse("127.0.0.1");

        byte[] sendbuf = Encoding.ASCII.GetBytes("gorets:1|c");
        IPEndPoint ep = new IPEndPoint(broadcast, 8125);

        s.SendTo(sendbuf, ep);

        Console.WriteLine("Message sent to the broadcast address");
    }
}

From Statsd you'll get something like

17 Sep 13:40:08 - reading config file: ./myConfig.js
17 Sep 13:40:08 - server is up
Flushing stats at Mon Sep 17 2012 13:40:18 GMT+0200 (W. Europe Daylight Time)
{ counter: { 'statsd.packets_received': 3, 'statsd.bad_lines_seen': 0 },
  timers: { 'statsd.packet_process_time': '0', gorets: '3' },
  gauges: {},
  sets: {},
  pctThreshold: [ 90 ] }

within 3 seconds.

I can't believe I contemplated coding up my on Windows service. This could not be any easier.

Making a backend

How to roll a backend is outlined on the Statsd readme. You're listening to three events of which flush is relevant to us now. Starting with the config file let's start designing what we want. Add the following to config.js.

  , backends: [ "./backends/aws-cloudwatch-statsd-backend" ]
  , cloudwatch: {accessKeyId:'YOUR_ACCESS_KEY', secretAccessKey:'YOUR_SECRET_KEY', region:"EU_WEST_1"}

It's clear what we expect here. The first step is to copy console.js, i.e. the backend we've already tested, and rename it to aws-cloudwatch-statsd-backend.js. It will still work of course.

Using console.js we can just add the extra functionality while leaving the old code in place for tracing. Add lot's of debug outputs as well. To get config.js in, modify the constructor to look like

this.config = config.cloudwatch || {};

in the appropriate place.

Awesome

Node-awssum is an npm-module meaning it will install almost by itself. Issue npm install -g awssum and Node-awssum installs it to where everybody can find it.

Browse the source code. In the example directory we're given a wealth of coding examples. For CloudWatch we find list-metrics.sj. Close enough for starters. We see that we need this

var fmt = require('fmt');
var awssum = require('awssum');
var amazon = awssum.load('amazon/amazon');
var CloudWatch = awssum.load('amazon/cloudwatch').CloudWatch;

at the top. The first of these includes is a console formatter used extensively by Awssum, it appears. You don't have to use it but it turns out to be very useful. Let's install it, too. npm install -g fmt.

The actual request should go in the flush event handler. The choice of variable names in myConfig.js at this point you'll realize was deliberately chosen to match with Awssum's. The passed object fits snuggly into the CloudWatch config with one exception that we'll get to in a moment.

var cloudwatch = new CloudWatch(this.config);

cloudwatch.ListMetrics(function(err, data) {
    fmt.msg("listing metrics - expecting success");
    fmt.dump(err, 'Error');
    fmt.dump(data, 'Data');
 });

Firing up now will produce and complaint about the region and that's because our region string is not what's expected. However, it does index into an array I found in the source code. The following line in the constructor will do the trick.

config.cloudwatch.region = amazon[config.cloudwatch.region];

Now you'll have a backend that lists your metrics if you have them, plus what's left of the console backend, every time you get a flush event. It's not what we want but we're definitely getting somewhere. We've connected Statsd to CloudWatch which is more than half way. Before we get any further let's take a look at the log4net end of the chain.

The UDPAppender

Someone meant for this to be done. log4net provides an appender targeting UPD services and it's all just configuration from here.

Halt the console app and hop into NuGet manager in Visual Studio and add the latest log4net library. Program.cs will look like this.

using System.Threading;
using log4net;
using log4net.Config;

class Program
{
    private static readonly ILog log = LogManager.GetLogger(typeof(Program));

    static void Main(string[] args)
    {
        XmlConfigurator.Configure();

        while (true)
        {
            log.Info("Counting. The message will be ignored.");

            Thread.Sleep(10);
        }
    }
}

Notice, we've added some stress to the system. 100 calls a second.

The config may look like this:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <configSections>
    <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net"/>
  </configSections>

  <log4net>
    <appender name="UdpAppender" type="log4net.Appender.UdpAppender">
      <remoteAddress value="127.0.0.1" />
      <remotePort value="8125" />
      <layout type="log4net.Layout.PatternLayout" value="gorets:1|c" />
    </appender>

    <appender name="ConsoleAppender" type="log4net.Appender.ConsoleAppender">
      <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%date [%thread] %-5level %logger [%ndc] - %message%newline"/>
      </layout>
    </appender>

    <root>
      <level value="ALL"/>
      <appender-ref ref="ConsoleAppender"/>
      <appender-ref ref="UdpAppender"/>
    </root>
  </log4net>
</configuration>

We've add a console appender here just for the trace. Notice the layout element. We're not going to care about the actual event message. We're formatting everything to just "gorets:1|c", same as before. We're not even bothering to consider what metric name might be used, that'll be a feature for later.

Test it. You should have two console windows output lot's of text, one continuously, one every three seconds.

PutMetricData

We'll skip ahead now. Unfortunately, there is no example code in Awssum for the AWS PutMetricData call. But a file search turned up with header declaration with expected parameters and also some test code. Add some guessing, browsing some of the other examples and the online API reference for AWS and we may produces this.

var metricDatum = {
            MetricName : 'Gorets',
            Unit : 'Count',
            Value : metrics.counters.gorets
        };

cloudwatch.PutMetricData({
        MetricData : [metricDatum],
        Namespace  : 'Gorets'
    },
    function(err, data) {
        fmt.msg("Putting metrics");
        fmt.dump(err, 'Error');
        fmt.dump(data, 'Data');
    });
};

We're not overreaching. This is no a finished product. The final result will put counter data on CloudWatch. The chain is finished and we did what we set out to do. The code below has added on line to the metricDatum object, for the sake of timestamp pickyness. Don't forget, AWS lives in UTC.

Timestamp: new Date(timestamp*1000).toISOString()

I've also removed the code from the original console backend we don't need.

var util = require('util');

var awssum = require('awssum');
var amazon = awssum.load('amazon/amazon');
var CloudWatch = awssum.load('amazon/cloudwatch').CloudWatch;
var fmt = require('fmt');

function CloudwatchBackend(startupTime, config, emitter){
  var self = this;

  config.cloudwatch.region = config.cloudwatch.region ? amazon[config.cloudwatch.region] : null;
  this.config = config.cloudwatch || {};

  // attach
  emitter.on('flush', function(timestamp, metrics) { self.flush(timestamp, metrics); });
};

CloudwatchBackend.prototype.flush = function(timestamp, metrics) {

  var cloudwatch = new CloudWatch(this.config);

var metricDatum = {
            MetricName : 'Gorets',
            Unit : 'Count',
            Value : metrics.counters.gorets,
            Timestamp: new Date(timestamp*1000).toISOString()
        };

cloudwatch.PutMetricData({
        MetricData : [metricDatum],
        Namespace  : 'Gorets'
    },
    function(err, data) {
        fmt.msg("Putting metrics");
        fmt.dump(err, 'Error');
        fmt.dump(data, 'Data');
    });
};

exports.init = function(startupTime, config, events) {
  var instance = new CloudwatchBackend(startupTime, config, events);
  return true;
};

Take a look at CloudWatch now, maybe something like this will show up.

I'll continue working this into something useful. I have already included it in production on cocoin.com. You can find it already on GitHub. By all means fork it and contribute or just suggest additions. The GitHub version has a different layout copied from another backend project for MongoDB. In the future it's meant to an npm module that you can install just as we did Awssum and Fmt.

A couple of branches I've been thinking is how we could integrate the system with CloudWatchAppender or even if this is desirable; and getting Graphite working on windows. I'd like to see a tutorial on that.

comments powered by Disqus

Sep 16, 2012

Building your dojo application is a poorly documented pain, in my view. It took me two weeks to get from A to B, not full time, but there was that delay. The only relief comes from Colin Snover at Site Pen who has provided a boilerplate for us to start from. Its design is for a stand alone dojo application but it's the unavoidable starting point for any project.

I don't have to explain the gains of doing a build, though. That's been done. Suffice to say it's benefial to the point of being crucial.

I am perhaps in a unique situation. Certainly I've realized that I am of a fairly small number in Sweden using dojo. Using dojo in conjunction with ASP.NET probably places me in a select group globally. Vote on ASP.NET MVC suggestions to contest this view.

Build it

This is a blueprint of making an dojo build for use in a ASP.NET MVC web environment. The goal is to get all your and dojo's js into a single file, dojo.js. You can split it up later if you want.Make sure to tell me if you find any errors or omissions.

To summarize, even with the boilerplate, it's still painful. Incidentally, derived phrases such "painstakingly", "taking pains to..." all apply. It's a slow process involving small incremental changes, then building, then testing. Use your source control overdiligently. Be smart. Be an engineer.

Everything is centered around and error prone at

  1. The dojo javascript include and its dojo-config.
  2. The build script build.bat
  3. main.js
  4. run.js
  5. app.profile.js (or coin.profile.js as I will later rename it to)

Add them all to your source control when appropriate.

ASP.NET MVC + dojo?

They're not in love, up in a tree, k-i-s-s-i-n-g... What does it mean to use the two in conjunction? Well, for me it means:

  1. Lot's of dojo framework code in odd places, both page specific code and application global code.
  2. Lot's if dijits on my .cshtml-pages. (razor? yes!) Some are declared in html, some programmatically.
  3. Lot's of my own custom widjets, similarly declared in html or programmatically.

At some point in the past I migrated to AMD which was also a pain, but not so much. Mostly it felt very good. This instruction was invaluable for my widjet.

Still, for the rest of the application... pain. You need to have complete control over widjet parsing and use lots of ready() and !domready. But in the end, you're also given more control as a result.

If you don't use AMD then I don't know if this blog post will work for you.

Starting off with the boilerplate

  1. Download (or clone) the boilerplate into a new directory. This is the 1.7.2 version. Lots of other versions are on there as well but I don't know how well this tutorial will apply. At one point I will upgrade and update this post.

  2. Did I say I was on windows? It's implicit in .NET perhaps. Never mind, the build script needs modifying and we call the bat-files not sh. Let's use the provided build.sh and rework it. You have java installed, right? This is what mine looks like.

    rem Base directory for this entire project
    set BASEDIR=c:/www/dojo-boiler
    set BASEDIRw=c:\www\dojo-boiler
    
    rem Source directory for unbuilt code
    set SRCDIR=%BASEDIR%/src
    set SRCDIRw=%BASEDIRw%\src
    
    rem Directory containing dojo build utilities
    set TOOLSDIRw=%SRCDIRw%\util\buildscripts
    
    rem Destination directory for built code
    set DISTDIR=%BASEDIR%/dist
    set DISTDIRw=%BASEDIRw%\dist
    
    rem Module ID of the main application package loader configuration
    set LOADERMID=app/run
    
    rem Main application package loader configuration
    set LOADERCONF=%SRCDIR%/%LOADERMID%.js
    
    rem Main application package build configuration
    set PROFILE=%SRCDIR%/app/app.profile.js
    
    
    echo Building application with %PROFILE% to %DISTDIR%.
    
    echo -n "Cleaning old files..."
    del %DISTDIRw% /s /q
    echo " Done"
    
    cd %TOOLSDIRw%
    
    java -Xms256m -Xmx256m  -cp ../shrinksafe/js.jar;../closureCompiler/compiler.jar;../shrinksafe/shrinksafe.jar org.mozilla.javascript.tools.shell.Main  ../../dojo/dojo.js baseUrl=../../dojo load=build --require %LOADERCONF% --profile %PROFILE% --releaseDir %DISTDIR%
    
    cd %BASEDIR%
    
     dir dist\dojo\dojo.js
    
    echo "Build complete"
    

    Pain! Yes, I believe I said so. Most variables have forward slash and backslash variants. Go ahead and tell me about the better way to do this.

    The first lines should reflect your absolute basedir. Build it. It should work. Surf to index.html. It should work.

  3. I've made one functional addition above. It's the dir-thing. If the build does not work, dojo.js will be of size zero so this is the check that things are going in the right direction.

    What could go wrong? Well, I found it very difficult with trailing commas here and there in my javascript. This is wrong but there are no browsers that warn me about it and no visual studio either. What happens is simply a dojo.js of size zero. That was until Ken, who's a star of the dojo-interest mailing list gave me this:

    // Reduce logging level to WARNING in Closure
    if (typeof Packages !== 'undefined' &&
        Packages.com.google.javascript.jscomp.Compiler) {
              //Packages.com.google.javascript.jscomp.Compiler.setLoggingLevel(Packages.java.util.logging.Level.WARNING);
    
              Packages.com.google.javascript.jscomp.Compiler.setLoggingLevel(Packages.java.util.logging.Level.SEVERE);
    }
    

    Trailing commas and anything else that closure can't get its head around shows up as errors. Increase your command window buffer size.

    Also check the build-report. It's more of an interesting read than anything else. Some errors about i8n and plugins appear. These are known issues for dojo 1.7.

  4. At this point I upgraded to 1.7.3. If you want, download and replace all files. Build and test.

    Preparations in your javascript

  5. Make sure everything is prepared, nice, good and correct. All must be encoded in ANSI or UTF8, preferably the latter. Not UTF8+.

    You're using AMD, you have relative paths for modules, you have no trailing commas in you dependency lists like I did. Make sure all your widjets are in a single subdirectory to Scripts. If this subdirectory is app, lucky you. Mine was coin.

  6. Make sure there are no javascript errors on your site. Surf around with a developer panel like Firebug open. Use different browsers. Diligence.

  7. Copy dojo from the boilerplate locally, dojo, dijit, dojox and utils, the works. Set to load dojo.js from this place. I was using Google's CDN prior to this, even for my local development. Maybe you have your dojo locally but in a directory way over there? That's not what I mean. I mean in you development directory, in your ~/Script directory. You don't have to include it in the VS project or in you source control, though. I've given this thought but I had so many problems with the basedir and relative paths to scripts and locale bundles that... well, I'm not going there.

  8. Try your app. Run UI tests if you have them. Surf to a page that has dijits declared in markup if you have them. This will convince you that the parser is running when it should. Keep an eye on Firebug for js and network errors.

    Boilerplate goes in there

  9. Copy the boilerplate locally. So that's the app directory, index.html and the build script if I'm not mistaken. Adjust your build script. Build and surf to index.html. It should work. That means all the directories know each other.

    Bring it together

  10. I decided to parse my pages in main.js. It's a nice place to start your application. One line is:

    define([ 'dojo/has', 'require',  'dojo/ready', 'dojo/parser' ], function (has, require, ready, parser) {
    

    Some others are:

      ready(function() {
          parser.parse();
      });
    

    The complete file is lost to me now. You'll figure it out.

  11. Build. Well, you haven't changed anything really, so it will still build. Right? That's what I'm talking about. You can't build and test enough. Things will go wrong and you just can't see it.

  12. No, my app is called coin not app. So everything in the app directory goes in the coin directory or whatever your one is called. Copy, don't move. Leave app in place for now. Update the build script using search and replace. The comments will have words like coinlication. Why should that bother you? Do the same for what's now called coin.profile.js and run.js. baseUrl is fine the way it is i.e. ''.

  13. The contents of index.html should go into your layout definition. This is the starting point for the dojo part of the application. I put this in a separate file, dojo.cshtml, that I include with

    @Html.Partial("Partials/Dojo")
    

    The snippet has defer on each of the two script tags. Remove the first.

    Build. The web site should now work again, augmented with the little Hello worlds dialog from the boilerplate. Importantly, we're now loading run.js (from the dist directory, of course). run.js is used by the builder and also by the site, so it's a very important file to get right.

    One by one

  14. All the dijits and custom widjets that were declared in HTML: now is when I start including them in the build. Formerly they hung loose on a layout page or a specific cshtml. May have had some idea about this or that widjet not being required unless someone ventures into this or that page. That sort of thinking is great but now is not the time for it.

    I'm bringing them into the bottom of my run.js, the dependency list in square brackets. You can also put them in your profile if it makes you feel better.

    One by one folks! Start with factory made dijits. Include it in run.js. Build. Test. Proceed with your custom widjets.

    I can't stress how totally worth your time this is, even though the build may take minutes. Read up on some blogs or whatever during the build. Lots to learn.

  15. Now it's time to start loading the site from dist/dojo. Change it in dojo.cshtml.

    Start with uncompressed (dojo.js.uncompressed.js) version so you can debug it if there is a problem in your own modules.

    Locale bundles and other resources

    At this point, let me pause for a minute. Perhaps you are having problems with locale? Loading bundles in cldr, nls and what have you? Other resources? blank.gif?

    I had painful problems. It seemed like the client would look in arbitrary locations for them. If I'd navigate to Controller/Action, I'd see network errors for &Controller/Action/dist/dojo/resources/blank.gif. Sometimes it would work in one browser and not the other. Sometimes in the uncompressed dojo.js while it worked fine in the compressed one. I experimented with baseUrls and everything I could think of. In the end I just hacked it. I put this in Global.asax.cs.

    protected void Application_BeginRequest(object sender, EventArgs e)
    {
        string path = Request.Url.PathAndQuery;
    
        if (!path.Contains("dist/"))
        {
            if (_regex.IsMatch(path))
            {
                var match = _regex.Match(path);
                Context.RewritePath("/dist/" + match.Groups[1].Value + 
                    match.Groups[2].Value + match.Groups[3].Value);
            }
        }
    }
    

    What that does is make sure the if dist is in the requested url\s path then rewrite the url to make sure that dist is the root. Nothing else make's any sense.

    In the end I also modified the dojo source code for locale failures in ValidationTextBox and CurrencyTextBox which worked in everything except Firefox. Some kind of locale thing failing to load. That will strike back, I know, but later and perhaps no after I upgrade to 1.8.

  16. Take a look in the network listing of whatever developer panel you're using. You may see some files, although loaded from the dist directory, they're still being loaded separately. They've not been included in the build for some reason. You'll need to include each one specifically in your layer, in your xxx.profile.js. Don't change anything. Just add to it.

    Take a break when all that remains is dojo.js and is dojo_sv.js or whatever locale you're in. It's meant to be that way.

    At some point I also had a couple of common.js in there. I'm not sure why they're gone now. If you have them, just leave them, perhaps you'll be lucky, too. Not worth fretting over, anyway.

  17. Now start removing dependencies on the Scripts directory. Use your network panel and file search.

    I had some files that were AMD but I had put them in directly in Scripts. No real reason. Not really part of any widjet or app per say, not really pure "script" either. Put it in the coin subdirectory and reference it from run.js. Feel good about that.

    I also had a collection of odd functions in Helpers.js. Some are now included in the one page that happened be using them. Others I put directly in my main layout file. That left me with two pure "script" files that were loaded separately. Someday I'll make them AMD, good and proper.

    Build. Test.

  18. Remove the part in run.js which displays the Dialog. Remove Dialog.js and any references.

    Build. Test.

That's all there is to it!

ZQCGE4H2JF35

comments powered by Disqus

Sep 14, 2012

Ever thought about what region to use for what audience? Lot's of people must have but they keep their findings to themselves from what I can tell.

Obviously if you want to target Japan, put you site in the Tokyo region. But if you want to target Vietnam or Thailand, like the client who put me up to crude benchmark did, then you may be at a loss.

I have a PhD but this is so unscientific. Interesting perhaps, nonetheless.

I set up the same site on a number of different Regions in the AWS cloud. Then tested the first uncached page load using the very helpful webpagetest.org. Webpagetest.org let's you test your site as a browser client from a number of different sites accross the globe. Unfortunately not Vietnam and not Thailand. You can even script it if you want.

The resulting graph looks like what you would get from hitting F12 in Chrome and opening the network tab.

Here are my results.

Access point
JiangsuTokyoAmsterdam
RegionIreland (eu-west)55 s8 s4 s
Virginia (us-east)10 s6 s5 s
Oregon (us-west)8 s5 s6 s
Singapore (ap-southeast)16 s4 s9 s
Tokyo (ap-northeast)11 s29 s5 s

Disclaimer: Singapore is the original site and has some overhead, like SSL-encryption. I only did this once so that hardly qualifies as statistics. But it gives you an idea about what to expect when you launch a site intended for certain locality.

Someone should do an extensive benchmark. Maybe I should. Btw, take at Google Page Speed if you haven't already.

comments powered by Disqus