blog.leidegren.se: 2008

Saturday, December 27, 2008

Shearing my life

I could seriously do for a indefinite timeout. With all the things in the world that needs to get done, 24 hour days ain't nowhere near enough.

I been wanting to make a website based around some 3D element. I got curious and looked at websites that did similar things, trying to figure out the simplest way of 3D-2D mapping. Skewing or shear mapping (which is another fancy name for a bizarre transform) actually preforms the kind of transform I've been wanting to do and it isn't really that difficult.

I began figuring out the geometric interpretation. I'm very much so a visual person in that I have to picture things before I can argue about their function. The above picture just came to by applying a really trivial skew transform over a period of time.

...but I need more time, all the things to do, all the things to see. I have to write up my Master's thesis, I got two jobs on the side which has to get done this week and I should really look for a third more permanent job. My time at DICE has been most wonderful, its truly a great place to be working at but I'm not certain I'll be able to continue once my internship is up. EA announced a 1000 people layoff 'til the end of March. My hope is that they let HR hire people along the side...

And apparently some stupid company got a 10-year old patent approved today. The consequence is that if you generate icons based on the contents of a file, you will be slapped with a lawsuit for patent infringement. Microsoft, Apple and apparently Google are on trail. What the hell is up with the USPTO!?

Thursday, October 30, 2008

PDC2008

The Future of C#
"Oslo" - The Language
Mono and .NET (kodus Miguel!)
Contract Checking and Automated Test Generation with Pex

Sunday, August 17, 2008

Piracy, causality and PC gaming

Piracy is the result of an elite group of reverse engineers that mostly got bored with everything else and took upon themselves to face a greater challenge. Software (and games in particular) is not 'cracked' because it's a good for you. It's the way some talented people choose to deal with boredom. And in a sense combating piracy in the realm of software engineering through legal actions is like a Sisyphean challenge. There will always exist bored, beautifully talented people that will be able to circumvent the inner workings of whatever DRM software you imagine.

If you want to pickup and play a game, the time it takes to get from that feeling in your brain to actually playing, will probably end up determining if you're going to play at all. There's this notion of availability which I do believe that consoles have in advantage but it's not exclusive to consoles. Getting from the point at which you decide to play a game up to the moment where you've bought and installed the game is an important deciding factor when combating piracy. Which ties together accessibility and distribution, that is, the very core of information technology.

The following quote was taken from the Hanselminutes #89 podcast (the context is console security).

"You know, one of the things people don't understand is some of us have like scanning electron microscopes in our living room."

He puts up a slide that has a picture of his living room and darn if there isn't a scanning electron microscope next to his couch and he uses that. He sands off the epoxy on the top of the chips of the things he's trying to hack and he uses this electron microscope to read the internal circuitry to find out what it does.

Outsmarting an army of people with an endless amount of resources, is just silly i.e. ByteShield which have written a very interesting and probably plausible way of truly protecting software, is flawed in that it's a system that people will take great pride in circumventing.

But what about availability? I'm congratulating Valve for their premium free of charge service Steam. Because it's everything the industry needs to be. Almost $2 billion in sales was collected from Steam-like services in 2007 (that roughly 21% of the grand total according to the ESA).

Download services that won't cut it, will basically sell you a single download opportunity and once that window closes your purchased content is forfeit. One of the reasons I enjoy Steam so much is that I can bring my games with me, anywhere I go. I just install their bootstrapping software Steam and let the software prepare the games I wish to play. It's as simple as that.

I'm saying that it all comes down to this very moment, were the sheer number of clicks you'll have to make to get the game to your hard drive is a deciding factor whether you're going to buy that game or not. People that understand piracy will be able to grab a seemingly bug free copy of the game for no money at all and start playing without having to invest any time into the process of acquire the game itself. This is where you (as a business exec) will have to be smart and competitive.

Now, Steam is free of charge, no subscription required, it's not pay-per-view and that's important, because you cannot compete with people that crack software just for the kicks of it. Not by taking money for something they'll provide for free. This is all about value and aggressive pricing. Even though these talented reverse engineers are great at what they do, they make mistakes and the quality can vary a lot. Many cracked games bring about new bugs, bugs that otherwise would not occur and any non-engineer would presume the game is faulty and might end up NOT buying the game based on false grounds. But with cleaver marketing, you'll pay that extra money to get that full experience.

Peter Moore is a man I've come to respect. He said this in a response to some rather harsh legal actions taken by Atari, Codemasters, (Topware, Reality Pump and Techland).

Now, if you want people to buy your software, you'll have to find value in the purchasing of your product as a full service package, not in the sequence of bytes alone that you'll end up shipping. For this, there's no single answer, instead there are numerous of carefully planned steps that need to take place.

Make a great game and if you think shipping NOW with what you got is the only way out, you're already fighting a losing battle. At this point you're better off saving what you can and starting over. It's hard yes but it is the right thing.

Michal Kicinski, co-founder of CD Projekt. Managed to sell 1 million units of The Witcher by sticking to a specific audience. You can read the rest here. What will you do to make sure your game sell?

Tuesday, July 1, 2008

Fast binary visualization (to hex conversions) and back

This spin-off started with the immutable System.Data.Linq.Binary class. It's somewhat irritating because it does not expose any read-only capabilities. As such one must either rely on the ToString method which surrounds the base64 encoded value in quotes or the ToArray method which uses defensive copying.

At this point, I wonder why they never defined a public getter, like with the string class (string s; s[0]). Since what I need is to be able to serialize this object. Object serialization comes in many flavors, I've chosen one which utilizes reflection to emit a parser collection of strongly typed serializes. It's way faster than what you get with any of the built-in stuff but I've purposely chosen to ignore the System.Data.Linq.Binary because I didn't use it, up until now.

SQL to LINQ maps timestamps to the Binary data type and you need timestamps to apply optimistic concurrency. So I have to able to treat them as well. And this is where this hack'ish hex stuff comes in.

In the end, timestamps are 8-bytes in size. So I accept the defensive copying overhead that takes place. I rely on the ToArray method and .ctor(byte[]) for getting and setting the internal representation.

When you look at hex, the 4 high and low bits of a byte is usually referred to as a nibble. A nibble map to 1 hexadecimal character, usually in the range of [0-9ABCDEF] but why? There really no relation between the binary value or the ASCII representation.

I gave it some thought, looked at the bit patterns and this is what I came up with.

Each nibble represent a 4-bit integer in the range of 0-15. As such if I were to use a lookup table I could map this space to any 16 characters, but I chose to not do that. Instead I searched the byte range for a bit pattern which would not involve any lookup map and still remain a presumably safe ASCII representation.

Funny enough the commercial at '@' character happens to not use the low 4-bits of a byte and the following 16 characters constitutes "ABCDEFGIHJKLMNOP", the beginning of the alphabet. What's even more coincidental is that the caret notation, is exactly this (Ctrl+C is really ASCII character at zero-based index 3).

But enough talk, let's look at the code:

public static string Serialize( this System.Data.Linq.Binary binary )
{
    StringBuilder sb = new StringBuilder();
    byte[] bytes = binary.ToArray();
    for ( int i = 0 ; i < bytes.Length ; i++ )
    {
        // Unpack
        sb.Append( (char)('@' | (bytes[i] >> 4)) ); // '@' 0x40 | (i & 0xf)
        sb.Append( (char)('@' | (bytes[i] & 0xf)) );
    }
    return sb.ToString();
}

public static System.Data.Linq.Binary ToBinary( string s )
{
    byte[] bytes = new byte[s.Length >> 1];
    for ( int i = 0 ; i < bytes.Length ; i++ )
    {
        // Pack
        bytes[i] = (byte)((s[i << 1] << 4) | (s[(i << 1) + 1] & 0xf)); 
    }
    return new System.Data.Linq.Binary( bytes );
}

The above code in C# can be used as is. But those of you that think bit manipulating is a bit daunting, I recommend you to take a while to understand the code (make sure you fully understand the impact of explicit and implicit type casts in conjunction with bit arithmetic).

Each nibble does not occupy the 4 high bits of each byte and since we want to be able to represent them as strings, the bitwise | with '@' (0x40) will give us a very nice ASCII representation. One which is also very efficient to parse, as each nibble is stored unmodified together with the '@' character. We simply bitwise & with mask 0xf to get just the low part, and then bit shift everything else into place. Another nice property of this method is that the ordering of the binary value does not change with the textual representation.

This bi-directional relationship actually wants me to stop using the old fashion hexadecimal representation altogether and start using this caret notation stuff. I reckon it's just wrapping your head around another numerical representation.

There's 10 kinds of people, right?
Those who speak binary and those who don't.

Update July 25, '08: I've observed that the commercial at character '@' is sometimes encoded, despite the fact that any documents I could find on the subject says otherwise. So if you want a pure ASCII and URL-safe binary representation add 1 just before the serialization and subtract 1 just after the de-serialization. It will do the trick.

Saturday, May 31, 2008

Type-casting as bugs in C#

I've been noticing more and more that people use the 'as' keyword in C# to presumably do safe type casts. But enough is enough, it's not ideally the way to actually go about type conversions, nor does it have any benefits.

To start things of, the expression 'instance1 as SomeType' keyword is just the equivalent of '(instance1 is SomeType ? (SomeType)instance1 : null)' and the following code will throw an exception anyway, iff the type is not instance of SomeType... So why do you type cast this way?

[TestMethod]
public void Index()
{
    // Setup
    HomeController controller = new HomeController();

    // Execute
    ViewResult result = controller.Index() as ViewResult;

    // Verify
    ViewDataDictionary viewData = 
        result.ViewData as ViewDataDictionary;

    Assert.AreEqual( "Home Page", viewData["Title"] );
}

The above example is from the ASP.NET MVC Framework, and while it's just a sample from the testing framework this code is bad. Because when the right-hand side 'controller.Index() as ViewResult' returns null (the return type was not of type ViewResult) and this code should, because this expression implies that type does not have to be ViewResult. There is no error handling code and the result is the NullReferenceException, which at this point is a side effect of some other code that failed.

Do you know the null coalescing operator? It's about the most useless thing in the language, or is it?

(controller.Index() as ViewResult) ?? new ViewResult()

Now, if you really want the program to crash in the event of such a failure you should NOT do this, nor should you yield null in the case of a type test failure. But if you want to actually type cast and in the case of a problem fall-back to some instance, you can use the above snippet.

Think about it. The example is really stupid because it fails to capture that an error could (or could not occur) as well as that if it would, the wrong error would be reported. But this snippet, it can actually make sense of a failed type test and rely on any other code to provide an instance of some type that doesn't have to stop execution of your program.

Sunday, May 18, 2008

Bending the Command Prompt to your will and why I don't use Microsoft's PowerShell

I'm a Windows guy for sure, but I bought a MacBook because they are kind of cool. And I occasionally write code for both OS X (BSD) and Windows and often so use command line tools.

One thing that Windows kind of have been missing is a terminal, with capabilities more similar to that what you get from Unix/Linux (from now on referred to as -nix) based operating systems. Microsoft has noticed this and they believe PowerShell is the one solution to all your scripting and terminal needs. I have a couple of problems with Microsoft's PowerShell and that is why I write this.

Auto-completion
: PowerShell auto-completion is completely subpar, not only does it fail to recognize aliases or commands in your path environment variable, it rewrites partial paths to full suggestions which covers your prompt. You end up cycling several suggestions before you run into the right one, or the completion in the worst-case will suggest a complete name, while you only want part of that. It makes me think that the guys who wrote PS did not have any background experience in NIX based operating systems #!/bin/bash.
Output: PoweShell does have a somewhat nice way of pretty printing output from different commands, but it's bloated by too much white space. I'm not very keen on output that will take up my whole screen, I want something which is mean and lean as often as possible. PS fails to achieve this with several commands, such as the basic ls or mkdir commands.

That Microsoft currently does not support SSH is just sad, but if you want to be able to SSH to your Windows computer you can always rely on Cygwin (though, Cygwin is bundled with so much more). Anyhow, this post is about making the good old fashion Command Prompt behaving more like a terminal you're familiar with if you know -nix based operating systems. The Command Prompt itself is actually alright.

Some sugar

Font-rendering capabilities are getting better in Windows, with ClearType you get true subpixel rendering which makes things more readable, so why don't apply this to your terminal. All you need is a TrueType font face with ClearType support. A monospaced font you might know of which is excellent for terminals and code is Consolas. I use it for everything code and it is freely provided by Microsoft.

Install this font on your computer and fire up the registry editor.
Goto HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont
Add a string value "00"="Consolas"
Reboot (you'll have to reboot for the font to be available in the command prompt settings dialog)

Commands

What I do is that I, set up a PATH variable to point out a bunch of batch files that reminds me of -nix. I create a file ls.bat and put a single line: "@dir /w" in that file. This way the DOS command ls, will actually give something which is more similar to ls. You can tweak around with this to get everything you might need. (I find "dir /w" a lot more useful than just "dir")

This is just a few tricks I ended up using after realizing Microsoft's PowerShell annoyance. The main reason why I "downgrade" is that the auto-completion in PS truly is worse than the old fashion cmd.exe way. I'm quite certain that there is a way to just replace the way this is done in PS but I will not waste my time pursuing it. Cygwin does come with almost everything you need but if you don't want to install Cygwin, you could make do with these tricks.

I primarily use the Command Prompt for svn, because I loath TortoiseSVN. And I wrote this in hope of some realization from people using PowerShell, in that it falls short on some very basic things.

Happy prompting!

In response to Jeffrey Snover's comment

The above mentioned problems can be further exemplified by the following:

C:\Users> ls


    Directory: Microsoft.PowerShell.Core\FileSystem::C:\Users


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
d----         5/15/2008   1:07 PM            leidegre
d-r--         11/2/2006   1:49 PM            Public
d----          3/1/2008  10:45 PM            sshd_server
d----          3/1/2008   9:35 PM            SvcCOPSSH


C:\Users>

leidegre@monkey-intel /c/Users
$ ls
All Users/   Default/  Default User/  Public/  SvcCOPSSH/  desktop.ini*  leidegre/
sshd_server/

leidegre@monkey-intel /c/Users

The above output illustrates the vast differance in PowerShell and Cygwin running bash. The Directory: followed by a CLR type? I definitively don't need to know when the directory was last modified, nor what attributes are set (if I was I would set a flag for that), moreover the second example is from Cygwin using bash, and it uses colors to show that file actually is folder. The old DOS command "dir /w" used square brackets around the [file name] both are fine mean and lean ways of displaying the appropriate information.

Here is more examples from that horrible output formatting, the pwd command in PS. Yeah I'm well aware that it's a path.

..\ItemTemplates\CSharp\Data> pwd

Path
----
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplates\CSharp\Data


..\ItemTemplates\CSharp\Data> 

leidegre@monkey-intel /c/Users
$ pwd
/c/Users

leidegre@monkey-intel /c/Users

There is an excessive amount of line breaks taking place almost everywhere, hold back on a few, you don't need to feed the prompt that much. It's nice to not have to scroll a lot within the terminal itself.

I'm using the Windows PowerShell V2 (CTP) and I can't say that it any better with the tab completion.

C:\Users> S<TAB>

into

C:\Users> .\sshd_server

And the obvious problem here is that S would first of all match SvcCOPSSH due to case and secondly is ambiguous. Also note how S is not transformed into .\sshd_server\ which instead necessitates an every so annoying extra slash as sshd_server is a directory. So if I where to cd into leidegre/Documents I'll have to write cd leidegre<SLASH>Documents. On top of this, the second completion will turn leidegre/Documents into cd C:\Users\leidegre\Documents which is longer. Not that big of deal, right? Well, let's look at another example:

..\ItemTemplates\CSharp\Data> cd 1033/D<TAB>

is auto-completed into

..\ItemTemplates\CSharp\Data> cd 'C:\Program Files\Microsoft Visual Studio 9.0\Comm
on7\IDE\ItemTemplates\CSharp\Data\1033\DataSet.zip'

Note that my prompt is different from the default PS prompt in that it only shows the top 3 directories. The full prompt running the defult PS installation is...

C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplates\CSharp\Data> 
cd 'C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\ItemTemplates\CSharp\D
ata\1033\DataSet.zip'

That is quite a lot. Now take into consideration the fact that the terminal most likely is more narrow than you're currently set browser/window and that people more than so will be running several instances of PS. And what about white space in file names? I'd prefer the old fashion programming style of just \ escaping the white space. Mean and lean does have it's benefits. And so, terminals are not mean to be used by just anyone.

bash will suggest the longest match which is not ambiguous, any ambiguity is solved by putting in more input. That's i very important in auto-completion. I'm expected to put in some text, but I will be able to rapidly get exactly what I want.

Things like being able to press CTRL+D or <TAB><TAB> to get an auto-completion list on current input is also welcome. And note that if the list is very long, you'll be prompted that you will be shown +100 items or so.

If you can't unite on any of the above stuff, make it easily configurable. So people can use it however they want.

Tuesday, March 25, 2008

Best of MIX'08

I love that Microsoft makes all this great content available for free, being unable to attend such a conference for many reasons, it nice to be able to listen to what people have to say without actually being there.
Notably there is one speaker, and one talk that I want to share with everybody.

Developing ASP.NET Applications Using the Model View Controller Pattern
by Scott Hanselman

Now, you might not think that ASP.NET is cool, or that MVC sucks, but this guys is not only hilarious (yeah, a bit off-topic but entertaining), he brings with him a lot more that conventional web development. The presentation runs about 1 hour 15 minutes, and features things from the ASP.NET MVC Framework and a lot of cool neat tricks, e.g. how to use a domain specific language (DSL) for HTML rendering, and more, I'll let you figure it out for yourself.

Thursday, February 14, 2008

File decomposition (part 1)

Almost every game share several proprietary file and/or archive formats. When bound by an underlying file system such as FAT, NTFS, HFS Plus or UFS (Mac OS X) you can only do so much, yet some of these come with features well beyond what which you are willing to implement.

I spend a decent amount of time reverse engineering the internals of things, and when it comes to file formats I've looked into a couple of formats that I have found to be more interesting. Among these are MPQ (Mo'PaQ/Blizzard Entertainment), GCF (Half-Life 2/Valve) as well as Scimitar (Assassin's Creed/Ubisoft).

In summary there are a couple things that you would consider when creating a archive/packing format and these are:

Performance, compression and integrity

This is my take on it, and why these things matter.

Performance

On Win32 platforms the WINAPI CreateFile function is not only used to create files but to open them as well. The performance implication of retrieving a single file handle through this function is negligible, but consider a scenario were there is thousands of small files that need to placed in memory. This is an entirely different case were it is much faster to have an in memory representation of the archive structure and do unbuffered sequential block read operations with this single file handle.

The MPQ archive is efficient as it keeps the entire archive structure in memory while maintaining a very small footprint ~64K. If I wanted to open a file in this archive, I would however, need to know the file name, because what I cannot do in a MPQ archive is browse.
It's simple, the MPQ archive uses a hash table to represent paths efficiently. e.g. 'ThisFolder/ThisFile.ext' transform into a 32-bit hash 0x7c14a6c9, which in modulus 32768 turns up 9929. This is a general case, but it means that the path is a file which is described by a block which is found at that index (it's all very efficient in practice).
This is also interesting as in Blizzard games the file is always queried in patch_rt.mpq before any other archive. This gives a way to override any file but it's a solution in which the initial construction failed to address replacement of legacy files within an archive. This became a challenge with the launch of World of Warcraft with massive monthly content updates.

MPQ archives did not initially support archives larger than ~4 GB. Something which became a problem. However, the initial version had almost no room for improvements. Extensions added after the initial release never looked as good as the initial release. A word of advice, if you think you might need it, make room for it. I'm talking about writing for any need you might want later. Mostly reserving binary space in headers. Backwards compatibility is nice, but not really necessary, the reusability is wroth a lot more.

I like the MPQ hash table approach, it's lean and efficient. But I realized while writing this, its a bit hefty to go through it all at once. I will be covering compression and integrity in the future.

I've been looking into the application of ternary search trees (TST) as well as partial match algorithms. To see if there isn't a even better way of creating even more efficient archiving methods.