Attacking NTLM with Precomputed Hashtables
warlord
warlord@nologin.org 


1) Introduction


Breaking encrypted passwords has been of interest to hackers for a long
time, and protecting them has always been one of the biggest security
problems operating systems have faced, with Microsoft's Windows being no
exception. Due to errors in the design of the password encryption
scheme, especially in the LanMan(LM) scheme, Windows has a bad track in
this field of information security.  Especially in the last couple of
years, where the outdated DES encryption algorithm that LanMan is based
on faced more and more processing power in the average household,
combined with ever increasing harddisk size, made it crystal clear that
LanMan nowadays is not just outdated, but even antiquated.

Until now, breaking the LanMan hashed password required somehow
accessing the machine first of all, and grabbing the password file,
which didn't render remote password breaking impossible, but as a remote
attacker had to break into the system first to get the required data, it
didn't matter much. This paper will try to change this point of view.


2) The design of LM and NTLM

2.1) The LanMan disaster


By default Windows stores all users passwords with two different hashing
algorithms. The historically weak LanMan hash and the more robust MD4.
The LanMan hash is based on DES and has been described in Mudge's rant
on the topic. A brief recap of the LM hash is below, though those
unfamilliar with LM will probably want to read.

First of all, Windows takes a password and makes sure it's 14 bytes
long. If it's shorter than 14 bytes, the password is padded with null
bytes. Brute forcing up to 14 characters can take a very long time, but
two factors make this task way more easy. First, not only is the set of
possible characters rather small, Microsoft further reduces it by making
sure a password is stored all uppercase. That means "test" is the same
as "Test" is the same as "tesT" is the same as...well...you get the
idea. Second, the password is not really 14 bytes in size. Windows
splits it up into two times 7 bytes. So instead of having to brute force
up to 14 bytes, an attacker only has to break 7 bytes, twice. The
difference is (keyspace^14) versus (keyspace^7)*2. That's a huge
difference.

Concerning the keyspace, this paper focuses on the alphanumerical set of
characters only, but the entire possible set of valid characters is:


ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 %!@\#$%^&*()_-=+`~[]\{}|\:;"'<>,.?/


The next problem with LM stems from the total lack of salting or cipher
block chaining in the hashing process. To hash a password the first 7
bytes of it are transformed into an 8 byte odd parity DES key. This key
is used to encrypt the 8 byte string "KGS!@". Same thing happens with
the second part of the password.

This lack of salting creates two interesting consequences. Obviously
this means the password is always stored in the same way, and just begs
for a typical lookup table attack. The other consequence is that it is
easy to tell if a password is bigger than 7 bytes in size. If not, the
last 7 bytes will all be null and will result in a constant DES hash of
0xAAD3B435B51404EE.

As I already pointed out, LM has been extensively documented.
"L0phtcrack" and "John the Ripper" are both able brute force tools to
break these hashes, and Philippe Oechslin of the ETH Zuerich was the
first to precompute LM lookup tables that allow breaking these hashes in
seconds.

2.2) NTLM


Microsoft attempted to address the shortcomings of LM with NTLM. Windows
NT introduced the NTLM(NT LanManager) authentication method to provide
stronger authentication. The NTLM protocol was originally released in
version 1.0(NTLM), and was changed and fortified in NT SP6 as NTLMv2.
When exchanging files between hosts in a local area network, printing
documents on a networked printer or sending commands to a remote system,
Windows uses a protocol called CIFS - the Common Internet File System.
CIFS uses NTLM for authentication.

In NTLM, the protocol covered in this document, the authentication works
in the following manner. When the client connects to the server and
requests a new session, the server replies with a positive session
response. Next, the client sends a request to negotiate a protocol for
one of the many dialects of the SMB/CIFS family by providing a list of
dialects that it understands. The server picks the best out of those and
sends the client a response that names the protocol to use, and includes
a randomly generated 8 byte challenge.

In order to log in now, the client sends the username in plaintext(!),
and also the password, hashed NTLM style. The NTLM hash is generated in
the following manner:


[UsersPassword]->[LMHASH]->[NTLM Hash]


The NTLM hash is produced by the following algorithm. The client takes
the 16 byte LM hash, and appends 5 null bytes, so that the result is a
string of 21 bytes length. Then it splits those 21 bytes into 3 groups
of 7 bytes. Each 7 byte string is turned into an 8 byte odd parity DES
key once again. Now the first key is used to encrypt the challenge with
the DES algorithm, producing an 8 byte hash. The same is done with keys
2 and 3, so that there are two additional 8 byte hashes. These 3 hashes
are simply concatenated, resulting in a single 24 byte hash, which is
the one being sent by the client as the encrypted password.

Mudge already pointed out why this is really stupid, and I'll just
recapitulate his reasons here. An attacker capable of sniffing traffic
can see the username, the challenge and the 24 byte hash.

First of all, as stated earlier, if the password is less than 8 bytes,
the second half of the LM hash always is 0xAAD3B435B51404EE. For the
purpose of illustration, let's assume the first part of the hash is
0x1122AABBCCDDEEFF. So the entire LM hash looks like:


-------------------------------------------
| 0x1122AABBCCDDEEFF | 0xAAD3B435B51404EE |
-------------------------------------------


When transforming this into an NTLM hash, the first 8 bytes of the new
hash are based solely on the first 7(!) bytes of the LM hash. The second
8 byte chunk of the NTLM hash is based on the last byte of the first LM
hash, and first 6 bytes of the second LM hash. Now there are 2 bytes of
the second LM hash left. Those two, padded with 5 null bytes and used to
encrypt the challenge, form the third 8 byte chunk of the NTLM hash.
That means in the example this padded LM hash


------------------------------------------------------
| 0x1122AABBCCDDEE | FFAAD3B435B514 | 04EE0000000000 |
------------------------------------------------------


is being turned into the 24 byte NTLM hash. If the password is smaller
than 8 characters in size, the third part, before being hashed with the
challenge to form the NTLM hash, will always look like this. So in order
to test wether the password is smaller than 8 bytes, it's enough to take
this value, the 0x04EE0000000000, and use it to encrypt the challenge
that got sniffed from the wire. If the result equals the third part of
the NTLM hash which the client sent to the server, it's a pretty safe
bet to say the password is no longer than 7 chars. It's even possible to
make sure it is. Assuming from the previous result that the second LM
hash looks like 0xAAD3B435B51404EE, the second chunk of the 24 byte NTLM
hash is based on 0x??AAD3B435B514. The only part unknown is the first
byte, as this one is based on the first LM hash. One byte, thats 256
permutations. By brute forcing those up to 256 possibilities as the
value of the first byte, and using the resulting key to encrypt the
known challenge once again, one should eventually stumble over a result
that's the same as the second 8 bytes of the NTLM hash. Now one can rest
assured, that the password really is smaller than 8 bytes. Even if the
password is bigger than 7 bytes, and the second LM hash does not end
with 0x04EE thus, creating all possible 2 byte combinations, padding
them with 5 null bytes and hashing those with the challenge until the
final 8 byte chunk of the NTLM hash matches will easily reveal the final
2 byte of the LM hash, with no more than up to 64k permutations.

2.3) The NTLM challenge


The biggest difference between the way the LM and the NTLM hashing
mechanism works is the challenge. In NTLM the challenge acts like a a
salt in other cryptographic implementations. This throws a major wrench
in our pre-computing table designs, adding 2^64 permutations to the
equation.

3.0) Breaking NTLM with precomputed tables

3.1) Attacking the first part

Precomputing tables for NTLM has just been declared pretty much
impossible with todays computing resources. The problem is pre-computing
every possible hash value (and then, of course storing those values even
if computation was possible). By applying a trick to remove the
challenge from the equation however, precomputing NTLM hashes becomes
almost as easy as the creation of LM tables. By writing a rogue CIFS
server that hands out the same static challenge to every client that
tries to connect to it, the problem has static values all over the place
once again, and hashtable precomputation becomes possible.

The following screenshot depicts a proof of concept implementation that
accepts an incoming CIFS connection, goes through the protocol
negotiation phase with the connecting client, sends out the static
challenge, and disconnects the client after receiving username and NTLM
hash from it. The server also logs some more information that the client
conveniently sends along.


IceDragon wincatch  bin/wincatch
This is Alpha stage code from nologin.org
Distribution in any form is denied


Src Name: BARRIERICE
IP: 192.168.7.13
Username: Testuser
Primary Domain: BARRIERICE
Native OS: Windows 2002 Service Pack 2 2600
Long Password Hash: 3c19dcbdb400159002d8d5f8626e814564f3649f0f918666


That's a Windows XP machine connecting to the rogue server running
on Linux. The client is connecting from IP address 192.168.7.13. The
username is ``Testuser'', the name of the host is ``BarrierIce'',
and the password hash got captured too of course.

3.2) Table creation


The creation of rainbow tables to precompute the hashes is a good
approach to easily breaking the hashes now, but as harddisks grow bigger
and bigger while costing ever less, I decided to roll my own table
layout instead. As the reader will see, my approach requires way more
harddisk space than rainbow tables do since they are computationally
less expensive to create and contain a determined set of data, unlike
rainbow tables with their less than 100 probability approach to contain
a certain password.

In order to create those tables, the big question is how to efficiently
store all the data. In order to stay within certain bounds, I decided to
stick to alphanumeric tables only.  Alphanumeric, that's 26 chars from
a-z, 26 chars from A-Z, and additional 10 for 0-9. Thats 62 possible
values for each character, so thats 62^7 permutations, right? Wrong.
NTLM hashes use the LM hash as input. The LM hashing algorithm
upper-cases its input. Therefore the possible keyspace shrinks to 36
characters, and the number of possible permutations goes down to 36^7.
The only other input that needs accounting is the NULL padding bytes
used, bringing the total permutations to a bit more than 36^7.

The approach taken here to allow for easy storage and recovery of hashes
and plain text is essentially to place every possible plaintext password
into one of 2048 buckets. It could easily be expanded to more. The table
creation tool simply generates every valid alphanumeric password, hashes
it and checks the first 11 bits of the hash. These bits determine which
of the 2048 buckets (implemented as files in this case) the plaintext
password belongs to. The plaintext password is then added to the bucket.
Now whenever a hash is captured, looking at the first 11 bits of the
hash determines the correct bucket to look into for the password. All
that's left to do now is hashing all the passwords in the bucket until a
match is found. This will take on average case ((36^7)/2048))/2, or
19131876 hash operations.  This takes approximately three minutes on my
Pentium 4 2.8 Ghz machine. It takes the NTLM table generation tool  94
hours to run on my machine. Fortunately, I only had to do that once :)

The question is how to store more than 36^7 plaintext passwords, ranging
in size from 0(empty password) to 7 bytes.

Approach 1: Store each password separated by newlines. As most passwords
are 7 byte in size and an additional newline extends that to 8 byte, the
outcome would be somewhere around (36^7)*8 bytes. That's roughly 584
gigabytes, for the alphanumeric keyspace. There has to be a better way.

Approach 2: By storing each password with 7 bytes, be it shorter than 7
or not, the average space required for each password goes down from 8 to
7, as it's possible to get rid of the newlines. There's no need to
separate passwords by newlines if they're all the same size. (36^7)*7 is
still way too much though.

Approach 3: The plaintext passwords are generated by 7 nested loops. The
first character changes all the time. The second character changes every
time the first has exhausted the entire keyspace. The third increments
each time the second has exhausted the keyspace and so on. What's
interesting is that the final 3 bytes rarely change. By storing them
only when they change, it's possible to store only the first 4 bytes of
each password, and once in a while a marker that signals a change in the
final 3 bytes, and is followed by the 3 byte that now form the end of
each plaintext password up to the next marker. That's roughly (36^7)*4
bytes = 292 gigabytes. Much better.  Still too much.

Approach 4: For each character, there's 37 possible values. A-Z, 0-9 and
the 0 byte. 37 different values can be expressed by 6 bits.  So we can
stuff 4 characters into 4*6 = 24 bits, which is 3 byte.  How convenient!
(37^7)*3 == 265 gigabytes.  Still too much.

Approach 5: The passwords are being generated and stored in a
consecutive way. The hash determines which bucket to place each new
plaintext password into, but it's always 'bigger' than the previous one.
Using 2048 buckets, a test showed that, within any one file, no offset
between a password being stored and the next one stored into this bucket
exceeded 55000. By storing offsets to the previous password instead of
the full word, each password can be stored as a 2 byte value.

For example, say the first password stored into one bucket is the one
char word "A". That's index 10 in the list of possible characters, as it
starts with 0-9. The table creation tool would now save 10 into the
bucket, as it's the first index from the start of the new bucket, and
it's 10 bigger than zero, the start value for each bucket. Now if by
chance the one character password "C" was to be stored into the same
bucket next, the number 2 would be stored, as "C" has an offset of 2 to
the previous password. If the next password for this bucket was "JH6",
the offset might be 31337.

Basically each password is being stored in a base36 system, so the first
2 byte password, being "00", has an index of 37, and all the previous
password offsets and the offset for "00" itself of the bucket that "00"
is being stored in add up to 37. To retrieve a password saved in this
way requires a transformation of the decimal index back into the base36
system, and using the resulting individual numbers as indexes into the
char keyspace[].

The resulting table size is  (36^7 )*2 == 146 gigabytes. Still pretty
big, but small enough to easily fit on today's harddisks. As I mentioned
earlier the actual resulting size is a bit bigger in fact, as a bunch of
passwords that end with null bytes have to be stored too. In the end
it's not 146 gigabytes, but 151 instead.

3.3) The big problem


Now there's a big problem concerning the creation of the NTLM lookup
tables. The first 8 byte of the final hash are derived from the first 7
byte of the LM hash, which are derived from the first 7 byte of the
plaintext password. Creating tables to match the first 8 byte of the
NTLM hash to the first 7 bytes of the password is thus possible, but the
same tables do not work for the second or even third block of the 24
byte NTLM hash.

The second 8 byte chunk of the hash is derived from the last byte of the
first LM hash, and the first 6 byte of the second LM hash. This first
byte adds 256 possible values to the second LM hash. While the first 8
byte chunk of the 24 byte LM hash stems purely from a LM hash of a
plaintext password, the second 8 byte chunk stems from an undetermined
byte and additional 6 byte of a LM hash.

Being able to look up the first up to 7 bytes of the password is a big
advantage already though. The second part of the password, if it's
longer than 7 bytes at all, can now usually be easily guessed or brute
forced. Having determined that the password starts with "ILLUSTR" for
example, most often it may end with "ATION" or "ATOR".  On the other
hand, when applying the brute force approach to this example after
looking up the first 7 bytes, it'd require to brute force 4-5 characters
until the final password is revealed. Even off-the-shelf hardware does
this in seconds. While taking a bit longer, even brute forcing 6 bytes
is nothing one couldn't sit out.  7 bytes, however, requires an
inconvenient amount of time. That's where being able to look that part
up as well would really come in handy. Well, guess what. There is a way.

3.4) Breaking the second part of the password


As described earlier in this paper, the second part of the password,
just as the first one, is used to encrypt a known string to form an 8
byte LM hash. Knowing the challenge sent from the server to the client,
it is possible to deduce the final 2 bytes of that LM hash out of the
third chunk of the NTLM hash. Doing so was explained in section 2.2.

So the final 2 byte of the LM hash of the second half of the original
password are known. If a similar approach to breaking the first half of
the password is being applied now, looking up the second part of the
password as well becomes quite possible.

The key here is to create a set of precomputed LanMan tables that are
sorted by the final 2 bytes of the LM hash. So once the final 2 byte of
the LM hash are known, a file is thus identified that contains plaintext
passwords that when hashed result in a matching 2 byte sequence at the
end.

The second chunk of the NTLM hash is derived from 6 bytes that are the
start of the hash of one of the plaintext passwords out of the file that
just got identified, and a single byte, the first one, which is the
final byte of the first LM hash.

Considering the first part of the password broken, that byte is known.
So all that's left to do is hash all the possible passwords in the file,
fit the single known byte into the first position of a string and
concatenate this one with 6 bytes from the just created hash, hashing
those 7 bytes again and comparing the result to the second chunk of the
NTLM hash. If it matches, the second part of the password has been
broken too.

Even if looking up the first part of the password didn't prove
successful, the method may still be applied. The only change would be
that up to 256 possible values for the first byte would have to be
computed and tested as well.

What's really interesting to note here, is that the second set of
tables, the sorted LM tables, unlike the first set of NTLM tables, does
NOT depend on a certain challenge. It will work with just any challenge,
which is usually sniffed or aquired from the wire when the password hash
and the username are being taken.

4) How to get the victim to log into the rogue server?

The big question to answer is how one can get the victim to log into the
rogue server, thus exposing his username and password hash for the
attacker to break.

Approach 1: Sending a html mail that includes a link in the form of a
UNC path should do the trick, depending primarily on the sender's
rhetoric ability in getting his victim to click the link, and the mail
client to understand what it's expected to do. A UNC path is usually in
the form of 192.168.7.6share, where the IP address obviously specifies
the host to connect to, and ``share'' is a shared resource on that host.
Due to Microsoft always being concerned about comfort first, the
following will happen once the victim clicks the link on a Windows
machine. The OS will try to log into the specified resource.  When asked
for a username and password, the client happily provides the current
user's username and his hashed password to the server in an effort to
try to log in with these credentials. No user interaction required. No
joke.

Approach 2: Getting the victim to visit a site that includes a UNC path
with Internet Explorer has the same result.  An image tag like  will do
the trick. IE will make Windows try to log into the resource in order to
get the image. Again, no user interaction is required. This trick does
not work with Mozilla Firefox by the way.

Approach 3: If the rogue server is part of the LAN, advertising it in
the network neighbourhood as "warez, porn, mp3, movie" - server should
result in users trying to log into it sooner or later. There's no way
anyone can withstand the power of the 4 elements!

There's plenty of other ways that the author leaves to the readers
imagination.

5) Things to remember


Once a hash has been received and successfully broken, it may still not
be the correct password, and accordingly not allow the attacker to log
into his victims machine. That's due to the password being hashed all
uppercase for LM, while the MD4 based second hash actually is case
sensitive. So a hash that's been deciphered as being "WELCOME" may
originally have been "Welcome" or "welcome" or even "wELCOME" or
"WeLcOme" or .. well, you get the idea. Then again, how many users
actually apply uncommon spelling schemes?

6) Covering it up


Having read this paper the reader should by now realize that NTLM,
an authentication mechanism that probably most computers on this
planet support, is actually a big threat to hosts and entire
networks. Especially with the recently discovered remote Windows
exploits that require valid accounts on the victim machines for the
attacker to log into first, a worm that makes people visit a
website, which in turn makes them log into a rogue server that
breaks the hash and automatically exploits the victim is a
frightening threat scenario.


Bibliography

Windows NT rantings from the L0pht
http://www.packetstormsecurity.org/Crackers/NT/l0phtcrack/l0phtcrack.rant.nt.passwd.txt

Making a Faster Cryptanalytic Time-Memory Trade-Off
http://lasecwww.epfl.ch/ oechslin/publications/crypto03.pdf