Reverse Engineering RET Homepage RET Members Reverse Engineering Projects Reverse Engineering Papers Reversing Challenges Reverser Tools RET Re-Search Engine Reverse Engineering Forum Reverse Engineering Links

Go Back   Reverse Engineering Team Board > Reverse Engineering Board > Reverse Code Engineering
FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
Thread Tools Display Modes
  #1  
Old 06-06-2003, 09:04 AM
sna sna is offline
Administrator
 
Join Date: Jun 2003
Posts: 76
Default a note on signed and unsigned byte arrays

when converting from assembler into anything else we need to maintain the byte sign handling. the elements that form a string or simple array are usually byte-sized (8 bits). these elements will be treated as either signed or unsigned.

for example, a string with bytes treated as unsigned:
where esi is a base pointer and ecx is the index.

Code:
movzx eax, byte ptr [esi+ecx]
the movzx instruction (move with zero-extend) extends an 8-bit value to a 16-bit value, or an 8-bit or 16-bit value to a 32-bit value by padding the high-order with zeros. the result in this case is that al holds the source byte and the rest of eax is cleared.

on the contrary, when the source is treated as signed, the msb (most significant bit) of the source is used to extend the source value.

movsx eax, byte ptr [esi+ecx] ; move with sign-extend

now, had the source byte been signed, the result would have been that al is the source byte unchanged, and the rest of eax's bits are set to 1. had the source byte not been signed, the result would have been the same as if movzx had been used.

we'll look at a couple of actual cases to help clarify this further:

1)
Code:
movzx eax, byte ptr [esi+ecx]
al will always hold the source byte and the rest of eax will always be cleared.

2)
Code:
movsx eax, byte ptr [esi+ecx] * *; source byte is <= 127 dec
al will hold the source byte and rest of eax will be cleared.

3)
Code:
movsx eax, byte ptr [esi+ecx] * *; source byte is > 127 dec
al will hold the source byte and the rest of eax's bits will be set to 1.

hope this makes sense and helps someone out there..
Reply With Quote
  #2  
Old 06-19-2003, 10:40 AM
w00tz` w00tz` is offline
Member
 
Join Date: Dec 2002
Posts: 8
Send a message via AIM to w00tz`
Default

Actually, that is pretty interesting now that you mention it, but have you noticed how OllyDBG handles the displaying of the bytes in stack ? it doesn't handle that very well,

for instance

Code:
 * *movsx eax, byte ptr [esi+ecx] * * * * * * * * ;will move only to the lower parts of the register
so if you say wanted to move the ASCII character 'w' into the register eax, and in eax it holds

FFFFFFFF , OllyDBG will move the hex ascii converstion (77) into eax and the remaining is

FFFFFF77 <-- //did not clear out the register...its a bug but it can be patched, good thing you mentioned that though in your article about the register has to clear otherwise some person might be confused :-)

ciao
__________________
w00tzenheimer //
w00tz`
Reply With Quote
  #3  
Old 01-30-2004, 02:11 AM
andyistic andyistic is offline
Member
 
Join Date: Jan 2004
Location: Los Angeles
Posts: 7
Send a message via ICQ to andyistic Send a message via AIM to andyistic Send a message via MSN to andyistic Send a message via Yahoo to andyistic
Default

The notion of signed characters is annoying.
Why must we have them?

Bytes used for characters should always be unsigned.
Signed bytes is a common source of negative results leading to headaches.
Just wastes time trying to cast and arrange everything so that we retain positive results.
Just say NO to signed characters.
Reply With Quote
  #4  
Old 01-31-2004, 06:35 PM
kw kw is offline
Administrator
 
Join Date: Dec 2002
Location: The Netherlands
Posts: 116
Send a message via Yahoo to kw
Default

Quote:
The notion of signed characters is annoying.
Why must we have them?
Simple, because char's are just byte sized integer values, and not necessarily characters as such. I agree with you though, that char is usually used to indicate letters, so in that sense it would be nicer to have it default to unsigned. Problem is though, that would be VERY inconsistent with something like int, which defaults to signed, and needs you to specify 'unsigned' to overwrite that default. char is another basic filetype, thus for consistency reasons it should probably remain the way it is. Otherwise you'll have people that forget to specify 'unsigned' for ints, for example
anyway, if you find it easier, use:
Code:
typedef unsigned char uchar;
and only use uchar from then on, while coding.. You'll have no further trouble

Greets,
kw
__________________
"It's people like this that make you realize how little you've accomplished. It is a sobering thought, for instance, that when Mozart was my age, he had been dead for two years." - Tom Lehrer
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump





Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.