Welcome Guest ( Log In | Register )

Outline · [ Standard ] · Linear+

 Compare GCC, Clang, Go and Rust, according to Taras Tsugrii

views
     
TSFlierMate
post Jun 10 2021, 11:26 PM, updated 3y ago

On my way
****
Validating
543 posts

Joined: Nov 2020
Compile the following...
CODE
int isHtmlWhitespace(int ch) {
   return ch == 0x0009 || ch == 0x000A ||
           ch == 0x000C || ch == 0x000D ||
           ch == 0x0020;
}


GCC with enabled optimizations is able to produce:

CODE
isHtmlWhitespace(int):
       cmp     edi, 32; compare ch with 0x0020
       ja      .L42      ; and return false if ch is larger, since 32 is the largest possible match
       movabs  rax, 4294981120
       mov     ecx, edi
       shr     rax, cl
       and     eax, 1
       ret
.L42:
       xor     eax, eax
       ret


Clang generates different but still very compact and interesting Assembly:
CODE
isHtmlWhitespace(int):                  # @isHtmlWhitespace(int)
       lea     eax, [rdi - 9]
       cmp     eax, 5
       jae     .LBB2_3
       mov     ecx, 27
       bt      ecx, eax
       jb      .LBB2_2
.LBB2_3:
       xor     eax, eax
       cmp     edi, 32
       sete    al
       ret
.LBB2_2:
       mov     eax, 1
       ret


Unfortunately not all compilers are using this clever optimization. Unfortunately Go's compiler is one of such compilers:
CODE
func is_html_whitespace(ch int) bool {
   return ch == 0x0009 || ch == 0x000A ||
       ch == 0x000C || ch == 0x000D ||
       ch == 0x0020
}

...is translated into a set of IFs which is very unfortunate:

CODE
v14
   00003 (+4) MOVQ "".ch(SP), AX
v30
   00004 (4) CMPQ AX, $9
b1
   00005 (4) JNE 9
v29
   00006 (4) MOVL $1, AX
v28
   00007 (+4) MOVB AX, "".~r1+8(SP)
b8
   00008 (4) RET
v10
   00009 (4) CMPQ AX, $10
b2
   00010 (4) JEQ 6
v34
   00011 (+5) CMPQ AX, $12
b4
   00012 (5) JEQ 6
v31
   00013 (5) CMPQ AX, $13
b6
   00014 (5) JEQ 6
v7
   00015 (+6) CMPQ AX, $32
v24
   00016 (6) SETEQ AX
b9
   00017 (6) JMP 7


And what about Rust? Interestingly its Assembly is different from the one generated by Clang compiler for C++ code and impressively contains no branches:

CODE
is_html_whitespace:                     # @is_html_whitespace
# %bb.0:
                                       # kill: def $edi killed $edi def $rdi
   leal    -9(%rdi), %eax
   cmpl    $2, %eax
   setb    %al
   movl    %edi, %ecx
   andl    $-2, %ecx
   cmpl    $12, %ecx
   sete    %cl
   orb    %al, %cl
   cmpl    $32, %edi
   sete    %al
   orb    %cl, %al
   retq


Source: https://softwarebits.hashnode.dev/bit-testing

Which is the winner? Which is the loser? thumbup.gif

This post has been edited by FlierMate: Jun 10 2021, 11:28 PM
FlierMate1
post Jul 1 2022, 02:31 AM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
I use Compiler Explorer website, interestingly, the following C# code never call "square()" function, instead, it calculated the value ("64") during compile time.

CODE

using System;

class Program
{
   static int Square(int num) => num * num;

   static void Main(string[] args){
       Console.WriteLine(Square(8));
   }
}    


Assembly output:
CODE

Program:.ctor():this:
      ret      

Program:Main(System.String[]):
      push     rax
      mov      edi, 64
      call     [System.Console:WriteLine(int)]
      nop      
      add      rsp, 8
      ret      

Program:Square(int):int:
      mov      eax, edi
      imul     eax, edi
      ret    


What do you guys think?
CPURanger
post Jul 8 2022, 09:42 PM

Enthusiast
*****
Senior Member
889 posts

Joined: Jun 2008


QUOTE(FlierMate1 @ Jul 1 2022, 02:31 AM)
I use Compiler Explorer website, interestingly, the following C# code never call "square()" function, instead, it calculated the value ("64") during compile time.

CODE

using System;

class Program
{
   static int Square(int num) => num * num;

   static void Main(string[] args){
       Console.WriteLine(Square(8));
   }
}    


Assembly output:
CODE

Program:.ctor():this:
      ret      

Program:Main(System.String[]):
      push     rax
      mov      edi, 64
      call     [System.Console:WriteLine(int)]
      nop      
      add      rsp, 8
      ret      

Program:Square(int):int:
      mov      eax, edi
      imul     eax, edi
      ret    


What do you guys think?
*
No need to call Square() bcos Square(8) is a constant => 8 * 8 = 64. Hence "mov edi, 64". A call op to Square() will have overhead of jump, saving/restore registers and multiply etc will be way slower.

Turn off optimize flag when compiling C#, may show a different results.
FlierMate1
post Jul 9 2022, 09:05 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
QUOTE(CPURanger @ Jul 8 2022, 09:42 PM)
Turn off optimize flag when compiling C#, may show a different results.
*
Thanks for your nice comment. There is no compiler option has been set, so I tried Visual Basic (also .NET compiler) and see, and the Assembly output is much longer than C#, for the same purpose.

Surprisingly, I found a bug in the compiler's Assembly output. Notice the "gword ptr" at line 12 and line 14?

Update: It is not a bug. See: https://github.com/dotnet/runtime/blob/11f2.../instr.cpp#L199

user posted image

This post has been edited by FlierMate1: Jul 9 2022, 09:14 PM
FlierMate1
post Aug 19 2022, 04:32 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
This has nothing to do with compiler, I will just show a multi-junctions Assembly code here:

CODE
; assume AL is in range 1-4
       cmp     al,3
       ja      item4
       je      item3
       jpe     item2
       jmp     item1    


It is working and we use only one CMP (if statement) for 4 JMP conditions!

I mean this is rare in Assembly programming world. Let me try to explain:

AL is a 8-bit low-order byte register, assume it has value 1 to 4.
We compare AL against value 3.

JA=Jump if above 3, i.e. 4
JE=Jump if equals 3
JPE=Jump if parity even, in this case, 2
JMP=Jump, when value is 1

QUOTE
Parity bits are added to transmitted messages to ensure that the number of bits with a value of one in a set of bits add up to even or odd numbers.
Even parity means the number of bits in a value is always even. (0101=Even, 0100=Odd)


We can't even achieve this in high level languages, because we would need 3 set of If...Then...Else statement.

----

Comment by original author of the Assembly code above:

QUOTE
JPE=Jump if parity even, in this case, 2
This deserves a more careful explanation, to avoid it being misunderstood as jump taken because 2 is even. PF does not say that, it counts the set bits among the lowest 8 bits of the result, and reflects whether that number is even or not. In this case the data is 8-bit, so it counts 1s in the entirety of the result.

CMP is the same as SUB, except it does not update the destination register with the result, only flags. If we subtract 3 from 2, we get -1, which is 11111111b and has an even number of 1s. This allows to differentiate it from 1 - 3 = -2, which is 11111110b and has an odd number of 1s.


This post has been edited by FlierMate1: Aug 19 2022, 06:42 PM
flashang
post Aug 19 2022, 07:17 PM

Casual
***
Junior Member
355 posts

Joined: Aug 2021


QUOTE(FlierMate1 @ Aug 19 2022, 04:32 PM)
This has nothing to do with compiler, I will just show a multi-junctions Assembly code here:

CODE
; assume AL is in range 1-4
       cmp     al,3
       ja      item4
       je      item3
       jpe     item2
       jmp     item1    


It is working and we use only one CMP (if statement) for 4 JMP conditions!

I mean this is rare in Assembly programming world. Let me try to explain:

AL is a 8-bit low-order byte register, assume it has value 1 to 4.
We compare AL against value 3.

JA=Jump if above 3, i.e. 4
JE=Jump if equals 3
JPE=Jump if parity even, in this case, 2
JMP=Jump, when value is 1
We can't even achieve this in high level languages, because we would need 3 set of If...Then...Else statement.
*
Human time and code readability are more important for software maintenance,
especially large project and legacy codes.

The cmp / multiple jump only use for specified scenario,
and required more time to re-think the logic,
which may be bad for human readability and system design.

People are more towards higher level language,
because of its productivity and flexibility.

Example :

CODE


# python - if / elif / else
def num_in_words(no):
   if(no==1):
       print("One")
   elif(no==2):
       print("Two")
   elif(no==3):
       print("Three")
   else:
       print("Give input of numbers from 1 to 3")
   
num_in_words(3)

# python dictionary :
def vowel(num):
   switch={
     1:'a',
     2:'e',
     3:'i',
     4:'o',
     5:'u'
     }
   return switch.get(num,"Invalid input")

vowel(3)



// c#
        char grade = 'B';
       
        switch (grade) {
           case 'A':
              Console.WriteLine("Excellent!");
              break;
           case 'B':
           case 'C':
              Console.WriteLine("Well done");
              break;
           case 'D':
              Console.WriteLine("You passed");
              break;
           case 'F':
              Console.WriteLine("Better try again");
              break;
              default:
           Console.WriteLine("Invalid grade");
              break;
        }
        Console.WriteLine("Your grade is  {0}", grade);
        Console.ReadLine();


smile.gif


FlierMate1
post Aug 19 2022, 08:05 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
QUOTE(flashang @ Aug 19 2022, 07:17 PM)
People are more towards higher level language,
because of its productivity and flexibility.

Example :

CODE

// c#
        [B]char grade = 'B';[/B]
       
        switch (grade) {
           case 'A':
              Console.WriteLine("Excellent!");
              break;
           case 'B':
           case 'C':
              Console.WriteLine("Well done");
              break;
           case 'D':
              Console.WriteLine("You passed");
              break;
           case 'F':
              Console.WriteLine("Better try again");
              break;
              default:
           Console.WriteLine("Invalid grade");
              break;
        }
        Console.WriteLine("Your grade is  {0}", grade);
        Console.ReadLine();

*
Correct, but for hobbyist like me like to explore low-level stuff, I can't stop talking about Assembly (I do go to asm forum, don't worry about me).

For your nice C# example, again, I disassemble using Compiler Explorer website:
CODE

        [B]char grade = 'B';[/B]
       
        switch (grade) {
           case 'A':
.....
.....

The C# code above actually doesn't get evaluated during runtime, it is evaluated during compile-time. So you'll see no branch, no jumping.

On the contrary, if I change the code to:
CODE

        [B]char grade=Console.ReadLine()[0];[/B]
       
        switch (grade) {
           case 'A':
.....
.....

Now the "grade" variable is uncertain, only now .NET 6.0.101 compiler generates a lookup table (LUT) to jump to depending on the input.

If anyone want to take a look: https://godbolt.org/z/xbW5GWvcq

CODE
G_M21826_IG02:
...
...
      mov      edi, edi
      lea      rax, [reloc @RWD00]
      mov      eax, dword ptr [rax+4*rdi]
      lea      rsi, G_M21826_IG02
      add      rax, rsi
      jmp      rax

G_M21826_IG03: ....
G_M21826_IG04: ....
G_M21826_IG05: ....
G_M21826_IG06: ....
G_M21826_IG07: ....


RWD00   dd      G_M21826_IG03 - G_M21826_IG02
       dd      G_M21826_IG04 - G_M21826_IG02   ;grade=B?
       dd      G_M21826_IG04 - G_M21826_IG02   ;grade=C?
       dd      G_M21826_IG05 - G_M21826_IG02
       dd      G_M21826_IG07 - G_M21826_IG02
       dd      G_M21826_IG06 - G_M21826_IG02


Clever job, compiler. thumbup.gif

This post has been edited by FlierMate1: Aug 20 2022, 11:06 PM
flashang
post Aug 20 2022, 12:03 AM

Casual
***
Junior Member
355 posts

Joined: Aug 2021



In fact, the cmp / multiple conditional jump do impress me.
Then the question came out "what if condition is from number x to y ? or not just 1 to 4 ?"

doh.gif

It seem the jump table is easier to maintain and extendable.

smile.gif


FlierMate1
post Aug 21 2022, 04:06 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
Revisit the first post in this thread, GCC with -O3 optimization switch uses a magic number 4294981120.

Explanation see below....

QUOTE(FlierMate @ Jun 10 2021, 11:26 PM)
Compile the following...
CODE
int isHtmlWhitespace(int ch) {
   return ch == 0x0009 || ch == 0x000A ||
           ch == 0x000C || ch == 0x000D ||
           ch == 0x0020;
}


GCC with enabled optimizations is able to produce:

CODE
isHtmlWhitespace(int):
       cmp     edi, 32; compare ch with 0x0020
       ja      .L42      ; and return false if ch is larger, since 32 is the largest possible match
       movabs  rax, 4294981120
       mov     ecx, edi
       shr     rax, cl
       and     eax, 1
       ret
.L42:
       xor     eax, eax
       ret

*
Essentially, if we to convert to high level language, like one in C#, we will get a similar solution with reference to magic number:

CODE
       static bool isHtmlWhitespace(int ch)
       {
           return ch == 0x0009 || ch == 0x000A || ch == 0x000C || ch == 0x000D || ch == 0x0020;
       }

       static bool isHtmlWhitespace_2(int ch)
       {
           return ((4294981120 >> ch) & 1)== 1;
       }


Both are working and give identical output, as below:
QUOTE
         1     False     False
         2     False     False
         3     False     False
         4     False     False
         5     False     False
         6     False     False
         7     False     False
         8     False     False
         9      True      True
        10      True      True
        11     False     False
        12      True      True
        13      True      True
        14     False     False
        15     False     False
        16     False     False
        17     False     False
        18     False     False
        19     False     False
        20     False     False
        21     False     False
        22     False     False
        23     False     False
        24     False     False
        25     False     False
        26     False     False
        27     False     False
        28     False     False
        29     False     False
        30     False     False
        31     False     False
        32      True      True
If you go to Compiler Explorer website to convert the C code again, the Assembly output is slightly differ but the logic stays intact.

CODE

GCC 12.1 with -O3 switch is able to produce a rather cryptic Assembly code:
Code:
isHtmlWhitespace:
       cmp     edi, 32
       ja      .L3
       movabs  rax, 4294981120
       bt      rax, rdi
       setc    al
       movzx   eax, al
       ret
.L3:
       xor     eax, eax
       ret    


Why the magic number 4294981120 or 0x100003600?

0001 0000 0000 0000 0000 0011 0110 0000 0000

0x9=9th bit position
0xA=10th bit position
0xC=12th bit position
0xD=13th bit position
0x20=32th bit position

So if the selected bit is one, corresponding to the value of "ch", the result is TRUE or else FALSE.

CODE

0001 0000 0000 0000 0000 0011 0110 0000 0000
   ^                       ^^  ^^
   |                       ||  ||
   |                       ||  | --- 9
   |                       ||   ----A (10)
   |                       | ------C (12)
   |                        ------D (13)
   |
    -----------------------------0x20 (32)


Clever, isn't it?

An experienced Assembly programmer says:
QUOTE
It is a very old "trick". GCC didn't invent it. It goes way back into the dim dark history of computers.


Any inputs or feedbacks are welcomed!

This post has been edited by FlierMate1: Aug 21 2022, 04:24 PM
FlierMate1
post Aug 21 2022, 04:34 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
QUOTE(flashang @ Aug 20 2022, 12:03 AM)
In fact, the cmp / multiple conditional jump do impress me.
Then the question came out "what if condition is from number x to y ? or not just 1 to 4 ?"

doh.gif 

It seem the jump table is easier to maintain and extendable.

smile.gif
*
Thanks for your feedback, I also find it fascinating, but it is slower, when the colleague of original author ran a test between these two:

CODE
       cmp     al,1
       jz      item1
       cmp     al,2
       jz      item2
       cmp     al,3
       jz      item3
       cmp     al,4
       jz      item4


real 0m4.551s
user 0m4.460s
sys 0m0.008s

CODE

; assume AL is in range 1-4
       cmp     al,3
       ja      item4
       je      item3
       jpe     item2
       jmp     item1


real 0m7.139s
user 0m7.004s
sys 0m0.000s

The data pattern is 250 items repetition of 1,2,3,4:
QUOTE
        while % <= 250
                db 1,2,3,4
        end while
And I don't think the jump table is easily extendable (without its corresponding CMP), perhaps it remains as a coding challenge for us to find out. rclxs0.gif

This post has been edited by FlierMate1: Aug 21 2022, 04:35 PM
xboxrockers
post Aug 24 2022, 01:33 AM

On my way
****
Senior Member
664 posts

Joined: Dec 2011


Have you looked at RISC-v ASM?
FlierMate1
post Aug 24 2022, 02:31 PM

Getting Started
**
Validating
139 posts

Joined: Jun 2022
QUOTE(xboxrockers @ Aug 24 2022, 01:33 AM)
Have you looked at RISC-v ASM?
*
No. Why?
flashang
post Aug 26 2022, 04:31 PM

Casual
***
Junior Member
355 posts

Joined: Aug 2021


QUOTE(xboxrockers @ Aug 24 2022, 01:33 AM)
Have you looked at RISC-v ASM?
*
I believe most people don't know (or don't care) about the difference between x86 compatible, arm and risc-v.
Some don't know what is machine code / asm.

To encourage more people use the risc-v,
Let people can easily get the system and decent app.
e.g. Raspberry Pi is a good example.

smile.gif


TSFlierMate
post Nov 20 2022, 01:51 PM

On my way
****
Validating
543 posts

Joined: Nov 2020
Do you know how do Assembly codes evaluate an expression? I got the code below from a programming board, which makes sense to me.

setXXX is "Bit and Byte Instruction" to set byte based on certain condition.

CODE
mov ax, [op1]
cmp ax, [op2]
setz al; if op1 == op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setnz al; if op1 <> op2 then al = 1 else al = 0

; unsigned integer

mov ax, [op1]
cmp ax, [op2]
seta al; if op1 > op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setae al; if op1 >= op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setbe al; if op1 <= op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setb al; if op1 < op2 then al = 1 else al = 0

; signed integer

mov ax, [op1]
cmp ax, [op2]
setg al; if op1 > op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setge al; if op1 >= op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setle al; if op1 <= op2 then al = 1 else al = 0

mov ax, [op1]
cmp ax, [op2]
setl al; if op1 < op2 then al = 1 else al = 0


The conditions of setXXX are listed below:

Note the use of "setl" and "setle"(less than or equal) with their counterpart "setb" and "setbe"(below or equal), and "setg" and "setge"(greater than or equal) with their counterpart "seta" and "setae"(above or equal) is that the former instructions (e.g. seta, setb) are used against unsigned numbers, while the latter (e.g. setl, setg) are used in signed numbers.

Personally, I use instruction for unsigned numbers more (also in ja, jae, jb, jbe for jump instruction).

CODE

seta
SETA
set byte if above
setae
SETAE
set byte if above or equal
setb
SETB
set byte if below
setbe
SETBE
set byte if below or equal
setc
SETC
set byte if carry
sete
SETE
set byte if equal
setg
SETG
set byte if greater
setge
SETGE
set byte if greater or equal
setl
SETL
set byte if less
setle
SETLE
set byte if less or equal
setna
SETNA
set byte if not above
setnae
SETNAE
set byte if not above or equal
setnb
SETNB
set byte if not below
setnbe
SETNBE
set byte if not below or equal
setnc
SETNC
set byte if not carry
setne
SETNE
set byte if not equal
setng
SETNG
set byte if not greater
setnge
SETNGE
set byte if not greater or equal
setnl
SETNL
set byte if not less
setnle
SETNLE
set byte if not less or equal
setno
SETNO
set byte if not overflow
setnp
SETNP
set byte if not parity
setns
SETNS
set byte if not sign (non-negative)
setnz
SETNZ
set byte if not zero


----
Personal notes:

» Click to show Spoiler - click again to hide... «


----
My first 64-bit Assembly Windows program:
» Click to show Spoiler - click again to hide... «


This post has been edited by FlierMate: Jun 3 2023, 10:33 PM
FlierMate4
post Feb 4 2023, 09:42 PM

Getting Started
**
Validating
90 posts

Joined: Jan 2023
Has anyone heard of "Stack Imbalance"?

I encounter "PInvokeStackImbalance" error message when accidentally upsize the width of parameter to IsProcessorFeaturePresent Win32 API function in kernel32.dll

CODE

BOOL IsProcessorFeaturePresent(
[in] DWORD ProcessorFeature
);


As shown in the syntax above as outlined by https://learn.microsoft.com/en-us/windows/w...rectedfrom=MSDN , ProcessorFeature parameter is DWORD 32-bit in size.

What I did was specify the parameter size as UInt64 (unsigned integer 64-bit) in C# .NET:
CODE

[DllImport("kernel32.dll")]
       private static extern bool IsProcessorFeaturePresent(UInt64 ProcessorFeature);

(The correct data type is "uint" which is 32-bit unsigned integer)

You can notice in the screenshot, I got this nice error message:
QUOTE
Managed Debugging Assistant 'PInvokeStackImbalance'
  Message=Managed Debugging Assistant 'PInvokeStackImbalance' : 'A call to PInvoke function 'example1!example1.Program::IsProcessorFeaturePresent' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature.'
I understand that in Assembly, I might have pushed another DWORD (and thus QWORD) onto the stack, like this?
CODE

cpufeature dd 39
...
....
   push [cpufeature]
   push [cpufeature];extra push (cpufeature != dq)
   call [IsProcessorFeaturePresent]

So it caused the stack to misaligned, but isn't it a nice way to say, "....has unbalanced the stack".

What's your thoughts? Are you new to this term "stack imbalance"?

user posted image

Corrections:
QUOTE
Stack imbalance == mismatched push-to-pop ratio. So the stack either keeps growing, or keep shrinking.

It is different from misaligned.

Misaligned = Addresses don't have the lowest bits as zero.


This post has been edited by FlierMate4: Feb 4 2023, 10:34 PM
Tullamarine
post May 15 2023, 05:02 PM

Getting Started
**
Validating
163 posts

Joined: Apr 2020
For those who are familiar with FASM 1 (flat assembler), you are likely to have used invoke to call Windows API function indirectly, or cinvoke for function call that requires C calling convention, such as wsprintfA in USER32.DLL.

As I progress in x86 Assembly programming, I discover that invoke and cinvoke are macros in FASM 1.

If you use disassembler like IDA Freeware, sooner or later you will discover invoke is actually:

CODE
invoke ExitProcess, 0


CODE
push 0
call [ExitProcess]


And likewise, cinvoke is actually:

CODE
cinvoke wsprintfA, DWORD param1, DWORD param2, DWORD param3


CODE
push param3
push param2
push param1
call [wsprintfA]
add esp, 12    ;4 bytes x 3


As you see from above, C calling convention requires caller to maintain a balanced stack by popping it off the stack, by restoring the stack pointer. For your information, stack in x86 grows downward, every time you push a parameter onto the stack, you are effectively decrement the stack pointer.

This is important in x86 Assembly programming, because beware not to rely on macros by learning the fundamentals in disassembly.

 

Change to:
| Lo-Fi Version
0.0190sec    0.31    5 queries    GZIP Disabled
Time is now: 25th December 2025 - 01:37 AM