64-bit package still runs 32-bit cuda50

Started by tommcg, May 20, 2014, 01:09:28 PM

Previous topic - Next topic
Both the 0.41 64-bit windows installer package, and the individual 64-bit cuda50 package contain only 32-bit executable.  Where can I find the real 64-bit package or binaries?

Thx.

During testing, it was found that the 64-bit executables were slower than the 32-bit versions. The choice was made to release 32-bit versions for all packages.

The difference is in the app_info.xml file created by the installer, as it contains entries expected by 64-bit BOINC and necessary to retain work in progress.
A person who won't read has no advantage over one who can't read. (Mark Twain)

Quote from: arkayn on May 20, 2014, 03:27:17 PM
During testing, it was found that the 64-bit executables were slower than the 32-bit versions.

That seems really odd, unless large portion of in-memory data contains mostly pointers, like pointer-based b-tree index or such.  Or, if the code has x86-specific asm instead of using SSE intrinsics that work on both platforms.  I've written compression code using SSE intrinsics, and it is at least 30% faster as 64-bit app vs 32-bit app.

Is the source code available somewhere to browse?

Thx.


May 20, 2014, 11:17:47 PM #4 Last Edit: May 20, 2014, 11:25:25 PM by Claggy
Quote from: tommcg on May 20, 2014, 10:56:40 PM
Quote from: arkayn on May 20, 2014, 03:27:17 PM
During testing, it was found that the 64-bit executables were slower than the 32-bit versions.
That seems really odd, unless large portion of in-memory data contains mostly pointers, like pointer-based b-tree index or such.  Or, if the code has x86-specific asm instead of using SSE intrinsics that work on both platforms.  I've written compression code using SSE intrinsics, and it is at least 30% faster as 64-bit app vs 32-bit app.

Is the source code available somewhere to browse?

Thx.
For Cuda it's the extra address space that makes Cuda64 apps slower,

Stock is in seti_boinc, Optimised and xbranch in is in branches/sah_v7_opt:

Porting and optimizing SETI@home

https://setisvn.ssl.berkeley.edu/trac/browser

Claggy

Quote from: Claggy on May 20, 2014, 11:17:47 PM
Quote from: tommcg on May 20, 2014, 10:56:40 PM
Quote from: arkayn on May 20, 2014, 03:27:17 PM
During testing, it was found that the 64-bit executables were slower than the 32-bit versions.
That seems really odd, unless large portion of in-memory data contains mostly pointers, like pointer-based b-tree index or such.  Or, if the code has x86-specific asm instead of using SSE intrinsics that work on both platforms.  I've written compression code using SSE intrinsics, and it is at least 30% faster as 64-bit app vs 32-bit app.

Is the source code available somewhere to browse?

Thx.
For Cuda it's the extra address space that makes Cuda64 apps slower,

Stock is in seti_boinc, Optimised and xbranch in is in branches/sah_v7_opt:

Porting and optimizing SETI@home

https://setisvn.ssl.berkeley.edu/trac/browser

Claggy

Correct.  Simply put, With a lot of memory bound operations at this time (meaning mostly pointer arithmetic), and few latency hiding mechanisms used, pointers being double the size means double the size of code.  Since loading code induces various latencies, and larger pointers sap precious GPU register space... 32 bit GPU code is just faster On Windows (Linux a different special case where 32 bit won't build due to OS and Cuda toolkit limitations). 

As with everything though, things can change and evolve.  As we have no use whatsoever for huge amounts of GPU memory within one application instance ( Yet! ), focussing on making native 64 bit Cuda binaries for Windows isn't high on any priority list.  That will possibly change as newer hardware, drivers, toolkjits, and latency hiding techniques become employed.

In general though, bear in mind that using huge amounts of memory (either host or GPU) tends to be an indicator of poor optimisation, not good optimisation.
It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change.
Charles Darwin
---
Chaos: When the present determines the future, but the approximate present does not approximately determine the future.
Edward Lorenz

Powered by EzPortal