nvcc --help 【命令备忘】
【摘要】
C:\Users\panda>nvcc --help Usage : nvcc [options] <inputfile>
Options for specifying the compilation phase
===========================...
-
C:\Users\panda>nvcc --help
-
-
Usage : nvcc [options] <inputfile>
Options for specifying the compilation phase
-
-
============================================
-
More exactly, this option specifies up to which stage the input files must be compiled,
-
according to the following compilation trajectories for different input file types:
-
.c/.cc/.cpp/.cxx : preprocess, compile, link
-
.o : link
-
.i/.ii : compile, link
-
.cu : preprocess, cuda frontend, PTX assemble,
-
merge with host C code, compile, link
-
.gpu : cicc compile into cubin
-
.ptx : PTX assemble into cubin.
-
-
--cuda (-cuda)
-
Compile all .cu input files to .cu.cpp.ii output.
-
-
--cubin (-cubin)
-
Compile all .cu/.gpu/.ptx input files to device-only .cubin files. This
-
step discards the host code for each .cu input file.
-
-
--fatbin(-fatbin)
-
Compile all .cu/.gpu/.ptx/.cubin input files to device-only .fatbin files.
-
This step discards the host code for each .cu input file.
-
-
--ptx (-ptx)
-
Compile all .cu/.gpu input files to device-only .ptx files. This step discards
-
the host code for each of these input file.
-
-
--preprocess (-E)
-
Preprocess all .c/.cc/.cpp/.cxx/.cu input files.
-
-
--generate-dependencies (-M)
-
Generate a dependency file that can be included in a make file for the .c/.cc/.cpp/.cxx/.cu
-
input file (more than one are not allowed in this mode).
-
-
--compile (-c)
-
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file.
-
-
--device-c (-dc)
-
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains
-
relocatable device code. It is equivalent to '--relocatable-device-code=true
-
--compile'.
-
-
--device-w (-dw)
-
Compile each .c/.cc/.cpp/.cxx/.cu input file into an object file that contains
-
executable device code. It is equivalent to '--relocatable-device-code=false
-
--compile'.
-
-
--device-link (-dlink)
-
Link object files with relocatable device code and .ptx/.cubin/.fatbin files
-
into an object file with executable device code, which can be passed to the
-
host linker.
-
-
--link (-link)
-
This option specifies the default behavior: compile and link all inputs.
-
-
--lib (-lib)
-
Compile all inputs into object files (if necessary) and add the results to
-
the specified output library file.
-
-
--run (-run)
-
This option compiles and links all inputs into an executable, and executes
-
it. Or, when the input is a single executable, it is executed without any
-
compilation or linking. This step is intended for developers who do not want
-
to be bothered with setting the necessary environment variables; these are
-
set temporarily by nvcc).
-
File and path specifications.
-
-
=============================
-
-
--output-file <file> (-o)
-
Specify name and location of the output file. Only a single input file is
-
allowed when this option is present in nvcc non-linking/archiving mode.
-
-
--pre-include <file>,... (-include)
-
Specify header files that must be preincluded during preprocessing.
-
-
--library <library>,... (-l)
-
Specify libraries to be used in the linking stage without the library file
-
extension. The libraries are searched for on the library search paths that
-
have been specified using option '--library-path'.
-
-
--define-macro <def>,... (-D)
-
Specify macro definitions to define for use during preprocessing or compilation.
-
-
--undefine-macro <def>,... (-U)
-
Undefine macro definitions during preprocessing or compilation.
-
-
--include-path <path>,... (-I)
-
Specify include search paths.
-
-
--system-include <path>,... (-isystem)
-
Specify system include search paths.
-
-
--library-path <path>,... (-L)
-
Specify library search paths.
-
-
--output-directory <directory> (-odir)
-
Specify the directory of the output file. This option is intended for letting
-
the dependency generation step (see '--generate-dependencies') generate a
-
rule that defines the target object file in the proper directory.
-
-
--compiler-bindir <path> (-ccbin)
-
Specify the directory in which the host compiler executable resides. The
-
host compiler executable name can be also specified to ensure that the correct
-
host compiler is selected. In addition, driver prefix options ('--input-drive-prefix',
-
'--dependency-drive-prefix', or '--drive-prefix') may need to be specified,
-
if nvcc is executed in a Cygwin shell or a MinGW shell on Windows.
-
-
--cudart{none|shared|static} (-cudart)
-
Specify the type of CUDA runtime library to be used: no CUDA runtime library,
-
shared/dynamic CUDA runtime library, or static CUDA runtime library.
-
Allowed values for this option: 'none','shared','static'.
-
Default value: 'static'.
-
-
--libdevice-directory <directory> (-ldir)
-
Specify the directory that contains the libdevice library files when option
-
'--dont-use-profile' is used. Libdevice library files are located in the
-
'nvvm/libdevice' directory in the CUDA toolkit.
-
-
--cl-version <cl-version-number> --cl-version <cl-version-number>
-
Specify the version of Microsoft Visual Studio installation. Note: this
-
option is to be used in conjunction with '--use-local-env', and is ignored
-
when '--use-local-env' is not specified. Support for VS2010 and earlier has
-
been deprecated.
-
Allowed values for this option: 2010,2012,2013,2015,2017.
-
-
--use-local-env --use-local-env
-
Specify whether the environment is already set up for the host compiler.
Options for specifying behavior of compiler/linker.
-
-
===================================================
-
-
--profile (-pg)
-
Instrument generated code/executable for use by gprof (Linux only).
-
-
--debug (-g)
-
Generate debug information for host code.
-
-
--device-debug (-G)
-
Generate debug information for device code. Turns off all optimizations.
-
Don't use for profiling; use -lineinfo instead.
-
-
--generate-line-info (-lineinfo)
-
Generate line-number information for device code.
-
-
--optimize <level> (-O)
-
Specify optimization level for host code.
-
-
--ftemplate-backtrace-limit <limit> (-ftemplate-backtrace-limit)
-
Set the maximum number of template instantiation notes for a single warning
-
or error to <limit>. A value of 0 is allowed, and indicates that no limit
-
should be enforced. This value is also passed to the host compiler if it
-
provides an equivalent flag.
-
-
--ftemplate-depth <limit> (-ftemplate-depth)
-
Set the maximum instantiation depth for template classes to <limit>. This
-
value is also passed to the host compiler if it provides an equivalent flag.
-
-
--shared(-shared)
-
Generate a shared library during linking. Use option '--linker-options'
-
when other linker options are required for more control.
-
-
--x {c|c++|cu} (-x)
-
Explicitly specify the language for the input files, rather than letting
-
the compiler choose a default based on the file name suffix.
-
Allowed values for this option: 'c','c++','cu'.
-
-
--std {c++03|c++11|c++14} (-std)
-
Select a particular C++ dialect. Note that this flag also turns on the corresponding
-
dialect flag for the host compiler.
-
Allowed values for this option: 'c++03','c++11','c++14'.
-
-
--no-host-device-initializer-list (-nohdinitlist)
-
Do not implicitly consider member functions of std::initializer_list as __host__
-
__device__ functions.
-
-
--no-host-device-move-forward (-nohdmoveforward)
-
Do not implicitly consider std::move and std::forward as __host__ __device__
-
function templates.
-
-
--expt-relaxed-constexpr (-expt-relaxed-constexpr)
-
Experimental flag: Allow host code to invoke __device__ constexpr functions,
-
and device code to invoke __host__ constexpr functions.
-
-
--expt-extended-lambda (-expt-extended-lambda)
-
Experimental flag: Allow __host__, __device__ annotations in lambda declaration.
-
-
--machine {32|64} (-m)
-
Specify 32 vs 64 bit architecture.
-
Allowed values for this option: 32,64.
-
Default value: 64.
-
Options for passing specific phase options
-
-
==========================================
-
These allow for passing options directly to the intended compilation phase. Using
-
these, users have the ability to pass options to the lower level compilation tools,
-
without the need for nvcc to know about each and every such option.
-
-
--compiler-options <options>,... (-Xcompiler)
-
Specify options directly to the compiler/preprocessor.
-
-
--linker-options <options>,... (-Xlinker)
-
Specify options directly to the host linker.
-
-
--archive-options <options>,... (-Xarchive)
-
Specify options directly to library manager.
-
-
--ptxas-options <options>,... (-Xptxas)
-
Specify options directly to ptxas, the PTX optimizing assembler.
-
-
--nvlink-options <options>,... (-Xnvlink)
-
Specify options directly to nvlink.
Miscellaneous options for guiding the compiler driver.
-
-
======================================================
-
-
--dont-use-profile (-noprof)
-
Nvcc uses the nvcc.profiles file for compilation. When specifying this option,
-
the profile file is not used.
-
-
--dryrun(-dryrun)
-
Do not execute the compilation commands generated by nvcc. Instead, list
-
them.
-
-
--verbose (-v)
-
List the compilation commands generated by this compiler driver, but do not
-
suppress their execution.
-
-
--keep (-keep)
-
Keep all intermediate files that are generated during internal compilation
-
steps.
-
-
--keep-dir <directory> (-keep-dir)
-
Keep all intermediate files that are generated during internal compilation
-
steps in this directory.
-
-
--save-temps (-save-temps)
-
This option is an alias of '--keep'.
-
-
--clean-targets (-clean)
-
This option reverses the behavior of nvcc. When specified, none of the compilation
-
phases will be executed. Instead, all of the non-temporary files that nvcc
-
would otherwise create will be deleted.
-
-
--run-args <arguments>,... (-run-args)
-
Used in combination with option --run to specify command line arguments for
-
the executable.
-
-
--input-drive-prefix <prefix> (-idp)
-
On Windows, all command line arguments that refer to file names must be converted
-
to the Windows native format before they are passed to pure Windows executables.
-
This option specifies how the current development environment represents
-
absolute paths. Use '/cygwin/' as <prefix> for Cygwin build environments,
-
and '/' as <prefix> for MinGW.
-
-
--dependency-drive-prefix <prefix> (-ddp)
-
On Windows, when generating dependency files (see --generate-dependencies),
-
all file names must be converted appropriately for the instance of 'make'
-
that is used. Some instances of 'make' have trouble with the colon in absolute
-
paths in the native Windows format, which depends on the environment in which
-
the 'make' instance has been compiled. Use '/cygwin/' as <prefix> for a
-
Cygwin make, and '/' as <prefix> for MinGW. Or leave these file names in
-
the native Windows format by specifying nothing.
-
-
--drive-prefix <prefix> (-dp)
-
Specifies <prefix> as both --input-drive-prefix and --dependency-drive-prefix.
-
-
--dependency-target-name <target> (-MT)
-
Specify the target name of the generated rule when generating a dependency
-
file (see '--generate-dependencies').
-
-
--no-align-double --no-align-double
-
Specifies that '-malign-double' should not be passed as a compiler argument
-
on 32-bit platforms. WARNING: this makes the ABI incompatible with the cuda's
-
kernel ABI for certain 64-bit types.
-
-
--no-device-link (-nodlink)
-
Skip the device link step when linking object files.
-
Options for steering GPU code generation.
-
-
=========================================
-
-
--gpu-architecture <arch> (-arch)
-
Specify the name of the class of NVIDIA 'virtual' GPU architecture for which
-
the CUDA input files must be compiled.
-
With the exception as described for the shorthand below, the architecture
-
specified with this option must be a 'virtual' architecture (such as compute_50).
-
Normally, this option alone does not trigger assembly of the generated PTX
-
for a 'real' architecture (that is the role of nvcc option '--gpu-code',
-
see below); rather, its purpose is to control preprocessing and compilation
-
of the input to PTX.
-
For convenience, in case of simple nvcc compilations, the following shorthand
-
is supported. If no value for option '--gpu-code' is specified, then the
-
value of this option defaults to the value of '--gpu-architecture'. In this
-
situation, as only exception to the description above, the value specified
-
for '--gpu-architecture' may be a 'real' architecture (such as a sm_50),
-
in which case nvcc uses the specified 'real' architecture and its closest
-
'virtual' architecture as effective architecture values. For example, 'nvcc
-
--gpu-architecture=sm_50' is equivalent to 'nvcc --gpu-architecture=compute_50
-
--gpu-code=sm_50,compute_50'.
-
Allowed values for this option: 'compute_30','compute_32','compute_35',
-
'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
-
'compute_62','compute_70','compute_72','sm_30','sm_32','sm_35','sm_37','sm_50',
-
'sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72'.
-
-
--gpu-code <code>,... (-code)
-
Specify the name of the NVIDIA GPU to assemble and optimize PTX for.
-
nvcc embeds a compiled code image in the resulting executable for each specified
-
<code> architecture, which is a true binary load image for each 'real' architecture
-
(such as sm_50), and PTX code for the 'virtual' architecture (such as compute_50).
-
During runtime, such embedded PTX code is dynamically compiled by the CUDA
-
runtime system if no binary load image is found for the 'current' GPU.
-
Architectures specified for options '--gpu-architecture' and '--gpu-code'
-
may be 'virtual' as well as 'real', but the <code> architectures must be
-
compatible with the <arch> architecture. When the '--gpu-code' option is
-
used, the value for the '--gpu-architecture' option must be a 'virtual' PTX
-
architecture.
-
For instance, '--gpu-architecture=compute_35' is not compatible with '--gpu-code=sm_30',
-
because the earlier compilation stages will assume the availability of 'compute_35'
-
features that are not present on 'sm_30'.
-
Allowed values for this option: 'compute_30','compute_32','compute_35',
-
'compute_37','compute_50','compute_52','compute_53','compute_60','compute_61',
-
'compute_62','compute_70','compute_72','sm_30','sm_32','sm_35','sm_37','sm_50',
-
'sm_52','sm_53','sm_60','sm_61','sm_62','sm_70','sm_72'.
-
-
--generate-code <specification>,... (-gencode)
-
This option provides a generalization of the '--gpu-architecture=<arch> --gpu-code=<code>,
-
...' option combination for specifying nvcc behavior with respect to code
-
generation. Where use of the previous options generates code for different
-
'real' architectures with the PTX for the same 'virtual' architecture, option
-
'--generate-code' allows multiple PTX generations for different 'virtual'
-
architectures. In fact, '--gpu-architecture=<arch> --gpu-code=<code>,
-
...' is equivalent to '--generate-code arch=<arch>,code=<code>,...'.
-
'--generate-code' options may be repeated for different virtual architectures.
-
Allowed keywords for this option: 'arch','code'.
-
-
--relocatable-device-code {true|false} (-rdc)
-
Enable (disable) the generation of relocatable device code. If disabled,
-
executable device code is generated. Relocatable device code must be linked
-
before it can be executed.
-
Default value: false.
-
-
--entries entry,... (-e)
-
Specify the global entry functions for which code must be generated. By
-
default, code will be generated for all entry functions.
-
-
--maxrregcount <amount> (-maxrregcount)
-
Specify the maximum amount of registers that GPU functions can use.
-
Until a function-specific limit, a higher value will generally increase the
-
performance of individual GPU threads that execute this function. However,
-
because thread registers are allocated from a global register pool on each
-
GPU, a higher value of this option will also reduce the maximum thread block
-
size, thereby reducing the amount of thread parallelism. Hence, a good maxrregcount
-
value is the result of a trade-off.
-
If this option is not specified, then no maximum is assumed.
-
Value less than the minimum registers required by ABI will be bumped up by
-
the compiler to ABI minimum limit.
-
User program may not be able to make use of all registers as some registers
-
are reserved by compiler.
-
-
--use_fast_math (-use_fast_math)
-
Make use of fast math library. '--use_fast_math' implies '--ftz=true --prec-div=false
-
--prec-sqrt=false --fmad=true'.
-
-
--ftz {true|false} (-ftz)
-
This option controls single-precision denormals support. '--ftz=true' flushes
-
denormal values to zero and '--ftz=false' preserves denormal values. '--use_fast_math'
-
implies '--ftz=true'.
-
Default value: false.
-
-
--prec-div {true|false} (-prec-div)
-
This option controls single-precision floating-point division and reciprocals.
-
'--prec-div=true' enables the IEEE round-to-nearest mode and '--prec-div=false'
-
enables the fast approximation mode. '--use_fast_math' implies '--prec-div=false'.
-
Default value: true.
-
-
--prec-sqrt {true|false} (-prec-sqrt)
-
This option controls single-precision floating-point squre root. '--prec-sqrt=true'
-
enables the IEEE round-to-nearest mode and '--prec-sqrt=false' enables the
-
fast approximation mode. '--use_fast_math' implies '--prec-sqrt=false'.
-
Default value: true.
-
-
--fmad {true|false} (-fmad)
-
This option enables (disables) the contraction of floating-point multiplies
-
and adds/subtracts into floating-point multiply-add operations (FMAD, FFMA,
-
or DFMA). '--use_fast_math' implies '--fmad=true'.
-
Default value: true.
Options for steering cuda compilation.
-
-
======================================
-
-
--default-stream {legacy|null|per-thread} (-default-stream)
-
Specify the stream that CUDA commands from the compiled program will be sent
-
to by default.
-
-
legacy
-
The CUDA legacy stream (per context, implicitly synchronizes with
-
other streams).
-
-
per-thread
-
A normal CUDA stream (per thread, does not implicitly
-
synchronize with other streams).
-
-
'null' is a deprecated alias for 'legacy'.
-
-
Allowed values for this option: 'legacy','null','per-thread'.
-
Default value: 'legacy'.
Generic tool options.
-
-
=====================
-
-
--disable-warnings (-w)
-
Inhibit all warning messages.
-
-
--keep-device-functions (-keep-device-functions)
-
In whole program compilation mode, preserve user defined external linkage
-
__device__ function definitions up to PTX.
-
-
--source-in-ptx (-src-in-ptx)
-
Interleave source in PTX. May only be used in conjunction with --device-debug
-
or --generate-line-info.
-
-
--restrict (-restrict)
-
Programmer assertion that all kernel pointer parameters are restrict pointers.
-
-
--Wreorder (-Wreorder)
-
Generate warnings when member initializers are reordered.
-
-
--Wno-deprecated-declarations (-Wno-deprecated-declarations)
-
Suppress warning on use of deprecated entity.
-
-
--Wno-deprecated-gpu-targets (-Wno-deprecated-gpu-targets)
-
Suppress warnings about deprecated GPU target architectures.
-
-
--Werror<kind>,... (-Werror)
-
Make warnings of the specified kinds into errors. The following is the list
-
of warning kinds accepted by this option:
-
-
cross-execution-space-call
-
Be more strict about unsupported cross execution space calls.
-
The compiler will generate an error instead of a warning for a
-
call from a __host__ __device__ to a __host__ function.
-
reorder
-
Generate errors when member initializers are reordered.
-
deprecated-declarations
-
Generate error on use of a deprecated entity.
-
Allowed values for this option: 'cross-execution-space-call','deprecated-declarations',
-
'reorder'.
-
-
--resource-usage (-res-usage)
-
Show resource usage such as registers and memory of the GPU code.
-
This option implies '--nvlink-options --verbose' when '--relocatable-device-code=true'
-
is set. Otherwise, it implies '--ptxas-options --verbose'.
-
-
--help (-h)
-
Print this help information on this tool.
-
-
--version (-V)
-
Print version information on this tool.
-
-
--options-file <file>,... (-optf)
-
Include command line options from specified file.
文章来源: panda1234lee.blog.csdn.net,作者:panda1234lee,版权归原作者所有,如需转载,请联系作者。
原文链接:panda1234lee.blog.csdn.net/article/details/84564540
【版权声明】本文为华为云社区用户转载文章,如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容,举报邮箱:
cloudbbs@huaweicloud.com
- 点赞
- 收藏
- 关注作者
评论(0)