TVM

TVM is a kind code generation and testing tools target at specific architecture. It improves the efficiency of code optimization on CNN tasks on various platforms.
This blog records my reading of the source code and tracking of the execution (and even modification) of TVM.

TVM provides operator description / scheduler (blocking and partition) interface. Following the tensor_expr_get_started.ipynb for an example.

Operator Description

Operator description is done by api: var/placeholder/compute.

From $TVM/python/tvm/init.py, we could see they are from the api.py in the same folder, which actually a wrapper of C++ function(all function begin with _api_internal). We can find the following note in the python file.

The functions in this namespace are automatically exported from C++ side via PackedFunc
that is registered by “TVM_REGISTER_” macro. This way makes calling Python functions from C++
side very easily.
Each string starts with “_” in the “TVM_REGISTER_” macro is an internal API. You can find
all the functions in “api_lang.cc”, “api_base.cc”, “api_arith.cc” and “api_ir.cc” under “src/api”.

in include/tvm/api_registry.h

#define TVM_REGISTER_API(OpName) TVM_REGISTER_GLOBAL(OpName)

and trace back more:

#define TVM_REGISTER_GLOBAL(OpName) \
TVM_STR_CONCAT(TVM_FUNC_REG_VAR_DEF, COUNTER) = \
::tvm::runtime::Registry::Register(OpName)

this all in include/tvm/runtime/registry.h, which return a static variable in fact.

/*!

\brief Register a function with given name

\param name The name of the function.

\param override Whether allow oveeride existing function.

\return Reference to theregistry.
/
TVM_DLL static Registry& Register(const std::string& name, bool override = false); // NOLINT()

Something to note: the define of TVM_DLL in include/tvm/runtime/c_runtime_api.h

#ifdef EMSCRIPTEN
#include <emscripten/emscripten.h>
#define TVM_DLL EMSCRIPTEN_KEEPALIVE
#endif

#ifndef TVM_DLL
#ifdef _WIN32
#ifdef TVM_EXPORTS
#define TVM_DLL declspec(dllexport)
#else
#define TVM_DLL __declspec(dllimport)
#endif
#else
#define TVM_DLL __attribute((visibility(“default”)))
#endif
#endif

There are TVM_REGISTER_API and TVM_REGISTER_NODE_TYPE macros. We first focus on those on the src/api folder ones.
grep -r TVM_REGISTER_ src/api we get (141 lines):

src/api/api_test.cc:TVM_REGISTER_NODE_TYPE(TestAttrs);
src/api/api_test.cc:TVM_REGISTER_API(“_nop”)
src/api/api_test.cc:TVM_REGISTER_API(“_test_wrap_callback”)
src/api/api_test.cc:TVM_REGISTER_API(“_test_raise_error_callback”)
src/api/api_test.cc:TVM_REGISTER_API(“_test_check_eq_callback”)
src/api/api_test.cc:TVM_REGISTER_API(“_context_test”)
src/api/api_test.cc:TVM_REGISTER_API(“_ErrorTest”)
src/api/api_test.cc:TVM_REGISTER_API(“_ndarray_use_count”)
src/api/api_ir.cc:TVM_REGISTER_API(“_Var”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.abs”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.floor”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.ceil”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.round”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.trunc”)
src/api/api_ir.cc:TVM_REGISTER_API(“make._cast”)
src/api/api_ir.cc:TVM_REGISTER_API(“make._range_by_min_extent”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.For”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.Load”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.Store”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.Realize”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.Call”)
src/api/api_ir.cc:TVM_REGISTER_API(“make.CommReducer”)
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_ir.cc: TVM_REGISTER_API(“make.”#Node) \
src/api/api_codegen.cc:TVM_REGISTER_API(“codegen._Build”)
src/api/api_codegen.cc:TVM_REGISTER_API(“module._PackImportsToC”)
src/api/api_base.cc:TVM_REGISTER_API(“_format_str”)
src/api/api_base.cc:TVM_REGISTER_API(“_raw_ptr”)
src/api/api_base.cc:TVM_REGISTER_API(“_save_json”)
src/api/api_base.cc:TVM_REGISTER_API(“_load_json”)
src/api/api_base.cc:TVM_REGISTER_API(“_TVMSetStream”)
src/api/api_base.cc:TVM_REGISTER_API(“_save_param_dict”)
src/api/api_lang.cc:TVM_REGISTER_API(“_min_value”)
src/api/api_lang.cc:TVM_REGISTER_API(“_max_value”)
src/api/api_lang.cc:TVM_REGISTER_API(“_const”)
src/api/api_lang.cc:TVM_REGISTER_API(“_str”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Array”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ArrayGetItem”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ArraySize”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Map”)
src/api/api_lang.cc:TVM_REGISTER_API(“_MapSize”)
src/api/api_lang.cc:TVM_REGISTER_API(“_MapGetItem”)
src/api/api_lang.cc:TVM_REGISTER_API(“_MapCount”)
src/api/api_lang.cc:TVM_REGISTER_API(“_MapItems”)
src/api/api_lang.cc:TVM_REGISTER_API(“Range”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Buffer”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BufferAccessPtr”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BufferVLoad”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BufferVStore”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Layout”)
src/api/api_lang.cc:TVM_REGISTER_API(“_LayoutIndexOf”)
src/api/api_lang.cc:TVM_REGISTER_API(“_LayoutFactorOf”)
src/api/api_lang.cc:TVM_REGISTER_API(“_LayoutNdim”)
src/api/api_lang.cc:TVM_REGISTER_API(“_LayoutGetItem”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BijectiveLayout”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BijectiveLayoutForwardIndex”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BijectiveLayoutBackwardIndex”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BijectiveLayoutForwardShape”)
src/api/api_lang.cc:TVM_REGISTER_API(“_BijectiveLayoutBackwardShape”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Tensor”)
src/api/api_lang.cc:TVM_REGISTER_API(“_TensorIntrin”)
src/api/api_lang.cc:TVM_REGISTER_API(“_TensorIntrinCall”)
src/api/api_lang.cc:TVM_REGISTER_API(“_TensorEqual”)
src/api/api_lang.cc:TVM_REGISTER_API(“_TensorHash”)
src/api/api_lang.cc:TVM_REGISTER_API(“_Placeholder”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ComputeOp”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScanOp”)
src/api/api_lang.cc:TVM_REGISTER_API(“_TensorComputeOp”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ExternOp”)
src/api/api_lang.cc:TVM_REGISTER_API(“_HybridOp”)
src/api/api_lang.cc:TVM_REGISTER_API(“_OpGetOutput”)
src/api/api_lang.cc:TVM_REGISTER_API(“_OpNumOutputs”)
src/api/api_lang.cc:TVM_REGISTER_API(“_OpInputTensors”)
src/api/api_lang.cc:TVM_REGISTER_API(“_IterVar”)
src/api/api_lang.cc:TVM_REGISTER_API(“_CreateSchedule”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageSetScope”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageBind”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageSplitByFactor”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageSplitByNParts”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageFuse”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageComputeAt”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageComputeInline”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageComputeRoot”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageReorder”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageTile”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageEnvThreads”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageSetStorePredicate”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageUnroll”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageVectorize”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageTensorize”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageParallel”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StagePragma”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StagePrefetch”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageStorageAlign”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageDoubleBuffer”)
src/api/api_lang.cc:TVM_REGISTER_API(“_StageOpenGL”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScheduleNormalize”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScheduleCreateGroup”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScheduleCacheRead”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScheduleCacheWrite”)
src/api/api_lang.cc:TVM_REGISTER_API(“_ScheduleRFactor”)
src/api/api_lang.cc:TVM_REGISTER_API(“_CommReducerCombine”)
src/api/dsl_api.cc:TVM_REGISTER_GLOBAL(“dsl_api.singleton”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.Simplify”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.CanonicalSimplify”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.Substitute”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.Equal”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.StorageFlatten”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.AttrsEqual”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.AttrsHash”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.ExprUseVar”)
src/api/api_pass.cc:TVM_REGISTER_API(“ir_pass.PostOrderVisit”)
src/api/api_pass.cc: TVM_REGISTER_API(“ir_pass.”#PassName) \
src/api/api_pass.cc: TVM_REGISTER_API(“ir_pass.”#PassName) \
src/api/api_pass.cc: TVM_REGISTER_API(“ir_pass.”#PassName) \
src/api/api_pass.cc: TVM_REGISTER_API(“ir_pass.”#PassName) \
src/api/api_pass.cc: TVM_REGISTER_API(“ir_pass.”#PassName) \
src/api/api_schedule.cc:TVM_REGISTER_API(“schedule.AutoInlineElemWise”)
src/api/api_schedule.cc:TVM_REGISTER_API(“schedule.AutoInlineInjective”)
src/api/api_schedule.cc:TVM_REGISTER_API(“schedule.ScheduleOps”)
src/api/api_schedule.cc: TVM_REGISTER_API(“schedule.”#PassName) \
src/api/api_schedule.cc: TVM_REGISTER_API(“schedule.”#PassName) \
src/api/api_arith.cc:TVM_REGISTER_API(“arith.intset_single_point”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.intset_vector”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.intset_interval”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.DetectLinearEquation”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.DetectClipBound”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.DeduceBound”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith.DomainTouched”)
src/api/api_arith.cc:TVM_REGISTER_API(“_IntervalSetGetMin”)
src/api/api_arith.cc:TVM_REGISTER_API(“_IntervalSetGetMax”)
src/api/api_arith.cc:TVM_REGISTER_API(“_IntSetIsNothing”)
src/api/api_arith.cc:TVM_REGISTER_API(“_IntSetIsEverything”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith._make_ConstIntBound”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith._make_ModularSet”)
src/api/api_arith.cc:TVM_REGISTER_API(“arith._CreateAnalyzer”)

We found the _Var, _Placeholder, _TensorComputeOp.

TVM_REGISTER_API(“_Var”)
.set_body( {
*ret = Variable::make(args[1], args[0]);
});

Two parameter, one is name, one is dtype (infile api_ir.cc). Placeholder and TensorComputeOp in the api_lang.cc

glimpse

api_base seems to give serilize/deserilze/save/load functions
api_ir seems to provide all possible operators and variable definition and loop control interface. They are worth to read if wanting to get more insight how to descripe an new operator.
api_codegen: only two interface: “codegen._Build” “module._PackImportsToC”
api_pass.cc: different optimization pass for the IR
api_lang: data type/variable and their attr

debug tvm.compute

add code in tvm.compute:

print(“debug := ndim %d, code.co_argcount %d, code.co_varnames: %r” %(ndim, code.co_argcount, code.co_varnames))
print(“debug := type of body %r “ % type(body))

get:
compute

In addition, the debug result for opt_conv_cuda.ipynb is:
compute-2

expression(tvm/expr.py)

First read the note:

User do not need to deal with expression AST node directly.
But they can be helpful for developer to do quick proptyping.
While not displayed in the document and python file.
Each expression node have subfields that can be visited from python side.
For example, you can use addexp.a to get the left operand of an Add node.
.. code-block:: python
x = tvm.var(“n”)
y = x + 2
assert(isinstance(y, tvm.expr.Add))
assert(y.a == x)
“””

there are several kinds of tvm.expr, such as tvm.expr.[Call, Reduce, Let, Not, Select … ]. They are inherited from the Expr class defined in tvm/expr.py, which is again inherited from class ExprOp and NodeBase.

ExprOp wrapper all _make.function (either call the function directly in expre.py and call through generic.py). Neverthness, all expr seem to come from the api_ir.cc (have a look the API list above with _make. as s prefix). In make.py, the script call to C++ interface:

_init_api(“tvm.make”)

TAGS: experiments tvm optimization

« HiKey970 Boot the Board (1) « Homepage » TVM cookbook - schedule »