5.类型推导

通用方法的类型还原

所谓类型推导，就是从低级语言中还原出高级语言的类型信息；

但是，现有的传统技术并不足以完美的进行还原。

本章将分析目前通用的反编译器的类型还原技术，总得来说分为两步：类型推导和类型传播。

类型定义

现有的通用反编译器，无论是IDA、Ghidra、JEB还是其它的，目标都是还原出C-like的高层伪代码，因此他们的类型系统设计时也是C-like形态。

ref: https://www.geeksforgeeks.org/data-types-in-c/

同时，考虑到不同架构上的int整形长度可能有差别，为了避免歧义，反编译器定义的类型同时往往会带上具体长度。

在Ghidra推导支持的类型信息在Ghidra\Features\Decompiler\src\decompile\cpp\typeop.cc中进行了定义，具体如下，注意Ghidra的类型具体长度是与程序的架构相关的:

/// The core meta-types supported by the decompiler. These are sizeless templates
/// for the elements making up the type algebra.  Index is important for Datatype::base2sub array.
enum type_metatype {
  TYPE_VOID = 14,		///< Standard "void" type, absence of type
  TYPE_SPACEBASE = 13,		///< Placeholder for symbol/type look-up calculations
  TYPE_UNKNOWN = 12,		///< An unknown low-level type. Treated as an unsigned integer.
  TYPE_INT = 11,		///< Signed integer. Signed is considered less specific than unsigned in C
  TYPE_UINT = 10,		///< Unsigned integer
  TYPE_BOOL = 9,		///< Boolean
  TYPE_CODE = 8,		///< Data is actual executable code
  TYPE_FLOAT = 7,		///< Floating-point

  TYPE_PTR = 6,			///< Pointer data-type
  TYPE_PTRREL = 5,		///< Pointer relative to another data-type (specialization of TYPE_PTR)
  TYPE_ARRAY = 4,		///< Array data-type, made up of a sequence of "element" datatype
  TYPE_STRUCT = 3,		///< Structure data-type, made up of component datatypes
  TYPE_UNION = 2,		///< An overlapping union of multiple datatypes
  TYPE_PARTIALSTRUCT = 1,	///< Part of a structure, stored separately from the whole
  TYPE_PARTIALUNION = 0		///< Part of a union
};

IDA 也是类似，但更加直观，直接把类型限制死了，管你是哪个平台算出来都是这么几个

推导初始类型

在反编译中通过两个维度推导一个变量的初始类型：操作长度和操作类型

例如，mov rax, [rbx] 这个指令会对rbx进行取址，那么此时rbx就是一个长度为64bit的指针类型（这好像是废话，64位程序的指针长度只能是64位），rax是一个64bit的未知类型。

首先，反编译会对中间语言的各个操作符预先定义好类型信息。

我们以Ghidra为例，在\Ghidra\Features\Decompiler\src\decompile\cpp\typeop.cc一开头就有一段很长的代码，把所有涉及到的中间语言操作对应的预制类型信息初始化了。

初始化将类型分为有符号、算数、逻辑操作、浮点等几类（指针类型的操作判断是在类型传播过程中完成的）

enum {
    inherits_sign = 1,		///< Operator token inherits signedness from its inputs
    inherits_sign_zero = 2,	///< Only inherits sign from first operand, not the second
    shift_op = 4,		///< Shift operation
    arithmetic_op = 8,		///< Operation involving addition, multiplication, or division
    logical_op = 0x10,		///< Logical operation
    floatingpoint_op = 0x20	///< Floating-point operation
  };