LLVM IR数据结构分析

研究了好几天终于搞清楚llvm的结构了,其实没必要但……一些pwn手对数据结构和内存分布的执着

过程及其坎坷,光编译一个带符号的.so文件就编译了好几次(╯‵□′)╯︵┻━┻(因为想看内存分布)

先放一张复杂的图(

LLVMContext

  • 一个黑盒,管理llvm中基础的、核心的“全局”数据,如类型、标准化的常量表

  • LLVMContext包含了llvm在一个线程中正常运行(比如一个编译任务)所需要的数据,在老版本中这些都是全局数据,现在他们被打包成了一个类LLVMContext,这样llvm就可以支持多线程的编译任务了

  • 之后会作为传入参数多次用到,不需要具体知道是什么(

  • Module类中有context成员

    1
    2
    3
    class Module {  
    private:
    LLVMContext &Context;
  • 创建一个LLVMContext

    1
    LLVMContext & context = llvm::getGlobalContext();
    • LLVMContext删除了拷贝构造函数

      1
      LLVMContext(LLVMContext &) = delete;

      这样就不行

      1
      2
      LLVMContext context1;
      LLVMContext context2(context1);
    • LLVMContext删除了拷贝赋值运算符

      1
      LLVMContext &operator=(const LLVMContext &) = delete;

      这样会不行

      1
      2
      3
      LLVMContext context1;
      LLVMContext context2;
      context2 = context1

      要注意使用引用

Module

一些碎碎念:c++太恶心了,嵌套n层类就是一个双链表

Module主要的成员主要是函数和全局变量的两个链表

1
2
3
4
5
6
7
class Module {
using GlobalListType = SymbolTableList<GlobalVariable>;
using FunctionListType = SymbolTableList<Function>;

private:
GlobalListType GlobalList;
FunctionListType FunctionList;

内存分布大概长这样👇

创建一个Module

创建一个Module,需要一个名字和一个LLVMContext

1
Module(StringRef ModuleID, LLVMContext& C);

一个🌰

1
2
LLVMContext & context = llvm::getGlobalContext();
Module* module = new Module("test", context);

迭代器

  • Function

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    /// @}
    /// @name Function Iteration
    /// @{

    iterator begin() { return FunctionList.begin(); }
    const_iterator begin() const { return FunctionList.begin(); }
    iterator end () { return FunctionList.end(); }
    const_iterator end () const { return FunctionList.end(); }
    reverse_iterator rbegin() { return FunctionList.rbegin(); }
    const_reverse_iterator rbegin() const{ return FunctionList.rbegin(); }
    reverse_iterator rend() { return FunctionList.rend(); }
    const_reverse_iterator rend() const { return FunctionList.rend(); }
    size_t size() const { return FunctionList.size(); }
    bool empty() const { return FunctionList.empty(); }

    iterator_range<iterator> functions() {
    return make_range(begin(), end());
    }
    iterator_range<const_iterator> functions() const {
    return make_range(begin(), end());
    }

    🌰

    1
    2
    3
    Module *module;
    for(auto &func : *module);
    for(auto &func : module->functions());
  • Global Variable

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    /// @}
    /// @name Global Variable Iteration
    /// @{

    global_iterator global_begin() { return GlobalList.begin(); }
    const_global_iterator global_begin() const { return GlobalList.begin(); }
    global_iterator global_end () { return GlobalList.end(); }
    const_global_iterator global_end () const { return GlobalList.end(); }
    size_t global_size () const { return GlobalList.size(); }
    bool global_empty() const { return GlobalList.empty(); }

    iterator_range<global_iterator> globals() {
    return make_range(global_begin(), global_end());
    }
    iterator_range<const_global_iterator> globals() const {
    return make_range(global_begin(), global_end());
    }

    🌰

    1
    2
    Module *module;
    for(auto &var : module->globals());

获取列表

  • Function

    1
    2
    3
    4
    /// Get the Module's list of functions (constant).
    const FunctionListType &getFunctionList() const { return FunctionList; }
    /// Get the Module's list of functions.
    FunctionListType &getFunctionList() { return FunctionList; }
  • Global Variable

    1
    2
    3
    4
    /// Get the Module's list of global variables (constant).
    const GlobalListType &getGlobalList() const { return GlobalList; }
    /// Get the Module's list of global variables.
    GlobalListType &getGlobalList() { return GlobalList; }

函数操作

查找函数

1
Function *getFunction(StringRef Name) const;

🌰

1
2
Module *module;
Function *pmain = module->getFunction("main");
查找Or插入函数

Module中提供了一系列插入or查找Function的函数

1
2
3
4
5
6
7
8
9
10
11
FunctionCallee getOrInsertFunction(StringRef Name, FunctionType *T,
AttributeList AttributeList);
FunctionCallee getOrInsertFunction(StringRef Name, FunctionType *T);

template <typename... ArgsTy>
FunctionCallee getOrInsertFunction(StringRef Name,
AttributeList AttributeList, Type *RetTy,
ArgsTy... Args);
template <typename... ArgsTy>
FunctionCallee getOrInsertFunction(StringRef Name, Type *RetTy,
ArgsTy... Args);

需要

  • StringRef Name:函数名

  • Type *RetTy:返回值类型

    Type类型提供了构造各种类型的静态函数,只需要提供Module的Context

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    static Type *getVoidTy(LLVMContext &C);
    static Type *getLabelTy(LLVMContext &C);
    static Type *getHalfTy(LLVMContext &C);
    static Type *getBFloatTy(LLVMContext &C);
    static Type *getFloatTy(LLVMContext &C);
    static Type *getDoubleTy(LLVMContext &C);
    static Type *getMetadataTy(LLVMContext &C);
    static Type *getX86_FP80Ty(LLVMContext &C);
    static Type *getFP128Ty(LLVMContext &C);
    static Type *getPPC_FP128Ty(LLVMContext &C);
    static Type *getX86_MMXTy(LLVMContext &C);
    static Type *getX86_AMXTy(LLVMContext &C);
    static Type *getTokenTy(LLVMContext &C);
    static IntegerType *getIntNTy(LLVMContext &C, unsigned N);
    static IntegerType *getInt1Ty(LLVMContext &C);
    static IntegerType *getInt8Ty(LLVMContext &C);
    static IntegerType *getInt16Ty(LLVMContext &C);
    static IntegerType *getInt32Ty(LLVMContext &C);
    static IntegerType *getInt64Ty(LLVMContext &C);
    static IntegerType *getInt128Ty(LLVMContext &C);

    🌰,记得命名空间

    1
    ArrayType::getInt32Ty(module->getContext())
  • ArgsTy… Args:每个参数的类型

  • FunctionType *T:函数类型(其实就是参数类型和返回值类型的集合),可以通过get方法构造

    1
    2
    3
    class FunctionType : public Type {
    static FunctionType *get(Type *Result,
    ArrayRef<Type*> Params, bool isVarArg);

    🌰,isVarArg是是否支持可变参数

    1
    FunctionType *funcTy = FunctionType::get(ArrayType::getInt32Ty(module->getContext()), {ArrayType::getInt32Ty(module->getContext()), ArrayType::getInt32Ty(module->getContext())}, false);
  • AttributeList AttributeList:先忽略(

🌰🌰

1
2
FunctionType *funcTy = FunctionType::get(ArrayType::getInt32Ty(module->getContext()), {ArrayType::getInt32Ty(module->getContext()), ArrayType::getInt32Ty(module->getContext())}, false);
FunctionCallee callee = module->getOrInsertFunction("myadd", funcTy);
1
FunctionCallee callee = module->getOrInsertFunction("myadd", ArrayType::getInt32Ty(module->getContext()), ArrayType::getInt32Ty(module->getContext()),ArrayType::getInt32Ty(module->getContext()));

返回值类型是FunctionCallee,成员为一个Value指针和一个FunctionType指针

1
2
3
4
5
6
class FunctionCallee {

private:
FunctionType *FnTy = nullptr;
Value *Callee = nullptr;
};

可以通过public方法获取这两个成员

1
2
FunctionType *getFunctionType() { return FnTy; }
Value *getCallee() { return Callee; }

这个Value指针实际上就是指向构造的Function,从getOrInsertFunction源码可以看出来

1
2
3
4
5
6
7
8
9
10
11
12
13
FunctionCallee Module::getOrInsertFunction(StringRef Name, FunctionType *Ty,
AttributeList AttributeList) {
// See if we have a definition for the specified function already.
GlobalValue *F = getNamedValue(Name);
if (!F) {
// Nope, add it
Function *New = Function::Create(Ty, GlobalVariable::ExternalLinkage,
DL.getProgramAddressSpace(), Name);
if (!New->isIntrinsic()) // Intrinsics get attrs set on construction
New->setAttributes(AttributeList);
FunctionList.push_back(New);
return {Ty, New}; // Return the new prototype.
}

使用的时候需要类型转换

1
Function *customFunc = dyn_cast<Function>(callee.getCallee());

Value,User和Use

一个形如%1 = add i32 %2, %3这里这个Instruction就是User(Value的使用者),%2、%3就是被使用的Value,Use就是这个使用的行为,在数据结构中体现为两点(User和Value)之间的一条边

Value

Value类中有一个UseList成员

1
2
class Value {
Use *UseList;

指向第一个Use

Use

一个Value被使用的所有Use以链表的形式连在一起

1
2
3
4
5
6
class Use {
private:
Value *Val = nullptr;
Use *Next = nullptr;
Use **Prev = nullptr;
User *Parent = nullptr;
  • Val:指向被使用的Value
  • Next:指向下一个Use
  • Prev:指向上一个Prev
  • Parent:指向User

User

Use会放在User结构体前,如上图所示👆

Use结构体有两种放置方式

  • 固定个数的Use,以数组的形式放在User前

  • 不定个数的Use,一个Use放在User前,这个Use的Prev指针指向Use数组

这点可以通过getOperandList函数看出来

1
2
3
4
5
6
7
8
9
10
11
12
// HasHungOffUses是Value的成员  

const Use *getOperandList() const {
return HasHungOffUses ? getHungOffOperands() : getIntrusiveOperands();
}

const Use *getHungOffOperands() const {
return *(reinterpret_cast<const Use *const *>(this) - 1);
}
const Use *getIntrusiveOperands() const {
return reinterpret_cast<const Use *>(this) - NumUserOperands;
}

当HasHungOffUses为0时,Use的Prev指向前一个Use的Next,由于Prev是Use**类型所以其实还是指向自己(

GlobalVariable

储存全局变量相关信息

相关操作

  • 创建一个GlobalVariable

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    GlobalVariable(Type *Ty, bool isConstant, LinkageTypes Linkage,
    Constant *Initializer = nullptr, const Twine &Name = "",
    ThreadLocalMode = NotThreadLocal, unsigned AddressSpace = 0,
    bool isExternallyInitialized = false);

    GlobalVariable(Module &M, Type *Ty, bool isConstant, LinkageTypes Linkage,
    Constant *Initializer, const Twine &Name = "",
    GlobalVariable *InsertBefore = nullptr,
    ThreadLocalMode = NotThreadLocal,
    Optional<unsigned> AddressSpace = None,
    bool isExternallyInitialized = false);

    GlobalVariable(const GlobalVariable &) = delete;
    GlobalVariable &operator=(const GlobalVariable &) = delete;
  • 判断是否能在运行时修改

    1
    bool isConstant() const

Function

主要成员有

  • 一个Arguments指针,指向Argument数组
  • 一个BasicBlock双链表

相关操作

  • 迭代器,相关操作同Module

    • BasicBlock迭代器

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      //===--------------------------------------------------------------------===//
      // BasicBlock iterator forwarding functions
      //
      iterator begin() { return BasicBlocks.begin(); }
      const_iterator begin() const { return BasicBlocks.begin(); }
      iterator end () { return BasicBlocks.end(); }
      const_iterator end () const { return BasicBlocks.end(); }

      size_t size() const { return BasicBlocks.size(); }
      bool empty() const { return BasicBlocks.empty(); }
      const BasicBlock &front() const { return BasicBlocks.front(); }
      BasicBlock &front() { return BasicBlocks.front(); }
      const BasicBlock &back() const { return BasicBlocks.back(); }
      BasicBlock &back() { return BasicBlocks.back(); }
    • Argument迭代器

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      arg_iterator arg_begin() {
      CheckLazyArguments();
      return Arguments;
      }
      const_arg_iterator arg_begin() const {
      CheckLazyArguments();
      return Arguments;
      }

      arg_iterator arg_end() {
      CheckLazyArguments();
      return Arguments + NumArgs;
      }
      const_arg_iterator arg_end() const {
      CheckLazyArguments();
      return Arguments + NumArgs;
      }

      Argument* getArg(unsigned i) const {
      assert (i < NumArgs && "getArg() out of range!");
      CheckLazyArguments();
      return Arguments + i;
      }

      iterator_range<arg_iterator> args() {
      return make_range(arg_begin(), arg_end());
      }
      iterator_range<const_arg_iterator> args() const {
      return make_range(arg_begin(), arg_end());
      }
  • 创建一个Function

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    static Function *Create(FunctionType *Ty, LinkageTypes Linkage,
    const Twine &N, Module &M);

    static Function *Create(FunctionType *Ty, LinkageTypes Linkage,
    unsigned AddrSpace, const Twine &N = "",
    Module *M = nullptr) {
    return new Function(Ty, Linkage, AddrSpace, N, M);
    }


    static Function *Create(FunctionType *Ty, LinkageTypes Linkage,
    const Twine &N = "", Module *M = nullptr) {
    return new Function(Ty, Linkage, static_cast<unsigned>(-1), N, M);
    }

    FunctionType的构造见上,getOrInsertFunction函数其实也是Create的封装

  • 获取函数返回类型

    1
    Type *getReturnType() const { return getFunctionType()->getReturnType(); }

    FunctionType是Type的子类,ReturnType和ParamType都存在Type类型的ContainedTys成员里,这是一个Type数组

    1
    2
    3
    4
    class Type {
    protected:
    unsigned NumContainedTys = 0;
    Type * const *ContainedTys = nullptr;

    可以从getXXXType函数中看出来

    1
    2
    3
    4
    5
    6
    7
    8
    9
    // Function
    Type *getReturnType() const { return getFunctionType()->getReturnType(); }

    // FunctionType
    FunctionType *getFunctionType() const {
    return cast<FunctionType>(getValueType());
    }
    Type *getReturnType() const { return ContainedTys[0]; }
    Type *getParamType(unsigned i) const { return ContainedTys[i+1]; }
  • 返回函数的入口BasicBlock

    1
    2
    const BasicBlock       &getEntryBlock() const   { return front(); }
    BasicBlock &getEntryBlock() { return front(); }
  • 设置&获取调用规则

    1
    2
    3
    4
    5
    6
    7
    8
    9
    CallingConv::ID getCallingConv() const {
    return static_cast<CallingConv::ID>((getSubclassDataFromValue() >> 4) &
    CallingConv::MaxID);
    }
    void setCallingConv(CallingConv::ID CC) {
    auto ID = static_cast<unsigned>(CC);
    assert(!(ID & ~CallingConv::MaxID) && "Unsupported calling convention");
    setValueSubclassData((getSubclassDataFromValue() & 0xc00f) | (ID << 4));
    }

    🌰,不设置好像也没啥问题(

    1
    2
    Function *foo;
    foo->setCallingConv(CallingConv::C);

BasicBlock

相关操作

  • 创建一个BasicBlock

    1
    2
    3
    4
    5
    static BasicBlock *Create(LLVMContext &Context, const Twine &Name = "",
    Function *Parent = nullptr,
    BasicBlock *InsertBefore = nullptr) {
    return new BasicBlock(Context, Name, Parent, InsertBefore);
    }

    当InsertBefore为NULL时默认插入Function末尾

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    BasicBlock::BasicBlock(LLVMContext &C, const Twine &Name, Function *NewParent,
    BasicBlock *InsertBefore)
    : Value(Type::getLabelTy(C), Value::BasicBlockVal), Parent(nullptr) {

    if (NewParent)
    // Insert unlinked basic block into a function. Inserts an unlinked basic block into Parent. If InsertBefore is provided, inserts before that basic block, otherwise inserts at the end.
    insertInto(NewParent, InsertBefore);
    else
    assert(!InsertBefore &&
    "Cannot insert block before another block with no function!");

    setName(Name);
    }

    🌰

    1
    2
    Function *customFunc;
    BasicBlock *entryBlock = BasicBlock::Create(context, "", customFunc, 0);
  • Instruction迭代器

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    //===--------------------------------------------------------------------===//
    /// Instruction iterator methods
    ///
    inline iterator begin() { return InstList.begin(); }
    inline const_iterator begin() const { return InstList.begin(); }
    inline iterator end () { return InstList.end(); }
    inline const_iterator end () const { return InstList.end(); }

    inline reverse_iterator rbegin() { return InstList.rbegin(); }
    inline const_reverse_iterator rbegin() const { return InstList.rbegin(); }
    inline reverse_iterator rend () { return InstList.rend(); }
    inline const_reverse_iterator rend () const { return InstList.rend(); }

    inline size_t size() const { return InstList.size(); }
    inline bool empty() const { return InstList.empty(); }
    inline const Instruction &front() const { return InstList.front(); }
    inline Instruction &front() { return InstList.front(); }
    inline const Instruction &back() const { return InstList.back(); }
    inline Instruction &back() { return InstList.back(); }
  • 获取所属Function

    1
    2
    const Function *getParent() const { return Parent; }
    Function *getParent() { return Parent; }

Instruction

相关操作

  • 获取父BasicBlock

    1
    2
    inline const BasicBlock *getParent() const { return Parent; }
    inline BasicBlock *getParent() { return Parent; }
  • 获取指令操作码

    1
    unsigned getOpcode() const { return getValueID() - InstructionVal; }
  • 返回指令的另一个实例

    1
    Instruction *clone() const;

    但这个指令

    • 没有名字
    • 没有Parent
  • 指令替换

    1
    2
    3
    void ReplaceInstWithInst(BasicBlock::InstListType &BIL,
    BasicBlock::iterator &BI, Instruction *I);
    void ReplaceInstWithInst(Instruction *From, Instruction *To); // 不更新迭代器,会段错误

    🌰

    1
    2
    3
    4
    5
    for (auto it = bas.begin(); it != bas.end(); it++){
    ……
    ReplaceInstWithInst(old_ope->getParent()->getInstList(), it, myaddCall);
    old_ope->replaceAllUsesWith(myaddCall);
    }

不同Instruction的创建

只列出来了写作业的时候使用的

alloca

alloca命令是AllocaInst类型,继承关系是

1
AllocaInst->UnaryInstruction->Instruction

只比Instruction多了一个成员,表示储存数据的类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$2 = {
<llvm::UnaryInstruction> = {
<llvm::Instruction> = {
<llvm::User> = {
<llvm::Value> = {
VTy = 0x46af60,
UseList = 0x0,
SubclassID = 57 '9',
HasValueHandle = 0 '\000',
SubclassOptionalData = 0 '\000',
SubclassData = 2,
NumUserOperands = 1,
IsUsedByMD = 0,
HasName = 0,
HasMetadata = 0,
HasHungOffUses = 0,
HasDescriptor = 0,
static MaxAlignmentExponent = 29,
static MaximumAlignment = 536870912
}, <No data fields>},
<llvm::ilist_node_with_parent<llvm::Instruction, llvm::BasicBlock>> = {
<llvm::ilist_node<llvm::Instruction>> = {
<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void> >> = {
<llvm::ilist_node_base<false>> = {
Prev = 0x46abd8,
Next = 0x46abd8
}, <No data fields>}, <No data fields>}, <No data fields>},
members of llvm::Instruction:
Parent = 0x46abb0,
DbgLoc = {
Loc = {
Ref = {
MD = 0x0
}
}
},
Order = 0
}, <No data fields>},
members of llvm::AllocaInst:
AllocatedType = 0x466d80
}

用new新建一个AllocaInst

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public:
explicit AllocaInst(Type *Ty, unsigned AddrSpace, Value *ArraySize,
const Twine &Name, Instruction *InsertBefore);
AllocaInst(Type *Ty, unsigned AddrSpace, Value *ArraySize,
const Twine &Name, BasicBlock *InsertAtEnd);

AllocaInst(Type *Ty, unsigned AddrSpace, const Twine &Name,
Instruction *InsertBefore);
AllocaInst(Type *Ty, unsigned AddrSpace,
const Twine &Name, BasicBlock *InsertAtEnd);

AllocaInst(Type *Ty, unsigned AddrSpace, Value *ArraySize, Align Align,
const Twine &Name = "", Instruction *InsertBefore = nullptr);
AllocaInst(Type *Ty, unsigned AddrSpace, Value *ArraySize, Align Align,
const Twine &Name, BasicBlock *InsertAtEnd);
  • AddrSpace:可以通过Module获取

    1
    2
    Module *module;
    module->getDataLayout().getAllocaAddrSpace()
    • DataLayout就是

      1
      target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
    • DataLayout有AllocaAddrSpace成员,getAllocaAddrSpace方法可以获取这个成员

      1
      2
      3
      4
      class DataLayout {
      private:
      unsigned AllocaAddrSpace;
      unsigned getAllocaAddrSpace() const { return AllocaAddrSpace; }
  • Name:形如以下语句的返回值名称

    1
    %1 = alloca i32, align 4
  • BasicBlock:所属基本块

  • InsertBefore:新建的Instruction会插在InsertBefore之前

  • ArraySize:数组大小,可以通过新建一个ConstantInt实现

    1
    static ConstantInt *get(LLVMContext &Context, const APInt &V);

    APInt可以通过new新建

    1
    2
    3
    APInt(uint64_t *val, unsigned bits) : BitWidth(bits) {
    U.pVal = val;
    }

    🌰

    1
    Value* intValue = ConstantInt::get(context, APInt(32, 1));
  • Align:可以通过Align创建

    1
    2
    3
    4
    5
    6
    explicit Align(uint64_t Value) {
    assert(Value > 0 && "Value must not be 0");
    assert(llvm::isPowerOf2_64(Value) && "Alignment is not a power of 2");
    ShiftValue = Log2_64(Value);
    assert(ShiftValue < 64 && "Broken invariant");
    }

    🌰,Align也可以在创建AllocaInst之后设置

    1
    ptr3->setAlignment(Align(4));

🌰

1
2
AllocaInst *ptr3 = new AllocaInst(IntegerType::get(context, 32), module->getDataLayout().getAllocaAddrSpace(),"",entryBlock);
ptr3->setAlignment(Align(4));

store

store命令是StoreInst类型

比Instruction多了一个SSID

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
$4 = {
<llvm::Instruction> = {
<llvm::User> = {
<llvm::Value> = {
VTy = 0x466c00,
UseList = 0x0,
SubclassID = 59 ';',
HasValueHandle = 0 '\000',
SubclassOptionalData = 0 '\000',
SubclassData = 4,
NumUserOperands = 2,
IsUsedByMD = 0,
HasName = 0,
HasMetadata = 0,
HasHungOffUses = 0,
HasDescriptor = 0,
static MaxAlignmentExponent = 29,
static MaximumAlignment = 536870912
}, <No data fields>},
<llvm::ilist_node_with_parent<llvm::Instruction, llvm::BasicBlock>> = {
<llvm::ilist_node<llvm::Instruction>> = {
<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void> >> = {
<llvm::ilist_node_base<false>> = {
Prev = 0x470d48,
Next = 0x46abd8
}, <No data fields>}, <No data fields>}, <No data fields>},
members of llvm::Instruction:
Parent = 0x46abb0,
DbgLoc = {
Loc = {
Ref = {
MD = 0x0
}
}
},
Order = 0
},
members of llvm::StoreInst:
SSID = 1 '\001'
}

可以通过构造函数新建一个StoreInst

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public:
StoreInst(Value *Val, Value *Ptr, Instruction *InsertBefore);
StoreInst(Value *Val, Value *Ptr, BasicBlock *InsertAtEnd);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, Instruction *InsertBefore);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, BasicBlock *InsertAtEnd);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, Align Align,
Instruction *InsertBefore = nullptr);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, Align Align,
BasicBlock *InsertAtEnd);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, Align Align,
AtomicOrdering Order, SyncScope::ID SSID = SyncScope::System,
Instruction *InsertBefore = nullptr);
StoreInst(Value *Val, Value *Ptr, bool isVolatile, Align Align,
AtomicOrdering Order, SyncScope::ID SSID, BasicBlock *InsertAtEnd);
  • 将Val存入Ptr
  • isVolatile表示是否优化

🌰

1
2
StoreInst *st0 = new StoreInst(param1, ptr4, false, entryBlock);
st0->setAlignment(Align(4));

load

load是LoadInst类型,也比Instruction多一个SSID

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
$5 = {
<llvm::UnaryInstruction> = {
<llvm::Instruction> = {
<llvm::User> = {
<llvm::Value> = {
VTy = 0x466d80,
UseList = 0x0,
SubclassID = 58 ':',
HasValueHandle = 0 '\000',
SubclassOptionalData = 0 '\000',
SubclassData = 4,
NumUserOperands = 1,
IsUsedByMD = 0,
HasName = 0,
HasMetadata = 0,
HasHungOffUses = 0,
HasDescriptor = 0,
static MaxAlignmentExponent = 29,
static MaximumAlignment = 536870912
}, <No data fields>},
<llvm::ilist_node_with_parent<llvm::Instruction, llvm::BasicBlock>> = {
<llvm::ilist_node<llvm::Instruction>> = {
<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void> >> = {
<llvm::ilist_node_base<false>> = {
Prev = 0x4686c8,
Next = 0x46abd8
}, <No data fields>}, <No data fields>}, <No data fields>},
members of llvm::Instruction:
Parent = 0x46abb0,
DbgLoc = {
Loc = {
Ref = {
MD = 0x0
}
}
},
Order = 0
}, <No data fields>},
members of llvm::LoadInst:
SSID = 1 '\001'
}

构造函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public:
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr,
Instruction *InsertBefore);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, BasicBlock *InsertAtEnd);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
Instruction *InsertBefore);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
BasicBlock *InsertAtEnd);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
Align Align, Instruction *InsertBefore = nullptr);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
Align Align, BasicBlock *InsertAtEnd);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
Align Align, AtomicOrdering Order,
SyncScope::ID SSID = SyncScope::System,
Instruction *InsertBefore = nullptr);
LoadInst(Type *Ty, Value *Ptr, const Twine &NameStr, bool isVolatile,
Align Align, AtomicOrdering Order, SyncScope::ID SSID,
BasicBlock *InsertAtEnd);

🌰

1
2
LoadInst *ld0 = new LoadInst(IntegerType::get(context, 32), ptr4, "",false, entryBlock);
ld0->setAlignment(Align(4));

add

add的类型是BinaryOperator,没有自己的data域

1
2
3
4
5
6
static BinaryOperator *Create(BinaryOps Op, Value *S1, Value *S2,
const Twine &Name = Twine(),
Instruction *InsertBefore = nullptr);

static BinaryOperator *Create(BinaryOps Op, Value *S1, Value *S2,
const Twine &Name, BasicBlock *InsertAtEnd);

BinaryOps

1
2
3
4
5
6
  enum BinaryOps {
#define FIRST_BINARY_INST(N) BinaryOpsBegin = N,
#define HANDLE_BINARY_INST(N, OPC, CLASS) OPC = N,
#define LAST_BINARY_INST(N) BinaryOpsEnd = N+1
#include "llvm/IR/Instruction.def"
};

🌰

1
BinaryOperator *add1 = BinaryOperator::Create(Instruction::Add, ld0, ld1, "", entryBlock);

icmp

icmp的类型是ICmpInst,没有自己的data域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
public:
/// Constructor with insert-before-instruction semantics.
ICmpInst(
Instruction *InsertBefore, ///< Where to insert
Predicate pred, ///< The predicate to use for the comparison
Value *LHS, ///< The left-hand-side of the expression
Value *RHS, ///< The right-hand-side of the expression
const Twine &NameStr = "" ///< Name of the instruction
) : CmpInst(makeCmpResultType(LHS->getType()),
Instruction::ICmp, pred, LHS, RHS, NameStr,
InsertBefore) {
#ifndef NDEBUG
AssertOK();
#endif
}

/// Constructor with insert-at-end semantics.
ICmpInst(
BasicBlock &InsertAtEnd, ///< Block to insert into.
Predicate pred, ///< The predicate to use for the comparison
Value *LHS, ///< The left-hand-side of the expression
Value *RHS, ///< The right-hand-side of the expression
const Twine &NameStr = "" ///< Name of the instruction
) : CmpInst(makeCmpResultType(LHS->getType()),
Instruction::ICmp, pred, LHS, RHS, NameStr,
&InsertAtEnd) {
#ifndef NDEBUG
AssertOK();
#endif
}

/// Constructor with no-insertion semantics
ICmpInst(
Predicate pred, ///< The predicate to use for the comparison
Value *LHS, ///< The left-hand-side of the expression
Value *RHS, ///< The right-hand-side of the expression
const Twine &NameStr = "" ///< Name of the instruction
) : CmpInst(makeCmpResultType(LHS->getType()),
Instruction::ICmp, pred, LHS, RHS, NameStr) {
#ifndef NDEBUG
AssertOK();
#endif
}

Predicate是比较的类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
enum Predicate : unsigned {
// Opcode U L G E Intuitive operation
FCMP_FALSE = 0, ///< 0 0 0 0 Always false (always folded)
FCMP_OEQ = 1, ///< 0 0 0 1 True if ordered and equal
FCMP_OGT = 2, ///< 0 0 1 0 True if ordered and greater than
FCMP_OGE = 3, ///< 0 0 1 1 True if ordered and greater than or equal
FCMP_OLT = 4, ///< 0 1 0 0 True if ordered and less than
FCMP_OLE = 5, ///< 0 1 0 1 True if ordered and less than or equal
FCMP_ONE = 6, ///< 0 1 1 0 True if ordered and operands are unequal
FCMP_ORD = 7, ///< 0 1 1 1 True if ordered (no nans)
FCMP_UNO = 8, ///< 1 0 0 0 True if unordered: isnan(X) | isnan(Y)
FCMP_UEQ = 9, ///< 1 0 0 1 True if unordered or equal
FCMP_UGT = 10, ///< 1 0 1 0 True if unordered or greater than
FCMP_UGE = 11, ///< 1 0 1 1 True if unordered, greater than, or equal
FCMP_ULT = 12, ///< 1 1 0 0 True if unordered or less than
FCMP_ULE = 13, ///< 1 1 0 1 True if unordered, less than, or equal
FCMP_UNE = 14, ///< 1 1 1 0 True if unordered or not equal
FCMP_TRUE = 15, ///< 1 1 1 1 Always true (always folded)
FIRST_FCMP_PREDICATE = FCMP_FALSE,
LAST_FCMP_PREDICATE = FCMP_TRUE,
BAD_FCMP_PREDICATE = FCMP_TRUE + 1,
ICMP_EQ = 32, ///< equal
ICMP_NE = 33, ///< not equal
ICMP_UGT = 34, ///< unsigned greater than
ICMP_UGE = 35, ///< unsigned greater or equal
ICMP_ULT = 36, ///< unsigned less than
ICMP_ULE = 37, ///< unsigned less or equal
ICMP_SGT = 38, ///< signed greater than
ICMP_SGE = 39, ///< signed greater or equal
ICMP_SLT = 40, ///< signed less than
ICMP_SLE = 41, ///< signed less or equal
FIRST_ICMP_PREDICATE = ICMP_EQ,
LAST_ICMP_PREDICATE = ICMP_SLE,
BAD_ICMP_PREDICATE = ICMP_SLE + 1
};

🌰

1
ICmpInst *icmp = new ICmpInst(*entryBlock, ICmpInst::ICMP_SGT, add1, ConstantInt::get(context, APInt(32, 100));

需要常数可以创建ConstantInt

br

BranchInst,无data域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
static BranchInst *Create(BasicBlock *IfTrue,
Instruction *InsertBefore = nullptr) {
return new(1) BranchInst(IfTrue, InsertBefore);
}

static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *IfFalse,
Value *Cond, Instruction *InsertBefore = nullptr) {
return new(3) BranchInst(IfTrue, IfFalse, Cond, InsertBefore);
}

static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *InsertAtEnd) {
return new(1) BranchInst(IfTrue, InsertAtEnd);
}

static BranchInst *Create(BasicBlock *IfTrue, BasicBlock *IfFalse,
Value *Cond, BasicBlock *InsertAtEnd) {
return new(3) BranchInst(IfTrue, IfFalse, Cond, InsertAtEnd);
}

🌰

1
BranchInst::Create(block10,block19,icmp,entryBlock);

ret

ReturnInst,无data域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
public:
static ReturnInst* Create(LLVMContext &C, Value *retVal = nullptr,
Instruction *InsertBefore = nullptr) {
return new(!!retVal) ReturnInst(C, retVal, InsertBefore);
}

static ReturnInst* Create(LLVMContext &C, Value *retVal,
BasicBlock *InsertAtEnd) {
return new(!!retVal) ReturnInst(C, retVal, InsertAtEnd);
}

static ReturnInst* Create(LLVMContext &C, BasicBlock *InsertAtEnd) {
return new(0) ReturnInst(C, InsertAtEnd);
}

🌰

1
ReturnInst::Create(context, ld20, block15);

call

CallInst,CallBase有自己的data域

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
$3 = {
<llvm::CallBase> = {
<llvm::Instruction> = {
<llvm::User> = {
<llvm::Value> = {
VTy = 0x466d80,
UseList = 0x0,
SubclassID = 82 'R',
HasValueHandle = 0 '\000',
SubclassOptionalData = 0 '\000',
SubclassData = 0,
NumUserOperands = 3,
IsUsedByMD = 0,
HasName = 0,
HasMetadata = 0,
HasHungOffUses = 0,
HasDescriptor = 0,
static MaxAlignmentExponent = 29,
static MaximumAlignment = 536870912
}, <No data fields>},
<llvm::ilist_node_with_parent<llvm::Instruction, llvm::BasicBlock>> = {
<llvm::ilist_node<llvm::Instruction>> = {
<llvm::ilist_node_impl<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void> >> = {
<llvm::ilist_node_base<false>> = {
Prev = 0x0,
Next = 0x0
}, <No data fields>}, <No data fields>}, <No data fields>},
members of llvm::Instruction:
Parent = 0x0,
DbgLoc = {
Loc = {
Ref = {
MD = 0x0
}
}
},
Order = 0
},
members of llvm::CallBase:
static CalledOperandOpEndIdx = -1,
Attrs = {
pImpl = 0x0
},
FTy = 0x46aed0
}, <No data fields>}

好多重载(

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
public:
static CallInst *Create(FunctionType *Ty, Value *F, const Twine &NameStr = "",
Instruction *InsertBefore = nullptr) {
return new (ComputeNumOperands(0)) CallInst(Ty, F, NameStr, InsertBefore);
}

static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
const Twine &NameStr,
Instruction *InsertBefore = nullptr) {
return new (ComputeNumOperands(Args.size()))
CallInst(Ty, Func, Args, None, NameStr, InsertBefore);
}

static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
ArrayRef<OperandBundleDef> Bundles = None,
const Twine &NameStr = "",
Instruction *InsertBefore = nullptr) {
const int NumOperands =
ComputeNumOperands(Args.size(), CountBundleInputs(Bundles));
const unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);

return new (NumOperands, DescriptorBytes)
CallInst(Ty, Func, Args, Bundles, NameStr, InsertBefore);
}

static CallInst *Create(FunctionType *Ty, Value *F, const Twine &NameStr,
BasicBlock *InsertAtEnd) {
return new (ComputeNumOperands(0)) CallInst(Ty, F, NameStr, InsertAtEnd);
}

static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
const Twine &NameStr, BasicBlock *InsertAtEnd) {
return new (ComputeNumOperands(Args.size()))
CallInst(Ty, Func, Args, None, NameStr, InsertAtEnd);
}

static CallInst *Create(FunctionType *Ty, Value *Func, ArrayRef<Value *> Args,
ArrayRef<OperandBundleDef> Bundles,
const Twine &NameStr, BasicBlock *InsertAtEnd) {
const int NumOperands =
ComputeNumOperands(Args.size(), CountBundleInputs(Bundles));
const unsigned DescriptorBytes = Bundles.size() * sizeof(BundleOpInfo);

return new (NumOperands, DescriptorBytes)
CallInst(Ty, Func, Args, Bundles, NameStr, InsertAtEnd);
}

static CallInst *Create(FunctionCallee Func, const Twine &NameStr = "",
Instruction *InsertBefore = nullptr) {
return Create(Func.getFunctionType(), Func.getCallee(), NameStr,
InsertBefore);
}

static CallInst *Create(FunctionCallee Func, ArrayRef<Value *> Args,
ArrayRef<OperandBundleDef> Bundles = None,
const Twine &NameStr = "",
Instruction *InsertBefore = nullptr) {
return Create(Func.getFunctionType(), Func.getCallee(), Args, Bundles,
NameStr, InsertBefore);
}

static CallInst *Create(FunctionCallee Func, ArrayRef<Value *> Args,
const Twine &NameStr,
Instruction *InsertBefore = nullptr) {
return Create(Func.getFunctionType(), Func.getCallee(), Args, NameStr,
InsertBefore);
}

static CallInst *Create(FunctionCallee Func, const Twine &NameStr,
BasicBlock *InsertAtEnd) {
return Create(Func.getFunctionType(), Func.getCallee(), NameStr,
InsertAtEnd);
}

static CallInst *Create(FunctionCallee Func, ArrayRef<Value *> Args,
const Twine &NameStr, BasicBlock *InsertAtEnd) {
return Create(Func.getFunctionType(), Func.getCallee(), Args, NameStr,
InsertAtEnd);
}

static CallInst *Create(FunctionCallee Func, ArrayRef<Value *> Args,
ArrayRef<OperandBundleDef> Bundles,
const Twine &NameStr, BasicBlock *InsertAtEnd) {
return Create(Func.getFunctionType(), Func.getCallee(), Args, Bundles,
NameStr, InsertAtEnd);
}

static CallInst *Create(CallInst *CI, ArrayRef<OperandBundleDef> Bundles,
Instruction *InsertPt = nullptr);

static CallInst *CreateWithReplacedBundle(CallInst *CI,
OperandBundleDef Bundle,
Instruction *InsertPt = nullptr);

🌰

1
2
3
Function* myAddFunc = module->getFunction("myadd");
Value *arg[] = {old_ope->getOperand(0), old_ope->getOperand(1)};
CallInst *myaddCall = CallInst::Create(myAddFunc, arg, "");

LLVM IR数据结构分析
http://akaieurus.github.io/2023/10/02/LLVM-IR数据结构分析/
作者
Eurus
发布于
2023年10月2日
许可协议