Trace 文件怎么获取?
traces.txt 只保留最后一次 ANR 的信息, Android 系统有个 DropBox 功能功能, 它能记录系统出现的 crash 错误. 因此保留有发生过的 ANR 的信息.(log 路径:/data/system/dropbox)
获取系统 crash log: adb shell dumpsys dropbox --print>>log.txt
系统生成的 Trace 文件保存在 data/anr, 可以用过命令 adb pull data/anr / 取出
Trace 文件怎么生成的?
当 APP(包括系统 APP 和用户 APP)进程出现 ANR, 应用响应慢或 WatchDog 的监视没有得到回馈时, 系统会 dump 此时的 top 进程, 进程中 Thread 的运行状态就都 dump 到这个 Trace 文件中了.
导致 ANR 的常见几种情况:
KeyDispatchTimeout(5s): 按键或触摸事件在特定时间内无法处理完成
BroadcastTimeout(10s): 广播在特定时间内无法处理完成
ServiceTimeout(20s): Service 在特定的时间无法处理完成 另外还有 ProviderTimeout 和 WatchDog 等导致的 ANR
应用 ANR 产生的时候, ActivityManagerService 的 appNotResponding 方法就会被调用, 然后在 / data/anr/traces.txt 文件中写入 ANR 相关信息.
- final void appNotResponding(ProcessRecord app, ActivityRecord activity,
- ActivityRecord parent, boolean aboveSystem, final String annotation) {
- ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
- SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
- if (mController != null) {
- try {
- // 0 == continue, -1 = kill process immediately
- int res = mController.appEarlyNotResponding(app.processName, app.pid, annotation);
- if (res <0 && app.pid != MY_PID) {
- app.kill("anr", true);
- }
- } catch (RemoteException e) {
- mController = null;
- Watchdog.getInstance().setActivityController(null);
- }
- }
- long anrTime = SystemClock.uptimeMillis();
- if (MONITOR_CPU_USAGE) {
- updateCpuStatsNow(); // 更新 CPU 使用率
- }
- synchronized (this) {
- // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
- if (mShuttingDown) {
- Slog.i(TAG, "During shutdown skipping ANR:" + app + " " + annotation);
- return;
- } else if (app.notResponding) {
- Slog.i(TAG, "Skipping duplicate ANR:" + app + " " + annotation);
- return;
- } else if (app.crashing) {
- Slog.i(TAG, "Crashing app skipping ANR:" + app + " " + annotation);
- return;
- }
- // In case we come through here for the same app before completing
- // this one, mark as anring now so we will bail out.
- app.notResponding = true;
- // Log the ANR to the event log.
- EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
- app.processName, app.info.flags, annotation);
- // Dump thread traces as quickly as we can, starting with "interesting" processes.
- firstPids.add(app.pid);
- int parentPid = app.pid;
- if (parent != null && parent.app != null && parent.app.pid> 0) parentPid = parent.app.pid;
- if (parentPid != app.pid) firstPids.add(parentPid);
- if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
- for (int i = mLruProcesses.size() - 1; i>= 0; i--) {
- ProcessRecord r = mLruProcesses.get(i);
- if (r != null && r.thread != null) {
- int pid = r.pid;
- if (pid> 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
- if (r.persistent) {
- firstPids.add(pid);
- } else {
- lastPids.put(pid, Boolean.TRUE);
- }
- }
- }
- }
- }
- // Log the ANR to the main log.
- StringBuilder info = new StringBuilder();
- info.setLength(0);
- info.append("ANR in").append(app.processName);
- if (activity != null && activity.shortComponentName != null) {
- info.append("(").append(activity.shortComponentName).append(")");
- }
- info.append("\n");
- info.append("PID:").append(app.pid).append("\n");
- if (annotation != null) {
- info.append("Reason:").append(annotation).append("\n");
- }
- if (parent != null && parent != activity) {
- info.append("Parent:").append(parent.shortComponentName).append("\n");
- }
- final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
- // dumpStackTraces 是输出 traces 文件的函数
- File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
- NATIVE_STACKS_OF_INTEREST);
- String cpuInfo = null;
- if (MONITOR_CPU_USAGE) {
- updateCpuStatsNow(); // 再次更新 CPU 信息
- synchronized (mProcessCpuTracker) {
- // 输出 ANR 发生前一段时间内的 CPU 使用率
- cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
- }
- info.append(processCpuTracker.printCurrentLoad());
- info.append(cpuInfo);
- }
- // 输出 ANR 发生后一段时间内的 CPU 使用率
- info.append(processCpuTracker.printCurrentState(anrTime));
- Slog.e(TAG, info.toString());
- if (tracesFile == null) {
- // There is no trace file, so dump (only) the alleged culprit's threads to the log
- Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
- }
- // 将 ANR 信息同时输出到 DropBox 中
- addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
- cpuInfo, tracesFile, null);
- if (mController != null) {
- try {
- // 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
- int res = mController.appNotResponding(app.processName, app.pid, info.toString());
- if (res != 0) {
- if (res <0 && app.pid != MY_PID) {
- app.kill("anr", true);
- } else {
- synchronized (this) {
- mServices.scheduleServiceTimeoutLocked(app);
- }
- }
- return;
- }
- } catch (RemoteException e) {
- mController = null;
- Watchdog.getInstance().setActivityController(null);
- }
- }
- // Unless configured otherwise, swallow ANRs in background processes & kill the process.
- boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
- Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
- synchronized (this) {
- mBatteryStatsService.noteProcessAnr(app.processName, app.uid);
- if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
- app.kill("bg anr", true);
- return;
- }
- // Set the app's notResponding state, and look up the errorReportReceiver
- makeAppNotRespondingLocked(app,
- activity != null ? activity.shortComponentName : null,
- annotation != null ? "ANR" + annotation : "ANR",
- info.toString());
- //Set the trace file name to app name + current date format to avoid overrinding trace file
- String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
- if (tracesPath != null && tracesPath.length() != 0) {
- File traceRenameFile = new File(tracesPath);
- String newTracesPath;
- int lpos = tracesPath.lastIndexOf (".");
- if (-1 != lpos)
- newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);
- else
- newTracesPath = tracesPath + "_" + app.processName;
- traceRenameFile.renameTo(new File(newTracesPath));
- }
- // 显示 ANR 提示对话框
- // Bring up the infamous App Not Responding dialog
- Message msg = Message.obtain();
- HashMap<String, Object> map = new HashMap<String, Object>();
- msg.what = SHOW_NOT_RESPONDING_MSG;
- msg.obj = map;
- msg.arg1 = aboveSystem ? 1 : 0;
- map.put("app", app);
- if (activity != null) {
- map.put("activity", activity);
- }
- mUiHandler.sendMessage(msg);
- }
- }
避免 ANR?
UI 线程尽量只做跟 UI 相关的工作
耗时的工作 (比如数据库操作, I/O, 连接网络或者别的有可能阻碍 UI 线程的操作) 把它放入单独的线程处理
尽量用 Handler 来处理 UIthread 和别的 thread 之间的交互
分析 ANR 的 Log:
出现 ANR 的 log 如下:
- 06-22 10:37:46.204 3547 3604 E ActivityManager: ANR in org.code:MessengerService // ANR 出现的进程包名
- E ActivityManager: PID: 17027 // ANR 进程 ID
- E ActivityManager: Reason: executing service org.code/.ipc.MessengerService // 导致 ANR 的原因
- E ActivityManager: Load: 8.31 / 8.12 / 8.47
- E ActivityManager: CPU usage from 0ms to 6462ms later: //CPU 在 ANR 发生后的使用情况
- E ActivityManager: 61% 3547/system_server: 21% user + 39% kernel / faults: 13302 minor 2 major
- E ActivityManager: 0.2% 475/debuggerd: 0% user + 0.1% kernel / faults: 6086 minor 1 major
- E ActivityManager: 10% 5742/com.android.phone: 5.1% user + 5.1% kernel / faults: 7597 minor
- E ActivityManager: 6.9% 5342/com.android.systemui: 2.1% user + 4.8% kernel / faults: 4158 minor
- E ActivityManager: 0.1% 477/debuggerd64: 0% user + 0.1% kernel / faults: 4013 minor
- E ActivityManager: 5.7% 16313/org.code: 3.2% user + 2.4% kernel / faults: 2412 minor
- E ActivityManager: 3.7% 17027/org.code:MessengerService: 1.7% user + 2% kernel / faults: 2571 minor 6 major
- E ActivityManager: 2.6% 306/cfinteractive: 0% user + 2.6% kernel
- ... ...
- E ActivityManager: +0% 17168/kworker/0:1: 0% user + 0% kernel
- E ActivityManager: 0% TOTAL: 0% user + 0% kernel + 0% softirq // CUP 占用情况
- E ActivityManager: CPU usage from 5628ms to 6183ms later:
- E ActivityManager: 42% 3547/system_server: 17% user + 24% kernel / faults: 11 minor
- E ActivityManager: 12% 3604/ActivityManager: 1.7% user + 10% kernel
- E ActivityManager: 12% 3609/android.display: 8.7% user + 3.5% kernel
- E ActivityManager: 3.5% 5304/Binder_6: 1.7% user + 1.7% kernel
- E ActivityManager: 3.5% 5721/Binder_A: 1.7% user + 1.7% kernel
- E ActivityManager: 3.5% 5746/Binder_C: 3.5% user + 0% kernel
- E ActivityManager: 1.7% 3599/Binder_1: 0% user + 1.7% kernel
- E ActivityManager: 1.7% 3600/Binder_2: 0% user + 1.7% kernel
- I ActivityManager: Killing 17027:org.code:MessengerService/u0a85 (adj 1): bg anr
- I art : Wrote stack traces to '/data/anr/traces.txt' //art 这个 TAG 打印对 traces 操作的信息
- D GraphicsStats: Buffer count: 9
- W ActivityManager: Scheduling restart of crashed service org.code/.ipc.MessengerService in 1000ms
log 打印了 ANR 的基本信息, 我们可以分析 CPU 使用率得知 ANR 的简单情况; 如果 CPU 使用率很高, 接近 100%, 可能是在进行大规模的计算更可能是陷入死循环; 如果 CUP 使用率很低, 说明主线程被阻塞了, 并且当 IOwait 很高, 可能是主线程在等待 I/O 操作的完成.
对于 ANR 只是分析 Log 很难知道问题所在, 我们还需要通过 Trace 文件分析 stack 调用情况.
- ----- pid 17027 at 2017-06-22 10:37:39 ----- // ANR 出现的进程 pid 和时间(和上述 log 中 pid 一致)
- Cmd line: org.code:MessengerService // ANR 出现的进程名
- Build fingerprint: 'Homecare/qucii8976v3_64:6.0.1/pansen06141150:eng/test-keys' // 下面记录系统版本, 内存等状态信息
- ABI: 'arm64'
- Build type: optimized
- Zygote loaded classes=6576 post zygote classes=13
- Intern table: 13780 strong; 17 weak
- JNI: CheckJNI is on; globals=283 (plus 158 weak)
- Libraries: /system/lib64/libandroid.so /system/lib64/libcompiler_rt.so /system/lib64/libjavacrypto.so /system/lib64/libjnigraphics.so /system/lib64/libmedia_jni.so /system/lib64/libwebviewchromium_loader.so libjavacore.so (7)
- Heap: 29% free, 8MB/12MB; 75731 objects
- Dumping cumulative Gc timings
- Total number of allocations 75731
- Total bytes allocated 8MB
- Total bytes freed 0B
- Free memory 3MB
- Free memory until GC 3MB
- Free memory until OOME 183MB
- Total memory 12MB
- Max memory 192MB
- Zygote space size 3MB
- Total mutator paused time: 0
- Total time waiting for GC to complete: 0
- Total GC count: 0
- Total GC time: 0
- Total blocking GC count: 0
- Total blocking GC time: 0
- suspend all histogram: Sum: 76us 99% C.I. 0.100us-28us Avg: 7.600us Max: 28us
- DALVIK THREADS (15):
- // Signal Catcher 是记录 traces 信息的线程
- // Signal Catche(线程名),(daemon)表示守护进程, prio(线程优先级, 默认是 5),tid(线程唯一标识 ID),Runnable(线程当前状态)
- "Signal Catcher" daemon prio=5 tid=3 Runnable
- // 线程组名称, suspendCount,debugSuspendCount, 线程的 Java 对象地址, 线程的 Native 对象地址
- | group="system" sCount=0 dsCount=0 obj=0x12d8f0a0 self=0x5598ae55d0
- //sysTid 是线程号(主线程的线程号和进程号相同)
- | sysTid=17033 nice=0 cgrp=default sched=0/0 handle=0x7fb2350450
- | state=R schedstat=( 4348125 172343 3 ) utm=0 stm=0 core=1 HZ=100
- | stack=0x7fb2254000-0x7fb2256000 stackSize=1013KB
- | held mutexes= "mutator lock"(shared held)
- native: #00 pc 0000000000489e28 /system/lib64/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, char const*, art::ArtMethod*, void*)+236)
- native: #01 pc 0000000000458fe8 /system/lib64/libart.so (art::Thread::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&) const+220)
- native: #02 pc 0000000000465bc8 /system/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+688)
- native: #03 pc 0000000000466ae0 /system/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+276)
- native: #04 pc 000000000046719c /system/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+188)
- native: #05 pc 0000000000467a84 /system/lib64/libart.so (art::ThreadList::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+492)
- native: #06 pc 0000000000431194 /system/lib64/libart.so (art::Runtime::DumpForSigQuit(std::__1::basic_ostream<char, std::__1::char_traits<char>>&)+96)
- native: #07 pc 000000000043e604 /system/lib64/libart.so (art::SignalCatcher::HandleSigQuit()+1256)
- native: #08 pc 000000000043f214 /system/lib64/libart.so (art::SignalCatcher::Run(void*)+452)
- native: #09 pc 0000000000068714 /system/lib64/libc.so (__pthread_start(void*)+52)
- native: #10 pc 000000000001c604 /system/lib64/libc.so (__start_thread+16)
- (no managed stack frames)
- //main(线程名),prio(线程优先级, 默认是 5),tid(线程唯一标识 ID),Sleeping(线程当前状态)
- "main" prio=5 tid=1 Sleeping
- | group="main" sCount=1 dsCount=0 obj=0x73132d10 self=0x5598a5f5e0
- //sysTid 是线程号(主线程的线程号和进程号相同)
- | sysTid=17027 nice=0 cgrp=default sched=0/0 handle=0x7fb6db6fe8
- | state=S schedstat=( 420582038 5862546 143 ) utm=24 stm=18 core=6 HZ=100
- | stack=0x7fefba3000-0x7fefba5000 stackSize=8MB
- | held mutexes=
- // java 堆栈调用信息(这里可查看导致 ANR 的代码调用流程)(分析 ANR 最重要的信息)
- at java.lang.Thread.sleep!(Native method)
- - sleeping on <0x0c60f3c7> (a java.lang.Object)
- at java.lang.Thread.sleep(Thread.java:1031)
- - locked <0x0c60f3c7> (a java.lang.Object) // 锁住对象 0x0c60f3c7
- at java.lang.Thread.sleep(Thread.java:985)
- at android.os.SystemClock.sleep(SystemClock.java:120)
- at org.code.ipc.MessengerService.onCreate(MessengerService.java:63) // 导致 ANR 的代码
- at android.app.ActivityThread.handleCreateService(ActivityThread.java:2877)
- at android.app.ActivityThread.access$1900(ActivityThread.java:150)
- at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1427)
- at android.os.Handler.dispatchMessage(Handler.java:102)
- at android.os.Looper.loop(Looper.java:148)
- at android.app.ActivityThread.main(ActivityThread.java:5417)
- at java.lang.reflect.Method.invoke!(Native method)
- at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
- at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
在 log 中显示的 pid 在 traces 文件中与之对应, 然后通过查看堆栈调用信息分析 ANR 的代码.
上述 ANR 实际上在 org.code.ipc.MessengerService.onCreate 中 63 行调用 SystemClock.sleep(10000)代码导致的; 这是发生在 Java 层的, 如果在 native 层, 那么堆栈中就不会有相关调用的路径, 这种情况只能在 native 层添加更多的 Log, 一步步来查找了
线程状态的分类: java 中定义的线程状态:
- // libcore/libart/src/main/java/java/lang/Thread.java
- /**
- * A representation of a thread's state. A given thread may only be in one
- * state at a time.
- */
- public enum State {
- /**
- * The thread has been created, but has never been started.
- */
- NEW,
- /**
- * The thread may be run.
- */
- RUNNABLE,
- /**
- * The thread is blocked and waiting for a lock.
- */
- BLOCKED,
- /**
- * The thread is waiting.
- */
- WAITING,
- /**
- * The thread is waiting for a specified amount of time.
- */
- TIMED_WAITING,
- /**
- * The thread has been terminated.
- */
- TERMINATED
- }
C 代码中定义的线程状态:
- ///art/runtime/thread_state.h
- enum ThreadState {
- // Thread.State JDWP state
- kTerminated = 66, // TERMINATED TS_ZOMBIE Thread.run has returned, but Thread* still around
- kRunnable, // RUNNABLE TS_RUNNING runnable
- kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout
- kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep()
- kBlocked, // BLOCKED TS_MONITOR blocked on a monitor
- kWaiting, // WAITING TS_WAIT in Object.wait()
- kWaitingForGcToComplete, // WAITING TS_WAIT blocked waiting for GC
- kWaitingForCheckPointsToRun, // WAITING TS_WAIT GC waiting for checkpoints to run
- kWaitingPerformingGc, // WAITING TS_WAIT performing GC
- kWaitingForDebuggerSend, // WAITING TS_WAIT blocked waiting for events to be sent
- kWaitingForDebuggerToAttach, // WAITING TS_WAIT blocked waiting for debugger to attach
- kWaitingInMainDebuggerLoop, // WAITING TS_WAIT blocking/reading/processing debugger events
- kWaitingForDebuggerSuspension, // WAITING TS_WAIT waiting for debugger suspend all
- kWaitingForJniOnLoad, // WAITING TS_WAIT waiting for execution of dlopen and JNI on load code
- kWaitingForSignalCatcherOutput, // WAITING TS_WAIT waiting for signal catcher IO to complete
- kWaitingInMainSignalCatcherLoop, // WAITING TS_WAIT blocking/reading/processing signals
- kWaitingForDeoptimization, // WAITING TS_WAIT waiting for deoptimization suspend all
- kWaitingForMethodTracingStart, // WAITING TS_WAIT waiting for method tracing to start
- kWaitingForVisitObjects, // WAITING TS_WAIT waiting for visiting objects
- kWaitingForGetObjectsAllocated, // WAITING TS_WAIT waiting for getting the number of allocated objects
- kStarting, // NEW TS_WAIT native thread started, not yet ready to run managed code
- kNative, // RUNNABLE TS_RUNNING running in a JNI native method
- kSuspended, // RUNNABLE TS_RUNNING suspended by GC or debugger
- };
两者可以看出在 C 代码中定义更为详细, Traces 中显示的线程状态都是 C 代码定义的. 我们可以通过查看线程状态对应的信息分析 ANR 问题.
如: TimedWaiting 对应的线程状态是 TIMED_WAITING
- kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout 执行了无超时参数的 wait 函数
- kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep() 执行了带有超时参数的 sleep 函数
ZOMBIE 线程死亡, 终止运行
RUNNING/RUNNABLE 线程可运行或正在运行
TIMED_WAIT 执行了带有超时参数的 wait,sleep 或 join 函数
MONITOR 线程阻塞, 等待获取对象锁
WAIT 执行了无超时参数的 wait 函数
INITIALIZING 新建, 正在初始化, 为其分配资源
STARTING 新建, 正在启动
NATIVE 正在执行 JNI 本地函数
VMWAIT 正在等待 VM 资源
SUSPENDED 线程暂停, 通常是由于 GC 或 debug 被暂停
来源: http://www.bubuko.com/infodetail-2754830.html