正则表达式 – 如何使用R解析sysmon文件以提取某些信息？

2020-07-12 原文

我正在尝试使用R来解析这些类型的文件,以解析信息并将数据放入数据帧中,如格式：

这是文件的内容：

last_run                        current_run                     seconds     
 ------------------------------- ------------------------------- ----------- 
             Jul  4 2016  7:17AM             Jul  4 2016  7:21AM         226 


Engine utilization (Tick %)   User Busy   System Busy    I/O Busy        Idle
  -------------------------  ------------  ------------  ----------  ---------- 
  ThreadPool : syb_default_pool                                                 
   Engine 0                         5.0 %         0.4 %      22.4 %      72.1 % 
   Engine 1                         3.9 %         0.5 %      22.8 %      72.8 % 
   Engine 2                         5.6 %         0.3 %      22.5 %      71.6 % 
   Engine 3                         5.1 %         0.4 %      22.7 %      71.8 % 

     -------------------------  ------------  ------------  ----------  ---------- 
  Pool Summary        Total       336.1 %        25.6 %    1834.6 %    5803.8 % 
                    Average         4.2 %         0.3 %      22.9 %      72.5 % 

  -------------------------  ------------  ------------  ----------  ---------- 
  Server Summary      Total       336.1 %        25.6 %    1834.6 %    5803.8 % 
                    Average         4.2 %         0.3 %      22.9 %      72.5 % 

Transaction Profile
-------------------

  Transaction Summary             per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
    Committed Xacts                 137.3           n/a       41198     n/a     

     Average Runnable Tasks            1 min         5 min      15 min  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
  ThreadPool : syb_default_pool                                                 
   Global Queue                       0.0           0.0         0.0       0.0 %
   Engine 0                           0.0           0.1         0.1       0.6 %
   Engine 1                           0.0           0.0         0.0       0.0 %
   Engine 2                           0.2           0.1         0.1       2.6 %

  -------------------------  ------------  ------------  ----------             
  Pool Summary        Total           7.2           5.9         6.1             
                    Average           0.1           0.1         0.1             

  -------------------------  ------------  ------------  ----------             
  Server Summary      Total           7.2           5.9         6.1             
                    Average           0.1           0.1         0.1 

Device Activity Detail
  ----------------------

  Device:                                                                       
    /dev/vx/rdsk/sybaserdatadg/datadev_125                                         
    datadev_125                   per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                          0.0           0.0           0       n/a   
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                          0.0           0.0           0       0.0 %


  ----------------------------------------------------------------------------- 

  Device:                                                                       
    /dev/vx/rdsk/sybaserdatadg/datadev_126                                         
    datadev_126                   per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                          0.0           0.0           0       n/a   
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                          0.0           0.0           0       0.0 %


  ----------------------------------------------------------------------------- 

  Device:                                                                       
    /dev/vx/rdsk/sybaserdatadg/datadev_127                                         
    datadev_127                   per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
    Reads                                                                       
      APF                             0.0           0.0           5       0.4 %
      Non-APF                         0.0           0.0           1       0.1 %
    Writes                            3.8           0.0        1128      99.5 %
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                          3.8           0.0        1134       0.1 %

  Mirror Semaphore Granted            3.8           0.0        1134     100.0 %
  Mirror Semaphore Waited             0.0           0.0           0       0.0 %

  ----------------------------------------------------------------------------- 

  Device:                                                                       
    /sybaser/database/sybaseR/dev/sybaseR.datadev_000                                    
    GPS_datadev_000               per sec      per xact       count  % of total
  -------------------------  ------------  ------------  ----------  ---------- 
    Reads                                                                       
      APF                             7.9           0.0        2372      55.9 %
      Non-APF                         5.5           0.0        1635      38.6 %
    Writes                            0.8           0.0         233       5.5 %
  -------------------------  ------------  ------------  ----------  ---------- 
  Total I/Os                         14.1           0.0        4240       0.3 %

  Mirror Semaphore Granted           14.1           0.0        4239     100.0 %
  Mirror Semaphore Waited             0.0           0.0           2       0.0 %

我需要捕获“2016年7月4日上午7:21”作为日期,
来自“引擎利用率(勾选％)行,服务器摘要 – >平均值”4.2％“

从“交易配置文件”部分 – >交易配置文件“计数”条目.

所以,我的数据框应该是这样的：

Date                     cpu   Count
Jul  4 2016  7:21AM      4.2   41198

有人可以帮我解析如何解析这个文件以获得这些输出吗？

我尝试过这样的事情：

read.table(text=readLines("file.txt")[count.fields("file.txt",blank.lines.skip=FALSE) == 9])

获得这一行：

Average         4.2 %         0.3 %      22.9 %      72.5 %

但我希望能够仅在之后提取平均值

引擎利用率(Tick％),因为可能有许多以Average开头的行.在引擎利用率(Tick％)之后立即显示的平均线是我想要的.

如何将其放在此行中以从此文件中提取此信息：

read.table(text=readLines("file.txt")[count.fields("file.txt",blank.lines.skip=FALSE) == 9])

我可以在read.table行中使用grep来搜索某些字符吗？

%%%% Shot 1 – 得到了一些工作

extract <- function(filenam="file.txt"){
    txt <- readLines(filenam)

    ## date of current run:
    ## assumed to be on 2nd line following the first line matching "current_run"
    ii <- 2 + grep("current_run",txt,fixed=TRUE)[1]
    line_current_run <- Filter(function(v) v!="",strsplit(txt[ii]," ")[[1]])
    date_current_run <- paste(line_current_run[5:8],collapse=" ")


    ## cpu:
    ## assumed to be on line following the first line matching "Server Summary"
    ## which comes after the first line matching "Engine utilization ..."
    jj <- grep("Engine utilization (Tick %)",fixed=TRUE)[1]
    ii <- grep("Server Summary",fixed=TRUE)
    ii <- 1 + min(ii[ii>jj])
    line_cpu <- Filter(function(v) v!=""," ")[[1]])
    cpu <- line_cpu[2]


    ## Count:
    ## assumed to be on 2nd line following the first line matching "Transaction Summary"
    ii <- 2 + grep("Transaction Summary",fixed=TRUE)[1]
    line_count <- Filter(function(v) v!=""," ")[[1]])
    count <- line_count[5]

    data.frame(Date=date_current_run,cpu=cpu,Count=count,stringsAsFactors=FALSE)
}

print(extract("file.txt"))

##file.list <- dir("./")
file.list <- rep("file.txt",3)
merged <- do.call("rbind",lapply(file.list,extract))

print(merged)

file.list <- rep("file.txt",2000)
print(system.time(merged <- do.call("rbind",extract))))
## runs in about 2.5 secs on my laptop

%%% Shot 2：第一次尝试提取(可能可变)数量的设备列

extractv2 <- function(filenam="file2.txt"){
    txt <- readLines(filenam)

    ## date of current run:
    ## assumed to be on 2nd line following the first line matching "current_run"
    ii <- 2 + grep("current_run"," ")[[1]])
    count <- line_count[5]


    ## Total I/Os
    ## 1. Each line "Device:" is assumed to be the header of a block of lines
    ##    containing info about a single device (there are 4 such blocks
    ##    in your example);
    ## 2. each block is assumed to contain one or more lines matching
    ##    "Total I/Os";
    ## 3. the relevant count data is assumed to be contained in the last
    ##    of such lines (at column 4),for each block.
    ## Approach: loop on the line numbers of those lines matching "Device:"
    ## to get: A. counts; B. device names
    ii_block_dev <- grep("Device:",fixed=TRUE)
    ii_lines_IOs <- grep("Total I/Os",fixed=TRUE)
    nblocks <- length(ii_block_dev)
    ## A. get counts for each device
    ## for each block,select *last* line matching "Total I/Os"
    ii_block_dev_aux <- c(ii_block_dev,Inf) ## just a hack to get a clean code
    ii_lines_IOs_dev <- sapply(1:nblocks,function(block){
        ## select matching liens to "Total I/Os" within each block
        IOs_per_block <- ii_lines_IOs[ ii_lines_IOs > ii_block_dev_aux[block  ] &
                                       ii_lines_IOs < ii_block_dev_aux[block+1]
                                   ]
        tail(IOs_per_block,1) ## get the last line of each block (if more than one match)
    })
    lines_IOs <- lapply(txt[ii_lines_IOs_dev],function(strng){
        Filter(function(v) v!="",strsplit(strng," ")[[1]])
    })
    IOs_counts <- sapply(lines_IOs,function(v) v[5])
    ## B. get device names:
    ## assumed to be on lines following each "Device:" match
    ii_devices <- 1 + ii_block_dev
    device_names <- sapply(ii_devices,function(ii){
        Filter(function(v) v!=""," ")[[1]])
    })
    ## Create a data.frame with "device_names" as column names and "IOs_counts" as
    ## the values of a single row.
    ## Sorting the device names by order() will help produce the same column names
    ## if different sysmon files list the devices in different order
    ord <- order(device_names)
    devices <- as.data.frame(structure(as.list(IOs_counts[ord]),names=device_names[ord]),check.names=FALSE) ## Prevent R from messing with our device names

    data.frame(stringsAsFactors=FALSE,check.names=FALSE,Date=date_current_run,devices)
}
print(extractv2("file2.txt"))


## WATCH OUT:
## merging will ONLY work if all devices have the same names across sysmon files!!
file.list <- rep("file2.txt",extractv2))
print(merged)

%%%%%%% Shot 3：提取两个表,一个包含一行,另一个包含可变行数(取决于每个sysmon文件中列出的设备).

extractv3 <- function(filenam="file2.txt"){
    txt <- readLines(filenam)

    ## date of current run:
    ## assumed to be on 2nd line following the first line matching "current_run"
    ii <- 2 + grep("current_run"," ")[[1]])
    count <- line_count[5]

    ## first part of output: fixed three-column structure
    fixed <-  data.frame(stringsAsFactors=FALSE,Count=count)

    ## Total I/Os
    ## 1. Each line "Device:" is assumed to be the header of a block of lines
    ##    containing info about a single device (there are 4 such blocks
    ##    in your example);
    ## 2. each block is assumed to contain one or more lines matching
    ##    "Total I/Os";
    ## 3. the relevant count data is assumed to be contained in the last
    ##    of such lines (at column 4),fixed=TRUE)
    if(length(ii_block_dev)==0){
        variable <- data.frame(stringsAsFactors=FALSE,date_current_run=date_current_run,device_names=NA,IOs_counts=NA)
    }else{
        ii_lines_IOs <- grep("Total I/Os",fixed=TRUE)
        nblocks <- length(ii_block_dev)
        if(length(ii_block_dev)==0){
            sprintf("WEIRD datapoint at date %s: I have %d devices but 0 I/O lines??")
            ##stop()
        }
        ## A. get counts for each device
        ## for each block,select *last* line matching "Total I/Os"
        ii_block_dev_aux <- c(ii_block_dev,Inf) ## just a hack to get a clean code
        ii_lines_IOs_dev <- sapply(1:nblocks,function(block){
            ## select matching lines to "Total I/Os" within each block
            IOs_per_block <- ii_lines_IOs[ ii_lines_IOs > ii_block_dev_aux[block  ] &
                                           ii_lines_IOs < ii_block_dev_aux[block+1]
                                          ]
            tail(IOs_per_block,1) ## get the last line of each block (if more than one match)
        })
        lines_IOs <- lapply(txt[ii_lines_IOs_dev],function(strng){
            Filter(function(v) v!=""," ")[[1]])
        })
        IOs_counts <- sapply(lines_IOs,function(v) v[5])
        ## B. get device names:
        ## assumed to be on lines following each "Device:" match
        ii_devices <- 1 + ii_block_dev
        device_names <- sapply(ii_devices,function(ii){
            Filter(function(v) v!=""," ")[[1]])
        })
        ## Create a data.frame with three columns: date,device,counts
        variable <- data.frame(stringsAsFactors=FALSE,date_current_run=rep(date_current_run,length(IOs_counts)),device_names=device_names,IOs_counts=IOs_counts)
    }
    list(fixed=fixed,variable=variable)
}
print(extractv3("file2.txt"))


file.list <- c("file.txt","file2.txt","file3.txt")
res <- lapply(file.list,extractv3)

fixed.merged <- do.call("rbind",lapply(res,function(r) r$fixed))
print(fixed.merged)

variable.merged <- do.call("rbind",function(r) r$variable))
print(variable.merged)

正则表达式 – 如何使用R解析sysmon文件以提取某些信息？的更多相关文章

HTML5数字输入仅接受整数的实现代码

这篇文章主要介绍了HTML5数字输入仅接受整数的实现代码,本文给大家介绍的非常详细，对大家的学习或工作具有一定的参考借鉴价值，需要的朋友可以参考下
Html5移动端适配IphoneX等机型的方法

这篇文章主要介绍了Html5移动端适配IphoneX等机型的方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
ios – 使用大写符号在字符串swift中获取URL的正则表达式

我尝试在文本中获取URL.所以,在此之前,我使用了这样一个表达式：但是当用户输入带有大写符号的URL时(例如Http://Google.com,它与它不匹配)我遇到了问题.我试过了：但什么都没发生.解决方法您可以使用正则表达式中的i内联标志关闭区分大小写,有关可用正则表达式功能的详细信息,请参阅FoundationFrameworkReference.(?ismwx-ismwx)Flagsetti
对于NSManagedObject,Xcode 9构建了Date vs NSDate的问题

Xcode9为模拟器与设备中的实体的Date类型属性生成不同的代码.我在coredata中将类设置为类别/扩展名下的codegen功能.直到Xcode8.3(最新)它一切正常.下面是Xcode9为属性自动生成的代码–在设备上：–和,在模拟器上：–有谁遇到过这个问题？对于一个有50个成员的项目来解决这个问题的最佳解决方案是什么,直到Xcode更新修复它？
iOS兼容输入类型日期 – 设置最小值最大值

我试图在UIWebViewiOS应用程序中使用jQueryMobile设置日期,值设置正确但最小和最大属性日期设置不起作用.和当我在模拟器上运行它时,当选择日期字段时,日期选择器可见,但未设置最小,最大日期.解决方法模拟器的safari不支持设置输入类型=“日期”的最小值和最大值.您可以通过导航到thissite并尝试控制来测试它.它(可能)可以在桌面浏览器上运行,但不会在模拟器的浏览器或UIWebView中运行.
ios – 设置NSDataDetector的上下文日期

假设今天是2014年1月20日.如果我使用NSDataDetector从“明天下午4点”字符串中提取日期,我将得到2014-01-21T16：00.大.但是,假设我希望NSDataDetector假装当前日期是2014年1月14日.这样,当我解析“明天下午4点”时,我将得到2014-01-15T16：00.如果我在设备上更改系统时间,我会得到我想要的.但是,有没有办法以编程方式指定它？
ios – 如何自动生成日期属性为Date而不是NSDate的NSManagedObject子类？

我目前正在将我的项目更新为Swift3,并且我将所有的NSDate方法和扩展都移动到Date以便在应用程序中保持标准.问题是我使用Xcode自动生成我的NSManagedobject子类,它生成日期属性为NSDate而不是Date.有没有办法用日期属性作为日期生成它？
ios – 如何减去日期组件？

今天是星期五,根据NSCalendar,这是6.我可以通过使用以下内容得到这个我怎么得到上周六的工作日组件,应该是7？
iOS推送通知适用于Dev而不是Enterprise Distribution

本网站上没有其他问题,我已经能够找到实际上提出了Dev将工作的原因,但企业分布不会.为什么归档总是使aps环境生产？
iOS – 友好的NSDate格式

我需要在我的应用程序中显示帖子的日期给用户,现在我用这种格式：“5月25日星期五”.如何格式化NSDate以阅读“2小时前”的内容？使其更加用户友好.解决方法NSDateFormatter不能做这样的事情;你将需要建立自己的规则.我想像：所以这是打印’x分钟前’或’x小时前’从日期起24小时,通常是一天.

随机推荐

法国电话号码的正则表达式

我正在尝试实施一个正则表达式,允许我检查一个号码是否是一个有效的法国电话号码.一定是这样的：要么：这是我实施的但是错了……
正则表达式 – perl分裂奇怪的行为

PSperl是5.18.0问题是量词*允许零空间,你必须使用,这意味着1或更多.请注意,F和O之间的空间正好为零.
正则表达式 – 正则表达式大于和小于

我想匹配以下任何一个字符：或=或=.这个似乎不起作用：[/]试试这个：它匹配可选地后跟=,或者只是=自身.
如何使用正则表达式用空格替换字符之间的短划线

我想用正则表达式替换出现在带空格的字母之间的短划线.例如,用abcd替换ab-cd以下匹配字符–字符序列,但也替换字符[即ab-cd导致d,而不是abcd,因为我希望]我如何适应以上只能取代–部分？
正则表达式 – /bb | [^ b] {2} /它是如何工作的？

有人可以解释一下吗？我在t-shirt上看到了这个：它似乎在说：“成为或不成为”怎么样？我好像没找到’e’？
正则表达式 – 在Scala中验证电子邮件一行

在我的代码中添加简单的电子邮件验证,我创建了以下函数：这将传递像bob@testmymail.com这样的电子邮件和bobtestmymail.com之类的失败邮件,但是带有空格字符的邮件会漏掉,就像bob@testmymail也会返回true.我可能在这里很傻……当我测试你的正则表达式并且它正在捕捉简单的电子邮件时,我检查了你的代码并看到你正在使用findFirstIn.我相信这是你的问题.findFirstIn将跳转所有空格,直到它匹配字符串中任何位置的某个序列.我相信在你的情况下,最好使用unapp
正则表达式对小字符串的暴力

在测试小字符串时,使用正则表达式会带来性能上的好处,还是会强制它们更快？不会通过检查给定字符串的字符是否在指定范围内比使用正则表达式更快来强制它们吗？
正则表达式 – 为什么`stoutest`不是有效的正则表达式？

isthedelimiter,thenthematch-only-onceruleof?PATTERN?
正则表达式 – 替换..与.在R

我怎样才能替换..我尝试过类似的东西：但它并不像我希望的那样有效.尝试添加fixed=T.
正则表达式 – 如何在字符串中的特定位置添加字符？

我正在使用记事本,并希望使用正则表达式替换在字符串中的特定位置插入一个字符.例如,在每行的第6位插入一个逗号是什么意思？如果要在第六个字符后添加字符,请使用搜索和更换从技术上讲,这将用MatchGroup1替换每行的前6个字符,后跟逗号.