--- title: "inSilecoMisc: an overview" author: "inSileco group" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{inSilecoMisc: an overview} %\VignetteEngine{knitr::rmarkdown} --- ```{r echo = FALSE, message = FALSE} knitr::opts_chunk$set( comment = "R> ", collapse = TRUE ) library(inSilecoMisc) ``` This vignette is a short overview of *inSilecoMisc* organized around the following themes: 1. strings manipulations 2. vector manipulations 3. data frames manipulations 4. Mathematical functions # String manipulations ## LoremIpsum It is sometime useful to have a piece of random text to play with, `loremIpsum()` display a piece of the commonly used placeholder text [Lorem ipsum](https://en.wikipedia.org/wiki/Lorem_ipsum) with the option of having a specific number of words. ```{r loremIpsum} loremIpsum() loremIpsum(10) ``` ## Keep a selection of words or letters Assuming I I need the second word, all I have to do is: ```{R} keepWords(c(loremIpsum(18), "be or not to be"), 2) ``` and if I want a specific selection of words I cas use a sequence: ```{R} keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16)) ``` As you may have notted, `NA`s are added when the position selected does not exist, this is useful but also annoying! Fortunately, a agument allow to remove na ```{R} keepWords(c(loremIpsum(18), "be or not to be"), c(1:4, 12:16), na.rm = TRUE) ``` Also `collapse = "-"` allows the user to change the character used to separate words: ```{R} keepWords(loremIpsum(18), c(1:6, 14:18), collapse = "-") ``` and if `collapse=NULL` then list will be returned including a vector of the selected words per input string: ```{R} keepWords(c(loremIpsum(18), "be or not to be"), c(2:3), collapse = NULL) ``` Not that all punctuation signs will be removed (this can be changed with argument `split_words `)! There are two other functions that work similarly: `keepLetters()` and `keepInitials()`. The former allows the used to select letters ```{R} keepLetters(loremIpsum(18), c(1:6, 14:18)) keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = "-") keepLetters(loremIpsum(18), c(1:6, 14:18), collapse = NULL) ``` while the latter extracts initials ```{R} keepInitials("National Basketball Association") keepInitials("National Basketball Association", "-") ``` Note that is you have a mixture of lower and upper case, so will the output ```{R} keepInitials("National basketball association") ``` if this annoys you, base functions `upper()` and `lower()` come in handy! ```{R} keepInitials(tolower("National basketball association")) keepInitials(toupper("National basketball association")) ``` ## Adjust the size of a character string `adjustStrings()` have 5 arguments to adjust strings of a character vector is a very flexible fashion: 1. `x`: the input character vector to be adjusted; 2. `n`: the number of characters to be added or used to constrain on the length of the output strings; 3. `extra`: the character(s) to be added (`0` is the default value); 4. `align`: the string alignment ("right", "left" or "center"); 5. `add`: whether `n` should be the constraint or a number of characters to be added (a constraint by default). By default, `adjustStrings()` uses `n` as a constrain for the length of the output strings. so if use `n = 4` instead of `n = 2` in the first example, all elements of the output vector will have 4 characters: ```{R n} adjustStrings(1:10, n = 4) ``` Add I change the value of `extra` to specify the replacement character(s) to be used : ```{R extra} adjustStrings(1:10, n = 4, extra = 1) adjustStrings(1:10, n = 4, extra = "a") adjustStrings(1:10, n = 4, extra = "-") adjustStrings(1:10, n = 4, extra = "ab") ``` With `align`, I can choose where extra characters are added: ```{R align} adjustStrings(1:10, n = 4, extra = "-", align = "right") # default adjustStrings(1:10, n = 4, extra = "-", align = "left") adjustStrings(1:10, n = 4, extra = "-", align = "center") ``` And if I want to add exactly `n` extra characters, `add = TRUE`, then exactly `n` extra characters are added to strings (): ```{R add} adjustStrings(1:10, n = 4, extra = "-", align = "right", add = TRUE) adjustStrings(1:10, n = 4, extra = "-", align = "left", add = TRUE) adjustStrings(1:10, n = 4, extra = "-", align = "center", add = TRUE) ``` Note that in this case, lengths of out strings differ! One last remark about how `adjustStrings()` works when `add = FALSE`: for a given string, there are 3 scenarios : 1. the string to be adjusted has more characters than `n`; in this case, the string is simply cut off: ```{R} adjustStrings("ABCD", n = 2, extra = "efgh") ``` 2. the string has more character but the number of character for the adjustment is smaller than the number of `extra`'s character; in this case, `extra` is cut off: ```{R} adjustStrings("ABCD", n = 6, extra = "efgh") ``` 3. finally, when `extra` is too short to adjust the string according to `n`, `extra` is repeated: ```{R} adjustStrings("ABCD", n = 14, extra = "efgh") ``` ## Extract file info ```{r fileinfo} getDetails("path1/path2/foo.R") getDetails(list.files()) getExtension("foo.R") getBasename("foo.R") ``` ## Assign a symbol to a p-value ```{r signifSymbols} sapply(c(.2, .08, .04, .008, 0.0001), signifSymbols) ``` ## applyString ```{r stApply} applyString("cool", FUN = toupper, pos = 1:2) applyString(c("cool", "pro"), pattern = "o", FUN = toupper) ``` ## Extract digits from strings ```{r getDigits} getDigits(c("a1", "032hdje2832")) ``` ## Collapse elements of a vector and add element separators ```{r commaAnd} commaAnd(c("Judith", "Peter", "Rebecca", "Eric")) ``` # Vectors manipulations ## whichIs ```{r whichIs} vec <- LETTERS[1:7] spl <- sample(vec) whichIs(vec, spl) id <- unlist(whichIs(vec, spl)) spl[id] ``` ## meanAlong ```{r meanAlong} meanAlong(1:10, 2) ``` ## scaleWithin ```{r scaleWithin} val <- runif(1000, 0, 100) res1 <- scaleWithin(val, 20, 40, 60) ``` # Data frame manipulation ## Assign a category ```{r categorize} (seqv <- stats::runif(40)) categorize(seqv, categ=seq(0.1,0.9, 0.1)) ``` ## Turn a matrix or a data frame into a squared matrix ```{r squaretize} mat <- matrix(1:15, 3, 5, dimnames = list(LETTERS[3:1], LETTERS[1:5])) print(mat) squaretize(mat, reorder = FALSE) ``` ## Assign classes to data frames' columns ```{r setColClass} df1 <- matrix(signif(runif(20),4), ncol = 2) df2 <- setColClass(df1, 2, 'character') str(df1) str(df2) ``` ## Create data frame from scratch or from other data frame See also this [blog post on inSileco](https://insileco.github.io/2019/02/03/creating-empty-data-frames-with-dftemplate-and-dftemplatematch/). ```{r dfTemplate} dfA <- data.frame(col1 = c(1, 2), col2 = LETTERS[1:2]) dfB <- data.frame(col1 = 2, col4 = "cool") dfTemplate(2, 2) dfTemplate(2, 2, fill = 0) dfTemplate(c("value", "name"), 2, col_classes = c("numeric", "character")) dfTemplateMatch(dfA, c("col4")) dfTemplateMatch(dfA, dfB) dfTemplateMatch(dfA, c("col1", "col4"), yonly = TRUE) dfTemplateMatch(dfA, c("col1", "col2"), yonly = TRUE, col_classes = "numeric", fill = 0) ``` ## packagesUsed ```{r packagesUse} packagesUsed(c('utils', 'methods')) ``` ## Export a data frame or a list of data frames ```R tblDown(list(CO2[1:2, ], CO2[3:6,]), "./tables.docx", section = "section", caption = "CO2", title = "Tables") ``` # Messages `inSilecoMisc` includes four simple message functions to standardize messages in scripts: ```{R} # 1. msgInfo() indicates what the upcoming computation msgInfo("this is what's gonna happen next") ``` ```{R} # 2. msgWarning() reminds me something important that should not affect the run msgWarning("Got to be careful") ``` ```{R} # 3. msgError() when something went wrong (and I anticipated that it could happen) msgError("Something wrong") ``` ```{R} # 4. msgSuccess() when a step/ a computation has been successfully completed msgSuccess("All good") ``` They are meant to help structuring scripts, here is a somewhat contrived example: ```{R} scr_min <- function() { # msgInfo() lets me know where I am in the script msgInfo("Average random values") set.seed(111) out <- mean(runif(100)) msgSuccess("Done!") # msgSuccess() indicates the successful completion of this part out } scr_min() ``` As these functions are based on `message()`, one can execute the script quietly by calling `suppressMessages()` beforehand: ```{R} # quiet run suppressMessages(scr_min()) ``` # Writing tables in document `tblDown()` writes a data frame (and list of data frame) in a document in various formats (`.docx` by default) ```{R, eval = FALSE} # NB tblDown(head(CO2)) creates table.docx by default tblDown(head(CO2), output_file = "table.odt") ``` ![](table_odt.png) `tblDown()` handles lists of data frames and the user can also pass a set of captions for every table and even separate them with section headers: ```{R, eval = FALSE} tblDown(list(head(CO2), tail(CO2)), output_file = "tables.pdf", caption = c("This is the head of CO2", "This is the tail of CO2"), section = "Table") ``` ![](tables_pdf.png) Note that if there are less captions or sections titles than data frames, vectors of captions (and/or sections) are repeated and an index is appended. # Mathematical functions ## Logistic functions ```{r logistic} seqx <- seq(-5, 5, 0.1) par(mfrow = c(1, 2)) plot(seqx, logistic(seqx), type = "l") abline(v = 0, h = 0, lty = 2) plot(seqx, logistic2(seqx, yzer = .5), type = "l") abline(v = 0, h = 0, lty = 2) ``` ## Gaussian shape ```{r gaussian} plot(gaussianShape(1:1000, 500, 2, 250, pow=5), type='l') lines(gaussianShape(1:1000, 500, 2, 250, pow=2), lty = 2) lines(gaussianShape(1:1000, 500, 2, 250, pow=1), lty = 3) legend("topleft", paste("pow = ", c(5, 2, 1)), lty = 1:3) ```