HTMLReader
Defined in: .build/typescript/packages/readers/src/html.ts:11
Extract the significant text from an arbitrary HTML document. The contents of any head, script, style, and xml tags are removed completely. The URLs for a[href] tags are extracted, along with the inner text of the tag. All other tags are removed, and the inner text is kept intact. Html entities (e.g., &) are not decoded.
Extends
Section titled “Extends”FileReader
<Document
>
Constructors
Section titled “Constructors”Constructor
Section titled “Constructor”new HTMLReader():
HTMLReader
Returns
Section titled “Returns”HTMLReader
Inherited from
Section titled “Inherited from”FileReader<Document>.constructor
Methods
Section titled “Methods”loadDataAsContent()
Section titled “loadDataAsContent()”loadDataAsContent(
fileContent
):Promise
<Document
<Metadata
>[]>
Defined in: .build/typescript/packages/readers/src/html.ts:18
Public method for this reader. Required by BaseReader interface.
Parameters
Section titled “Parameters”fileContent
Section titled “fileContent”Uint8Array
The content of the file.
Returns
Section titled “Returns”Promise
<Document
<Metadata
>[]>
Promise<Document[]>
A Promise object, eventually yielding zero or one Document parsed from the HTML content of the specified file.
Overrides
Section titled “Overrides”FileReader.loadDataAsContent
parseContent()
Section titled “parseContent()”parseContent(
html
,options
):Promise
<string
>
Defined in: .build/typescript/packages/readers/src/html.ts:33
Wrapper for string-strip-html usage.
Parameters
Section titled “Parameters”string
Raw HTML content to be parsed.
options
Section titled “options”Partial
<Opts
> = {}
An object of options for the underlying library
Returns
Section titled “Returns”Promise
<string
>
The HTML content, stripped of unwanted tags and attributes
getOptions
getOptions()
Section titled “getOptions()”getOptions():
Partial
<Opts
>
Defined in: .build/typescript/packages/readers/src/html.ts:46
Wrapper for our configuration options passed to string-strip-html library
Returns
Section titled “Returns”Partial
<Opts
>
An object of options for the underlying library