Веб-скрабинг из таблицы динамического содержания на powershell с использованием модуля power html

XML, or eXtensible Markup Language, is a versatile data format widely used for storing and exchanging structured information. In the realm of PowerShell, XML becomes a potent ally, offering a robust mechanism for handling and manipulating data in a hierarchical and human-readable way. In this article, we’ll explore the integration of XML with PowerShell, unraveling the simplicity and efficacy it brings to scriptwriters.

my conditional text is not reflecting on the page when the function is called

Namatama taba's user avatar

Javascript innerText adds two \n for every carriage return key user pressed

Priya Ganesan's user avatar

string in the search box for webview21

StefanoBon's user avatar

innerText.includes() stripping newlines

la.evis.'s user avatar

can element.innerText hold a function which displays console.log message in a Paragraph for Javascript?

tre3's user avatar

Dynamically replace < and > with &lt; and &gt;

user avatar

Why this InnerHtml returning same output as innerText

user avatar

I have problem with editing fetched values

D A R K  VerbalCentaurPL's user avatar

Why am I not getting the innerText property of an element?

v0vvy's user avatar

how to assign an id to every <p> with a specific bit of text inside?

EL22310's user avatar

console.log() works but “textcontent” is not working – JavaScript

nibblebytes07's user avatar

VBA webscraping – can’t get innerText from class

Piotr Połetek's user avatar

How to extract every string between two substring in a paragraph?

poon cl's user avatar

How to extract innertext of nested tags with the same class names

Bart Zakkenwasser's user avatar

How to get innerText of child div

Bora Özçağlar's user avatar

<?xml version="1.0" encoding="utf-8"?>
<Topology>
  <Core>
    <SERVER1 desc="computer name server"></SERVER1>
    <SERVER2 desc="computer name server"></SERVER2>
    <CORE1 desc="application core"></CORE1>
    <CORE2 desc="application core"></CORE2>
  </Core>
    <Credentials>
          <AppCredentials>
              <USERNAME></USERNAME>
              <PASSWORD></PASSWORD>
          </AppCredentials>
          <ServicesCredentials>
              <SC_USERNAME></SC_USERNAME>
              <SC_PASSWORD></SC_PASSWORD>
              <SC_DOMAIN></SC_DOMAIN>
          </ServicesCredentials>
    </Credentials>
</Topology>
$xmldata =[XML](Get-Content C:\Config.xml)
$xmldata. Core.SERVER1.InnerText = 'Admin'
$xmldata. Core.SERVER2.InnerText = 'Nimda'
$xmldata. AppCredentials.USERNAME.InnerText = 'Account'
$xmldata. AppCredentialsPASSWORD.InnerText = '123'
$xmldata. Save((Resolve-Path C:\Config.xml). Path)
  • $xmldata. Core.SERVER2.InnerText = ‘Nimda’
  •   + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
      + FullyQualifiedErrorId : PropertyNotFound
    
    

Finally, the XML file should look like as below when we run the above script

<?xml version="1.0" encoding="utf-8"?>
<Topology>
  <Core>
    <SERVER1 desc="computer name server">Admin</SERVER1>
    <SERVER2 desc="computer name server">Nimda</SERVER2>
    <CORE1 desc="application core"></CORE1>
    <CORE2 desc="application core"></CORE2>
  </Core>
    <Credentials>
          <AppCredentials>
              <USERNAME>Account</USERNAME>
              <PASSWORD>123</PASSWORD>
          </AppCredentials>
          <ServicesCredentials>
              <SC_USERNAME></SC_USERNAME>
              <SC_PASSWORD></SC_PASSWORD>
              <SC_DOMAIN></SC_DOMAIN>
          </ServicesCredentials>
    </Credentials>
</Topology>

Please let me know any additional changes are required for this to work

:/>  Команда DEL (ERASE): удаление файлов через командную строку Windows | вебисторий

I have a sample below xml file. I would like to update element settings node and change it “NewSettings” (highlighted in bold). I tried doing it through powershell using different properties and methods to change it, however it is not working out.

<configuration>
  **<Settings>**
    <test.Settings>
Some text block
    </test.Settings>
  **</Settings>**
</configuration>

I have tried below so far and some other powershell method to change element but it is not working

$path             = 'C:\temp1\web6.config'

$Newsetting        = 'NewSettings'

$xml = [xml](Get-Content $path)

$newxml = $xml.SelectSingleNode("//Settings")

$newxml.InnerText = $Newsetting

$xml.Save($path)

Mathias R. Jessen's user avatar

Max Job's user avatar

See: How can I update the value for a XML node using PowerShell?

The InnerText property changes the text within an element, not the element’s name. You need to create a new element with the desired name, clone the attributes and children from the old element to the new one, replace, and save.

This script will replace the <Settings> element with a <NewSettings> element, keeping all attributes and child elements.

$path = 'C:\temp1\web6.config'
$Newsetting = 'NewSettings'

# Load the XML file
$xml = xml

# Select the node to be renamed
$oldNode = $xml.SelectSingleNode("//Settings")

# Create a new node with the new name
$newNode = $xml.CreateElement($Newsetting)

# Clone attributes and children from the old node to the new node
foreach ($attribute in $oldNode.Attributes) {
    $newAttribute = $xml.CreateAttribute($attribute.Name)
    $newAttribute.Value = $attribute.Value
    $newNode.Attributes.Append($newAttribute)
}
foreach ($childNode in $oldNode.ChildNodes) {
    $newNode.AppendChild($childNode.Clone())
}

# Replace the old node with the new node in the parent
$oldNode.ParentNode.ReplaceChild($newNode, $oldNode)

# Save the XML document
$xml.Save($path)

Try this and let me know if this helps.

Martin Iszac's user avatar


PowerShell offers several ways to work with XML, making it a powerful tool for managing configurations, parsing data, and more. In this article, we’ll explore how you can use PowerShell to create, read, and manipulate XML.

xml file icon.

Understanding XML

XML (eXtensible Markup Language) is a common data format used to store and transport data. It’s human-readable and widely supported, making it an ideal choice for many applications.

A basic XML document might look something like this:

<?xml version="1.0"?>
<employees>
  <employee>
    <name>John Doe</name>
    <position>Manager</position>
  </employee>
  <employee>
    <name>Jane Smith</name>
    <position>Developer</position>
  </employee>
</employees>

Reading XML with PowerShell

# Read an XML file
$xml = [xml](Get-Content "C:\path\to\your\file.xml")

# Access elements
$xml.employees.employee | ForEach-Object {
  Write-Output ("Name: " + $_.name)
  Write-Output ("Position: " + $_.position)
}

Creating XML documents

You can also create XML documents with PowerShell. You can do this by creating an XmlDocument object and using its methods to add elements:

# Create an XmlDocument
$xml = New-Object System.Xml.XmlDocument

# Create and append the root element
$employees = $xml.CreateElement("employees")
$xml.AppendChild($employees)

# Create and append an employee element
$employee = $xml.CreateElement("employee")
$employees.AppendChild($employee)

# Create, set the value of, and append a name element
$name = $xml.CreateElement("name")
$name.InnerText = "John Doe"
$employee.AppendChild($name)

# Save the XML document
$xml.OuterXml | Set-Content "C:\path\to\your\file.xml"

Modifying XML documents

Modifying an XML document is as simple as reading it, making changes, and then saving it:

# Read an XML file
$xml = [xml](Get-Content "C:\path\to\your\file.xml")

# Change the name of the first employee
$xml.employees.employee[0].name = "New Name"

# Save the XML document
$xml.OuterXml | Set-Content "C:\path\to\your\file.xml"

Best practices

Here are some best practices when working with XML and PowerShell:

  • Error Handling: Always include error handling when working with files or when parsing XML.
  • Indenting XML: PowerShell doesn’t indent XML by default. If you want indented XML, use the Save method of the XmlDocument object.
  • XML vs. JSON: While XML is powerful, JSON may be easier to work with in some cases. PowerShell also supports JSON.
:/>  Не грузится icloud drive

Conclusion

Understanding how to interact with XML using PowerShell can open up a variety of possibilities. It can make your scripts more versatile, allowing them to interact with XML-based APIs, configuration files, and more.

Reader Interactions


I’m getting an error when I try to read contents form a table on the web page described in the script. Can anyone please help me with a solution to fix it. Thanks.

@mklement0, Thanks for the detailed explanation. With your help, I was able to extract the table information. However, I'm still unable to extract table rows as it's still returned as null. Can you please help? Please see below. Thanks.
 
$wc = New-Object System.Net.WebClient
$res = $wc.DownloadString('https://datatables.net/examples/data_sources/ajax.html')
$html = ConvertFrom-Html -Content $res

$ScrapeData=[System.Collections.ArrayList]::new()
$ScrapeData+=$n
$table = $html.SelectNodes('//table') | Where-Object { $_.HasClass("display") -or $_.HasClass("dataTable")}

foreach ($row in $table.SelectNodes('//tr') | Where-Object { $_.HasClass("odd") -or $_.HasClass("even")} )
{
    $cnt += 1

    if ($cnt -eq 1) { continue }

    #$name= $row.SelectSingleNode('//th').innerText.Trim() | Where-Object { $_.HasClass('sorting_1')}
    $value=$row.SelectSingleNode('td').innerText.Trim() -replace "\?", " "
    $new_obj = New-Object -TypeName psobject
    $new_obj | Add-Member -MemberType NoteProperty -Value $value
    $ScrapeData+=$new_obj 
}

Write-Output 'Extracted Table Information'
$table
 
Write-Output 'Extracted Book Details Parsed from HTML table'
$ScrapeData

Extracted data as below

asked Dec 9, 2023 at 7:12

Roshan Fernando's user avatar

    • To support extracting content loaded dynamically – via scripts embedded in the source code that only execute when the page is rendered in a browser – you need a full web browser that you can control programmatically – see this answer.

    • Interactively, most browsers offer:

      • a view of the static HTML source code of a page by right-clicking and selecting a shortcut-menu command such as View Page Source or Show Page Source.

      • a view of the dynamically generated HTML by right-clicking and selecting a command such as Inspect or Inspect Element

        • By default, it returns HtmlAgilityPack.HtmlNode instances from the HtmlAgilityPack .NET library.
      • While the explanations below point to solutions, they are hypothetical, as they would require the dynamically generated HTML to operate on, which, as noted .DownloadString() cannot provide.
        Specifically, the source code doesn’t contain any table rows – they are populated dynamically; also, the <table> element has only one class, display.

    • $_.HasClass('display dataTable') looks for a single class name literally named display dataTable, whereas class="display dataTable" in the dynamically generated HTML means that the element has two classes, display and dataTable. Therefore your method call always returns $false.

      • The logic you were looking for is probably to find elements with class display as well as dataTable, which requires $_.HasClass("display") -and $_.HasClass("dataTable")

    • $_.HasClass("odd", "even") would have become a problem, because the method only accepts a single string.

      • The logic you were looking for is probably to find elements with class odd or class even, which requires $_.HasClass("odd") -or $_.HasClass("even")

answered Dec 9, 2023 at 21:47

:/>  Как создавать и удалять симлинки

mklement0's user avatar

67 gold badges672 silver badges861 bronze badges

Reading XML with PowerShell

Reading XML in PowerShell is an intuitive process, allowing seamless navigation through the document’s hierarchical structure.

Let’s explore how to extract information from an XML file:

# Load XML from a file
$xmlFilePath = 
$xmlContent = Get-Content -Path $xmlFilePath
# Accessing elements
$rootValue = $xmlDocument.Root.InnerText
$child1Value = $xmlDocument.Root.Child1.InnerText
$child2Value = $xmlDocument.Root.Child2.InnerText
# Displaying values
Write-Host $rootValue
Write-Host $child1Value
Write-Host $child2Value
C:\> Write-Host "Root Value: $rootValue"
Root Value: Value1Value2
C:\> Write-Host "Child1 Value: $child1Value"
Child1 Value:
C:\> Write-Host "Child2 Value: $child2Value"
Child2 Value:

In this snippet, we load an XML file, cast it to an XML type, and then access specific elements and their corresponding values. The result is a straightforward display of the extracted data.

Modifying XML with PowerShell

Let’s delve into an example where we update the value of a child element:

# Load XML from a file
$xmlFilePath = 
$xmlContent = Get-Content -Path $xmlFilePath
# Modify a child element's value
$xmlDocument.Root.Child1 = 
# Save the modified XML

In this scenario, we load the XML file, update the value of a specific child element (Child1), and then save the modified XML content back to the file.

Creating XML with PowerShell

PowerShell provides seamless ways to generate XML content, both manually and programmatically.

Let’s embark on a journey by creating a basic XML structure using PowerShell’s intrinsic capabilities:

# Creating a simple XML document
$xmlDocument = New-Object System.Xml.XmlDocument
# Adding a root element
$rootElement = $xmlDocument.CreateElement()
# Adding child elements
$childElement1 = $xmlDocument.CreateElement()
$childElement1.InnerText = 
$childElement2 = $xmlDocument.CreateElement()
$childElement2.InnerText = 
# Save the XML to a file

In this example, we create an XML document, add a root element named “Root,” and populate it with two child elements, each containing a distinct value. The resulting XML is then saved to a specified file path.

Оставьте комментарий