top of page

Web scraping GitHub with Power BI

Writer's picture: Seth SanusiSeth Sanusi



Have you tried the GitHub connector in Power BI? You’ll find information on changes and statistics of the repo, but not much more. What if you need raw code or binary data for a report? Read on.


Goal: show Azure Icons and metadata based on a GitHub repo.





Ben Coleman has a superb write-up and repo of Azure Icons (past and present). He also has some tools and artifacts for web scraping.


Ben also has a well written web gallery for viewing, searching and downloading Azure Icons:

In this post I’ll walk you through creating a Power BI report based on a GitHub repo of images and Power BI web data “by providing example” feature:



  1. locate the repo

  2. in Power BI desktop: connect to the Web page

  3. Click on “Add Table using Examples”

  4. scrape the URLs of the images

  5. create a calculated column to link to GitHub’s “raw” URL

  6. use table, image grid or other Power BI visuals to render image

  7. publish the report, schedule refreshes if desired.


Here's the M code. See AzureIcons2021.pbix on Github:


let
    Source = Web.BrowserContents("https://github.com/benc-uk/icon-collection/tree/master/azure-icons"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Icon Name", ".Link\-\-primary.js-navigation-open"}, {"URL", "[data-pjax=""\#repo-content-pjax-container""]", each [Attributes][href]?}}, [RowSelector=".Box-row + *"]),
    #"Inserted Replaced Text" = Table.AddColumn(#"Extracted Table From Html", "Icon URL", each Text.Replace([URL], "/benc-uk/icon-collection/blob", "https://raw.githubusercontent.com/benc-uk/icon-collection"), type text),
    #"Added Prefix" = Table.TransformColumns(#"Inserted Replaced Text", {{"URL", each "https://github.com" & _, type text}}),
    #"Renamed Columns" = Table.RenameColumns(#"Added Prefix",{{"Icon Name", "File Name"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "File Name", "File Name - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"File Name - Copy", "Icon"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns1","-"," ",Replacer.ReplaceText,{"Icon"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",".svg","",Replacer.ReplaceText,{"Icon"})
in
    #"Replaced Value1"





 
 
 

2 Comments


A leading Python training institute in Pune, offering comprehensive courses for beginners and professionals. Learn Python programming from expert trainers with hands-on projects, real-world applications, and industry-relevant curriculum. Flexible schedules and certification upon completion ensure your success.


Like

Enhance your IT capabilities with top-tier Azure Cloud training in Pune. Our comprehensive program is designed to equip you with the skills needed to excel in the ever-evolving world of cloud computing. You'll gain a deep understanding of Azure services, architecture, and deployment through theoretical knowledge and hands-on practice.


Like
Post: Blog2_Post

Subscribe Form

Thanks for submitting!

  • Twitter
  • LinkedIn
  • Facebook

©2021 by Snoozy Data. Proudly created with Wix.com

bottom of page