eBay Product Scraping, Manta Data Scraping, Website Screen Scraping, Website Screen Scraping, Website Scraper, Scraping Data from Websites, Website Information Scraping, Web Scraping Services, Scraping Data from Websites, Website Information Scraping

Thursday 11 September 2014

Web Data Extraction / Scraping Data from Kitco Inc. Text Only Market Page

I wish to capture data from

<html>
<head>
<title>Text Only Market Page</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFFF">
<br><br>
<pre>
<b><font size=6>
  Kitco Inc.

  Text Only Market Page</font></b>

    <a href="http://www.kitco.com/market/">Graphic version of this page</a>

    <a href="http://www.kitco.com/market/LFrate.html">Precious Metals Lease Rates</a> 
    <a href="http://www.kitco.com/gold.londonfix.html">Historical Price Data</a> 
    <a href="http://www.kitco.com/market/marketnews.html">Precious Metals News Headlines</a>

    <font size=4><b><a href="https://online.kitco.com/bullion/completelist_USD.html#gold">Buy gold and silver online direct from Kitco!</a>
   Live quotes for all bullion products.</b></font>


   --------------------------------------------------------------------------------
   London Fix          GOLD          SILVER       PLATINUM           PALLADIUM
                   AM       PM                  AM       PM         AM       PM
   --------------------------------------------------------------------------------
   Jun 19,2012   1628.50   1625.50   28.8100   1486.00   1486.00   629.00   634.00 
   Jun 18,2012   1623.50   1615.50   28.4300   1486.00   1484.00   626.00   628.00 
   --------------------------------------------------------------------------------


                  New York Spot Price
                MARKET IS OPEN
            Will close in 4 hour 25 minutes
   ----------------------------------------------------------------------
   Metals          Bid        Ask           Change        Low       High
   ----------------------------------------------------------------------
   Gold         1619.80     1620.80     -8.90  -0.55%    1616.60  1632.70
   Silver         28.46       28.56     -0.28  -0.97%      28.24    28.95
   Platinum     1479.00     1489.00      0.00   0.00%    1476.00  1500.00
   Palladium     627.00      632.00      0.00   0.00%     622.00   639.00
   ----------------------------------------------------------------------
   Last Update on Jun 19, 2012 at 12:50.59
   ----------------------------------------------------------------------


                Asia / Europe Spot Price
                MARKET IS OPEN
            Will close in 4 hours 25 minutes
   ----------------------------------------------------------------------
   Metals                      Bid          Ask      Change from NY close
   ----------------------------------------------------------------------
   Gold                      1619.80      1620.80     -8.90   -0.55%
   Silver                      28.46        28.56     -0.28   -0.97%
   Platinum                  1479.00      1489.00     +0.00   +0.00%
   Palladium                  627.00       632.00     +0.00   +0.00%
   ----------------------------------------------------------------------
   Last Update on Jun 19, 2012 at 12:50.59
   ----------------------------------------------------------------------


<b>   File created on Tue Jun 19 12:51:04 2012</b>


        <style type="text/css"><!--
 #main_container_footer {width:100%;text-align: center;}
    #main_container_footer #footer_container {width:auto; margin:25px auto 25px auto;}
    #main_container_footer #footer_container ul {margin:0; padding:0;}
    #main_container_footer #footer_container ul li {float:left; display:inline; list-style:none; padding:0 8px; font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; border-right:1px #000 solid;}
    #main_container_footer #footer_container ul li a {font-family:Verdana, Arial, Helvetica, sans-serif; font-size:12px; color:#000; text-decoration:underline; font-weight:normal;}
    #main_container_footer #footer_container ul li a:hover {color:#ac1a2f; text-decoration:none; font-weight:normal;}
    #main_container_footer #footer_container ul li.no_border {border:0px;}
--></style>
  <table border="0" cellspacing="0" cellpadding="0"><tr><td>
 <div id="main_container_footer">
        <div id="footer_container">
            <ul>
                <li class="no_border"><script type="text/javascript">
copyright=new Date();
update=copyright.getFullYear();
document.write("&copy; "+ update + " Kitco Metals Inc.");
</script></li>
                <li><a href="https://corp.kitco.com/index.html">About Us</a></li>
                <li><a href="http://www.kitco.com/TermsofUse/" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Website Terms of Use</a></li>
                <li><a href="https://online.kitco.com/help/privacy_policy.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Privacy Policy</a></li>
                <li><a href="http://www.kitco.com/ads/">Advertise With Us</a></li>
                <li><a href="https://corp.kitco.com/en/corporate_culture.html">Careers</a></li>
                <li><a href="https://corp.kitco.com/en/contact.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Contact Us</a></li>
                <li class="no_border"><a href="https://corp.kitco.com/en/feedback.html" target="_top" onclick="Window_open(this.href,'KITCO','top=120,left=250,width=500,height=350'); return false">Feedback</a></li>
            </ul>
        </div>
    </div> 

    </td></tr></table><br /><br />
<script language="JavaScript" type="text/javascript">
<!--
function Window_open (Address) {
  NewWindow = window.open(Address, "Popup", "width=695,height=600,left=100,top=200,resizable=yes,scrollbars=yes");
  NewWindow.focus();
}
// -->
</script>
 <!-- img src="http://www.kitco.com/scripts/counter/counter.pl?txtonlyE.txt" width="1" height="1" -->
<!-- Google-Analytics Code-->
<script type="text/javascript">
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-4074364-3']);
  _gaq.push(['_trackPageview']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();
</script>
</body>
</html>

More specifically, I am looking to capture the following data:

--------------------------------------------------------------------------------
London Fix          GOLD          SILVER       PLATINUM           PALLADIUM
               AM       PM                  AM       PM         AM       PM
--------------------------------------------------------------------------------
Jun 19,2012   1628.50   NA        28.8100   1486.00   1486.00   629.00   634.00 
Jun 18,2012   1623.50   1615.50   28.4300   1486.00   1484.00   626.00   628.00 
--------------------------------------------------------------------------------

Does anybody have any suggestions how I can do this using PHP?



1 Answer


Quick and dirty regex method:

$data = file_get_contents('http://www.kitco.com/texten/texten.html');
preg_match_all('/([A-Z]{3,5}\s+[0-9]{1,2},[0-9]{4}\s+([0-9.NA]{2,10}\s+){1,7})/si',$data,$result);

$records = array();
foreach($result[1] as $date) {
    $temp = preg_split('/\s+/',$date);
    $index = array_shift($temp);
    $index.= array_shift($temp);
    $records[$index] = implode(',',$temp);
}
print_R($records);

Note, you'd probably want to add some validation, etc.


Source: http://stackoverflow.com/questions/11103001/web-data-extraction-scraping-data-from-kitco-inc-text-only-market-page

No comments:

Post a Comment