diff options
author | Ben Wagner <benjaminwagner@google.com> | 2017-09-27 14:45:47 -0400 |
---|---|---|
committer | Skia Commit-Bot <skia-commit-bot@chromium.org> | 2017-09-28 13:45:37 +0000 |
commit | 05572fa6e784b3037e94966c74c8a358348b2245 (patch) | |
tree | 97d1584708ded030488b988157cd47e7d9c3ddaf /site/dev | |
parent | 420c4cfcd75189c03c735b1f02dee360e705c3e9 (diff) |
Update docs on Skia bots.
No-Try: true
Docs-Preview: https://skia.org/?cl=52163
Change-Id: I2bcd73bc7597219e4748c28e9120b5138a0eb3d1
Reviewed-on: https://skia-review.googlesource.com/52163
Reviewed-by: Eric Boren <borenet@google.com>
Reviewed-by: Ravi Mistry <rmistry@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Diffstat (limited to 'site/dev')
-rw-r--r-- | site/dev/sheriffing/trooper.md | 27 | ||||
-rw-r--r-- | site/dev/testing/skialab.md | 225 | ||||
-rw-r--r-- | site/dev/testing/swarmingbots.md | 69 |
3 files changed, 75 insertions, 246 deletions
diff --git a/site/dev/sheriffing/trooper.md b/site/dev/sheriffing/trooper.md index bcd456051f..e49d7cfb0e 100644 --- a/site/dev/sheriffing/trooper.md +++ b/site/dev/sheriffing/trooper.md @@ -50,14 +50,10 @@ Tips for troopers - Install the Skia trooper Chrome extension (available [here](https://chrome.google.com/webstore/a/google.com/detail/alerts-for-skia-troopers/fpljhfiomnfioecagooiekldeolcpief)) to be able to see alerts quickly in the browser. - Where machines are located: - - Machine name like "skia-gce-NNN", "ct-gce-NNN" -> GCE - - Machine name ends with "a3", "a4", "m3" -> Chrome Golo + - Machine name like "skia-gce-NNN", "skia-i-gce-NNN", "ct-gce-NNN", "skia-ct-gce-NNN", "ct-xxx-builder-NNN" -> GCE + - Machine name ends with "a9", "m3" -> Chrome Golo/Labs - Machine name ends with "m5" -> CT bare-metal bots in Chrome Golo - - Machine name starts with "skiabot-" -> Chapel Hill lab - - Machine name starts with "win8" -> Chapel Hill lab (Windows machine - names can't be very long, so the "skiabot-shuttle-" prefix is dropped.) - - slave11-c3 is a Chrome infra GCE machine (not to be confused with the Skia - Buildbots GCE, which we refer to as simply "GCE") + - Machine name starts with "skia-e-", "skia-i-" (other than "skia-i-gce-NNN"), "skia-rpi-" -> Chapel Hill lab - The [chrome-infra hangout](https://goto.google.com/cit-hangout) is useful for questions regarding bots managed by the Chrome Infra team and to get @@ -76,23 +72,12 @@ Tips for troopers username is chrome-bot and the password can be found on [Valentine](https://valentine.corp.google.com/) as "chrome-bot (Win GCE)". +- To log in to other bots, see the [Skolo maintenance doc](https://docs.google.com/document/d/1zTR1YtrIFBo-fRWgbUgvJNVJ-s_4_sNjTrHIoX2vulo/edit#heading=h.2nq3yd1axg0n) remote access section. + - If there is a problem with a bot in the Chrome Golo or Chrome infra GCE, the best course of action is to [file a bug](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure) - with the Chrome infra team. But if you know what you're doing: - - To access bots in the Chrome Golo, - [follow these instructions](https://chrome-internal.googlesource.com/infra/infra_internal/+/master/doc/ssh.md). - - Machine name ends with "a3" or "a4" -> ssh command looks like `ssh - build3-a3.chrome` - - Machine name ends with "m3" -> ssh command looks like `ssh build5-m3.golo` - - Machine name ends with "m5" -> ssh command looks like `ssh build1-m5.golo`. - [Example bug](https://bugs.chromium.org/p/chromium/issues/detail?id=638193) to file to Infra Labs. - - For MacOS and Windows bots, you will be prompted for a password, which is - stored on [Valentine](https://valentine.corp.google.com/) as "Chrome Golo, - Perf, GPU bots - chrome-bot". - - To access bots in the Chrome infra GCE -> command looks like `gcutil - --project=google.com:chromecompute ssh --ssh_user=default slave11-c3` (or - use the ccompute ssh script from the infra_internal repo). + with the Chrome infra team. - Read over the [Skolo maintenance doc](https://docs.google.com/document/d/1zTR1YtrIFBo-fRWgbUgvJNVJ-s_4_sNjTrHIoX2vulo/edit) for more detail on dealing with device alerts. diff --git a/site/dev/testing/skialab.md b/site/dev/testing/skialab.md deleted file mode 100644 index 5ddb0deaf2..0000000000 --- a/site/dev/testing/skialab.md +++ /dev/null @@ -1,225 +0,0 @@ -SkiaLab -======= - -Overview --------- - -Skia's buildbots are hosted in three places: - -* Google Compute Engine. This is the preferred location for bots which don't - need to run on physical hardware, ie. anything that doesn't require a GPU, - stable performance numbers, or a specific hardware configuration. Most of our - compile bots live here, along with some non-GPU test bots on Linux and - Windows. -* Chrome Golo. This is the preferred location for bots which require specific - hardware or OS configurations that are not supported by GCE. We have several - Mac, Linux, and Windows bots in the Golo. -* The local SkiaLab in Chapel Hill. Anything we can't get in GCE or the Golo - lives here. This includes newer or uncommon GPUs and all Android, ChromeOS, - and iOS devices. - -This page covers the local SkiaLab in Chapel Hill. - - -Layout ------- - -The SkiaLab consists of three wireframe racks which hold machines connected to -two KVM switches. Each KVM switch has a monitor, mouse, and keyboard and is the -primary mode of access to the lab machines. In general, the machines are on the -same rack as the KVM switch used to access them. The switch nearest the door -(labeled "DOOR"), is connected to machines on its own rack as well as a smaller -rack closer to the door. - -Each machine is labeled with its hostname and the number or letter used to -access it on the KVM switch. Android devices are located on the rack nearest -the interior of the office (the KVM switch is labeled "OFFICE"). They are -labeled with their serial number and the name of the buildslave they are -associated with. Each device connects to a host machine, either directly or -by way of a powered USB hub. - -**Disclaimer: Please ONLY make changes on a lab machine as a last resort, as it -is disruptive to the running bots and can leave the machines in a dirty state. -If you must make changes, such as cloning a copy of Skia to run tests and debug -failures, be sure to clean up after yourself. If a permanent change needs to be -made on the machine (such as a driver update), please contact an infra team -member.** - - -Common Tasks ------------- - -### Locating the host machine for a failing bot - -Sometimes failures can only be reproduced on a particular hardware -configuration. In these cases, it is sometimes necessary to log into the host -machine where a failing bot is running in order to debug the failure. - -From the [Status](https://status.skia.org/) page: - -1. Click on the box associated with a failed build. -2. A popup will appear with some information about the build, including the - builder and buildslave. Click the "Lookup" link next to "Host machine". This - will bring you to the [SkiaLab Hosts](https://status.skia.org/hosts) page, - which contains information about the machines in the lab, pre-filtered to - select the machine which runs the buildslave in question. -3. The information box will display the hostname of the machine as well as the - KVM switch and number used to access the machine, if the machine is in the - SkiaLab. -4. Walk over to the lab. While standing at the KVM switch indicated by the host - information page, double tap \<ctrl\> and then press the number or letter from - the information page. It may be necessary to move or click the mouse to wake - the machine up. -5. Log in to the machine if necessary. The password is stored in - [Valentine](https://valentine/) as "Chapel Hill buildbot slave password". - -### Rebooting a problematic Android device - -Follow the same process as above, with some slight changes: - -1. On the [Status](https://status.skia.org/) page, click the box for the failed - build. -2. Click the "Lookup" link for the host machine. Remember the name of the - buildslave which ran the build. -3. The hosts page will display the information used to access the host machine - for the device as well as the serial number for the device next to the name - of its buildsave. -4. Walk over to the lab and find the Android device with the serial number from - the hosts page. Hold the power and volume-up buttons until the device - reboots. -5. Access the host machine for the device, per the above instructions. Use the - `which_devices.py` script to verify that the device has re-attached. From - the home directory: - - $ python buildbot/scripts/which_devices.py - - -Maintenance Tasks ------------------ - -### Bringing up a new buildbot host machine - -This assumes that we're just adding a host machine for a new buildbot slave, -and doesn't cover how to make changes to the buildbot code to change the -behavior of the builder itself. - -1. Obtain the machine itself and place it on the racks in the lab. Connect - power, ethernet, and KVM cables. -2. If we already have a disk image appropriate for this machine, follow the - instructions for flashing a disk image to a machine below. Otherwise, follow - the instructions for bringing up a new machine from scratch. -3. Power on the machine. Be sure to kill any buildbot processes that start up, - eg. `killall python` on Linux and Mac, and just close any cmd instances which - pop up on Windows. -4. Set the hostname for the machine. -5. Ensure that the machine is labeled with its hostname and KVM number. -6. Add the new slave to the slaves.cfg file on the appropriate master, eg. - https://chromium.googlesource.com/chromium/tools/build/+/master/masters/master.client.skia/slaves.cfg, - and upload the change for code review. -7. Add an entry for the new host machine to the slave_hosts_cfg.py file in the - Skia infra repo: https://skia.googlesource.com/buildbot/+/master/site_config/slave_hosts_cfg.py, - and upload it for review. -8. Commit the change to add the slave to the master. Once it lands, commit the - slave_hosts_cfg.py change immediately afterward. -9. Restart the build master. Either ask borenet@ to do this or file a - [ticket](https://code.google.com/p/chromium/issues/entry?template=Build%20Infrastructure&labels=Infra-Labs,Restrict-View-Google,Infra-Troopers&summary=Restart%20request%20for%20[%20name%20]&comment=Please%20provide%20the%20reason%20for%20restart.%0A%0ASet%20to%20Pri-0%20if%20immediate%20restarted%20is%20required,%20otherwise%20please%20set%20to%20Pri-1%20and%20the%20restart%20will%20happen%20when%20the%20trooper%20gets%20a%20free%20moment.) for a trooper to do it. -10. Reboot the machine and monitor the build master to ensure that it connects. - This can take some time, since the bot needs to sync Chrome. - - -### Bringing up a new Android bot - -1. Locate or add a host machine. We generally want to keep the number of - devices attached to each host below 5 or so. If a new host machine is - required, follow the above instructions for bringing up a new buildbot - host machine, with the exception that the slave corresponds to the Android - device, not the host machine itself. -2. Ensure that the buildslave is not yet running: - - $ killall python - -3. Disable MTP and PTP on the device. Some devices require one or the other to - be enabled; in that case, select PTP and choose to 'do nothing' when - attaching to the host machine. -4. Connect the device to the host machine, either through a powered USB hub or - directly to the machine. -5. Make sure that the device is in developer mode and that USB debugging is - enabled. -6. Authorize the device for USB debugging on the host machine by checking the - "always allow" box on dialog box which appears on the Android device after - plugging it into the host. -7. Ensure that the device appears as "connected" when you run the - `which_devices.py` script: - - $ python buildbot/scripts/which_devices.py - -8. Reboot the machine to start the buildslave. - - -### Bringing up a new machine from scratch - -TODO(borenet): Migrate from Google Docs. - -OS-specific instructions are available in a -[Google Doc](https://docs.google.com/document/d/1X7Hvsj33AlBmj-KEWfFbmdCArUJJAICLkB7ipDcxRV8/edit) - - -### Flashing a disk image to a machine - -1. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the - machine. -2. Turn on the machine and load the boot menu. For Shuttle machines, press - \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and - press the \<option\> key at boot. Boot from the USB key. It's typically UEFI - and named something like "FlashBlu" or "Kanguru". -3. At the Clonezilla menu, choose the "to RAM" option. -4. Choose your preferred language. -5. "Don't touch keymap". -6. "Start Clonezilla". -7. "device-image". -8. "local_dev". -9. Unplug the flash drive and plug in the external hard drive labeled, "Disk - images." Wait for the "Attached Enclosure device" message to appear, then - hit \<enter\>. -10. Select the external drive to use for /home/partimag, something like, - "1000GB_ntfs_My_Passport". -11. Select the bot_img directory. -12. Hit \<enter\> to continue. -13. "Beginner" -14. "restoredisk" -15. Select the image to use. Make sure that it's compatible with this machine. -16. Choose the hard drive in the machine. It should be the only option. -17. "y" and "y" -18. Choose "reboot" after flashing the image to the machine. -19. Set the hostname of the machine so that it doesn't conflict with any - existing machines. - -### Capturing a disk image - -1. Make sure that the machine is in a clean state: no pre-existing buildslave - checkouts, extra software, etc. -2. Find the USB key labeled, "Clonezilla" in the SkiaLab and insert it into the - machine. -3. Turn on the machine and load the boot menu. For Shuttle machines, press - \<del\> or \<esc\>. Mac machines require that you plug in the Mac keyboard and - press the \<option\> key at boot. Boot from the USB key. It's typically UEFI - and named something like "FlashBlu" or "Kanguru". -4. At the Clonezilla menu, choose the "to RAM" option. -5. Choose your preferred language. -6. "Don't touch keymap". -7. "Start Clonezilla". -8. "device-image". -9. "local_dev" -10. Unplug the flash drive and plug in the external hard drive labeled, "Disk - images." Wait for the "Attached Enclosure device" message to appear, then - hit \<enter\>. -11. Select the external drive to use for /home/partimag, something like, - "1000GB_ntfs_My_Passport". -12. Select the bot_img directory. -13. "Beginner" -14. "savedisk" -15. Choose a name for the disk image. The convention is: - `skiabot-<hardware type>-<OS>-<disk image revision #>` -12. Choose the hard drive in the machine. It should be the only option. -13. "y" -14. Choose "reboot" or "shut down" when finished. diff --git a/site/dev/testing/swarmingbots.md b/site/dev/testing/swarmingbots.md new file mode 100644 index 0000000000..75cf38b3d4 --- /dev/null +++ b/site/dev/testing/swarmingbots.md @@ -0,0 +1,69 @@ +Skia Swarming Bots +================== + +Overview +-------- + +Skia's Swarming bots are hosted in three places: + +* Google Compute Engine. This is the preferred location for bots which don't need to run on physical + hardware, ie. anything that doesn't require a GPU or a specific hardware configuration. Most of + our compile bots live here, along with some non-GPU test bots on Linux and Windows. We get + surprisingly stable performance numbers from GCE, despite very few guarantees about the physical + hardware. +* Chrome Golo. This is the preferred location for bots which require specific hardware or OS + configurations that are not supported by GCE. We have several Mac, Linux, and Windows bots in the + Golo. +* The Skolo (local Skia lab in Chapel Hill). Anything we can't get in GCE or the Golo lives + here. This includes a wider variety of GPUs and all Android, ChromeOS, iOS, and other devices. + +[go/skbl](https://goto.google.com/skbl) lists all Skia Swarming bots. + +Adding new jobs +--------------- + +See [Skia Automated Testing](automated_testing) for an overview of how jobs and tasks are executed +by the Skia Task Scheduler. + +If you would like to add jobs to build or test new configurations, please file a [New Bot +Request](https://bugs.chromium.org/p/skia/issues/entry?template=New+Bot+Request). + +If you know that the new jobs will need new hardware or you aren't sure which existing bots should +run the new jobs, assign to jcgregorio. Once the Infra team has allocated the hardware, we will +assign back to you to complete the process. + +Generally it's possible to copy an existing job and make changes to accomplish what you want. You +will need to add the new job to +[infra/bots/jobs.json](https://skia.googlesource.com/skia/+/master/infra/bots/jobs.json). In some +cases, you will need to make changes to recipes: + +* If there are new GN flags or compiler options: + [infra/bots/recipe_modules/flavor/gn_flavor.py](https://skia.googlesource.com/skia/+/master/infra/bots/recipe_modules/flavor/gn_flavor.py) +* If there are modifications to dm flags: + [infra/bots/recipes/test.py](https://skia.googlesource.com/skia/+/master/infra/bots/recipes/test.py) +* If there are modifications to nanobench flags: + [infra/bots/recipes/perf.py](https://skia.googlesource.com/skia/+/master/infra/bots/recipes/perf.py) + +If you need to do something more complicated, or if you are not sure how to add and configure the +new jobs, please ask for help from borenet, benjaminwagner, or mtklein. + +Debugging +--------- + +If you need a physical machine/device to debug an issue, the [current +Trooper](http://skia-tree-status.appspot.com/trooper) can loan one from the Skolo. For Internet +access, you can connect to GoogleGuest WiFi. + +If you need to make changes on a Skolo device, please check with an Infra team member. Most can be +flashed/imaged back to a clean state, but others can not. + +If a permanent change needs to be made on the machine (such as an OS or driver update), please [file +a bug](https://bugs.chromium.org/p/skia/issues/entry?template=Infrastructure+Bug) and assign to +jcgregorio for reassignment. + + +Maintenance Tasks +----------------- + +See the [Skolo maintenance +doc](https://docs.google.com/document/d/1zTR1YtrIFBo-fRWgbUgvJNVJ-s_4_sNjTrHIoX2vulo/edit). |